Session: Apache Hudi 1.0 preview: A database experience on the data lake

Bhavani Sudha Saktheeswaran, Software Engineer @ Onehouse

Sagar Sumit, Software Engineer @ Onehouse

Hudi is a top-level Apache open-source project and community that is breaking the boundaries of what is possible on a data lake. This talk unveils the essence of Apache Hudi 1.0, a pivotal version that will encapsulate a ground-up reimagination of Hudi's transactional database layer while staying true to its foundational principles. Diving deep, we'll explore:

  1. State of the project: How Hudi today is a versatile data lake platform, enabling automated, near real-time data ingestion and incremental processing, integrated seamlessly with powerful frameworks such as Apache Spark, Flink, and Kafka Connect.
  2. Hudi 1.0: An insightful look into how Hudi is architecting the foundational blocks of its database kernel. From implementing non-blocking concurrency control, and faster access methods with improved indexing and metadata, to leveraging an LSM tree-style timeline for infinite time travel – Hudi is redesigning every facet to optimize data lakes at scale.
  3. The road ahead: Understanding the potential of transforming Hudi's core into a universal database experience for the lake, diving into deep query engine integrations, employing a hybrid server architecture, and expanding capabilities for complex data types, including images, videos, and formats conducive to ML/AI.

As Hudi 1.0 prepares to set a new benchmark in the world of streaming data lakes, this talk invites feedback, ideas, and collaborations to augment its scope and deliver unparalleled value to the user community. Join us to be part of this transformational journey!

Interested in learning more about Hudi 1.0?
Where & when?

Open Source Data Summit 2025 will be held on October 8th, 2025.

What is the cost of access to the live virtual sessions?

OSDS is always free and open for all to attend.

What is Open Source Data Summit?

OSDS is a peer-to-peer gathering of data industry professionals, experts, and enthusiasts to explore the dynamic landscape of open source data tools and storage.

The central theme of OSDS revolves around the advantages of open source data products and their pivotal role in modern data ecosystems.

OSDS is the annual peer hub for knowledge exchange that fosters a deeper understanding of open source options and their role in shaping the data-driven future.

Who attends OSDS?

OSDS is attended by data engineers, data architects, developers, DevOps practitioners and managers, and data leadership.

Anyone who is looking for enriched perspectives on open source data tools and practical insights to navigate the evolving data landscape should attend this event.

On October 2nd, 2024 we convened for discussions about:
  • Benefits of open source data tools
  • Cost/performance trade-offs
  • Building data storage solutions
  • Challenges surrounding open source data tool integration
  • Solutions for the cost of storing, accessing, and managing data
  • Data streams and ingestion
  • Hub-and-spoke data integration models
  • Choosing the right engine for your workload
Are you interested in speaking or sponsoring the next Open Source Data Summit?

Submit a talk proposal here or reach out to astronaut@solutionmonday.com.

Register for on-demand access to the OSDS 2024 sessions and announcements about OSDS 2025

"*" indicates required fields

This field is for validation purposes and should be left unchanged.