The streaming-first lakehouse: Handling high-frequency mutable workloads with Apache Hudi™

Shiyan Xu, Founding Team Member @ Onehouse

Handling updates and deletes in a streaming lakehouse is a monumental challenge: processing high-frequency mutable workloads can lead to performance degradation, small file issues, and resource wastage due to conflicts when concurrent writers are involved. How can you build a truly streaming-first lakehouse without sacrificing performance? This session demystifies Apache Hudi™’s streaming-first designs built to handle these exact problems.

We'll dive into how Apache Hudi™ uses Merge-on-Read (MOR) tables to efficiently absorb frequent updates and record-level indexing to maintain low-latency writes for mutable data. Discover how auto-file sizing and asynchronous compaction proactively solve the "small file problem." We'll also cover Apache Hudi™ 1.0’s Non-Blocking Concurrency Control (NBCC) to avoid costly retries and the LSM Timeline for optimized metadata access.

Key Takeaways:

  • Understand Apache Hudi™'s core designs for handling streaming mutable workloads at scale.
  • Solve challenging workloads involving high-frequency updates and large, mutable datasets.
  • Leverage Apache Hudi™'s advanced concurrency and metadata optimizations to build a stable, low-latency lakehouse.

Learn more from our co-chair sponsor Onehouse: 

Where & when?

Open Source Data Summit 2025 was held on November 13th, 2025.

What is the cost of access to the live virtual sessions?

OSDS is always free and open to all.

What is Open Source Data Summit?

OSDS is a peer-to-peer gathering of data industry professionals, experts, and enthusiasts to explore the dynamic landscape of open source data tools and storage.

The central theme of OSDS revolves around the advantages of open source data products and their pivotal role in modern data ecosystems.

OSDS is the annual peer hub for knowledge exchange, fostering a deeper understanding of open source options and their role in shaping the data-driven future.

Who attends OSDS?

OSDS is attended by data engineers, data architects, developers, DevOps practitioners and managers, and data leadership.

Anyone looking for enriched perspectives on open source data tools and practical insights to navigate the evolving data landscape should attend this event.

Example topics for Open Source Data Summit:
  • Benefits of open source data tools
  • Cost/performance trade-offs
  • Building data storage solutions
  • Challenges surrounding open source data tool integration
  • Solutions for the cost of storing, accessing, and managing data
  • Data streams and ingestion
  • Hub-and-spoke data integration models
  • Choosing the right engine for your workload
Are you interested in speaking or sponsoring the next Open Source Data Summit?

Submit a talk proposal here or reach out to astronaut@solutionmonday.com.

That's a wrap for 2025! Enter your email address below for access to the 2025 sessions on-demand and news about the 2026 summit!!