Session: Open data foundations with OneTable - Hudi, Delta, and Iceberg interoperability

Ashvin Agrawal, Senior Researcher @ Microsoft

Tim Brown, Engineering @ Onehouse

Anoop Johnson, Senior Staff Software Engineer @ Google

OneTable is a brand new open source project that unlocks omni-directional interoperability between the popular lakehouse projects Apache Hudi, Delta Lake, and Apache Iceberg. When your data is at rest in your lake, Hudi, Delta, and Iceberg are not so different. They each offer a metadata layer over a set of parquet files. OneTable offers lightweight conversion mechanisms that can take a source metadata format and sync it into one or more target metadata formats.

This session will feature a live demo and describe real-world applications of how to build open data foundations that can accelerate your workloads into a variety of open source query engines including Spark, Presto, Trino, Flink, and more. We will describe the technical foundations for Hudi, Delta, and Iceberg and lay out the nuts and bolts of how OneTable seamlessly converts data between these formats. We will detail our journey to create the project, share the vision for the future, and show how you can join this new open community.

Interested in trying out OneTable or contributing to building out an interoperable lakehouse future?
Where & when?

Open Source Data Summit 2024 was held on October 2nd, 2024.

What is the cost of access to the live virtual sessions?

OSDS is always free and open for all to attend.

What is Open Source Data Summit?

OSDS is a peer-to-peer gathering of data industry professionals, experts, and enthusiasts to explore the dynamic landscape of open source data tools and storage.

The central theme of OSDS revolves around the advantages of open source data products and their pivotal role in modern data ecosystems.

OSDS is the annual peer hub for knowledge exchange that fosters a deeper understanding of open source options and their role in shaping the data-driven future.

Who attends OSDS?

OSDS is attended by data engineers, data architects, developers, DevOps practitioners and managers, and data leadership.

Anyone who is looking for enriched perspectives on open source data tools and practical insights to navigate the evolving data landscape should attend this event.

On October 2nd, 2024 we convened for discussions about:
  • Benefits of open source data tools
  • Cost/performance trade-offs
  • Building data storage solutions
  • Challenges surrounding open source data tool integration
  • Solutions for the cost of storing, accessing, and managing data
  • Data streams and ingestion
  • Hub-and-spoke data integration models
  • Choosing the right engine for your workload
Are you interested in speaking or sponsoring the next Open Source Data Summit?

Submit a talk proposal here or reach out to astronaut@solutionmonday.com.

Register for on-demand access to the OSDS 2024 sessions and announcements about OSDS 2025

"*" indicates required fields

This field is for validation purposes and should be left unchanged.