Session: Scaling and governing Robinhood's data lakehouse

Balaji Varadarajan, Senior Staff Software Engineer @ Robinhood

Pritam Dey, Technical Lead @ Robinhood

Robinhood Markets’ mission is to democratize finance for all. Continuous data analysis and data-driven decision-making are fundamental to achieving this. The data required for analysis comes from varied sources - OLTP databases, event streams, and various 3rd party sources. A reliable lakehouse with an interoperable data ecosystem and fast data ingestion service is needed to power various reporting and business-critical pipelines and dashboards. Being in the financial domain, effective data governance is crucial for regulatory compliance and ensuring that data is consistent and trustworthy.

In this talk, we will describe the evolution of the big data ecosystem in Robinhood not only in terms of the scale of data stored and queries made but also the use cases that it supports. We go in-depth into the lakehouse along with the data ingestion services we built using open source tools to reduce the data freshness latency for our core datasets from one day to under 15 minutes. Finally, we will also describe our approach to data governance and compliance and how we are leveraging open source components in implementing this.

Interested in diving deeper into scaling and governing a data lakehouse?
Where & when?

Open Source Data Summit 2025 will be held on October 8th, 2025.

What is the cost of access to the live virtual sessions?

OSDS is always free and open for all to attend.

What is Open Source Data Summit?

OSDS is a peer-to-peer gathering of data industry professionals, experts, and enthusiasts to explore the dynamic landscape of open source data tools and storage.

The central theme of OSDS revolves around the advantages of open source data products and their pivotal role in modern data ecosystems.

OSDS is the annual peer hub for knowledge exchange that fosters a deeper understanding of open source options and their role in shaping the data-driven future.

Who attends OSDS?

OSDS is attended by data engineers, data architects, developers, DevOps practitioners and managers, and data leadership.

Anyone who is looking for enriched perspectives on open source data tools and practical insights to navigate the evolving data landscape should attend this event.

On October 2nd, 2024 we convened for discussions about:
  • Benefits of open source data tools
  • Cost/performance trade-offs
  • Building data storage solutions
  • Challenges surrounding open source data tool integration
  • Solutions for the cost of storing, accessing, and managing data
  • Data streams and ingestion
  • Hub-and-spoke data integration models
  • Choosing the right engine for your workload
Are you interested in speaking or sponsoring the next Open Source Data Summit?

Submit a talk proposal here or reach out to astronaut@solutionmonday.com.

Register for on-demand access to the OSDS 2024 sessions and announcements about OSDS 2025

"*" indicates required fields

This field is for validation purposes and should be left unchanged.