Session: Maximizing efficiency by templating Glue jobs and serverless architecture in Hudi data lakes

Soumil Shah, Data Engineering Team Lead @ JobTarget

By combining the power of templated Glue jobs and a serverless architecture within Hudi data lakes, this project revolutionizes how organizations handle large volumes of data. In this project, we explore the challenges of data lake management and introduce a solution that leverages the flexibility of Glue jobs to automate data ingestion, transformation, and loading tasks. The templated approach significantly reduces the effort required to manage complex ETL processes.

Additionally, the project incorporates a serverless architecture, minimizing operational overhead and costs associated with traditional infrastructure management. This approach not only streamlines data processing but also ensures scalability, fault tolerance, and high availability.

Through practical examples, source code, and architectural insights, this project equips data engineers and architects with the knowledge and tools to implement these techniques in their own data lake ecosystems. Attendees will learn how to maximize efficiency, optimize ETL workflows, and harness the full potential of Hudi data lakes, ultimately improving their organization's data management capabilities.

Interested in diving deeper?
Where & when?

Open Source Data Summit 2024 will be held on October 2nd, 2024.

What is the cost of access to the live virtual sessions?

OSDS is always free and open for all to attend.

What is Open Source Data Summit?

OSDS is a peer-to-peer gathering of data industry professionals, experts, and enthusiasts to explore the dynamic landscape of open source data tools and storage.

The central theme of OSDS revolves around the advantages of open source data products and their pivotal role in modern data ecosystems.

OSDS is the annual peer hub for knowledge exchange that fosters a deeper understanding of open source options and their role in shaping the data-driven future.

Who attends OSDS?

OSDS is attended by data engineers, data architects, developers, DevOps practitioners and managers, and data leadership.

Anyone who is looking for enriched perspectives on open source data tools and practical insights to navigate the evolving data landscape should attend this event.

Join again on October 2nd, 2024 for discussions around:
  • Benefits of open source data tools
  • Cost/performance trade-offs
  • Building data storage solutions
  • Challenges surrounding open source data tool integration
  • Solutions for the cost of storing, accessing, and managing data
  • Data streams and ingestion
  • Hub-and-spoke data integration models
  • Choosing the right engine for your workload
Interested in speaking or sponsoring Open Source Data Summit 2024?

Submit a talk proposal here or reach out to astronaut@solutionmonday.com.

Don't miss out on important updates! Register for access to Open Source Data Summit 2024

Register for OSDS 2024 Access

"*" indicates required fields