Skip to the content.

How to become Data Engineer?

Watch this below video

YouTube - What is Data Engineering

Must go-through (free resources):

Azure Data factory


Azure Synapse


Azure Databricks


Pyspark


SQL


Python


git


Azure Fundamentals

Spark Advanced videos with slides (must for Interview)

  1. Making Apache Spark Better with Delta Lake [Presentation slides here]
  2. Understanding Query Plans and Spark UIs - Xiao Li Databricks [Presentation slides here]
  3. Optimizing Delta Parquet Data Lakes for Apache Spark - Matthew Powers [Presentation slides here]
  4. Easy, Scalable, Fault Tolerant Stream Processing with Structured Streaming in Apache Spark [Presentation slides here]
  5. Everyday I'm Shuffling - Tips for Writing Better Apache Spark Programs [Presentation slides here]
  6. Optimizing Apache Spark SQL Joins: Spark Summit East talk by Vida Ha [Presentation slides here]
  7. Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks [Presentation slides here]
  8. Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das [Presentation slides here]
  9. Deep Dive: Apache Spark Memory Management [Presentation slides here]
  10. The Parquet Format and Performance Optimization Opportunities Boudewijn Braams [Presentation slides here]
  11. Designing ETL Pipelines with Structured Streaming and Delta Lake How to Architect Things Right [Presentation slides here]
  12. Advanced Apache Spark Training - Sameer Farooqui [Presentation slides here]
  13. Deeper Understanding of Spark Internals - Aaron Davidson [Presentation slides here]
  14. Top 5 Mistakes When Writing Spark Applications [Presentation slides here]
  15. Spark Architecture, Alexey Grishchenko [Presentation slides here]
  16. Spark SQL: A compiler from Queries to RDDS with Sameer Agarwal [Presentation slides here]
  17. Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenchen Fan [Presentation slides here]
  18. A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai [Presentation slides here]
  19. Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland [Presentation slides here]
  20. Tuning and Debugging in Apache Spark Patrick Wendell [Presentation slides here]
  21. Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia [Presentation slides here]
  22. Understanding the Performance of Spark Applications - Patrick Wendell [Presentation slides here]

Fastrack Interview

WIP

Optional resources for Mechanical Engineers:

Certifications:

Advanced Reads:

Free Cloud Resources:

𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 Burger

image

Road Map to Data Engineer

1696162341930

Data Warehouse vs Lake vs Mesh

1696248040560

Data Warehouse vs Lake vs Lakehouse vs Mesh

1694950258656

Cloud Platform Models

1695779445753

ETL vs ELT vs reverse ETL

1695032965655

Star vs Snowflake Schema

1693622854283

Medallion Architecture

1719381184666