Data Engineering
Data Engineering
Batch vs Streaming
Bounded vs unbounded data, MapReduce to Spark, stream processing with Flink/Kafka Streams, Lambda vs Kappa architecture, and when to choose each
Stream Processing EnginesFlink internals (checkpointing, RocksDB state, watermarks), Kafka Streams (library model, changelog-backed state), Spark Structured Streaming (micro-batch), and engine selection framework
Data Warehouse BasicsOLTP vs OLAP workloads, columnar storage and compression, star schema with fact/dimension tables, materialized views, and Snowflake vs BigQuery vs Redshift
Change Data Capture (CDC)Reading database transaction logs, Debezium with PostgreSQL WAL and MySQL binlog, cache/search/warehouse sync, outbox pattern integration, and schema evolution