Data Pipelines
ETL, orchestration, and data pipeline patterns
Data Pipelines
ETL, orchestration, and pipeline design.
Pipeline stages
- Extract — Pull from sources (APIs, DBs, files)
- Transform — Clean, join, aggregate
- Load — Write to warehouse or downstream systems
Orchestration
- Scheduling — Cron, Airflow, Dagster, Prefect
- Dependencies — Task order and failure handling
- Idempotency — Safe to re-run without duplicates
- Incremental vs full — Only new/changed data vs. full refresh
Design patterns
- Medallion — Bronze (raw), silver (cleaned), gold (modeled)
- Incremental — Track last sync; only process new records
- Backfill — Reprocess historical data when logic changes
- Checkpointing — Resume from last successful state