Bored Analyst

Data Pipelines

ETL, orchestration, and data pipeline patterns

Data Pipelines

ETL, orchestration, and pipeline design.

Pipeline stages

  1. Extract — Pull from sources (APIs, DBs, files)
  2. Transform — Clean, join, aggregate
  3. Load — Write to warehouse or downstream systems

Orchestration

  • Scheduling — Cron, Airflow, Dagster, Prefect
  • Dependencies — Task order and failure handling
  • Idempotency — Safe to re-run without duplicates
  • Incremental vs full — Only new/changed data vs. full refresh

Design patterns

  • Medallion — Bronze (raw), silver (cleaned), gold (modeled)
  • Incremental — Track last sync; only process new records
  • Backfill — Reprocess historical data when logic changes
  • Checkpointing — Resume from last successful state

On this page

No Headings