Intro

https://newsletter.pragmaticengineer.com/p/what-is-data-engineering-part-1

Broad list of resources

https://github.com/DataExpert-io/data-engineer-handbook

Extensive notes

https://www.ssp.sh/brain/data-engineering/

Data mesh, data as product, modern data engineering

  • data as product with clear metadata (freshness, origin, schema)
  • domain team owns data product
  • infra / data platform for storage and query engine
  • federated policies on access, security, documentation
  • enabling team provides consulting, best practices, examples
  • analytics
    • ingest
    • raw vs events & entities
    • use of external data products
    • aggregations

https://www.datamesh-architecture.com/

Best practices

Versioning data and models

https://news.ycombinator.com/item?id=37694701

Useful tools

lakeFS - Data versioning

https://docs.lakefs.io/

dbt

DVC - Data versioning and experiment tracking

Airflow

Alternatives:

  • Metaflow
  • Prefect

See also