Intro
https://newsletter.pragmaticengineer.com/p/what-is-data-engineering-part-1
Broad list of resources
https://github.com/DataExpert-io/data-engineer-handbook
Extensive notes
https://www.ssp.sh/brain/data-engineering/
Data mesh, data as product, modern data engineering
- data as product with clear metadata (freshness, origin, schema)
- domain team owns data product
- infra / data platform for storage and query engine
- federated policies on access, security, documentation
- enabling team provides consulting, best practices, examples
- analytics
- ingest
- raw vs events & entities
- use of external data products
- aggregations
https://www.datamesh-architecture.com/
Best practices
Versioning data and models
https://news.ycombinator.com/item?id=37694701
Useful tools
lakeFS - Data versioning
dbt
DVC - Data versioning and experiment tracking
Airflow
Alternatives:
- Metaflow
- Prefect