- NOAA dataset
Parquet format
- experiment with pyarrow parquet
- https://blog.datasyndrome.com/python-and-parquet-performance-e71da65269ce
Data I/O
- experiment with awswrangler
- https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/019%20-%20Athena%20Cache.ipynb
Local pipeline with Metaflow
- metaflow
- get from API
- save as Parquet files
Pipeline
Local
- practical example for investment assistant
Scheduling
https://docs.prefect.io/core/concepts/schedules.html#overview
Running on Fargate
https://docs.prefect.io/orchestration/agents/ecs.html#flow-configuration
Running on EC2
Scheduled tasks on AWS Fargate with scheduled tasks
- time-based, cron-loke and event-based scheduling
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduled_tasks.htm