A Do-It-Yourself ETL Framework in Python

The diagram below outlines the process:Pipeline routing based on data source.The same overall pipeline object is responsible for all data sources, but pulls from a bank of different functions depending on the data source in order to get the right read source..The load and stage processes are identical across all pipeline sources.The Monitoring Front EndOriginally, I added a few Flask restful endpoints just to make testing the application easier..Over time, I ended up adding in a UI to start/stop pipeline runs, which became a real-ish monitoring..Of all the skills that I dabble in, but have no real expertise, front end applications are the weakest..This is my proudest UI accomplishment:The front end is built in Flask, which I, as a total front end scrub, love working in..The endpoints are also managed through the Flask application as well.The Deployment FrameworkWith Python applications, I’ve always had a pretty significant gap between local and production.Enter containers.This was my first attempt at using containers as a deployment mechanism instead of just as a local convenience for a database or whatever else..There’s a whole post on containerized Python upcoming, but the basic outline is:Dockerfiles for main python web app, rabbitmq, and nginxDocker compose to organize those 3 containersAWS ECS/ECR to manage deploymentIt worked like a dream, and I’m never going back to any other method for deploying Python code unless I absolutely have to.The Invocation FrameworkWith everything deployed in ECS, the only thing left to do was invoke pipelines on a regular basis..This ended up being a little more complex than expected..We ended up using a combination of Cloudwatch, Flask, and SQS..The details aren’t particularly interesting..The core concept was to use Cloudwatch to trigger posts to endpoints for pipelines at particular intervals..All in all, it works great.WrapWe ended up diving into a lot more tools and services to get this pipeline running than we expected.. More details

Leave a Reply