Airflow 2

As part of the GCP Supportability Programme, we believe that migrating to Airflow 2 with Google Composer is the best way to improve our data orchestration capabilities. This tool will provide us with the features and flexibility we need to manage our data pipelines more effectively.

Problem Description

Currently, we are using Airflow V1 with Google Composer to orchestrate our data pipelines. However, this tool is starting to show its age. It is mainly used as a cron scheduler. We want to upgrade most of our DAGs and want to perform those upgrades on an upgraded tool.

Additionally, the tool lacks proper alerting, which can make it difficult to identify and troubleshoot problems.

Another reason to consider an upgrade of our current Airflow is that both Airflow 1 and Composer 1 are no longer maintained.

Solution

We propose migrating to Airflow V2 with Google Composer. Airflow 2 is a major upgrade to Airflow 1, and it brings a number of new features and improvements, including:

  • Improved monitoring capabilities that make it easier to track the health of our data pipelines.
  • Better airflow infrastructure, such as auto-scaling, which can help us to improve the performance and reliability of our pipelines.

On top of that we want to improve our processes and build better tooling on top of Airflow so that:

  • We have a proper alerting module that can be used to send notifications when DAGs fail or encounter errors.
  • Better methods for error recovery and data backfills. These methods will be customized to our specific needs and requirements.
  • More flexibility to perform data backfills.

Integrations

Secret Manager

A feature that composer has out-of-the-box is the possibility to A feature that composer has out-of-the-box is the possibility to integrate with Secret Manager. Secret Manager its where we are currently hosting all our secrets and so we will be able to keep all our sensitive information in a single place. More information about this integration can be found here.

Slack/Jira (Alerting)

A custom integration that we have created is to connect Airflow 2 with Slack and Jira. This integration was done in the context of alerting so that when a DAG fails we can better notify the relevant people through Slack and also to automatically create a Jira ticket to track that issue/error.

you can find more information about this integration here.

Alternatives

A possible alternative to Airflow 2 is Prefect. Prefect is a newer tool that is similar to Airflow, but it has some different features and capabilities. However, Prefect is still quite new, and it has a smaller community of users than Airflow.

Another alternative would be to Use Airflow without the using Composer, basically hosting and maintaining our own Airflow Instance/Service. This would give a bit more flexibility, but very residual since GCP Composer already give us a lot of flexibility, we would lose all the google integrations that already come with Composer, like connecting to Secret Manager, and most importantly we would have to spend resources in managing Airflow.

Service Ownership

The Service Ownership can be divided into 2 side, Airflow as an infrastructure and the processes running inside Airflow.

The Infrastructure side will be owner and maintained by the current DIO team. Any processes/DAGs running inside Airflow will also be owner and maintained by the DIO team in the present moment. That said, this last part can change and in the future we could open Airflow to other teams enabling them to create their own DAGs and leverage the capabilities of Airflow.

Caveats

N/A

Security Impact

Security Model

Airflow 2 already give us some pre-configured roles that we will leverage: - Viewer - Default role of every user - Can view all DAGs, can not perform any action. - User - Given to people that need to be able to control DAGs. - Main permissions to previous role: ability to perform re-executions, change status of tasks and DAGs. - Op - Given to full owners of Airflow 2, currently the DIO team. - Main permissions to previous role: ability to configure variables, connections and pools. - Admin - Very restricted amount of people. Attributed only to the Airflow Champions (people in charge of Airflow) - Main permission: ability to manage accesses.

For the time being no other roles will be created. This statement might have to change if we want to create custom roles so that, for example, a group of people/team of can only access/manage a specific set of DAGs owned by that team.

Security Access

For security access we will leverage the current procedure we have setup with security for all GCP resources through service desk.

You can find more information about this here. (Keep in mind that, as time of writing, this doc is still a draft.)

Performance Impact

N/A

Data Contracts

N/A

Data Sources

N/A

Deployment

Currently our deployment works as a simple Cloud Build pipeline where when a PR is merged to master all relevant folders are rsynced to the Airflow Storage Bucket.

In the future we want to have a better and CI/CD pipeline, to mainly, to ingrate the staging environment Airflow. This environment is already created and setup but it is not integrated in the CI/CD pipeline.

Dependencies

N/A

Future improvements

As Airflow 2 matures, we will continue to evaluate new features and improvements that can be implemented. Some possible future improvements include:

  • The ability to integrate with other data tools (eg. Monte Carlo)
  • Improve the current deployment method for DAGs - integrate the staging environment.