Kafka Connect for development environment in Kubernetes
Configure Kafka Connect and Debezium connectors so database changes are streamed by default in development environment in Kubernetes (ED8K) with the same configuration as it happens in staging and production environments.
Problem Description
As we progress in the adoption of Ebury 2.0 architecture and Kubernetes deployments, developers will need to test the integration of their services with events streamed from database changes, following CDC (Capture database change) pattern. In addition, it would be useful to test new Debezium connectors and configurations without needing to apply it in staging infrastructure.
Background
ED8K
ED8K is a multitenant development environment for Kubernetes where developers can deploy replicas of Ebury platform. It includes some addons for simulating some of the actual production infrastructure in a more cost efficient way, so some of the Amazon services like RDS or MSK are replaced by Kubernetes resources.
Kafka in ED8K
Kafka is provided by default in every development workspace provided in ED8K following the guidelines at Kafka for ED8K blueprint. Topic creation is still pending to be agreed.
Databases in ED8K
There are two ways for having a database in ED8K workspaces:
-
Services deployment specification through Helm charts are free to include the creation of ad-hoc PostgreSQL services as part of the deployment if configured in that way through Values in Helm chart. It is recommended to do so by including an
ed8kModevalue in the Helm chart, and then implement some logic in the Helm template for creating tests resources when working in that mode. -
By default, a PostgreSQL service is provided as well in each workspace, so any service is able to create databases there as part of its deployment, governed also by whatever is defined in the values for working in
ed8kMode
Connectors configuration
Debezium currently runs in staging and production as ECS tasks, running a modified version of Kafka Connect container images with Debezium plugin installed. The configuration for the different connectors is done with Terraform provider for Kafka connect, with resources being configured directly in terraform-backoffice together with the infrastructure.
This approach has several caveats:
- Lifecycle for connectors configuration is coupled with the lifecycle of underlying infrastructure.
- It is not possible to use or test the configurations in other environments (like EBOX or ED8K workspaces) using the same code as in staging and production.
- The Terraform provider for Kafka is not an official provider.
Current approach
Current approach for testing integration with Kafka connect is to write synthetic events and stream
them from specific local scripts with kafkacat.
Solution
Extract the configurations in terraform-backoffice repository to a terraform module and use this module in both staging/production and in ED8K workspaces.
Terraform modules, in general, can be used with two purposes:
- Wrap resources defining an opinionated way of creating them in terms of naming, tags, etc. In this sense, modules acts like libraries, providing building blocks for environments.
- Define whole submodules of an infrastructure. For instance, a module for networking, a different one for clusters, etc.
We will create two modules, one for generic connectors, and a different one actually defining the specific connectors. Both will be versioned, so it is always possible to apply the latest version of connectors in Ebury platform to a given environment. The parameters for this second module will be just the provider configuration. In this way, the same connectors configuration could be deployed in any environment.
Code in terraform-backoffice related with
Kafka connectors will be replaced with a reference to the connectors module, and the state file will
be updated accordingly using Terraform state management tools (i.e terraform import,
terraform state rm, etc.)
A post install hook will be included in ED8K helm chart, with a single shot task running
terraform apply for the connectors module. Being ephemeral environments, state management is not
needed. Examples for this technique can be found at
https://github.com/Ebury/ed8k-example-service-v2/blob/master/docs/03_vault.md
Alternatives
-
Define connectors as Kubernetes CRDs provided by Strimzi operator.
Although Strimzi is intended for managing a complete Kafka deployment in Kubernetes, maybe the operator in charge of managing Debezium connectors could be used in a standalone way.
A good advantage is that connectors could be defined by the services that provide the database schema as well, although most of those services are outside Kubernetes at the moment.
This alternative would be an ideal solution in the long term, but it would mean a significative change in how connectors are being defined at the moment in staging/production, delaying the rollout of support for Kafka connect in development environment as well as impacting current operations in production. -
Another alternative is to use CRDs only for development environment, but that would kill the whole purpose of keeping development environment up to date with production changes.
-
Duplicate terraform code currently in terraform-backoffice, and include it in ED8K Helm chart or as par of services Helm charts when working in
ed8kMode. Again, it is quite likely that the configuration will drift soon.
Caveats
-
Connectors deployment and definition would be still a responsibility for Platform teams. Although development teams could still add their own connectors with pull request to terraform repositories, this has been seen as an impediment or blocker in the past for other resources.
-
A significant refactor of the current Kafka connect definition would be needed, including changes in terraform state file that are always troublesome.
Operation
The new terraform modules will be applied as part of terraform-backoffice infrastructure, but also as part of Helm charts deployed by default in each ED8K workspace. In both places we will use floating tags, so changes in the connectors module will be propagated automatically unless there are backwards incompatible changes, which should be avoided.
Security Impact
As the changes to staging/production will be deployed with the same infrastructure pipelines, none of the changes have any security impact.
Performance Impact
N/A
Developer Impact
Developers will get the ability for testing CDC pattern in a way closer to production.
Development teams will need to be involved in connectors definitions.
Data Consumer Impact
The changes must not change the actual configuration running in production. It is just a change in how we define that configuration, but the resources existing in AWS and in Kafka Connect must be the same as now.
Deployment
Changes shall be applied first in the environment running in development account, and then in production. No other changes should be ongoing while the terraform state file is being modified in order to accommodate it to the new modules structure.
Dependencies
Changes to production/staging infrastructure are needed in order to provide Kafka connect configuration modules that can be deployed anywhere.
References
N/A