WIP: Testing environment for BOS Data Migration EMP <-> Core
A testing environment will be needed during the development phase of the EMP<->Core migration.
This document is a work in progress. Feel free to comment and add new information as needed.
Problem Description
Currently, there is no environment where the development and testing of the solutions proposed by this RFC can be ran.
Background
The testing environment needs to have access to both EMP and Core databases, along with a proper connection with Kafka and its components in order to generate events and serve as a "playground" for the developers.
The current testing environment for BOS do not attend the requirements, as there's only Core database.
It's a must to not touch the current staging environments, in order to not disrupt or impact any project that's already in progress since this new solution and environment is experimental.
It's also a must to give the developers control over the environment, so it can be wiped out clean and rebuild as needed without any dependencies.
This testing environment do not need to be up and running on weekends and holidays.
Solution
The proposed environment follows:
Diagram.
Notes:
- Both databases (EMP RDS and Migration RDS) need to be an obfuscated copy of the production databases, so the developers can have an idea of size and how long operations would take on real-life scenarios;
- The developers must have credentials do access both databases;
- The developers must have credentials to access kafka topics (list topics and read them) through the web-ui;
- The testing by the developers will use both fully reconciled and partially reconciled data, so different scenarios can be covered by the tests;
More details on the solution can be found on this document provided by the Devops team, managed through SRE-1287 epic.
For cost-saving measures, we're going to use the same Kafka cluster (MSK) we already have in place today, but prefixing the topics
related to the EMP RDS database with emp-devel-events-bos, described in the document above.
Delivery
The delivered can be done in phases.
- Phase 1
- Environment as described on the diagram above;
- CORE DB and EMP DB obfuscated, considering the fact that the EMP obfuscation has to be done manually;
- Kafka and Debezium connector reading events from all tables in the EMP database;
- Phase 2
- Copy of the EMP obfuscated DB either by volume or another way to the developers, so they can run the EMP database locally as well;
- Access to the Terraform documents for the developers in order to include/exclude any table from the Debezium connector doing a pull request;
- In this phase, any data migration on the EMP database will be done manually by the developers as needed;
- Phase 3
- Automatic deployment of new versions of BOS code/migrations also in the EMP RDS, the same way it happens today with the "DEMO" environments, so there's no need for the developers to apply migrations manually;
- Automation tools to restore both CORE and EMP DBs, applying obfuscation;
- Automation tool to wipe out all Kafka topics from the EMP connector and start from scratch;
- Phase 4
- Automation tools to take down the environment on Friday nights and take it up again Monday mornings.
Alternatives
- Replicate this environment on developers machines.
- PRO: Easy to develop.
- CON: Won't replicate the actual environment.
- CON: Size might be a problem.
- CON: Conflicts with other projects or tickets for the developer.
Caveats
IMPORTANT NOTE: The solution tested on this environment will be the same solution applied on the production environment. The infrastructure team has to take this into consideration when designing the testing environment.
The current production environment is using different AWS accounts for EMP and Core. It might be the case that in production we may face different problems if this testing environment is not replicating the same structure.
Once the migration is done, the whole testing environment will be decommissioned. It doesn't need to be perfect.
Restoring the Obfuscated DB on either EMP or Core instances may put the Kafka topics out of sync, which is important to wipe them out and take Kafka down before the restore process. From a development point of view, wiping out the topics in this environment is not a problem.
Operation
At first the Devops team needs to be allocated to create the first version of the environment and deliver all the necessary info for the developers. During Phase 1 developers won't be able to recreate the environment, and that's expected.
Security Impact
All the security concerns we have today with BOS and EMP staging environments must be taken into consideration when creating this new environment, as it'll contain databases from both entities.
Yes, the databases are obfuscated, but access should be restricted only to the necessary people inside Ebury tech team and closed to the outer world.
Performance Impact
None expected. This is a new environment, created solely for the purposes of testing the EMP migration. It should not impact current environments.
Developer Impact
Positive in this case. Having a dedicated environment to test the project allows for freedom of movement for the developers, allowing more testing, prototyping and recovery from mistakes easier.
Data Consumer Impact
Not applicable.
Deployment
A new pipeline needs to be created to build this environment, which will impact the devops team.
This new environment can be isolated from other Ebury instances (staging and prod, for example), and must not affect any other deployments.
Dependencies
Devops team needs to be allocated to work on all aspects of this environment, from the first delivery to the automation tools.