Automation of BI Contract tests
The Data team needs data from BOS database to perform their analysis. To achieve that, a Kafka based system will pull data from certain tables in BOS and then push the data through Kafka topics into S3 files in AWS(to form a data lake).

This system expects that the related BOS tables have certain structure(e.g. column names, type etc). If any of these tables is changed by BOS devs, the Kafka extractor might be unable to read the data. Therefore, we need an automated way of detecting such changes before they break the data extractor.
The main purpose of this document is to propose a High Level Approach for designing an automated solution that detects structural changes in certain BOS tables during PR Phase.
Background
Over a meeting Steve McHugh has explained how Data team will extract data from BOS into a data lake that consists of files in AWS S3. In the future, other processes may extract data and store it outside BOS for processing. Therefore it is important to establish contracts that specify what is the expected structure of specific BOS tables so that if the contract is broken as part of BOS software development, other interested parties will get early notification. In order to enforce these contracts, an automated tool is required for monitor the tables and send an early notification in case of a breaking change. After the meeting, a detailed document with requirements has been prepared. We followed up with another meeting to discuss the requirements.
Problem Description
In essence, the requirements for the automated solution are as follows:
- Data team will maintain a git repo that contains JSON files specifying expected table structure e.g.
Contract for the PostgreSQL table public.trades_deal
{
"table_name": "trades_deal",
"columns": [
{
"name": "id",
"is_nullable": false,
"data_type": "integer",
"numeric_precision": 32
},
{
"name": "created",
"is_nullable": false,
"data_type": "timestamp with time zone",
"datetime_precision": 6
}
]
}
-
A list of breaking and not breaking rules will be defined by Data team and kept in their repo in JSON format. For example:

-
Automation Squad will implement a simple solution which parses all JSON files and checks if any validation rule has been broken.
-
The checks will be performed as part of the CI Pipeline on each PR. In case of any failures, Slack notification as well as email will be sent.
Solution
Feasibility Study
There are at least two ways to check the Validation rules(a third one is described in alternatives):
-
Check the Validation Rules on the ORM level e.g. against the Model classes without even starting the database
-
Apply all new migrations in the PR and check the Validation Rules on the Database level e.g. against actual db tables
In order to decide which option is better, we can do a simple feasibility study e.g. create a Python function which parses the JSON files with the table definitions as well as the Validation Rules. Then create two more functions: one that checks the rules using the first approach and another one for the second approach.
If at all feasible, the first approach seems quite promising because it will be faster and maybe simpler to implement. Also, we can have a standalone Jenkins Job which can be pointed to any BOS branch(e.g. master, dev, staging etc). On the other hand, the second approach will require a running DB with all migrations so we may need to piggyback on the "Run Migrations" stage which already runs Postgre and applies the migrations.
In any case, the functionality for checking the Validation Rules should be pluggable so that we can plug any implementation based on the feasibility study.
Minimum Viable Product
Next, we will implement an MVP which is a simple java/python program running in Jenkins. It should do the following: * Check if there are any migrations by means of migrations_files list in ci.jenkinsfile. If empty, skip the stage.
-
Pull the docker image for the automation tool from ECS. It can be uploaded there manually because it should not change often.
-
Clone ebury-data-team-extract from git to obtain the Table Files as well as Validation Rules(can be implemented through 'checkout scm' in the Jenkinsfile).
-
Launch the automation system by means of 'make bi-contract-tests'
- If approach 1) is selected during the feasibility study, we need to run Postgre and apply all migrations
- The results are displayed in the log and stored into contract_tests_results.txt
-
If contract_tests_results.txt is not empty
- Attach the file as artifact to the Jenkins Job and send its contents as Slack and Email notifications. Consider enhancements as per [AUT-1647] (https://fxsolutions.atlassian.net/browse/AUT-1647) e.g. do not send same notification more than once, initial notification should be concise and all details should be in thread etc
- Mark the stage as unstable(yellow) leaving the job as stable(green) similar to sql-migrations check
Full Blown Solution
Once the MVP is up and kicking, we should consider building a Full Blown automation solution by adding support for Gherkin and Allure. Gherkin will help us to document what the contract test is doing. We can describe the test in Gherkin syntax like this:
Given I have access to a BOS Database with some migrations applied
And a couple of JSON files describing expected table structures
And a JSON file containing Validation Rules
When I evaluate all rules against each table
Then no Validation Rule will be broken
Allure will help us to present the results of the tests in HTML Report. We can attach the validation rules and table definitions into the report. Adding these to the solution makes sense if we want to extend the scope of the automation solution by adding support for other types of tests e.g. check if data from BOS is correctly transfered into the data lake etc.
Programming Language If we pursue option 1) above we should use Python. However, if we decide to pursue option 2) I think we should consider Java as well.
The Automation Squad is cross functional across Python, Java and JavaScript. We choose the programming language for the automation solution based on the programming language for the System Under Tests(SUT). For example, the Salesforce automation system is written in Java, the Avoka automation system is written in JavaScript and the BOS automation system is written in Python. In this particular case, the SUT system is based on Kafka and does not use any of these languages. Therefore we should be free to pick Java.
Alternatives
In addition to the two implementation approaches mentioned above, yet a third way to check Validation Rules would be to compare the Validation Rules against the SQL queries generated for the migrations. However, it is probably better to use the Model class than parsing SQL to find which table columns have changed.
Caveats
None
Operation
Once the automation solution is implemented, there will be a new stage in ci.jenkinsfile for checking the BI data contracts. So it will operate automatically.
The Jenkins Job will be maintained by Automation Squad. The Table Definitions and Validation Rules will be maintained by the Data team.
Security Impact
None
Performance Impact
If we check Validation Rules on ORM level, performance impact will be smaller. Apart from the git checkout operations, the execution of the tests should take seconds. So we should fit in a Small Jenkins Node and should finish in less than a minute.
If we check Validation Rules on DB level, there will be some impact related to starting Postgre and applying all migrations. Still it should not increase the total time of the CI pipeline because there are slower stages. In any case, we can consider to piggyback on other stages that already run Postgre and apply the migrations. So we will need XLarge node and will probably take a few minutes to do the checks.
Developer Impact & Data Consumer Impact
This change should help Data Consumers(e.g. Data team) but should not impact Developers directly.
The change is aimed at helping Data team discover early changes of the Database Model that could impact the data extraction so, on a notification, BOS Developers can be contacted to discuss Model changes.
Deployment
No special deployment is required, automation will be running in Jenkins.
Dependencies
This will be a standalone automation solution. Only dependency would be on BOS Model Layer if we choose to check Validation Rules on the Model level.
References
Meeting to discuss BI solution and Contract Testing