BOS Audit
We are looking to resolve an existing issue with the delivery and storage of audit data from BOS. The process and usage of MongoDB is causing multiple issues and therefore we need to remove it from the critical path and put in place an alternative solution.
Problem Description
The main issue is that any disruption to MongoDB prevents the BOS application making checks before a event is sent to the a celery task. The Django ORM is also managing the MongoDB model and therefore there is no separation of the interface between BOS and Mongo even though the events themselves are fired through celery tasks.

This can manifest in a number of ways such as MongoDB running out of disk space or a long running, heavy query on the BOS Audit data impacting the ability of the audit events to written expediantly. The fact that the audit process is tightly coupled with MongoDB and the current usage of MongoDB are the issues that this RFC will resolve. The Celery tasks queues are not fault tolerant and it is possible for audit events to be lost when an error ocurrs. There is no retry mechanism currently enabled. Changes to the audit policy, usage of the audit data and regulatory responsibilities with regard to audit data are out of scope for this RFC. Enhancing the Admin site is also out of scope except where functional changes to connectivity are required.
Background
Ebury-audit is responsible for tracking two things;
- Changes to the data through the django models.
- HTTP requests
The various data changes and requests are pushed in to a defined stucture consisting of Access, ModelAction and Process.
- Access - Information about the HTTP requests and responses.
- ModelAction - Changes to the data through the Django model.
- Process - Information related to the process that changed the data.

Ebury-audit monitors an explict set of models with the Django ORM, processes some data and write to a celery task queue that then writes to MongoDB,

The behaviour of ebury-audit is controlled through a number of configuration parameters;
- EBURY_AUDIT_ACTIVATE - Activate or deactivate audit within the current component (e.g. BOS / FXSuite).
- EBURY_AUDIT_RUN_ASYNC - Use Celery to run in async mode, this should be on.
- EBURY_AUDIT_LOGGED_MODELS - List of the Django models that will be logged by the audit process.
- EBURY_AUDIT_BLACKLIST - List of URL's and fields that will be excluded from auditing.
- EBURY_AUDIT_CUSTOM_PROVIDER - Allows additional fields to be supplied to the access model.
- EBURY_AUDIT_LOGGING - Enables logging for the Ebury-audit process.
- EBURY_AUDIT_TRANSLATE_URLS - Translate ebury-audit URL's.
In addition to the audit capture and delivery process there is an admin site that provides access to the data within MongoDB. This is also based on Django but a separate application using the ORM but all objects in MongoDb are owned and managed by the BOS application.
Solution
The immediate requirement is to decouple the audit capture and write process from MongoDB. This will involve implementing an intermediate, fault tolerant pipeline for the audit events to be written to. Additional processes will receive the messages from the pipeline and write them to the data store. Secondly the tight coupling of MongoDb to BOS for the Process Model check will be removed and replaced with a post stream process or handled gracefully within the DocumentDb write action. This will require the audit write process to be customised with either a new service being resposible for pushing data to DocumentDb or the use of step functions / Lambda to write the data. This will allow the complete removal of any Mongo entities from the BOS ORM and completely isolate the admin application. In time the expecation would be that this data can be passed to the data team and the Audit Admin application can be decomissioned.
For this implementation we are proposing the following;

Ebury-audit will be updated to comprise of two sub-components. The first will manage the capture of events and behave in the same way the current audit process does. It will also seirialise the event in to a JSON to be passed on. The second will manage the delivery of an event to the Kafka topics through the Kafka REST interface. The delivery component has been separated to allow for potential future changes where events are directly published to an immutable ledger or other endpoint without impacting other logic in the capture process and event creation processes. The request will be synchronous and allow for configurable retries. Whilst this transfers the existing reliance on Mongo / Celery to Kaka REST proxy / AWS MSK, the scalability and fault tolerance of the new architecture will mitigate that risk and availability is certain to exceed that of BOS currently. If another format aside from sub-component 1's JSON serialisation is required, it will also be responsible for the translation.
The audit topics will be consumed by either Lambda / Step Functions or a custom audit write component and loaded in to DocumentDB. This will also be responsible for replacing the Process Model check currently inside BOS.
To enable this to be deployed in managable steps we are proposing this rollout;
- Use AWS Data Migration Service to start the migration process from the existing MongoDB implementation to AWS DocumentDB.
- Update the Mongo library in BOS from version 2 to version 3 so that DocumentDb can be supported.
- Migrate, validate and move the admin application on to DocumentDB separating out so it has no reliance on BOS.
- Redirect connection from old Mongo endpoint to new DocumentDB endpoint.
- Establish topic and Kafka Connect data load in to DocumentDB.
- Remove Process Model Check to take out link between BOS and DocumentDb.
- Implement changes to Ebury-audit to use topic based audit load over direct connection.
- Remove MongoDb library from BOS.
Alternatives
The alternative option is to supply data directly to a ledger store and use that as a source to supply reporting. This would provide complete auditability but extend the delivery timeframes and potentially introduce additional costs.

With this option any downstream consumers of audit data would source it from the ledger. Initially this may in to a secondary store such as DocumentDB or supplied to the data team directly who can consume and supply reporting similar to the functionality found in the current admin site.
Caveats
The improvements will come after all steps are completed with the initial goal being to put in place a more robust MongoDB replacement. Further steps are to improve overall stability and prepare for possible additional changes in the future.
Operation
This will run on existing infrastructure except in the case of the DocumentDB. A new instance will need to be created for this but the migration process will from AWS is stated as zero cost for 6 months allowing plenty of time to completely migrate.
Security Impact
We will now have to implement security on the DocumentDB instance that is created.
Performance Impact
We expect performance to improve by migrating to this architecture. All components will reside in the cloud.
Developer Impact
There will be changes to the ebury-audit component but its interaction with day to day development should continue as it does currently. Migrating to a Mongo compatible data store should mean a negligable impact on changes to the admin site.
Data Consumer Impact
Improved access to audit data as this will ultimately be made available through Kafka topics alowing realtime access.
Deployment
As this has significant impact on multiple components a staged rollout as described earlier should be performed.
Dependencies
All dependencies are referenced in this RFC