EMP Infrastructure Parity Plan
Problem Description
When the Ebury EMP environment was created it was built manually in a manner significantly different from how Core builds and manages their environments. This has resulted in - Significant difference between Core and EMP infrastructure - Different deployment mechanisms for applications between Core and EMP - There are some components running within Core that are EMP specific - A number of EMP components that were manually built and not part of IaC. - A DR scenario would involved significant time to rebuild the EMP environments manually.
Abbreviations, acronyms, and definitions
- Ebury Core or Core: Current Ebury production data, infrastructure and services.
- EMP, Ebury Mass Payments or Mass payments: Current Ebury mass payments production data and services (originally FrontierPay and/or TheFXFirm)
- ECR: Elastic Container Registry
- ECS: Elastic Container Service
- EKS: Elastic Kubernetes Service.
- IaC: Infrastructure as Code.
- PQS: Payment Query Service.
- EBO: Ebury Online
- BOS: Back office Systems
- RDS: AWS Relational Database System
Background
The EMP environment was originaly built at the start of the relationship between Ebury and FrontierPay. It was built to allow Frontierpay to utilise Eburys application stack. The build process was entirely manual and not done using IaC processes. It has never recieved any of the updates that were performed in Core, including containeristion and Kubernetes.
What is meant by parity?
EMP will be re-built on separate infrastructure in its own AWS project(s). EMP Infrastucture is to be built using IaC processes and will re-use as many Core components as possible (TF Modules, Ansible Playbooks, Jenkins Pipelines). Where it is not possible to use existing IaC components due to localisation or different business requirements, any custom EMP components are to be created as similar as possible to Core components and processes as possible. Application deployment is included in the parity plan and will follow the above principles
Solution
The overall plan is to build out new and dedicated EMP environments in EMP specific AWS accounts with the goal of having EMP segragated from Core infrastructure.
These environments will be setup to match their Core equivalents as much as practical and will utilise IaC and CI/CD processes.
EMP have request the creation of new dedicated environments. - Staging/pre-prod - Production
These will be split across the new AWS accounts with Staging/pre-prod being hosted in one and production hosted in the other.
Once the migration is finished there may be the possibility of removing (or running in an idle state) the staging/pre-prod environment depending on business needs.
Business Applications
The following application stacks will be hosted as part of the EMP platform. All applications instances will be dedicated to handing EMP related traffic only.
It is recommended that the applications should be built using existing Core CI infrastructure and then deployed into EMP environments. This will require some effort to either enable replication of artifacts into EMP or to set permissions for EMP to access.
There are a number of applications such as BOS, EBO and FXSuite that are used in both Core and EMP but have different release mechanisms. The recommended approach is to bring these processes as close to parity as practical. This will for the most part mean converting the EMP process over to use the Core build pipelines.
There are a number of applications that are in the process of being migrated from ECS to EKS. Recommendation is to use EKS if it taking any traffic in production within Ebury Core currently.
Ebury Online
Ebury online is fronted by WAF and Cloudfront CDN
EMP EBO architecture is different as it currently deploys to EC2 via Ebury/eburyonline-ansible playbooks. Core uses container driven ECS. Recommendation is to change EMP to use ECS
Deployment type: ECS NOTE: EBO does not use the standard ECS release pipeline
Dependencies: - Foundations - ECS - ALB - S3 - Route53 - CloudFront - Vault - WAF
Core TF source
terraform-publicfrontal/src/app_ebo.tf

Core Release
Jenkins jobs development/online/ebo.deploy.prod
FXSuite
The current setup of FXS is pretty simple, only for quotes, they don't use any of the bridges to send payments to Swift, Fasterpayment or Sepa. Also they're not uploading QTM files
EMP currently deploys to EC2 via Ebury/fxsuite-ansible
Deployment type: ECS NOTE: FXS does not use the standard ECS release pipeline
Dependencies: - Foundations - ECS - SNS - SQS - ALB - Route53 - Elasticache - NLB - S3 - RDS Postgres - Audit / DocumentDB
Core TF source
terraform-backoffice/src/app_fxs.tf

Core Release
Jenkins jobs development/online/fxs.deploy.prod
BOS
EMP BOS architecture is different as it currently deploys to EC2 via Ebury/bos-ansible-git playbooks to a number of ec2 instances that were created previously manually.
Core uses a process that creates AMI images and then deploys them to AWS via a blue/green switch over model.
Core is in the process of building out a container driven ECS model however it is not yet ready for production deployment.
BOS code also includes a custom middleware component (ebury_audit.middleware.AuditMiddleware which links to the Ebury/ebury-audit repo) released and installed to store the "audited" events as celery tasks in to a specific redis database. Exist a specific celery worker get that task and store in a aws doc db
The recommendation for the initial parity project is to replicate the current Core AMI process. Once the Core container driven ECS project it should be relatively straight forward to adapt it for EMP. This may change if the containerisation process progresses significantly during the project
Dependencies: - Foundations - EC2 - Elasticache - RDS Postgres - Vault - ALB - Route53 - SQS - SNS - S3 - Elasticsearch - DocumentDB for audit
Core Ansible
In Core BOS is deployed via Ansible repos which deploys a prebuilt AMI image.
- iac-ami-creation
- iac-bos
- iac-ec2-creation
- iac-launcher
There are a number of BOS services running in Ebury Core that do not run in EMP. Effort will need to be spent comparing build processes between Core and EMP. Given this it is unlikely that EMP will be able to re-use these repos and will need EMP specific ones.
Ansible Repos
There are a number of common Ansible Repos used by Core that handle the build and deployemnt of several apps (Primarily BOS) iac-ami-creation iac-bos iac-ec2-creation iac-launcher
Sherlock ###
Anti Money laundering lookup service
Deployment type: ECS Application has been targeted for EKS migration but is not yet at a stage where it can be considered ready for deployment.
Dependencies: - Foundations - ECS - SQS - S3 - ALB - MSK - Kafka Connect - RDS Postgres
Core TF source
terraform-backoffice/src/app_sherlock.tf

Core Release
Jenkins jobs development/etc/sherlock
Verify
EMP infrastructure uses Ebury Verify service for 2FA and is required for BOS for sending emails verify - backend service 2fa tokens for EBO + authenticate payments. internal service. Used by BOS as well.
Deployment type: ECS Application has been targetted for EKS migration but is not yet at a stage where it can be considered ready for deployment.
Dependencies:
- Foundations
- ECS
- S3
- secretmanager
- ALB
- Route53
- Kafka Connect
Core TF source
terraform-backoffice/src/app_verify.tf

Core Release
Jenkins jobs development/ata/verify
Ebury API
Note: There is a desire from Steve McHugh to include API/API Gateway in the scope of the parity project however it will not be done as part of the initial migration and will be treated as a separate task after. To initially use API as a shared service, peering will need to be setup.
Ebury API has an endpoint that reaches directly to the BOS EMP frontapp instance, and to FXS.
Client traffic flow is: client-> Kong api gateway -> API
API deployment controlled through Ebury/ebury-manifests.
Is this going to work for EMP as it has references to specific environments in environments folder
Application composed of two repos; ebury-api-webapp and ebury-api-auth
API services managed by DXP or API teams
Deployment type: ECS/EKS
Dependencies: - Foundations - EKS
IAM
The initial integration of the IAM tool to our systems is the Ebury API and as such integrations will exist with BOS, BOS EMP and to Verify in a similar way the Ebury API does today. This service will for now be an exception for parity and will work exactly as the Ebury API which means peering will need to be setup. All EMP related functionality will live in separate realms completely isolated from core integrations, such that moving EMP integrations to a new separate deployment later should not be difficult.
IAM deployment controlled through Ebury/ebury-manifests.
Application composed of one repo; ebury-keycloak
Service managed by JAM.
Deployment type: EKS
Dependencies: - Foundations - EKS - RDS Postgres - Vault - Kong (only used to expose the service to the internet, could be replaced with anything sensible)
Core TF source
API services are deployed differently to other infrastructure components. They don't follow the structure as other terraform infrastructure modules. ebury-api-iac/terraform
API Gateway (Kong)
Note: There is a desire from Steve McHugh to include API/API Gateway in the scope of the parity project however it will not be done as part of the initial migration and will be treated as a separate task after. To initially use API as a shared service, peering will need to be setup.
Kong based API Gateway platform that feeds traffic into the Ebury API Service
Deployment type: ECS
Dependencies: - Foundations - ECS - Elasticache - RDS Postgres - WAF - ALB - Vault - Route53
Core TF source
terraform-publicfrontal/src/app_api_gateway.tf
Safeguarding
Core can generate this report through Kafka connect/debezium into s3 bucket Debezium reading from BOS Database
Deployment type: ???
Dependencies: - Foundations - MSK - Kafka Connect - S3 - RDS Postgres - ?
Core TF source
??
Smart Date Service
Deployed in ECS Core but not used by ECS Core
Deployment type: ECS
Dependencies: - Foundations - ECS - ALB - Route53 - Vault - Elasticache
Core TF source
terraform-natonly/src/app_smart_date_service.tf

Payment Query Service
Deployed currently in EKS Core. Not used by Core
Repo: https://github.com/Ebury/payment-query-service
Deployment type: EKS
Dependencies: - Foundations - MSK - Kafka Connect - RDS Postgres
Core TF source
terraform-backoffice/src/app_pqs.tf - db build but not deployment
QuickFix Service (QFS)
Talks out to BarX (Barclays) over fixsession protocol
Deployment type: ECS
Dependencies: - Foundations - ECS - Secretmanager - Vault - Stunnel
Core TF source
terraform-natonly/src/app_qfs.tf

QuickFix Connect (QFC)
Service to handle FIX connections.
Deployment type: ECS
Dependencies: - Foundations - ECS - Secretmanager - Vault
Core TF source
terraform-natonly/src/app_qfc.tf

Infrastructure components
The following infrastucture components are used by applications
| Component | Used by | Comment | Core Implementation |
|---|---|---|---|
| Vault | Foundation | Currently defined terraform-publicfrontal | |
| Prometheus | Foundation | Central prometheius built out as part of terraform-global, federated nodes built out in terraform-publicfrontal | |
| Grafana | Foundation | TBC | Foundation, Built out as part of terraform-global. Then Ebury/ansible-role-grafana-dashboard-deploy |
| AlertManager | Foundation | TBC | Foundation, Built out as part of terraform-global |
| Kibana | Foundation | TBC | Foundation, Built out as part of terraform-global |
| Jenkins | All | Full Jenkins master in emp staging with agents in staging and production | TBC |
| MSK | Sherlock, Safeguard, PQS | terraform-kafka but clone an EMP specific one | Ebury/terraform-kafka and Ebury/ansible-playbook-kafka-topics |
| Kafka Connect | Sherlock, Safeguard, PQS | Runs within ECS, Utilises IAM roles. There are some connectors in Core for EMP that will need to be ported over | |
| RDS Postgres | FXS, BOS, Sherlock, Safeguard, PQS, API Gateway | Seperate DB for each app | |
| ECS | EBO, FXS, Sherlock, Verify, SDS, QFS, API Gateway | TBC | Built out as part of terraform-global. Might Also be a dependency on ebury-manifest repo to setup ECS cluster |
| EC2 | BOS | Through Ansible Repos via prebuild AMI deployment | |
| EKS | PQS | EKS Under active development in terraform-internal repo terraform-kubernetes-clusters (?). Possibly need to create a dedicated emp one as heavy reference to global_vars | terraform-kubernetes-clusters |
| WAF | EBO, BOS, API Gateway | Managed by Security | example implementation in terraform-publicfrontal/src/app_ebo.tf (module "ebo_wafv2) |
| ElasticSearch | BOS | TBC | |
| EC2 | BOS | TBC | TBC |
| Cloudfront | EBO | TBC | terraform-publicfrontal/src/app_ebo.tf (module "ebo_cloudfront") |
| DocumentDB | BOS | Think MongoDB is used in core? Used for Audit? | TBC |
| AWX | SRE managed | terraform-global/accounts/prod/awx/terragrunt.hcl. Then also Ebury/ansible-role-awx | |
| Aquasec | ?? | Is there a licence requirement for this? Is it needed | terraform-publicfrontal/src/aquasec_enforcer.tf |
| SQS/SNS | Sherlock, FXS, BOS | ?? | |
| Elasticache | FXS, BOS, SDS, API Gateway |
Monitoring
Monitoring is done through a combination of Nagios and Prometheus/Alert Manager
Once the foundation infrastructure is setup the following Ansible repos can be used to setup monitoring for the environment. - Ebury/ansible-playbook-monitoring
Nagios is currently running in manually created EC2 instances (called management) in all the environments, Core or EMP. The code for the Nagios configuration is in https://github.com/Ebury/ebury-infrastructure-scripts In some cases it has been manually managed into the EC2 instances and never pushed to the git repository.
This process will need to be fixed to be fully pipline/automation controlled
Logging ##
Logging is shipped to a single central ELK stack. This was originally created manually and will need to be implemented through IaC
AWS Environments
EMP is currently hosted within its own AWS account. This account only contains the EMP production resources. For EMP the EMP infrastructure parity project it is anticipated that 2 new AWS accounts will be setup; one for development/staging and one for production.
Note: Need to discuss with DXP team how this could work in regards to credentials etc.
There are a number of VPCs create to contain various services. There is a VPC for: - publicfrontal - backoffice - natonly - legacy BOS - MSK - Global (contains support tools such as Jenkins/AWS/Prom)
If we maintain the VPC segradation of the environments it will look like this:

Pipeline Automation
Where possible the plan is reuse as many Ebury Core TF modules as possible. This will assist with providing environment parity. There will need to be EMP specific TF modules created to retain environment independence
Required new EMP Modules/Configurations
There were two options looked at when it came to creating the Terraform modules and infrastructure configurations required for emp. The first option was to create emp specific modules based on fork/clones of the existing core ones. The second option was to reuse as much of the core modules as possible and insert emp related environment information into them.
It was decided to use the first option of creating emp specific modules mainly because it would allow for independence of the emp environment and also to avoid scenarios where an update to a shared module would trigger pipeline execution to both emp as well as core environments.
There will still be shared terraform modules that are dependencies of the infrastructure modules but these perform generic functions and are not environment specific.
The notable exception to this is terrafrom-module-emp-globalvars. This module is used heavily by a number of other lower level Terrafom modules and would required cloning almost the entire Terrafrom module estate.
EMP specific variables will be inserted into terrafrom-module-emp-globalvars using an emp_ prefix eg emp_devel, emp_prod etc.
terraform-module-globalvars
Not a new module but significant modifications to the existing one containing specific variables for EMP environments.
terraform-emp-global
Clone of terraform-global to setup the core environment components. This module is used to setup generic AWS account components such as aquasec/userpolicies/iamusers/iam roles)/avoka/ecr repos/.. that are global related to the aws account
ebury-manifest / platform-manifest ####
Mechanism to trigger API version deployments.
Investigation will need to be performed to see if this can be reused or needs to be duplicated.
terraform-emp-backoffice|natonly|publicfrontal
Used to build out the main environment configuration. Currently this is in three separate repos terraform-backoffice/natonly/publicfrontal Adhearing to the definition of parity above it is recommended to maintain the three repos.
terraform-emp-kafka
Used to build out an MSK instance
terraform-emp-kubernetes-clusters
Used to build out a Kubernetes cluster
Jenkins
To maintain independence from Ebury Core it is recommended to have a separate Jenkins instance. The Jenkins configuration will follow the core layout of the master residing in the AWS staging account with agents deployed in the staging and production zones.
Migration Process
The migration process will be broken up into multiple phases.
The first phase will deliver an initial EMP environment that has equivalent function to the current and has parity with the core environment in terms of operations
Phase 1 - Parity via IaaC
The first phase will deliver the core components of the environment including networking, operations and observability as well as the application components required to provide the equivalent environment.
Basic networking and Foundations
This is the core AWS networking setup, creating VPCs, VPC peerings and subnets. It installs a number of components required for the management of the environments: Vault, Prom+Grafana+AlertManager, Kibana resizing, Jenkins Master and agents, Database obfuscation process, Prometheus federated nodes in place, alert runbooks and BCP review
Cluster / Shared dependent components
Creation of shared infrastucture components required to support applications. - ECS cluster creation - Redis/Elasticache - RDS Postgres
Equivalence Applications
Creation and setup of applications for the initial equivalence - BOS - FXS - EBO
Phase 2 - Additional Infrastructure Work
The second phase will start to increase the capability of the EMP platform but adding new components.
Cluster / Shared dependent components
Creation of shared infrastucture components required to support phase 2 applications. - MSK - EKS - Kafka Connect
Phase 2 applications
The following applications are required based on other in progress EMP RFCs: - PQS + SDS - Sherlock - QFS/QFC
Clean up of Core
Any existing EMP components that are currenlty hosted in Core such as PQS, SDS will need to be decomissioned.
Alternatives
An initial attempt to merge the two environments withing Ebury was abandoned. Although there are several alternatives for how we can create and host this new environment, they can be considered implementation details compared to the overall requirement to have a managed, scalable infrastructure.