Infrastructure for processing tracking files received from Santander

This RFC proposes infrastructure solution for processing tracking files received from Santander. It also describes alternatives and highlights their pros and cons.

Background

Santander will act as a sponsoring bank for Sepa Credit Transfer for Ebury for incoming funds and outgoing payments. We are currently implementing inbound flow. In scope of the project we need to implement procedure for processing tracking files. Tracking files contain all movements on settlement account and will be used by Operations Team for reconciliation.

SEPA Core is a service responsible for processing incoming SEPA payments received from Santander.

Problem Description

Tracking files are received 5 times per day after each EBA cycle. They are sent from Santander's S3 bucket to SEPA Core S3 bucket sepa-core-santander-tracking-files using S3 replication. That means that files are stored in the S3 bucket the same way they are organised on Santander's side.

We need to implement infrastructure and procedure that will:

  1. Take tracking file from SEPA Core S3 bucket sepa-core-santander-tracking-files and store it in another SEPA Core S3 bucket sepa-core-files, organised by file type and date, for easier future reference. S3 bucket sepa-core-files also contain backup of all pacs messages that we exchange with Santander, so it a logical place to store tracking files as well

  2. Convert tracking file from csv format to more readable excel format, to be used by Operations team

  3. Store converted tracking file in Google Drive accessible by Operations Team

The first phase of implementation covers only point 1. which is the scope of this RFC.

Solution

We propose to use following infrastructure entities:

  • S3 notifications that will be triggered when new tracking file is added to bucket

  • SQS queue for receiving notifications

  • Consumer, implemented as ECS task, that will consume messages from SQS and perform required actions, based on the code implemented in SEPA Core service

When new tracking file is added to S3 bucket sepa-core-santander-tracking-files, a message with the information about received file will be published to SQS queue sepa-core-santander-incoming-tracking-file-notification. A consumer will continuously check for new messages in SQS queue. When new message is received, consumer will move tracking file from S3 bucket sepa-core-santander-tracking-files to S3 bucket sepa-core-files following path structure: /tracking-files/yy/mm/dd

Future developments

In next stage, we will extend consumer's functionality by adding coversion from csv format to excel and uploading excel file to Google Drive accessible to Operations team using Documents Service.

Alternatives

Another solutions can be implemented using lambda functions. In this case, S3 notifications will invoke execution of lambda function that will run the same code as the consumer would:

  • move tracking file from one S3 to another, in the first phase

  • convert tracking file and upload it to Google Drive, in the second phase

Pros:

  • We do not need to have consumer running all time, given the fact that we will be receiving files 5 times per business day

  • We do not need additional infrastructure (SQS queue) for publishing the message that will invoke code execution

Cons:

  • In the first phase, we will implement simple code for moving file from one s3 bucket to another. But, in the second phase, we will need to implement more complex code for converting files and uploading them to Google Drive. Therefore, we will need to include external libraries for implementing those functionalities. Lambda functions have different limitations, including code size, memory, as well as execution limitations. Depending on the tracking file size, those limits may be exceeded.

  • Currently, we use lambda functions for infrastructure related tasks, e.g. start/stop resources, backup etc., but not for running project specific tasks, so we would mix infrastructure related code with project specific code.

  • We already have paid ECS resources, so we should use existing resources instead of deploying new ones

  • We need to investigate how to include monitoring of lambda functions in the project scope. Currently, we are using Nagios for monitoring SQS and consumers

Operation

There is no impact on current operations, as SEPA Core service is still not in production.

Security Impact

There is no security impact.

Performance Impact

There is no performance impact on existing services.

Developer Impact

We will need to adjust existing terraform aws s3 module to include option to subscribe SQS queue to S3 notifications. This option will also be available for use in other projects.

Deployment

For deploying infrastructure we will use Terraform to define all new resources that needs to be created. Infrastructure will be deployed in terraform-natonly.

For deploying code we will use existing Jenkins CI.

Dependencies

This RFC has no dependencies.