ICE Dynamic Forward Data Extraction (Interin Solution)

The data managed by Ice applications need to be consumed by key processes at Ebury. This document covers the mechanisms we'll use to extract Dynamic Forwards information as a short term solution.

This implementation will be replaced in the future for a definitive and long term solution.

Reference Documents

Reference Document Location
DFXE0001 Dynamic Forward Extraction

Problem Description

There are some proccesses at Ebury that requires Dynamic forward data which is managed by a third party application ICE. We need to automate data extraction from ICE application in order to enable Ebury's teams to have such data available for business workloads. The third party application has an API which will be used in a future project that will implement an integration using the provided endpoints. However, we need to provide Dynamic Forward data earlier in order to create reports on Ebury side. Ebury will provide a list of data requirements to ICE team and they will generate a custom reports.

Background

Dynamic Forward is an Ebury's product which uses ICE as the tool to manage it. For reporting Dynamic Forwards metrics we need to fill our data lake/warehouse with data coming from ICE.

ICE platform provides an API which we could use to extract required data. This will be implemented in a future integration project but due to the urgency on providing critical reports, we are setting up this short term solution. Ebury's analytics team is going to provide a list of required data to ICE team and they will create a custom report on their application. This report will be executed automatically on a daily basis and provided to Ebury through a SFTP directory. Then Ebury will be able to download, transform and use that data in internal key processes.

See sample of a report here.

Solution

The short term solution presented in this document relies on ICE team developing a custom report with data required by Ebury Analytics Team. Such report will be delivered in a csv file in a SFTP server. Such report will be generated following an agreed data contract (column names, column order, data types). ICE team must provide the requested report on daily basis. Each report will contain data for X date window (TBD). Ebury data engineering team will create a DCS (data cron service) that runs on a daily basis and will download the provided report file using SFTP protocol with credentials provided by ICE team.

Once downloaded, the raw file will be stored on a Google Cloud Storage path without any transformation for auditability of the input. Ebury data engineering team will create a DCS that runs on daily basis and will get the raw data, run some tests for data quality and compleatness based on the requirements raised by analytics, run small transformations (datetime formatting, data modeling for bigquery), save the transformed file to Google Cloud Storage and load it to BigQuery.

Then Analytics team will have the dataset automatically updated periodically and will be able to create the required anlytics reports.

Overview of requirements

  • Ice application must generate the report with Ebury required data
  • Report Data contract must be agreed between Ebury and ICE (column names, column order, file format)
  • Report execution on ICE will run daily
  • Report file must be placed on SFTP server on a agreed path
  • ICE needs to provide SFTP credential to be used by DCS application
  • DCS Extration job will run daily
  • DCS Extraction job will save the raw report file on an Google Cloud Storage path
  • DCS Extraction job must not change anything on raw report file to enable future auditing
  • DCS Load job will test the file for data quality and compleatness based on the requirements raised by analytics
  • DCS Load job will run small transformations to adequate the file to be loaded to Google BigQuery
  • Data loaded on BigQuery must follow SCD4 data modeling Dynamic Forward data extraction diagram

Implementation steps

  1. ICE to create custom report following an agreed data contract
  2. Define the date range each report execution will handle
  3. ICE to run the custom report on a agreed schedule
  4. ICE to provide the report output in a SFTP server on a agreed path and defined file format
  5. ICE to provide the SFTP credentials (SSH or user/password)
  6. Ebury DCS Extraction job to pull the raw report file using SFTP protocol
  7. Ebury DCS Loader job to run data validations, small transformations, save on Google Cloud Storage
  8. Ebury DCS Loader job to load the transformed report file to BigQuery

Error management

  • If ICE report doesn't execute on time, we will retry for one hour and if it's not there we will send an email to Data Team and DIO Team;

  • If Aristoteles Job fails, Google Cloud Composer (Airflow) will retry for one hour. In case the error persists, it will raise an notification to Data Team and DIO Team;

  • If DCS Loader job fails, Google Cloud Composer (Airflow) will retry for one hour. In case the error persists, it will raise an notification to DIO Team and Data Team.

Data Contract

Columns Definition

Column Name Data Type Column Order Nullable?
id int8 1 []
name varchar(100) 2 [x]

File Format

ICE will provide the report in a csv file

Data Quality Checks

The data quality checks will be providade by Data Analytics team once they have reviewed the data.

Service Ownership

New Service Service Name Service Owner
Yes Aristoteles Job DIO Team
Yes DCS Dynamic Forward Ice Extractor DIO Team

Alternatives

There is no alternatives becuase this is an interin solution to be deprecated.

Operation

New services will not have impact on existing operations.

Security Impact

We are going to use SFTP protocol for downloading the data from Ice's SFTP server.

The authentication will be by user/password provided by Ice Team.

Data Sources

  • Input data sources: SFTP csv file

  • Transformation requirements: data quality and compleatness based on the requirements raised by analytics team

  • Output data: Google Cloud Storage for Data Auditability and BigQuery for access by Data Analytics Team.

Deployment

Google Cloud Composer (Airflow)

  • Aristoteles Job
  • dcs-dynamic-forwards-ice-extractor