FenX Data Extraction

The data mastered in FenX needs to be consumed by key processes at Ebury. This document covers the mechanisms we'll use going forward to extract such information.

Reference Documents

Reference	Document Location
FXDE0001	Ebury-FenX MI Options
FXDE0002	Salesforce API integration
FXDE0003	FenX Events Registry
FXDE0004	Data Reg. Reporting requirements

Problem Description

The transition of the onboarding process and client maintenance to FenX makes a big amount of data to be no longer relevant to the Salesforce model. However, there are consumers of such data that need to access it to guarantee their business processes will continue working as of today.

That was taken into account as part of the migration so that it is as transparent as possible. In a nutshell, a small amount of client information is being synced from FenX back to Salesforce. This ensures BOS is not noticing any change. But there is a lot more. The Data Team generates a big amount of reports not only from the client, but from associations and business events in Salesforce. The most important ones where we'll put focus on are the "regulatory reporting" reports (MVP), but we'll still need to support the generation of others like MI's in the longer term. FenX has some limitations in regard to Reporting and MI, and therefore we need to design our own data extraction mechanism.

From the architectural perspective, applying the same strategy and sync associations and events back to Salesforce is pointless, as it would be not only be a waste of resources and costs, but also opposed to the rationale behind the migration to FenX. That has been evaluated and discarded as short term solution for data extraction in the document (FXDE0001) though as you'll see in the alternatives section.

Therefore, the problem aimed to be resolved in this document is a new data extraction mechanism from FenX, focused on the main gap left and most urgent consumer, the Data Team, with the perspective of Ebury 2.0. Only the Regulatory Reporting defined in the "Data Reg. Reporting requirements" document (FXDE0004) is considered MVP and will be taken into account for the solution.

Background

Ebury is replacing our current onboarding and compliance processes from Salesforce into FenX (a complete CLM tool provided by Fenergo) to streamline these and tasks like risk assessment, Anti-Money Laundering or ID&V. The migration also aims speeding up and simplifying the adoption of new regulations which is something that FenX platform excels at.

This project has been handled by different work-streams since it involves changing a several number of processes that are managed by the compliance team, but also have a direct impact to other business processes, not only Salesforce. This has been running for months, with special emphasise on Salesforce as it is which provides the current flows and data and the migration is a big effort. Ingesting the data from other services needs specific attention, that's why this document has been created.

Solution

FenX is able to send Domain Events to up to 10 target webhooks (see FXDE0003 for more information), and although they don't contain fully qualified payloads, these in conjunction with additional API calls would allow us to integrate FenX in our Ebury 2.0 architecture.

However, this would require a substantial amount of time to be developed, and the FenX go live date limits us the number of options we could apply in a timely manner. This option will be our target, but we have to provide a sort-term-ish solution to be compliant with our regulations. We are looking for something that is as close as we can to the above target solution to allow a natural transition in the future. This also has to be provided in a way that the Data Team can adopt it on time.

The selected solution is a daily scheduled lambda function that extracts the information and publishes it into Kafka. Then Kafka Connect writes the information in S3 for Data to consume it.

FenX Entity Polling Diagram

The solution rationale is:

Low resource impact (compared to other options).
Splits out the resource dependencies to make it quicker to implement.
Achievable within the timelines provided by the project (available for March).
An enabler for further data extraction reducing future implementation costs and aligns with Ebury's architecture goals. We can expand on this solution to meet short / medium / long term requirements.
Tech-debt that is easier to move (compared to other options).

The Entity Polling Service has special relevance in this document as it'll be the key development piece. So let's describe it.

Entity Polling Service

As mentioned, FenX doesn't provide an API for the delta data extraction. But, as we'll see next, we can fill that gap by performing a sequence of calls to their available API methods.

FenX API is built following the CQRS pattern, and they consider themselves as API first: "If there is a feature in the UI, then there is also a corresponding API endpoint available for you to perform the same operations". However, the approach and result are wrong as they are resolving UI requirements with the API instead of wider client business needs. In essence, they didn't think on bulkification. This led to a set of GET API endpoints that are able to return a single record by its ID.

Luckily, they introduced a "reporting functionality" that is able to return the IDs of the Entities that have changed within two dates, but only the IDs. So to resolve our problem we'll use both, the reporting API to extract the delta IDs and then, one by one, the single record getters to obtain the fully qualified records.

Last but not least, as highlighted in the diagram, to keep consistency between calls and fetch only what's new, we'll, from call to call, read our own writes from Kafka where the last timestamp is stored. This guarantees that there is no data loss or duplication.

In conclusion, the extraction process will roughly look like follows:

Fetch the last event sent to Kafka to determine the timestamp of the last poll from FenX (API call).
Use the timestamp to ask what are the new or updated entities and relationships since the last data poll. I.e. their reporting API (API call).
Collect the Ids of the records modified in that period (service internal logic).
Extract the list of associations - type + id (API call).
Collect the IDs of the associations (service internal logic).
Extract the internal data of those associations by ID, one by one, to identify who that relation is (API call).

As mentioned this routine will be scheduled to be run daily. The nitty-gritty details of the underlying technology need to be assessed by the team implementing it. At glance this seems to be suitable for a Lambda function, and it will be our preferred choice if there are no big impediments found on its implementation.

Assumptions

Extraction will be Entity and Associations only: For the short term, we only need to extract the entity and relationship's information as it has been identified in the "Data Reg. Reporting requirements" document (FXDE0004). The Legal Entity level information will continue being extracted from Salesforce until the long term solution has been implemented.
Efficiency MI's out of scope: The MI's reporting has been discarded from the MVP. Thus, journey's information won't be extracted either as part of this solution.
Extraction frequency will be daily: The MVP reports need to be generated once a day. We are assuming that a daily snapshot of the data is valid for this purpose. I.e. if there's been multiple state changes within a day, the latest state within the day is enough and meets our requirements.
Error handling: If the service cannot complete the polling<>kafka communication of all records in a single transaction, the timestamp won't be updated. Meaning that the next try will pick up those and new ones. If error scenarios are problematic for the accuracy of the reporting, we can mitigate it by polling it twice a day.

Alternatives

The "Ebury-FenX MI Options" document (FXDE0001) was created promptly with a limited level of understanding of the actual requirements. It was an exercise to try to come with a number of feasible plans while all the information was being collected. Now that we have the "Data Reg. Reporting requirements" (FXDE0004) and we know what's MVP for us, we can come with better defined solutions. Thus, apart from our choice above, here we have a number of other alternatives that are relevant to our current level of understanding of the problem. Other options in the document have been omitted as they no longer make sense.

Target State

FenX integration in Ebury's Event Driven Architecture is our strategic solution. Actually, this alternative is close to the short-term selected one. The FenX information will be published into Kafka topics and then to S3 so that the Data Team can ingest it easily.

The main difference is how the extraction is performed. We would be listening to the events defined in the FenX Events Registry (FXDE0003), then fetching the relevant information to finally publishing a new richer message into Kafka.

However, dealing with events has its own intricacies that also sum up to the FenX specific architecture we would have to deal with. The polling is a sequence of calls to be run once a day in a lambda. So the infrastructure and maintenance is much lower in comparison. That's why it has been discarded for now.

Pros:

Strategic solution with data definitions done upon sourcing.

Cons:

Time to implement will be excessive.

Fully Sync Back To Salesforce

To reduce the impact on the Data Team data ingestion, it was considered to temporarily keep the relationships in sync as it is being done for the legal entities. The flows would continue being in FenX and that data mastered there, but also replicated into Salesforce. That would let them reading from a known source (Salesforce) and ideally with none or small adoption effort on their side.

However, the model in FenX is pretty different, specially when it comes to UBOs. The required work to adapt this sync on the Salesforce side was sized in around 5 sprints, which can't be fit on the current resource availability. It'd also add a high risk on the in progress FenX work jeopardizing the delivery dates.

Pros:

Almost transparent to Data Team. Small adoption effort

Cons:

Adds a lot of effort on the Salesforce side affecting the project delivery.
Salesforce would need to re-engineer part of the application that will be removed afterwards.
The data would be duplicated and consuming unnecessary resources from Salesforce.

Salesforce As Bridge Between FenX and Data

Salesforce is going to store the FenX events it is subscribed to, to then process them asynchronously. These events don't contain the related data that caused them, but it can be extracted in conjunction with some additional calls to the FenX APIs.

On the other hand, the Data Team already connects to the Salesforce APIs to obtain data, so it'd be straightforward for them to consume events from there and perform further processing afterwards. So the proposal made was to make Salesforce to store the whole range of FenX events instead of only the ones that are relevant to them. Then Data reading them (polling) and making further calls to FenX to extract what changed.

Additionally, Salesforce could provide APIs as a layer for the FenX integration, which is something Salesforce has already resolved.

Pros:

Small work for integration (compared with other solutions), as it doesn't require a pub/sub service. Direct polling instead.
They can focus on adapting their reports to the new shape of the data, not becoming a waste of time.
They can opt-in to subscribe to Salesforce Platform events or do polling to the stored Salesforce events instead.
Data doesn't require new technology to be implemented/learnt. Technology is known and heavily used by the Data Team. So no new skills required.

Cons:

Additional work for Data Engineering to integrate with the Salesforce new endpoint.
Not utilising the direct source of the data during this phase. Not real time events.
The Data Team would have to extend their queries to read from a different object (events).
Additional storage used in Salesforce. It'd require a cleansing policy/system to keep the DB healthy.
Small extra work for the Salesforce team to publish their internal FenX API implementation.
Not aligned with our strategic solution. Everything would have to be re-engineered afterwards.
Adds delivery risks for both Salesforce and Data teams.

Caveats

We need to reinforce that the option selected is the tactical one that is closer to Ebury's strategy. The polling mechanism would have to be replaced longer term with the target state. The S3 provided for Data to consume the information would hold a new directory with the new information once that's implemented. Then they'll have to adopt the new "events".

Operation

This is intended for internal access being the Data Team the first stakeholder. In the future, any other service would be able to consume FenX information from Kafka if necessary.

Engineering will be responsible for creating and maintaining the service (lambda) and the cron trigger. The Salesforce team should be consulted as they have a deeper knowledge of the FenX platform and its APIs.

Security Impact

The data that is handled in this service is client data. All will be stored in our AWS infrastructure in a S3 and consumed by the Data Team to produce the relevant reports required by the regulators.

Some lower level details, but yet important to mention in this document are:

The S3 buckets that will be provided for data will have to implement encryption S3 at rest and access filtered by IP.
It is assumed encryption in-transit
The REST API credentials are stored in the usual secret stores.
Encryption at the Data Team infrastructure is already in place by the GCP.

Performance Impact

The new service will be run once a day and will poll FenX data through their APIs. Then that will flow through Kafka. This hasn't been identified as a performance risk or issue. Fenergo confirmed that the system and users won't be affected by requests like this.

That said, we will preferably run the extraction during low user activity time windows.

Developer Impact

It is intended to source the data through Kafka as it is part of our strategy for the Ebury 2.0. This won't conform any change on how services communicate. Will establish the foundations though, but it would adhere our standards.

Data Contracts

The compliance related information is going to be moved to FenX (new data source). There is a number of working groups where all stakeholders and consumers agree on the migration steps and strategy. The Data Team, main consumer discussed here, is also aware, informed and consulted about this change. For the rest of consumers, Salesforce will continue surfacing this information. So no changes for anyone else.

As mentioned, the current solution aims a single consumer (the Data Team) and will eventually be replaced by the strategic solution. Therefore, a new contract will be defined taking into account the compatibility strategy.

It is expected the tactical solution to be forwards compatible, but as a short term solution, there is a dependency with the FenX data model. The schema evolution strategy is pending to be confirmed as part of the whole FenX migration project.

Data Sources

The data is going to be extracted from FenX only. The data will be transformed in payloads for the Kafka messages, but no other changes will be performed. It is expected the data to land in S3.

Deployment

No specific deployment steps identified.

Dependencies

No dependencies. This work can start immediately.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search