smartTrade Internal Book

Proposed architectural changes to be implemented during the smartTrade Internal Book project.

Problem Description

Ebury maintains pricing and trading FIX sessions with 3rd-party provider smartTrade. When a quote is booked through the latter session, BOS and FXS create new BrokerDeal and related model instances in their databases. You can check the high-level architecture we currently run in the picture below (boxes in red show that we don't have high availability in those components).

Current Architecture

The smartTrade endpoint for the aforementioned sessions is called LALO. However, as part of the Internal Book project, we need to upgrade to a newer version of smartTrade which deprecates LALO in favor of two different endpoints: LOLD and Post Trade. The ExecutionReports we previously received from LALO to create our deal models, will be received from Post Trade from now on. We can temporarily use LALO and Post Trade simultaneously, but by the end of the project we should be using LOLD and Post Trade. Also, all components issue log statements for obvious reasons (system observability and troubleshooting). The Data team uses some of these logs to run offline market data reports that they must be able to keep running during (and after) the completion of this project.

We could implement the new flows with smartTrade without any change to our existing infrastructure. However, it has some shortcomings that we could start addressing (at least taking the first steps) in this project.

The QuickFix Service (QFS) works perfectly fine so far. However, we cannot have multiple instances running simultaneously because we're only allowed to maintain a FIX session with an endpoint at a time. Among other things, this means that, in order to avoid downtime during a release, we must follow a convoluted process.

Also, QFS is using Redis pub/sub channels to implement the asynchronous FIX protocol with the booking platforms BarX and smartTrade. While there's nothing wrong with this decision, it might be worthwhile to start consolidating our platform around Kafka in our path to Ebury 2.0.

Background

Ebury uses 3rd-party providers to fulfill various core business functions, including getting quotes for FX trades on behalf of our clients, and creating deals with Liquidity Providers (LP) that back those trades. For the latter case, our preferred provider is usually smartTrade for most currency pairs. In addition to providing us with market rates from different LP, it also allows us to book quotes with them from a single FIX interface.

As of this writing, we're running smartTrade v4.02. We will upgrade to v4.23 because it includes some useful functionality that Ebury agreed with them.

Right now, the Execution Reports we receive from LALO include the venue (liquidity provider) in which the deal was executed. However, in smartTrade v4.23, that information doesn't come from LOLD reports but from the Post Trade feed. Therefore, we must accommodate our existing implementation to update our broker deals using this new flow.

Once upgraded, we'll also take advantage of the smartTrade "internal book" feature. Instead of creating a contract with an LP for each deal Ebury needs, they can serve a deal directly, i.e. using Ebury as an LP, according to some pre-configured rules. These internal deals increment our exposure to different currencies, and hence our risk. So such rules also determine under which circumstances smartTrade may automatically create a deal with an LP to "net" our position in any specific currency (auto-hedging).

Again, Ebury will receive the information about these netting deals from the Post Trade FIX session. Therefore, we need to implement a new flow to support these netting deals.

Note that, at this moment, Ebury also implements an internal book on its own. When we get a rate from Xignite (a 3rd-party aggregator) and create a deal, we're also using Ebury as an LP. Then, the Operations team checks our exposure in a periodic basis in BOS (Position Aggregator) so that they can net our position. Also note that the smartTrade internal book doesn't substitute the existing internal book, but it might reduce their use as we keep supporting more and more currencies in smartTrade - therefore reducing the workload for the Operations team.

Finally, by making a significantly higher use of an internal book, we'll greatly reduce the workload from Trade Support and Treasury teams. Each deal with an LP must be verified and the corresponding payments must be confirmed. Some of these processes are involved, require significant human intervention, and -worse yet- differ from LP to LP. Overall, we'll be highly reducing SWIFT MT942 traffic for all the associated debits and credits in the settlement accounts, in addition to Recon processing. The more we exploit the smartTrade internal book, the more we'll avoid these costly processes.

We might implement this project without changing our current architecture. However, Ebury is trying to evolve internally towards a more robust and scalable platform (Ebury 2.0). Due to the huge benefits that this project is expected to bring, in the solution depicted below we've tried to reach a trade-off between architectural changes that might help now and in such a future architecture, and the time required to deliver the smartTrade internal book.

Solution

The smartTrade Internal Book Project has been divided into five phases, each of them bringing a specific business benefit. In the remainder of this section, we summarize these phases and the proposed architectural changes.

Phase 1: Upgrade smartTrade to version 4.23

Business benefit

We'll be able to use new functionalities provided with version 4.23, but not the internal book yet.

High level tasks

In this phase, we're just interested in addressing some of the existing shortcomings and then upgrade smartTrade.

  1. Internal Book RFC
  2. Improve current platform observability (logs and monitoring)
  3. Improve automated testing
  4. Improve QFS deployment pipeline
  5. Upgrade smartTrade 4.23 in production

Architecture

No architectural changes are required for this phase.

Phase 2: Consume the new Post Trade feed

Business benefit

Broker deals are created from the new Post Trade feed while risks are minimized by keeping the LALO endpoint we've been using so far.

High level tasks

  1. Create infrastructure component to connect with smartTrade's Post Trade endpoint via FIX (QFC smartTrade)
  2. Create gateway to provide Ebury services with access to the FIX domain of smartTrade (smartTrade Gateway)
  3. Confirm Post Trade feed conveys the same data as the LALO confirmation messages (via Data report)
  4. Substitute LALO confirmation for Post Trade feed

Architecture

In this phase, we introduce all the architectural changes required to accomplish this project.

Internal Book Phase 2

QFC smartTrade

The QuickFix Connection (QFC) is an infrastructure component that maintains a connection with an external FIX provider (smartTrade in this case) and the corresponding FIX sessions (Post Trade in this case). As part of maintaining the FIX session, it implements the admin interface of the protocol flow, i.e. Logon, Logout, Heartbeat, and TestRequest messages.

Remaining protocol messages (application interface) are forwarded from/to a FIX Gateway (smartTrade Gateway in this case) via Kafka for further processing. In particular, QFC receives ExecutionReport FIX messages from smartTrade. QFC doesn't have to fully understand these messages, just know enough about them to obtain an appropriate record key for Kafka to guarantee ordered delivery in reports related to a same market order.

  • Producer
    • Topic: fix-protocol-v42.smarttrade.posttrade.v1
    • Message: FixMessage
    • Key: ClOrdLinkID

Note that v42 is the version of the FIX protocol, while v1 is the version of the topic itself.

Finally, QFC logs the content of the Post Trade FIX messages, just like QFS does for LALO FIX messages. This allows the Data team to keep running the market reports they currently run. Also, it will help us confirm that the information contained in ExecutionReports from LALO and Post Trade is the same.

smartTrade Gateway

The smartTrade Gateway is able to fully (de)serialize FIX messages and respond appropriately. I.e. in the case of the Post Trade session, it processes ExecutionReport FIX messages, and publishes an integration event to communicate interested systems about it.

  • Consumer:

    • Topic: fix-protocol-v42.smarttrade.posttrade.v1
    • Message: FixMessage
  • Producer:

    • Topic: fix-gateway.smarttrade.execution-reports.v1
    • Event: ExecutionReportReceived (integration event)
    • Key: ClOrdLinkID

Taken together, QFC + smartTrade Gateway are a replacement for QFS. We create two different services so that all protocol logic can be implemented in the gateway - where we can have high availability and zero-downtime deployment processes.

FXS Gateway

The FXS Gateway consumes execution reports and, when appropriate, calls a new FXS REST endpoint to create the corresponding Swap and BrokerDeal models in its database.

  • Consumer:
    • Topic: fix-gateway.smarttrade.execution-reports.v1
    • Event: ExecutionReportReceived (integration event)
BOS Gateway

The BOS Gateway consumes execution reports and, when appropriate, calls a new BOS REST endpoint to create the corresponding Swap and BrokerDeal models in its database.

  • Consumer:
    • Topic: fix-gateway.smarttrade.execution-reports.v1
    • Event: ExecutionReportReceived (integration event)

Phase 3: Identify the booking platform

Business benefit

We'll extend the Position Aggregator in BOS so that it can differentiate between the current internal book and the new one from smartTrade.

High level tasks

  1. Add “booking platform” information to BOS for broker deals created via smartTrade
  2. Extend the Position Aggregator page with a new booking platform filter
  3. Update QXT files with booking platform information (TBC)

Architecture

No architectural changes with respect to Phase 2 are required.

Phase 4: Internal booking with manual hedging

Business benefit

Connect and migrate FIX session with LALO to the new LOLD service. Business will be able to activate internal booking in smartTrade, but someone must be netting the position manually on a regular basis (e.g. daily).

High level tasks

  1. Migrate FIX session from LALO to LOLD
  2. Review Quantum reports
  3. Create General Ledger within Quantum

Architecture

In this phase, we keep the architecture we introduced before but change the implementation of the QFS smartTrade service. Instead of using LALO, QFS will maintain a FIX session with the LOLD internal endpoint of smartTrade.

Internal Book Phase 4

Phase 5: Internal booking with automated hedging

Business benefit

Our position within the internal book will be automatically netted thanks to the auto-hedging mechanism within smartTrade.

High level tasks

  1. Mark auto hedging deals with the label “Treasury FX Deal”
  2. Add “type” as a filter option in BOS page “All Broker Deals”

Architecture

No architectural changes with respect to Phase 4 are required.

Alternatives

Support LOLD in QFC instead of QFS

This option is very attractive because it would allow us to slowly migrate from LALO to LOLD. E.g. we might start by moving just 10% of the quote requests and orders to LOLD, and then augment that percentage in steps until we reach 100%. I.e. QFS would maintain a LALO session with smartTrade, QFC would maintain a LOLD session with them - simultaneously.

Another benefit of this approach is that it mitigates an additional risk. With our proposal above, we'll be booking quotes using a piece of infrastructure (Redis) and receiving the information of the corresponding deals from a different one (Kafka). We must consider how to proceed if Kafka fails but Redis doesn't.

However, this requires a significantly higher amount of work, since we'd have to support in the smartTrade Gateway all the flows we already have in QFS. While this is our objective in the middle term, it might be too much work for this project. In any case, it's something to consider.

Keep using QFS instead of QFC + smartTrade Gateway

Following the opposite direction than in the previous alternative, we might get rid of QFC + smartTrade Gateway altogether and focus on the existing QFS. This option is fine, but splitting the service in two different components has the advantages we've discussed above. Also, we think this is a good opportunity to start making the first steps towards our future event-driven platform.

Anyway, we could fall back to this alternative in case we found an unexpected issue with the proposed approach that could significantly delay the delivery of the project.

Keep using Redis instead of Kafka

For a similar reason, we understand that we can start gaining operational experience with Kafka in this project and consolidating our infrastructure around this message broker. In the future, we'll be able to get rid of Redis as a pub/sub broker and therefore maintain a more homogenous platform.

Create 'Deal Service' instead of 'FXS Gateway'

Since the logic behind Swaps and Broker Deals is way simpler in FXS than in BOS, we might take a different approach for the former.

Instead of creating a gateway, in this case we might set up a new component that could eventually become a Deal Service. The Deal Service wouldn't ask FXS for anything. On the contrary, it would be responsible for deal aggregates and FXS would query this service whenever it must show any Broker Deal.

We find this option quite interesting, but it opens up additional questions and therefore it could delay the completion of this project.

Caveats

We cannot provide QFC with high availability for the same reason we aren't able to do it with QFS: We only have one FIX session per application and environment with the 3rd-party booking platform.

Operation

QFC and the smartTrade Gateway must be properly monitored with Prometheus. We'll create the corresponding alerts at the Alert Manager. If we ever encounter messages the Gateway cannot parse then these will be logged, and an alert will raise if the error rate is above a predefined threshold.

The Kafka cluster must be monitored and logs must be enabled. In particular, the lag between consumers and producers should be tracked in order to anticipate any potential performance bottleneck.

Failures to publish or consume ExecutionReportReceived events must be logged, tracked, and alerted. Put otherwise, when a deal with an LP has been created, our systems must eventually process it even if we face a temporary error in our platform.

Security Impact

The communication with Kafka is ciphered.

The communication between QFC and smartTrade flows through stunnel like in the case of QFS.

Performance Impact

Since the message exchange via Redis pub/sub isn't secured but the one with Kafka is, this might have a slight performance impact on the FIX flow. However, we expect this delay to be negligible.

Developer Impact

N/A

Data Consumer Impact

The Data team will be asked to confirm that information coming from both LALO and Post Trade is consistent.

Deployment

The release pipeline for the FIX Gateway must allow for blue/green deployments with graceful shutdowns.

Dependencies

N/A

References