Asynchronous Command Interface Pattern

Problem Description

Services need to collaborate and communicate with each other.

Synchronous communication results in tight coupling where both the client and service must be available for the duration of the request. In a chain of synchronous inter-service calls all services must be fully operational and the reliability is the product of the reliability of each ‘hop’ in the chain.

As the system scales, latency increases and reliability decreases.

Solution

Use asynchronous event based Commands in Kafka to invoke actions in other services.

Asynchronous Command Service Pattern

1: The Client writes a Command Event on a Command Topic dedicated to the Processing Service.

The Command Event Includes:

  • client_id - uniquely identifying the client service.
  • request_id - uniquely identifying the request.

The Command Event is generated using the Schema owned and defined by the Processing Service.

2: The Processing Service consumes the event and performs the requested action. In the simplest case, the Command Topic contains one partition and one consumer processing the events in sequence.

3: On completing the request, the Processing Service creates a Reply Event on the dedicated Reply Topic recording the status of the action. The Reply Event includes the client_id and the request_id of the initiating Command.

The Schema of the Reply Event is owned and defined by the Processing Service.

4: If the Client Service is interested in the reply (it may not be) - it consumes Reply Events containing its client_id.

5: In performing the action initiated by the Command Event, the state of the Processing Service may have changed. In which case, the service publishes one or more Domain Events on the Domain Topic(s).

For traceability, each Domain Event includes the client_id and the request_id of the Command that initiated the changes.

The Schema of the Domain Event(s) is owned and defined by the Processing Service.

6: A dependent downstream service consumes the Domain Event(s) and takes appropriate action in response to the change.

Variations

Different use-cases may use a sub-set of the above flows. For example, a Reply Event may not be required if there are no error conditions to report, Domain Events may not be generated if the command does not impact Domain Data.

Example

Consider a service that has multiple ledgers. A command would be 'Move $X from A to B'. A Reply Event would record success or failure (e.g. 'Insufficient Funds'). On success the Domain Events would be 'New Balance A', 'New Balance B' - recording the new state.

Deprecated Service Integration

Asynchronous Command Service Pattern

Ambassador is a term used to describe a gateway into an internal legacy system, such as BOS. A service inside BOS can be made accessible externally via an Ambassador.

Each exposed service has its own Ambassador using this Asynchronous Command Interface Pattern.

This is a temporary arrangement until the service, and its interface, is moved outside of BOS.

Advantages

Well Defined Interfaces

In most synchronous REST/HTTP API implementations the interface is not formally specified. There is no explicit contract between Client and Server. Parameter checking is ad-hoc. Results are serialised in a free-format. There is no automated validation of calls or return values. Interfaces rely heavily on extensive testing, hand-coded checking and explicit serialisation. Interface evolution is complex - often requiring multiple interfaces to be supported simultaneously.

With asynchronous events, the interface between Client and Server is a persistent static data contract with well defined and well behaved schema. The schemas formally specify the calling data and the return data. They enable automated serialisation and validation of calls and return values. They enable interfaces to evolve in a controlled manner. They provide a robust and well defined interface that simplifies testing.

Clients are Decoupled from their Servers

Synchronous calls require both Client and Server to be running and communicating at the same time.

With asynchronous events, the interface is static data. The Server does not have to be running for the Client to make a request. The Client does not have to be running for the Server to process requests. This decoupling improves reliability and resiliency.

Clients are Isolated from the Server Configuration

In a synchronous call, the routing between different providers has to be done in the client. The client has to determine which service can fulfil a request and call the service directly. This logic needs to be replicated in each client. The logic needs to change in each client if the providers change.

With an asynchronous event, an event is emitted onto a topic (queue). The client does not need to know which service will process the event. A provider consumes events it is responsible for and generates responses. There is no routing logic in the client and it is isolated from the providers. The client is decoupled and does not change when the provider configuration changes.

High Load is Managed Gracefully

If there are many concurrent synchronous calls to a service, each with a compute or memory intensive request (e.g. generate a report), the service will fail under heavy load. Synchronous calls must be rate limited. This means each client must handle failed connections, timeouts and retries to handle scenarios where rate limiting is applied. As the server capability changes, the rate limits must also be changed.

Client side complexity is introduced to protect the server side capability.

An event driven service processes events at its own pace so there is no need for throttling. The client only has to create an event - and may not have to handle the service being temporarily unavailable. As the service capability increases, events are processed more quickly automatically scaling performance with no change to the configuration or to the clients.

Resilient to Client Crashes

With synchronous calls, each client is responsible for ensuring the request is completed successfully. This means managing the persistence of the request in a local database. If a client crashes after making a request, but before receiving a response - it needs to retry the request again on restarting.

With asynchronous events, once a client persists the event, it is guaranteed to be processed in an orderly fashion at some point in the future.

Resilient to Server Crashes

With synchronous calls, circuit breakers must be implemented to handle scenarios where the provider is unavailable. They minimise the disruption caused by a synchronous cascade of timeouts. Furthermore, when a provider recovers - it may be swamped with clients retrying requests (see rate limiting above). Client side complexity is required to manage server side availability.

With asynchronous events, once a client persists the event, no further action is required. Kafka is highly-available by design. Providers process events within their capacity.

Enables High Priority Requests

With synchronous calls, all requests to an API have equal priority. There is no mechanism for providing faster responses to high priority clients or endpoints. The only indirect mechanism of achieving this is to artificially throttle most clients so that high priority clients have uncontended access.

A high priority and a low priority topic (queue) can be provided. An asynchronous event on a high priority topic is processed before any events on a low priority topic.

Avoids Complex Concurrency Protection

With synchronous calls, concurrency needs to be handled explicitly by the server. Some mechanism (e.g. pessimistic locking) is required to ensure concurrent requests are handled safely. This adds significant complexity to the server and is too often the source of obscure bugs.

With an event interface on one partition, requests are serialised and frequently there is no need for server-side concurrency protection. With multiple partitions, events are routed to partitions in such a way that multiple consumers can operate independently. This simplifies server side implementation (avoiding locks) and scales with load.

Disadvantages

Mindset Change: Designing Asynchronous interactions requires a different mindset to designing with synchronous calls.

Additional Resources: If a reply is required, the Client needs to consume Reply Events from the Reply Topic.

Debugging: Tracking asynchronous events is more difficult than tracking the results of synchronous calls.

References

  1. “Communicating using the Asynchronous messaging pattern” in Microservices Patterns by Chris Richardson.

  2. “Messaging Request-Reply” in Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions by Gregor Hohpe & Bobby Woolf

  3. Kafka Standards and Patterns Blueprint