Event Standard

Problem

This document intends to detail high level event standards that help Ebury Ebury engineering teams to build event driven services efficiently and operational teams to rapidly troubleshoot event driven services.

Solution

Document and approve event standards and relevant patterns, including data definitions associated with events e.g. type, fields and the communication of an event triggering logic.

This document aims to provide guiding principles and best practices. It will be the responsibility of the engineering teams to assess individual use cases in accordance with these principles, and discuss with the relevant stakeholders where exceptions are required or a decision point is needed.

Event Definitions

In the context of event-driven design, there are two main event types; domain events and change events. These event types are important to identify the relevant patterns for processing events.

  • Domain events – an explicit event, created as part of a business domain, which is generated by an application/service. These events describe something that has happened, and are represented in the past tense e.g., AccountOpened, AccountClosed, etc. These events are the primary concern for event sourcing patterns.

  • Change Events – an event that is generated from a database transaction log when state transition has occurred. The log is considered internal to the database and captures all changes. These events are the primary concern for change data capture patterns.

Domain and change events are generally not related, unless a change event contains a domain event e.g. an account database logs a change, resulting in an account moving from a ‘pending’ status to an ‘open’ status. This change event contains an ‘account opened’ domain event.

There are other event types relating to the event structure and purpose, but at a high level there are Domain events and Change events.

Other Event Types to consider

  • Command event - For further useful context, a command ‘event’ is a request to carry out an action, usually in the form of a verb. An event is a statement of fact, detailing an action in the past tense because it has already occurred. Therefore a command is the intent to do something, the event is the record of the action taken.

  • Notification event - notification that some action has been completed. These are usually generated on the back of another event - whether this is a domain, change or command is dependent on the use case.

Event Structure

Events are typically structured in the format of key value pairs. The value stores the detail of the event and the key provides identification information, routing and aggregations on events with the same key. The key is not required for all event types.

Events are typically structured in one of three ways:

  • Entity events

    -- An entity is a unique ‘object’ in a business context and is keyed on the unique ID of that object.

    -- An entity event describes the properties and state of the entity at that point in time.

    -- An example for Ebury would be a broker deal as an entity, therefore the broker deal ID would be the key and all necessary information relating to that specific broker deal contained in the value.

    -- Entity events are important in event driven design, as they allow for the creation of a historical state of the entity, creating a materialised state for any point in time.

    -- Only the latest entity event is required to determine the current state of the entity.

  • Keyed events

    -- Contains a key, but doesn’t represent an entity.

    -- Usually used for partitioning the stream of events to guarantee data with the same key is allocated to the same partition.

    -- The distinction between an entity and keyed event, is that a keyed event describes an interaction with an entity, but not the entity itself.

    -- All events with the same key can be aggregated to create a single entity event if appropriate.

    -- An example for Ebury would again be a broker deal, but this would contain all changes and interactions with that broker deal. These events could then be combined to create an entity event that represents the current state of the broker deal.

  • Unkeyed events

    -- Used to describe an event as a singular statement of fact.

    -- There is no key involved in this type of event.

While this document aims to be technology agnostic, it’s worth noting in the context of Kafka as an example, the key, if provided, will ensure all events that have the same key are directed to the same topic partition. This is the same concept in other event brokers such as Apache Pulsar. If no key is provided, allocation to a topic is based on a round robin approach.

Event keys can be used to compact topics, create aggregates, joins and create materialized views.

Further detail regarding LLD of event structures within Ebury is available in the Event Standards document. (Link will be provided when this document is published)

The structure of an event is defined by the schema. If the event structure needs to be changed, the impact on the schema will need to be assessed to determine whether the changes are backwards compatible, to version control the schema and ensure changes are communicated to downstream consumers.

Schema Definitions

To ensure both producers and consumers understand the events that form the basis of the communication, both teams must have a common understanding, forming the basis of a ‘data contract’.

The data contract ensures both the data definition (the event structure) and the trigger logic (what causes the event to be generated) for the event are understood. Both the definition and trigger logic may evolve over time.

The best way to enforce a data contract is to define a schema for each event. The schema is defined by the producer, which details the data definition and trigger logic, allowing consumers to build logic and extract the required data based on this schematized data.

Schema Evolution

The schema format must support schema evolution rules, to ensure producers can update their services without impacting consumers. This may be based on the addition of new fields, removing old fields, expanding scope of an existing field or renaming a field.

Following on from the previous example for Ebury, if a Broker Deal event previously did not hold the fields ‘branch’ and ‘entity’, and we wanted to add those fields to the event payload, we could use schema evolution to add these new fields in with a default value where necessary.

Following a schema evolution framework allows these changes to occur while allowing producers and consumers to be updated independently of each other.

Schemas can have multiple compatibility types:

  • None - no checks for schema compatibility (not recommended).

  • Full - the new schema is both forwards and backwards compatible.

  • Forwards compatibility - data produced with the new schema can be read as if it was produced with the old schema. This means consumers can continue to read new data without being updated, unless it wants to access the new fields. -- Using the broker deal example above, the addition of entity and branch fields would be ignored by the consumer, until the consumer makes the necessary changes, but the event could still be consumed without causing any problems with these fields omitted.

  • Backwards compatibility - data produced with the old schema can be read as if it was produced with the new schema. This means a consumer can read old data as if it was produced with the new schema. This allows a consumer to release an update prior to the producer release. -- Using the broker deal example above, the consumer could apply the default values for entity and branch until the producer releases the necessary update to provide these fields within the event. This means the consumer can update its own read schema to include these fields, even before they exist in the producer payload.

Breaking Schema Changes

Breaking changes may be required due to a business requirement change that alters the model of the original domain or improper scoping of the original domain/human error. In either situation the producer should consider downstream systems impacts. The producing team should speak to consumer owners as early as possible, communicate the changes, and version the schema and streaming API with a deadline of when the original version will be deprecated.

A potential example would be if Ebury redefined the relationship between clients and deals, resulting in a restructure of the data model.

Breaking changes have a big impact on entities that exist indefinitely, and less of an impact for events that expire in a short period of time.

Breaking changes should be a last resort, as most changes can be accommodated elegantly with schema evolution.

Event Data Definitions

Events form the basis for long term and implementation agnostic data storage, and the communication mechanism between services (producers and consumers). Therefore it’s important that all who publish and subscribe to these events have a common understanding of the meaning of the data.

Schema definition systems such as Avro and Protobuf ensure a common understanding of the event definitions and minimise the impact of changes on consumers where possible, or breaking changes are communicated between producers and consumers.

Well designed events will minimize otherwise repetitive pain points for both producers and consumers, therefore the following best practices should be implemented, and where not possible, the trade offs considered.

  • An event should contain all required information to determine the action that occurred during that event, avoiding the consumer having to consult any other source to establish further information about the event. It should be treated as the source of truth in full.

  • Use the narrowest data types to avoid ambiguity e.g. don’t use strings to store a numeric value, or integer as a boolean. This will help serialization unit tests, code generators and language type checking (where applicable).

  • Minimise event size - small, well defined, easily processed events are the aim. Very large events do occur, especially when following the first principle above and there is a lot of contextual information, but it is important to ensure data is directly relevant to the event, and not adding in case of a potential future undefined use case. It is important to recognise the use case and bounded context in this scenario.

  • The time an event took place should be easily located in the event, as should the timestamp that the event was processed.

  • Schema definition comments - communicating why an event was triggered, typically in the header of the event and comments that clarify certain fields in a schema are useful to provide context of an event. The comments may be used by operational staff during troubleshooting or by engineers of consumers to avoid misinterpretations.

  • Data quality – When creating data entities for an event stream it is important to ensure the entity is known by the data governance team, so the data can be defined and owners allocated to ensure ongoing maintenance. The data governance process will be detailed in a separate document (reference to be provided once it is available).

  • Keep events single purpose, avoiding having to overuse ‘type’ values to identify sub features of an event. Often each ‘type’ would have a different business meaning, which can change or evolve over time. Some ‘types’ may also have different parameters to track type specific information, and eventually end up with very different events under the same event schema.

  • Use a singular event definition per stream - it is not advisable to mix different types of events within an event stream, as it adds complexity to definitions of the event and when to use that particular event stream. It will also cause complexity within the schema, therefore should be avoided where possible.

  • There are exceptions to this principle that will be discussed below.

    When can multiple event types be put in the same Kafka topic?

    The general rule is that all events of the same type go in the same topic, and different topics are used for different event types. This keeps schema management clean, avoiding overloading event definitions, complex logic to be handled by consumers and monitoring different trigger logic, which in turn reduces the number of schema changes to existing schemas.

    Having multiple event types in the same topic can also lead to consumers consuming many events that they are not interested in.

    However, where Kafka is used in a more database-like fashion, rather than just streaming events, a more important factor to consider is event order.

    We may have an entity that has different actions that can occur, and the ordering of those actions matter. A generic example of this would be a customer entity, where the customer can have the following actions:

  • Created

  • Change an address
  • Change status

    If a change of address event is consumed prior to the creation event, this can cause issues in consumer logic in some use cases.

    Events about the same entity - Most commonly, the order of events matter if they are about the same entity, therefore all events that define an aggregate for that entity will go into the same topic under the same partition key.

    Events about different entities - If one entity depends on another, or are often needed together e.g. customer and address, they can go in the same topic, however if they are unrelated and managed by different teams, they should be in separate topics. - Also if one entity has a much higher rate of event throughput than the other, they would be better split into different topics to avoid overwhelming consumers who only want the events relating to the low throughput entity.

    Events about multiple entities - If an event relates to multiple entities, for example a purchase relates to both a customer, a product, a payment and an account, these events may need to be processed as a single atomic message and placed on a topic as one event. Provide the event with a UUID to allow the event to be split further in downstream processes if necessary for traceability.

    Kafka Streams State Store - A changelog topic for a KTable should be kept separate from all other topics. This is managed by a Kafka Streams process.

    Where this exception applies will depend on the use case, requirements and architecture patterns for event processing being followed.

Common Event Attributes

The following data attributes should be common across events:

  • Id - Unique event identifier.
  • SchemaId - identifies the schema version used to create the event. The specific implementation of this isn’t the subject of this document, but should be available.
  • CorrelationId -Unique identifier that is attached to the message to support referencing a particular transaction or chain of events.
  • EventAction - Type of the event (CREATE, UPDATE, DELETE).
  • Payload - Contains all of the event data in a single field.
  • Timestamp - Timestamp of the event.

This section may be moved into the existing ‘events based architecture’ document in time, but captured here currently.