Schema Evolution Implementation

As mentioned in the Events Standards RFC, schema management is required to ensure consumers and producers understand an event in the same way, and can identify when an event does not meet the expected standards.

Over time, schemas may need to change to accommodate a number of scenarios, such as redundant fields being removed, enriched data being added and definitions changing over time.

It is important that we have a well defined communications process between producers and consumers to ensure any schema evolution changes are accommodated to avoid unwanted impacts on downstream processes.

Schema Registry

A Schema Registry is any central service that enables the sharing of schemas between all services and supports the evolution and checking of those schemas.

Schemas are registered and persisted in a Schema Registry and are allocated an identity.

Events serialised with the Schema are associated with the Schema Identity, which allows consumers to access the Schema and deserialize the message.

A Schema Registry typically keeps a version history of all schemas and provides configurable compatibility checks for new schemas.

The technology choice for implementing a Schema Registry will be covered in a subsequent RFC.

Schema Changes

The producer owns the schema and is responsible for registering the schema with a Schema Registry.

Sometimes the new schema needs to be implemented by the consumer before it can be used by the producer.

Compatibility Producer
Schema
Consumer
Schema
Changes Allowed
in new schema
Backward Old New Remove fields
Add optional fields
Forward New Old Add fields
Remove optional fields
Full Any Any Add optional fields
Remove optional fields

The table above illustrates compatibility types.

A Schema Registry typically checks against the last version of the schema to ensure compatability.

If transitive compatability is configured, a Schema Registry checks for compatability against all previous versions of the schema.

Identifying Impacted Consumers

A Google Sheet Kafka Topic Configurations is maintained manually detailing the Kafka topics and their producers & consumers.

The Terraform configuration in the 'terraform-backoffice' repo is the definitive configuration reference

Service Ownership

Full details on service ownership can be found in a separate Service Ownership Blueprint

Each service documents the Kafka topics it consumes and produces.

Process

When changing schemas :

  • Adequate notice should be given to impacted service owners.
  • A rollout plan needs to be agreed between the owners of the producing and consuming services
  • Typically, only the current and previous schemas are expected to be active at any given time.

Developer Impact

Schemas are currently shared either at source code level or included in each message.

As the number of services and schemas grows, schemas need to be checked, versioned and shared via a common Schema Registry.

The technology choice for implementing a Schema Registry will be covered in a subsequent RFC.

When a Schema Registry is available for use in production :

  • New producers must use the Schema Registry
  • Schema changes should initiate a refactoring of producers and consumer to use the Schema Registry

References