Schema Evolution Implementation
As mentioned in the Events Standards RFC, schema management is required to ensure consumers and producers understand an event in the same way, and can identify when an event does not meet the expected standards.
Over time, schemas may need to change to accommodate a number of scenarios, such as redundant fields being removed, enriched data being added and definitions changing over time.
It is important that we have a well defined communications process between producers and consumers to ensure any schema evolution changes are accommodated to avoid unwanted impacts on downstream processes.
Schema Registry
A Schema Registry is any central service that enables the sharing of schemas between all services and supports the evolution and checking of those schemas.
Schemas are registered and persisted in a Schema Registry and are allocated an identity.
Events serialised with the Schema are associated with the Schema Identity, which allows consumers to access the Schema and deserialize the message.
A Schema Registry typically keeps a version history of all schemas and provides configurable compatibility checks for new schemas.
The technology choice for implementing a Schema Registry will be covered in a subsequent RFC.
Schema Changes
The producer owns the schema and is responsible for registering the schema with a Schema Registry.
Sometimes the new schema needs to be implemented by the consumer before it can be used by the producer.
| Compatibility | Producer Schema |
Consumer Schema |
Changes Allowed in new schema |
|---|---|---|---|
| Backward | Old | New | Remove fields Add optional fields |
| Forward | New | Old | Add fields Remove optional fields |
| Full | Any | Any | Add optional fields Remove optional fields |
The table above illustrates compatibility types.
A Schema Registry typically checks against the last version of the schema to ensure compatability.
If transitive compatability is configured, a Schema Registry checks for compatability against all previous versions of the schema.
Identifying Impacted Consumers
A Google Sheet Kafka Topic Configurations is maintained manually detailing the Kafka topics and their producers & consumers.
The Terraform configuration in the 'terraform-backoffice' repo is the definitive configuration reference
Service Ownership
Full details on service ownership can be found in a separate Service Ownership Blueprint
Each service documents the Kafka topics it consumes and produces.
Process
When changing schemas :
- Adequate notice should be given to impacted service owners.
- A rollout plan needs to be agreed between the owners of the producing and consuming services
- Typically, only the current and previous schemas are expected to be active at any given time.
Developer Impact
Schemas are currently shared either at source code level or included in each message.
As the number of services and schemas grows, schemas need to be checked, versioned and shared via a common Schema Registry.
The technology choice for implementing a Schema Registry will be covered in a subsequent RFC.
When a Schema Registry is available for use in production :
- New producers must use the Schema Registry
- Schema changes should initiate a refactoring of producers and consumer to use the Schema Registry