A schema language to describe services and libraries for building RPC stacks

Introduce protobuf / gRPC as part of our services stack.

Problem Description

As we move towards a micro-service driven architecture, we see:

A diversity of technologies being used and the interfaces of those services to become buried in the code.
An increment of hand crafted client libraries to consume those services.
No ability for automated CI checks to confirm whether changes would break consumers (that is without writing often convoluted e2e tests that are hard to keep up to date).
An increase of JSON REST APIs targeted for internal use only: wasting performance both on the network layer and in the data serialization.

Goals

Establish Protobuf as a common language to define services and messages.
Decouple implementation from the contract and expose interfaces in a unified way.
Establish gRPC as the RPC stack for internal services communication -- Protobuf is de-coupled of the RPC implementation but gRPC is the most widely used implementation.
Showcase available tooling and libraries that can leverage the schema as a tour of ecosystem features (client generation, breaking change detection, REST-proxies, testing capabilities, etc).

Non-goals

Establish a path for rewriting existing services.
Propose a migration from our current Facade pattern (this will be addressed in a separate RFC).

Background

In the beginning there was BOS, and just a few services around. Then we figured BOS grew too much on its own and that it was time to move functionality out of it. As we move forward implementing features in new separate services, we now see an increase of non-standard ways of API documentation, design and client authoring.

At the same time each of the new services have their client libraries manually crafted and this is already becoming a maintainability burden that'll get exponentially worse as we keep creating more.

Moreover, as we are coupling the implementation and the service schema, we are making it harder to be able to swap the language and implementation of each service. For example: we already have some services in Python and Node.js and it may even be desired to create some others in Go or other languages (maybe for Form3 or Kubernetes integrations), if we keep the contracts the same, the implementation then becomes less relevant and is less risky to replace.

Also, plain good old HTTP has not the best performance for internal communication and JSON serialization is not really the fastest either. As more services start to be part of the system and communicate with each other, it's very important we establish baseline technology that scales on which we can build upon.

Finally, with more services around, having a unified description of the interfaces will help us by enabling static analysis and ensuring no breaking API changes are introduced when the service ecosystem evolves.

Solution

Protobuf is a language for describing services and expected messages. It has a lot of tooling around it which provides:

Generation of libraries for working with the defined services and messages for all major languages.
Generation of both gRPC and REST API clients (for REST exposed services).
Automatic detection of breaking API changes via static analysis.
Generation of stubs to implement gRPC based services (or service doubles for tests!) easily.
Linters to ensure complete documentation, project layout and syntax.

How a service definition looks like

Here's a definition for a simple rates service like the one required in our Backend interview project:

syntax = "proto3";

package ebury.rates.v1;

import "ebury/type/v1/currency.proto";
import "ebury/type/v1/decimal.proto";


// [Rates](rates.ebury.com) API.
//
// Retrieves rates.
//
// Rates are retrived from the fixer.io external service.
service RatesService {
  // Retrieves a rate.
  rpc GetRate(GetRateRequest) returns (GetRateResponse) {}
}

// Request for asking a rate.
message GetRateRequest {
  // The 3-letter currency code defined in ISO 4217 for the base.
  ebury.type.v1.Currency base_currency = 1;
  // The 3-letter currency code defined in ISO 4217 for the target.
  ebury.type.v1.Currency target_currency = 2;
}

// Response for a rate.
message GetRateResponse {
  // The rate returned for the given currency pair.
  ebury.type.v1.Decimal rate = 1;
}

We won't get into the details of the syntax or the specification; it is recommended you read the Language Guide to get a full grasp of all the features and types provided by the language.

In this example we have a few custom types defined at ebury.type.v1 and we are using them to construct two messages representing our request and responses. Those messages are then used at the service definition and with that we have effectively specified what's our interface.

With just this, our tooling can now start to generate a lot of things.

The following examples consider this file-system structure:

$ tree ebury/
ebury/
├── rates
│   └── v1
│       └── rates.proto
├── trades
│   └── v1
│       └── trades.proto
└── type
    └── v1
        ├── currency.proto
        ├── decimal.proto
        └── money.proto

Generating Protobuf libraries for serialization

See Protocol Buffer Basics: Python for a primer introduction.

$ protoc -I $(pwd) --python_out=out/ ebury/rates/v1/rates.proto
$ tree out/
out/
└── ebury
    └── rates
        └── v1
            └── rates_pb2.py

Generating gRPC libraries for our RPC server

See gRPC's Python Quick Start for a primer introduction.

$ protoc -I $(pwd) --python_out=out/ --grpc_python_out=out/ ebury/rates/v1/rates.proto
$ tree out/
out/
└── ebury
    └── rates
        └── v1
            ├── rates_pb2_grpc.py
            └── rates_pb2.py

Writing a server with generated stubs

# server.py
"""The Python implementation of the rates server."""

from concurrent import futures
import logging

import grpc

from ebury.rates.v1 import rates_pb2, rates_pb2_grpc
from ebury.type.v1 import decimal_pb2
from grpc_reflection.v1alpha import reflection


class RatesService(rates_pb2_grpc.RatesServiceServicer):

    def GetRate(self, request, context):
        # TODO: use currencies, query fixer.io and return that instead.
        return rates_pb2.GetRateResponse(rate=decimal_pb2.Decimal(digits=314, scale=-2))


def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    rates_pb2_grpc.add_RatesServiceServicer_to_server(RatesService(), server)
    SERVICE_NAMES = ('ebury.trades.v1.TradesService', reflection.SERVICE_NAME)
    reflection.enable_server_reflection(SERVICE_NAMES, server)
    server.add_insecure_port('[::]:50051')
    server.start()
    server.wait_for_termination()


if __name__ == '__main__':
    logging.basicConfig()
    serve()

$ python3 server.py &
$ grpcurl -plaintext -d '{"base_currency": "USD", "target_currency": "EUR"}' localhost:50051 ebury.rates.v1.RatesService/GetRate
{
  "rate": {
    "digits": "314",
    "scale": -2
  }
}

Lint and API compatibility checkers

See the Buf Tour for a comprehensive overview of features.

Some key features: - Enforcement of docstrings for every single entity and field (API docs are then generated from this). - Enforcement of recommended layout so API versioning and evolution is supported naturally. - Detection of incompatible API breaking changes against master or a specific revision. - Linter rule collection that encapsulate all the current best practices around maintainability of Protobuf libraries.

Exposing gRPC API through a REST proxy for Browser consumption

For situations were we don't really need a REST interface nor care about HTTP only clients. See the gRPC-web quickstart for examples to generate browser libraries and serve gRPC APIs via the Envoy proxy.

Exposing gRPC via REST (with swagger output)

For situations were we do care about HTTP only clients. See the gRPC-gateway example for an example to generate the REST proxy and its Swagger definition.

Generating API documentation

Sphinx supports Protobuf and several scripts are available in different projects like Envoy or Protobuf itself that we can base ours upon. See: https://github.com/protocolbuffers/protobuf/pull/6525

Since we will be requiring docstrings on Protobuf files via lint, we can ensure a baseline quality of API docs across the board.

Testing

Testing is another of the areas that will benefit greatly from the Protobuf definitions and the gRPC stubs.

All types are provided and available to consumer of services, therefore constructing well-formed requests for external services is enforced by the types. This solves a huge problem we have right now, where we do a lot of mocking and create responses from external services that don't reflect the reality.

Another upside is that as fields contain information about their type, it's possible to introspect the messages and have factories that generate random contents for testing just by following those types.

Additionally, given the compiler generates the stubs of the services exported, consumer of those services can then easily replicate them locally with fake services that return desired responses. And just as we generated the real Rates example above we can also generate an equivalent interface in a fake Rates service that responds fixed values, and with this we are able to test code against the service without mocking the client.

Finally, as clients for all our services will be auto-generated, with each change of our proto files it will be ensured that the latest features are available all the time for our E2E tests.

Alternatives

These notes come from my experience, and I don't have much of it with Avro so you are all invited to fill that gap. Disclaimer: it's been a while since I used Thrift extensively (~2012) so the list may be outdated/incomplete.

Protobuf (Google)

Since 2001, Open sourced in 2008

Pros:

Battle tested, trusted and stable.
One of the main components of Google infrastructure.
All Google services are described in this language.
Officially supports: Python, Go, JS, Java, C++ and more.
Outstanding documentation.
Great examples everywhere plus the Google API design recommendations are available.

Cons:

RPC stack is provided externally, the one most widely used is gRPC.

Thrift (Apache)

Since 2007, internally developed at Facebook. Arguably has a bit cleaner syntax but I don't think it's true since Protobuf's proto3. It did not have great documentation at the time I used it (~2012), it surely improved a lot.

Pros:

Supports more languages out of the box than Protobuf.
Built-in RPC stack so the toolchain is a bit simpler.

Cons:

Non-equivalent set of features across languages makes it hard to work with when the combination is large.

Avro (Apache)

Since 2009.

Pros:

Built-in RPC stack.
The schema evolution story looks particularly nice as the schema is stored with the data when transferred over the wire.
No need for compilation, which helps for quick prototyping when tooling is not yet setup.

Cons:

Limited language support: C, C++, Java, Python and Ruby.

OpenAPI

OpenAPI could be considered a subset of Protobuf+gRPC, since APIs defined with gRPC can be exposed in a restful way and generate the proper OpenAPI definitions for restful clients.

Pros: + Standard widely known JSON format, human readable out of the box. + Provides code generation for clients and documentation. + Allows extra static checks from the schema.

Cons:

Sub-optimal over the wire when compared with Protobuf serialization.
JSON serialization is slower than Protobuf serialization (see Performance Impact for more details).
No options for bi-directional communication and streaming.

Caveats

Our toolchain gets a little more involved by introducing Protobuf and gRPC, so the usual scaffold for projects may get a bit more complex than today. On the other hand the separation of concerns adds to the overall flexibility of the technology, allowing for drop-in replacements for code generation of stubs, clients or libraries.

Operation

Operation is very similar to any of our current services, the containers generated for our services can be deployed on ECS and consumed by clients as usual. The Load Balancer rules for gRPC-only services may need some adjustments as our kind Devops have recommended Amazon NLB or Amazon ELB in "pass-through" mode to handle gRPC loads.

Security Impact

To ensure the security of our gRPC deployed services, we must:

Enable TLS in production gRPC apps.
Ensure that gRPC services only listen and respond over secured ports.

In addition gRPC has a few default security features that are nice:

Incoming messages are limited to a 4MB size (can be increased).
Exceptions are not propagated to clients (can be changed).

Performance Impact

Protobuf beats standard JSON serialization in all the languages it supports (usually 5x faster).

gRPC comes with an extensive performance test suite that showcases the behavior of servers and clients implemented with different languages.

Protobuf performance improvements on serialization vs JSON

Simple updated Python 3 benchmarks with 1000000 iterations on serialization and deserialization:

Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> import timeit
>>>
>>> from ebury.rates.v1 import rates_pb2
>>> from ebury.type.v1 import currency_pb2
>>>
>>> req_pb = rates_pb2.GetRateRequest(base_currency=currency_pb2.USD, target_currency=currency_pb2.EUR)
>>> req_py = {"base_currency": "USD", "target_currency": "EUR"}
>>> req_pb_serialized = req_pb.SerializeToString()
>>> req_py_serialized = json.dumps(req_py)
>>>
>>> # protobuf vs JSON serialization
... timeit.timeit(req_pb.SerializeToString)
0.9764512259862386
>>> timeit.timeit(lambda: json.dumps(req_py))
3.1570051899761893
>>> # protobuf vs JSON deserialization
... timeit.timeit(lambda: rates_pb2.GetRateRequest.FromString(req_pb_serialized))
0.5846345609752461
>>> timeit.timeit(lambda: json.loads(req_py_serialized))
2.5946265590027906

Numbers show about a 3x increase of performance on serialization and ~5x on deserialization. But note that even so, comparison is still somewhat incomplete since usually the serialization on our rest APIs is handled by abstractions (e.g. Serializers on Django Rest Framework) heavier than the JSON. This comparison is with JSON as good as it can be.

Finally, there's a huge compression gain on the serialized payload, which is one of the advantages of using Protobuf over the wire:

>>> len(req_pb_serialized)
5
>>> len(req_py_serialized)
50

gRPC performance across platforms

See gRPC Benchmarking for the current performance testing infrastructure and the results.

Here's the dashboard with the numbers of the latest release: gRPC Performance Multi-language (@upstream/master)

Developer Impact

Pros:

Interfaces are normalized across services and can be picked up to do static analysis or code generation.
Replacement of languages and implementations can be done with ease.
Improved development speed because of automatic code generation (clients, test doubles, stubs and scaffolding).
Increased trust on API compatibility changes (enforced by linter).
Standard API documentation process built-in (enforced by linter).
Ensured Protobuf syntax and layout best practices (enforced by linter).
REST APIs can be exposed through the gRPC services defined, also with OpenAPI/Swagger output generation.
Comprehensive list of Best Practices available from Google.

Cons:

More tooling involved in the default development environment.
New technologies often cause friction until the first couple of projects are out.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

A schema language to describe services and libraries for building RPC stacks

Problem Description

Goals

Non-goals

Background

Solution

How a service definition looks like

Generating Protobuf libraries for serialization

Generating gRPC libraries for our RPC server

Writing a server with generated stubs

Lint and API compatibility checkers

Exposing gRPC API through a REST proxy for Browser consumption

Exposing gRPC via REST (with swagger output)

Generating API documentation

Testing

Alternatives

Protobuf (Google)

Thrift (Apache)

Avro (Apache)

OpenAPI

Caveats

Operation

Security Impact

Performance Impact

Protobuf performance improvements on serialization vs JSON

gRPC performance across platforms

Developer Impact

Deployment

Dependencies

References