General Application Guidelines and Expectations

Reference Documents

Reference Document Location
EPIC01 RFC template
Kubernetes Service Requirements
Python Logging
Pattern Library
Software Testing

Problem Description

Building and running correct, secure, reliable, etc., applications is difficult, requiring significant care and effort from dev and ops perspectives. Ebury has additional requirements due to its technology platform and business requirements.

As Ebury moves away from monoliths, the responsibility for ensuring application code is designed and built well is distributed to many smaller services. Developers can no longer rely on (rightly or wrongly) being part of a larger application to provide critical functionality such as logging, metrics, security, etc.

This document attempts to provide a very high level view - almost a checklist - of what a good application should look like and key areas developers must consider when adding functionality. Some aspects will be obvious and can be covered here; some may require more detail and should reference other documents or example "current best practice" implementations.

Solution

This section is organised around aspects that are common to most (probably all) applications, and aspects that are specific to a particular type of application workload, e.g. HTTP APIs, Kafka consumers, etc.

Subsections are deliberately terse to help with scanning. Typical RFC wording is used to indicate requirement levels.

Common Aspects

Logging

  • MUST NOT include secrets, personally identifiable information (PII), and similar sensitive information
  • SHOULD provide an effective trace of an application's process or state
  • SHOULD emit structured logs
  • SHOULD NOT be excessive
  • Messages MUST be retained for a reasonable period

Secrets

  • MUST be stored securely
  • MUST be securely generated
  • MUST follow the principle of least privilege
  • MUST NOT be visible in production
  • SHOULD NOT be shared

Client Requests

  • SHOULD be avoided if possible, i.e. conform to Ebury 2.0 architecture
  • SHOULD be authenticated
  • SHOULD be authorised
  • SHOULD timeout to avoid blocking
  • SHOULD encrypt network traffic
  • SHOULD support rotation of credentials with no downtime
  • MAY retry idempotent/safe requests, ideally with back-off

Monitoring

  • MUST emit metrics
  • MUST provide alerts
  • SHOULD provide a dashboard
  • SHOULD NOT cause false positive alerts

Scalability

  • SHOULD be scalable by design
  • MUST support having multiple instances running concurrently

Deployment

  • MUST be deployed to Kubernetes
  • MUST deploy correctly for the first time
  • MUST upgrade correctly and safely, e.g. considering:
  • Backwards compatibility
  • Migrations
  • Rolling restarts
  • SHOULD be fully automated

Data

  • MUST NOT assume inputs can be trusted
  • MUST be validated for correctness

Schema

e.g. messages, APIs, databases, etc.

  • MUST take all reasonable precautions to ensure breaking changes will not affect dependents
  • SHOULD be backwards and forwards compatible

Data State/Storage

  • MUST be safely persisted
  • MUST not be lost after a restart
  • MUST be stored for read consistency or not at all
  • SHOULD make use of constraints in the storage engine
  • SHOULD use indexes to make known access plans efficient

Programming Errors

  • MUST produce an alert
  • MUST NOT be ignored
  • SHOULD NOT be directly surfaced to an end user in production

Runtime Errors

  • MUST NOT be ignored
  • MAY be logged and dropped
  • MUST NOT surface potentially sensitive details (variables, stack traces, etc.) to an end user in production
  • SHOULD provide sufficient context to help debug the issue

Code Quality & Security

  • Changes MUST be peer-reviewed
  • Changes MUST be approved by at least one code owner
  • Security vulnerabilities SHOULD be fixed as quickly as possible
  • Dependencies SHOULD be updated regularly

Rate Limiting

  • Processes MAY need to be rate limited, e.g. for:
  • Load management
  • Service availability
  • Cost management
  • etc

Testing

  • MUST be tested
  • SHOULD have automated tests
  • MUST have stable automated tests

See Software Testing

Auditable

TBD

Tracing

TBD

Application Types

APIs

There are various reasons for building and maintaining APIs, e.g.

  • Admin API - internal, heavily restricted API only for administration purposes
  • Service API - internal API for other services and UIs
  • API gateway - proxy between a client and one or more backend APIs
  • Channel APIs - external, i.e. public facing API

In general, all API requests:

  • MUST require the client to be authenticated
  • MUST require the client to be authorised
  • MUST encrypt network traffic
  • SHOULD support rotation of credentials with no downtime

Kafka Producer

  • MUST emit events with a well-designed and maintained schema
  • SHOULD publish the schema in a shared repository
  • SHOULD be authorised to send events to a topic
  • MUST emit events containing a consistent identifier to allow idempotent handling in consumers
  • MUST use partition keys that ensure events do not arrive out of order
  • SHOULD avoid emitting the same event multiple times

Kafka Command Event Consumer

  • MUST handle events idempotently (exactly once is effectively a myth)
  • MUST NOT acknowledge unhandled events
  • MUST assume events for different entities will arrive in arbitrary order
  • SHOULD send a Reply event as acknowledgement (success or failure)

Kafka Domain Event Consumer

  • MUST handle events idempotently (exactly once is effectively a myth)
  • MUST NOT acknowledge unhandled events

Periodic Tasks

  • MUST NOT leave the system in an inconsistent state
  • MUST support concurrent execution
  • SHOULD NOT cause duplication
  • SHOULD NOT take longer than the run frequency

Gateway Service

  • MUST encapsulate upstream location
  • MUST encapsulate any upstream authentication
  • MAY transform data
  • MAY proxy/bridge protocols

Reference Data

TBD

User Interface

  • MUST NOT expose internal secrets
  • MUST NOT control authorisation
  • MAY use authorisation checks to improve the user experience

Developer Impact

Some aspects may be impossible, impractical, or prohibitively expensive for application developers to tackle on their own or per-application. However, an application is only part of the full picture:

  1. Platform
  2. Shared libraries
  3. Application templates
  4. Application helpers
  5. Custom application code

Improvements at lower levels of the stack should be prioritised to benefit as many applications as possible.

In reality, implementations are likely to start at the highest level, i.e. as custom application code, and work their way down towards the platform, being generalised as they go. In fact, this is often a useful process as generalising too early can lead to inflexible, poorly designed, and overly complex implementations.



Based on RFC Template Version 1.1