Software Testing

Software Testing is generally a broad topic, that has many debates and confusion around.

In this blueprint we are defining standards and guidelines for Ebury, ensuring reliable, high-quality testing principles. This document is a high level overview of what we want testing to be like at Ebury, our aims and approaches, other complementary in-depth documents will be created to handle language specific best practices, tooling, and other specific topics whenever needed.

NOTE: We concentrate on Software Testing that is suitable for automated verification (typically CI/CD pipelines).

Problem Description

In Ebury there has been no global concept regarding Software Testing, therefore some applications have inconsistent, flaky, sometimes insufficient or overly complex tests.

This isn't only making the life of developers harder, but results in a completely unnecessary load of the CI pipelines.

Background

It is important to keep in mind the reasons why we are testing our software:

Catching regressions
Enabling smooth refactoring
Faster development

These objectives can help to make the right decisions in case of doubt.

Solution

There are well-established principles, guidelines and best practices for Software Testing, together with reliable and convenient tooling.

Catch regressions

We are working on Software Systems that are complex well beyond our brain capacity. Thus a set of tests is defined to specify the correct behavior.

Furthermore, these tests also provide protection to preserve the correct behavior They prevent the introduction of bugs, failing whenever expected behavior might be broken.

There MUST be at least one test for all expected behavior, so that a broken behaviors results in a broken test
There MUST be at least one test failing, if an expected behavior is broken

Enables refactoring

There are constantly new requirements to Software Applications, thus the application code is always changing. As a side-effect, from time-to-time internal re-organization, re-structuring of the application code may be required.

Tests ensure that internal code refactoring changes would not break the expected behavior.

Tests mustn't break due to internal implementation changes
Tests must provide full coverage, yet rather not be too much coupled to the actual code

NOTE: If test modules have to be re-named after a software re-factor, those modules were probably testing implementation details ;-)

Faster development

Working with automated tests means getting fast feedback loop regarding code changes.

Tests should support the feedback loop, instead of slowing it down
Code changes should be in a healthy ratio with tests they break
Extending functionality (backwards compatible changes) shouldn't break any tests

Testing Principles

Generally, all tests have to be:

deterministic
reproducible
self-contained
stateless
idempotent

Furthermore, as for Developers' considerations, tests have to be

simple
clear
easy to change or extend
modular

Furthermore, unit- and integration tests have to be:

decoupled
only break when the tested component(s) break
- no "false" alarms
focus on functionality, NOT implementation
- as the latter may change

Determinability, Reproducibility

Tests MUST be deterministic and reproducible.

This means they should not rely on anything whose behaviour is not predictable, e.g * date and time * order of database rows * order of file listings * factories using random values as default * etc

There is only one good reason for a test to fail, something is broken in the system. These unpredictable behaviours can make the tests fail even though the system keeps working as expected.

NOTE: There is a space for randomized input in testing. It is an excellent way to detect bugs on "unexpected" data. However, randomized tests are NOT suitable for automated checks (like CI/CD pipelines), and thus outside of the scope of this blueprint.

Testing on the right levels

Testing pyramid

Given real-life constraints, we would only like to focus on the levels of the Testing Pyramid, that are used in Ebury.

Simplified Testing Pyramid

Component Testing

Individual building blocks, small logical units are tested decoupled from other components and workflows
Testing if the component corresponds to the functional requirements
Lightweight (generally)
Often referred to as Unit Testing
(As much as possible) separate from external dependencies
- i.e. dependencies outside of the boundaries of the codebase (external libraries, API calls, etc.).

Component Testing is mandatory and non-replacable in Ebury for all Software Components.

NOTE: Though ideally we should be able to cut all dependencies at this level (network, database, etc.) this may not be possible or beneficial in some cases. Typically when the functionality of a component is particularly difficult, elaborate -- virtually impossible to separate from underlying dependencies (like the database in Django), Component Testing may involve these dependencies as well. Rather use the Django database, than writing meaningless or obfuscated tests.

Long story short: Unit Testing is NOT equal to "cutting all possible dependencies". Unit Testing translates to: testing a logical component, a functional unit.

Workflow Testing

Combination/sequence of components tested together
- Partial or full workflows
- Typically: part of the business logic, a functional chunk of the execution flow
Partial or full workflows
Often referred to as Integration Testing
May involve external dependencies
More heavyweight than Component Testing

Integration testing is mandatory in Ebury, ensuring the correctness of workflows.

NOTE: Workflow Testing is NOT redundant to Component Testing. Since Software Components can be combined in various workflows, Workflow Testing is to ensure that these combinations/sequences are correct. (While Component Testing was there to ensure that the behavior of each Software Component is correct.)

End-to-end Testing

Full user experience
Testing complete workflows (from the database up to the user interface)
Often referred to as End-to-end Testing
Could be automated (typically for APIs)
May not be done by the developers directly

Ebury is looking into options to have automated end-to-end testing for certain applications at least.

Mocking

This section primarily concerns Component and Workflow testing, in particular their reliability and independence from any other factors than purely the application code.

Sometimes tests may interact with external dependencies -- implicitly or explicitly (via the tested code). Within the guidelines of keeping tests self-contained, the dependency has to be replaced with a so called test double, one of the following types. (Just a brief mention for completeness, detailed description is outside the scope of these RFC.)

Fake: an object with a working implementation, however usually with an internal "shortcut", so that it is efficient for testing, but not suitable for production.
Stub: providing canned answers to function calls invoked during the test, usually not at all responding to anything outside of what it is programmed for
Mock: pre-programmed object with expectations on the (sub)set of the specification that it should simulate
Spy: stub, that also record information based on how it was called

Primarily targeted dependencies (functions, modules, services, etc) are external to the application, and they should be replaced at the boundary of the application, using the public interfaces. Replacing internal dependencies is highly discouraged.

There is a very important space to test the application code connected to external systems. This is End-to-end testing or Systems Testing, which outside of the scope of this chapter.

There may be certain particularly reliable systems, that are so closely coupled with the application code, that we rather consider them as an "implementation detail", than an "external service". Typical example is a database, disallowing for meaningful tests if "cut out". These systems are NOT expected to be replaced, as using a test double may cause more harm than good for the quality and the value of the tests.

Beyond the boundary of the application code

Respecting public interfaces as a border is particularly important. Either partially replacing or including any external elements is implicitly making assumptions about implementation details of external code.

Component and Workflow testing, closest to the code, strictly has to concentrate on application behavior. Allowing for any external factor to interfere with that, can easily introduce a flaky test behavior, with unreliable results (as test failures may relate to external bugs). This is neither suitable for verification of application code behavior, nor automated (CI pipeline) execution.

Within the boundary of the application code

Within the boundaries of the application code we may consider to use test doubles. This means that we would be swapping application code with a test double. However, this is a dangerous, artificial alternation of application workflows. Elevated risk comes from the following reasons:

Bugs may be introduced when the test double and the real implementation diverges. This is harmful for the capability of the tests catching regressions.
Test may get tightly coupled to the implementation (e.g. when using stubs/mocks) much harmful to the refactoring capability of our tests.

Having said that, in certain occasions it makes sense to replace application code with a test double, as doing allows for cleaner tests. However in this case specific tests have to be put in place, ensuring that the test double and the real implementation are kept in sync at all time.

Testing the behavior not the implementation

By application behavior we are referring to what the code is doing -- opposed to how the code is doing it (which is 'implementation').

It is important that Software Testing is targeting application behavior and not the implementation. Otherwise the tests will be coupled to a particular implementation, meaning that the test will fail if the implementation changes.

Tests are there to ensure the correct behavior of applications, interfaces, making sure no contracts are broken. How is that all achieved internally... Doesn't really matter.

The tests should target the behavior of:

building bricks of the application (components, units)
workflows connecting components as expected

Refactoring is the process of changing application code implementation without modifying the behavior. Refactoring is essential to avoid software rot, and to allow codebases to evolve with new requirements. Refactoring should require no changes in the test code.

Tests act as a safe net, allowing to refactor safely without breaking the behavior. If implementation changes require test modifications, there is a good chance that the modified tests diverge from the original ones. Thus they would not be testing the right behavior anymore, and introducing a bug can go unnoticed.

Another issue with testing implementation instead of behavior is that it slows down the development process. Whenever changing the implementation, the tests have to be modified as well: single changes imply double effort,

Testing implementation is much easier and faster than testing behavior, but it doesn't really add value.

Tools

As a well-established Python best practice, pytest is recommended for all projects.

Note: Thanks to pytest's perfect compatibility with unittest, "hybrid" tests are perfectly fine.

Furthermore, a number of helper libraries are highly encouraged to use, such as:

responses or pook: Cutting external API dependencies
freezegun: Pinning down date and time of the test execution

Caveats

There would always be occasions, when finding the right guideline or the suitable pattern may be dubious.

However, keeping these principles in mind should globally have pretty good results.

Operation

Testing has particularly high impact on CI pipelines.

Following the principles described in this blueprint is essential to prevent unnecessary costs on the CI infrastructure.

Security Impact

Good testing is essential for reliable software.

Performance Impact

Integration tests (generally tests involving external sources) result in expensive test execution.

Which is part of the reason why we should be able to group the tests in smart ways, allowing to run only the necessary sets.

Developer Impact

In a long term, the following guidelines result in readable, modular, easy-to-extend, easy-to-modify tests, allowing for flowing a development cycle.

Deployment

N/A

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search