Data Integrity on Database

Building databases with strong data integrity to not lose data are essential for systems with important data for the business.

Problem Description

In repositories with an old version of Django, we are delegating the integrity of the data to the application when it should be on the database.

Also the application should know nothing about the persistence layer.

If the application has a bug and a row can be removed from the database, this can delete “ON CASCADE” tons of related rows with important data that should NOT be removed.

Then to restore the removed data we need to waste a lot of time to analyse what we lose and execute a few scripts to amend everything with the risk that this can cause.

By moving the integrity of the data to the database we can avoid this problem, which means, we can be sure that our data is not going to be lost in any case.

Background

Almost all the Foreign Keys that we have in applications do not have defined the ON DELETE action and the actual Django version that we have installed. If you do not have defined this action, it is going to set by default the action “CASCADE”.

Then every time a row with that relationship is removed from the database is going to remove the related rows too.

We needed to include extra code to check the integrity of the rows that we are going to delete.

Moving the integrity of the data to the database using a proper on delete action, we can avoid this extra code and also prevent losing important data.

Also, delete data from a system is always a bad idea, we should do a soft delete and remove it logically not physically.

Solution

The solution consists of 3 steps:

  • Include ON DELETE action into all Foreign Keys. (Mandatory since Django 2.0)
  • Use soft delete instead of delete
  • Include into the Quality Gate where we check the migrations, that FK can not be created without defining the ON DELETE action.

Caveats

It's a sensitive process that requires a lot of effort and care. Fortunately, this is going to prevent losing data so if the database raises an integrity error is because we are doing something that we should not be doing.

If developers overwrite delete methods, they need to take into consideration the post_delete event from Django. This event is being used by ElasticSearch and CQRS to remove objects from there.

Operation

We’ll need to apply the migrations with the changes to the actual database of the application in staging to run all the automatic tests, the capabilities for this are in place so should be a non-problem.

Security Impact

The security is going to be improved as we delegate to the database the integrity of the data instead of having it in the application.

Performance Impact

Performance is not going to be affected by this change.

Developer Impact

Developer experience will improve since data integrity is not going to be handled by the application code, so developers do not need to worry about it.

Deployment

No special instructions for deployment required, we'll use the standard pipeline to push these changes forward once ensured the quality and stability of the database. Dependencies We'll need to coordinate with teams to make sure we are not blocking each other on the migration step.