Assets caching and bundling
A proposal of changes to do in EBO and its assets to ensure better caching and cache-busting, which should lead to a reduced number of requests, and should require less space to store assets in S3.
The following RFC discusses issues with the current approach and how to fix them, focusing on two areas, caching/cache-busting and bundling of the assets.
Problem Description
Caching and cache-busting
Recent changes in the automation of the release process in EBO created new sets of problems with our static assets. Our static files have the version of the EBO in the URL, e.g. the script URL required for the login page is https://d3pd80oeif52cn.cloudfront.net/6.7.0/static/js/auth.js. The content is cached for one day, so the browsers don't have to download the asset with every request.
Whenever we release a new version of EBO, the URL changes, and it forces browsers to re-download the asset. This is called cache-busting, and it's very ineffective cache-busting when used with a rapid release process. A new version of the assets means all 5000 of them are no longer valid even if 4999 didn't change between the versions. That means browsers have to re-download lots of scripts and CSS files and images and SVGs every time a new release gets deployed.
With a goal in ONL team, aiming to do a release (or multiple releases) a day, this must be solved by a different approach. We're going to start using an MD5 hash of the file content instead of EBO version, e.g. the URL for auth.js should become https://d3pd80oeif52cn.cloudfront.net/assets/static/js/auth.2ecc13654c22.js, and it should keep that URL until there is a change in its content.
Over zealous assets bundling
An additional problem with cache-busting is that lots of files are bundled together, thus if one file in the bundle changes, then the entire bundle changes and must be re-downloaded.
It was a best practice to bundle them into a single file because of performance issues in HTTP/1.X, but all of these issues have been resolved in HTTP/2. The bundling is no longer needed, and it's now considered a bad practice. Not bundling the assets means individual files can be cached, and only those that changed in the release should be re-downloaded by the browsers and Cloudfront again.
However, some level of bundling is tolerable, e.g. library like Lodash consists of 1000 tiny files and downloading them one by one can lead to more overhead than downloading one bundle.
Some Ebury environments don't have cache-busting at all
Some production environments (DEMO, SOLUTIONS, MP) don't use the version of the EBO in the assets URLs, e.g. the same auth.js file in MP has URL https://thefxfirm-ebostatic.s3.amazonaws.com/eburyonline-prod-fxfirm/static/js/auth.js. This can cause issues occasionally when deploying these environments because the assets are deployed to S3 before spawning the new server instances. All existing files are overwritten in the S3 bucket while the old instances are still running. It's the same problem as running a DB migration that deletes the table while there is still an old instance trying to use it. This problem can be solved by appending MD5 hashes to each file too, so existing files cannot get overwritten.
Rolling back assets for some environments is hard
The previous problem also causes issues when we need to roll back EBO to a previous version. DEMO, SOLUTIONS, and MP doesn't have versioned assets, so they have to be re-generated and re-uploaded to the S3 bucket again for a specific version that we want to roll back to.
Background
Assets in EBO
Currently, the number of assets in EBO is around 5000 items (approx 90MB). They live in the S3 buckets, and each version has its own folder. That means each time EBO in PROD is deployed, a new folder and 5000 items are copied there. In DEMO and MP, 5000 items are just copied over the existing 5000 items without the ability to roll them back because the version is always the same string.
There are thousands of files that haven't changed since the beginning of EBO, or they change only once per year. Creating a new URL for them each time the EBO is released is a huge waste of resources, and Cloudfront and browsers have to re-download them again. Those can be cached in the browsers with 1 year expiration reducing the number of requests that our platform, WAF, Cloudfront and S3 have to deal with significantly.
On top of that, files that don't change should exist only once in the bucket, so the space needed to store assets can be much smaller than the current one.
HTTP/2
Bundling assets was one of the most needed performance features back in 2010 because of limitations in HTTP/1.0 and 1.1. Each request that a browser has to do required a separate TCP connection, and the browsers had limitations and could make only 6-10 parallel connections to the same domain. Lots of techniques how to deliver the assets faster to the browser were invented to bypass these obstacles.
Domain sharding (having multiple domains, e.g. images.ebury.com, statics.ebury.com, api.ebury.com) to serve assets was very popular because you could have 18-24 parallel requests.
Assets bundling (concatenating hundreds of small JS/CSS files into one bundle) had massive benefits because it required only one request to get the entire content to the browser.
Image sprites, font icons, SVG sprites was a technique used to concatenate images into a single one and then properly position them.
EBO uses all of them. Meanwhile, HTTP/2 solved all issues with connections, and it now needs only one open TCP connection to handle hundreds of parallel requests.
A current adoption is that 95.4% of all requests to EBO's static assets are served via HTTP/2. Deprecating IE and stopping the support of this browser in November 2020 helped a lot to get this number very high. 4.6% of the remaining requests are people with Windows XP or old MacOS or crawlers/bots.
Solution
Cache-busting via hash
EBO's static assets are collected and uploaded to the S3 using Django and their collectstatic command. This command collects all JS, CSS, SVG, PNG files and uploads them to S3. The command can handle postprocessing of the files and create MD5 hash of the content and add the hash to the filename. This can be achieved via ManifestStaticFilesStorage.
The final folder structure can look like this:
assets/static/js/
- auth.78d908e02312.js # contains changes from 6.4.0
- auth.e8bd1e9dc0fc.js # contains changes from 6.11.0
- auth.f656de4abf70.js # contains changes from 7.2.0
...
assets/static/css/
- auth.f656de4abf70.css # contains changes from 6.4.0
- auth.ab478dd009aa.css # contains changes from 6.13.0
All files are in one folder, and they are created only if their content has changed. Because the auth.js hasn't changed between versions 6.4.0 and 6.11.0, the same file should be used by the EBO.
Using hashed URLs in Django app
Django can keep using the syntax without hash in the URL because ManifestStaticFileStorage handles that too.
will generate:
<script src="https://d3pd80oeif52cn.cloudfront.net/assets/static/js/auth.f656de4abf70.js"></script>
Because collectstatic command and ManifestStaticFileStorage generate a file staticfiles.json which contains a map or original filenames and hashed filenames, e.g.:
{
"paths": {
"js/auth.js": "js/auth.f656de4abf70.js",
"css/auth.css": "a/a.ab478dd009aa.js",
},
}
This file should be added to the generated docker image for each environment. That means online:6.4.0 should carry its own staticfiles.json with its own paths to specific assets in S3 bucket, and online:6.5.0 should also have its own file, which should contain paths that points to unchanged files from 6.4.0 or new files that were modified in version 6.5.0.
This means all environments should be ready to roll back immediately because the files should be already in S3.
Less bundling of assets
CSS and JS files should not be bundled or we can keep bundling modules or libraries. This can be easily achieved with modern vuejs stack which already use code splitting and tree shaking. Very big modules can be split even more using maximum bundle size in webpack. We will be aiming to have maximum bundle size below 10KB.
3rd party libraries, e.g. vue, lodash, date-fns, ky should get its own bundle instead of bundle them together into one gigantic vendor.js file. This can be also configured in webpack. For legacy code, we already have available umd builds for each library which can be used directly in the browser. We will just drop concatenation in gulp config.
Remove concatenating of images and SVGs into image sprites and start using them directly in the code. Current image sprites work like this:
images/flags/
- can.png
- uk.png
- de.png
- all.png # all PNGs from about are joined together in this file
<div style="background: url('images/flags/all.png'); background-position: -32px -16px; width: 16px; height: 16px;"></div>
Instead of this, we will use images directly, e.g. <img src="images/flags/uk.png" />. This approach encourages downloading only assets that the application really needs. We don't have a situation when a user would need all flags in the browser.
Alternatives
N/A
Caveats
HTTP/1 usage
We still have 4.6% requests coming to the EBO using HTTP/1.1 and 1.0. For those users, the page would load slower than for the others. Better caching and less aggressive cache-busting should compensate for a created delay. On top of that, 100% of these users are using desktop browsers, so the internet connection should be capable to reduce the lag too.
Zlib and small files
There are articles on the internet saying that stopping the bundling leads to an increased size of downloaded content because the gzip compression is not that effective anymore. If you bundle 100 files, and then you gzip them, the result is smaller than if you gzipped 100 individual files. The difference is somewhere between 2.5% to 5% of the size. However, we are aiming to cache files much longer than it was possible before and aiming to reduce the number of requests, so that should compensate for the loss of having slightly bigger content to serve.
DEMO and MP not using Cloudfront
All DEMO environments and MP don't use Cloudfront for serving the assets, and they are connecting directly to S3. S3 can't support HTTP/2, so the entire communication is handled via 1.1. The proposed cache-busting is not affected by this, but the proposed changes in bundling should have a significant impact on the performance of these environments. There are two options that can handle this situation.
We can wait until MP gets merged to PROD and DEMO gets moved to Kubernetes, so all environments are sitting behind Cloudfront servers. That means, in the initial implementation, only the cache-busting could be implemented. Or, we can enable Cloudfront for DEMO and MP now. There should not be any known issue at the moment that would prevent that.
Operation
N/A
Security Impact
No changes in security. The entire RFC is about assets that are already public.
Performance Impact
We should see a spike of requests in Cloudfront after the solution gets deployed, but it shouldn't be bigger than the spikes we have after each release in EBO. After that, the number of requests fired to Cloudfront and S3 should get smaller than before.
We'd like to measure how many requests to static assets gets fired on average, so we can verify the solution is working. The counters can be collected just by checking the WAF in production/staging, which logs each request to Cloudfront.
Developer Impact
No impact on the development process because the assets are collected and processed only and only when deploying to production environments or EBOX or staging.
Data Consumer Impact
N/A
Deployment
The cache-busting via hash and changes in bundling can be deployed individually with help from devops to make sure everything is working.
EBO's deployment pipeline
The deployment pipeline for EBO would require adjustments too. Currently, the statics for production are generating when deploying a new EBO version to staging. Later, the statics get promoted from staging S3 to prod S3. Jenkins downloads the statics from the staging S3 bucket and then uploads them to prod S3. The statics are in the folder named after the version we're deploying, so it always copies only files related to the current release. With moving all assets into a single folder without the version in the folder, the number of files in the folder should grow with each release, so the time to promote statics this way should grow over time, and it's no longer a feasible solution.
Instead of that, we should call collectstatic command twice, once with S3 bucket for staging and once with S3 bucket set to prod.
Cleaning up existing buckets
We'd like to remove unused versions in existing buckets too. Some of them contain assets for EBO of version 4.X.X or 5.X.X, which are 2-4 years old, and we will never roll back to these versions.
In order to perform the cleanup of files that have a hash in the URL, we can create a folder in S3 bucket where we can store all generated staticfiles.json, but this folder should not be publicly available because it reveals the entire folder structure of the S3 bucket. Then we can have a simple script that reads all staticfiles.json files, determines which file in the bucket was used with what version and only clean up files that were used before a given version, e.g. 7.0.0.
Dependencies
N/A