Sherlock audit certificates
Problem Description
Comply Advantage (CA) provides PDF certificates for individual searches and currently we manually store those certificates in S3 and Google Drive for audit purposes.
This proposal will automate the process of storing those certificates from Sherlock.
Requirements
- Search and hit certificates should be stored in S3 and google drive.
Background
CA search certificates are PDF documents that summarize the results of one particular search performed on CA. It contains the details about the search itself (e.g. search term, date created, types...) and a list of matches with some details (relevance, status, risk level...). This type of certificate is available both through CA web interface and CA's API.
Another PDF certificate type available on CA is the "entity certificate" which offers more information for each search match, including comments made on the CA platform. This type of certificate it's only available through the CA web interface.
Currently search certificates are uploaded to Google Drive along the rest of the client's documentation on "the client's file" (a directory inside one of multiple "CLIENT DOCUMENTATION - XXX" shared folders on Google Drive) as part of a manual procedure and almost exclusively as part of the the onboarding process. Certificates are kept for audit of the performed screening.
To get the certificate into the client's file the PDF is manually downloaded from CA (which often requires reassigning the case on CA momentarily), change the file's name and upload it to the corresponding location on Google Drive.
To keep more complete records for audit purposes and avoid the time consuming manual downloading/uploading we'd like to automate the procedure and not restrict it to only recently onboarded clients.
Solution
Sherlock will be responsible for retrieving the certificates from Comply Advantage’s API, uploading them to S3 and sending a request to the file upload service for uploading it to Google Drive.
At what point of the lifecycle of the search/hit will the certificates be uploaded it's yet to be fleshed out with the product and compliance teams.
The most basic approach would be to generate a new search certificate every time there's a status update detected in Sherlock. In that case a certificate URL would be requested to CA's API and then uploaded to Google Drive using the file upload service.
Extra rules on top of this basic mode of operation shouldn't further complicate the design as long as all the information is available to Sherlock.
To store the certificates we'll split the implementation in two stages:
- The certificates will get uploaded to S3 by sherlock and allowed users will be able to download them through sherlock's admin interface.
- The certificates will be uploaded to google drive using the file upload service.
Alternatives
N/A
Caveats
- Currently CA doesn't offer a way to export hit certificates through their API. It's still not clear if that would completely block this feature or a first version with only search certificates will already add value to the involved teams.
Operation
This feature won't change current operation of Sherlock.
Security Impact
Certificates uploaded to S3 from Sherlock will only be accessible through the admin interface leveraging django's permissions system.
Certificates uploaded to Google Drive should be only accessible to those who would have access to the client's folder in regular Google Drive operation. This is assured by the Uploads Service.
Developer Impact
N/A
Deployment
We have to make sure Sherlock has access to the file uploads service for this feature to be correctly deployed.
Dependencies
This feature depends on access to CA API which is already at its core and access to the file uploads service.