Sending email via SendGrid API
A central service to send emails from all applications via a provider. This RFC does not intend to answer all questions, instead it serves as a first step to initiate a discussion on our plans regarding email sending. In the first milestone of the project the intended recipients are all internal Ebury email accounts.
Problem Description
Applications each have their own way to send emails now. Typically they use an SMTP interface, have their own SMTP settings (e.g. authentication data), and use their own Ebury templates. This is not ideal from a maintenance standpoint, because if something changes, every application must be individually updated. Furthermore, the SendGrid API offers more possibilities than the traditional SMTP connection.
Background
Without a central email sending service, applications didn't really have any
other option than to manage email sending themselves. Django applications
tend to use Django's built-in email functionality which is backed by Python's
standard
Solution
Create a separate service running in a container that allows any other service to send emails without having any SMTP connections. Furthermore, create a client library for this service, mimicking the API of Django's similar class, therefore enabling the quick and easy replacement of one implementation with the other. This way the SendGrid API can be leveraged, but it is also possible to use a fallback email provider with the standard SMTP feature-set.
SendGrid API advantages
- Recipient email address checking
- Good control of outgoing flow (IP warmup and anti-spam measures)
- Two-level authorization control (User and subuser)
- Link branding
- Store and manage rich email templates in the Design Library
- Schedule mail sending
- Mass mailing (e.g. for monthly reports)
Architecture for mail sending
The email service has an SQS queue for incoming messages. One or more daemon processes consume the messages (using the locking system to avoid sending messages multiple times), archive them in an S3 bucket, and send them to SendGrid's API or the fallback provider's SMTP connection. Technically the Email Service itself is the consumer process running in one or more container instances.
Sandbox mode
In sandbox mode, the service is up and running as expected, however the consumer
of the queue does not actually send emails, instead just puts them
into the log. This can be useful for development and debugging purposes. A
dry-run target is included in the Makefile which executes in sandbox mode.
Mock objects
The Email Service client library contains a set of mock and mommy objects that can be used in any test to verify if the emails are sent correctly upon certain events.
Environments
Just as with any other components, Email Service also has multiple ways of deployment. In a dev box, it is likely sandbox mode, i.e. mails are not sent at all, just appear in the logs or the console. However, also in staging, but also in dev it is possible to setup Email Service to use a testing email account. In this staging deployment the emails are actually being sent through SendGrid, but the user (or subuser) actually used to log into SendGrid is a very limited one which can only send emails to a predefined list of addresses. In the first milestone of the project only internal emails are sent this way, e.g. alerts and reports, therefore this restriction is mostly to avoid spamming the inbox of unsuspecting bystanders. This way a UAT or even a dev test can actually do the whole process end-to-end, and see if the email is 100% OK, including design, templates, and the like.
Client library
The client library is a very thin layer which receives the parameters of the email to be sent, and adds the task to the SQS queue. The format of the task must be generic, it must not include any dependencies on the processing, e.g. it can't have SendGrid-specific fields (like campaign ID or template name). Bear in mind that the provider can change, even the protocol can change, so these messages might end up being push messages in the future.
Possible future developments
- SendGrid's API allows to have templates which only need to be "personalized" before sending. This way the management of all templates can be done within SendGrid, and it'll be easier to ensure a consistent visual experience for our customers.
- After the Prometheus-based monitoring RFC's implementation is in production,
the Email Service shall incorporate the guidelines of that RFC, and expose a
/metricsendpoint for Prometheus to be scraped.
Alternatives
Alternatively it is possible to create an in-house SMTP server which serves as a "man-in-the-middle" agent between the applications and the email provider. The advantage would be that the current applications don't require any code change, merely the SMTP server parameters need to be updated. Then the new SMTP server would call SendGrid's API with the received email, or use an alternative provider via SMTP.
Caveats
- Using a non-standard API (i.e. SendGrid's) makes us somewhat locked in to that specific vendor. The alternative solution retains the standard interface between applications and the email sending service. A delicate balance must be maintained between the use of SendGrid's features and keeping the Email Service provider-agnostic.
- The email service must be very reliable, because if it stops, important alerts can be lost, having an impact on the service we provide to our customers.
Operation
The platform is operated by SRE, the development team operates the application. Since the monitoring RFC is not fully implemented yet, in the first release we remain with Nagios for alerting. The data monitored is the SQS metrics and the SendGrid (or fallback provider) usage statistics.
Security Impact
This service is sending emails in Ebury's name, so security is of utmost importance. In order to preserve flexibility, there is a mapping between applications as users of this service and the users/subusers registered with the email provider. The user specified in the SQS task is a username (e.g. "techops") which is resolved to a sender email address by Email Service (e.g. "techops.team@ebury.com"). This mapping is part of the source code, it can be updated by releasing a new minor version of the product. Authentication and authorization are both performed by SQS.
Since the email service is not available from outside the intranet, it is not prone to DoS attacks or other vectors.
Performance Impact
During normal operation, the service is not expected to have big performance impact. Clearly it has an HTTP/S communication cost, but email sending is not expected to be real-time with the SMTP client either, so that's no difference. The only critical period is when large amounts of emails must be sent, each of them containing large amount of attachments, e.g. monthly statements to customers. Currently this does not happen in email, so the service should be safe from a performance standpoint.
Developer Impact
This service has a considerable developer impact. Even if Django's email class can be replaced, there are other services that use a different approach, and the exact number of these is unknown. Clearly there is no need to switch all existing services immediately to use the new service, but whenever the moment comes to do so, it will have an impact on all services using email. With all that said, the implementation itself shouldn't be a really tough one, since most (if not all) services use some existing mail sending API, so the level of abstraction is hopefully already sufficient.
Deployment
Before firing up the email service, the mapping between applications and sender email addresses must be created. E.g. if BOS needs to send mails, it must have a user called "bos" in Email Service, and when an email from BOS is being consumed from the queue, it could be sent with the sender address "bos@ebury.com". When an application wants to start using the email service, it has to request two new users: one for testing purposes, and one for production.
Dependencies
This RFC has no dependencies.
Notes
The project has been put on hold so the Kafka implementation mentioned in the previous version of this RFC is not implemented.
For more information check the PR.
References
- SendGrid API description: https://sendgrid.com/docs/api-reference/