Identity and Access Management (IAM)
This is a Request For Comments on a proposal to handle Identity and Access Management for Ebury services.
Scope
The scope of this document covers authentication and authorization of Ebury services.
Background
-
Services (old and new) need to come up with their own way to authenticate and authorize users, often neglecting best security practices and repeating what is a hard problem.
-
There's no single view of the authentication and authorization policies enforced by the services. Each service needs thorough independent review, this is time consuming and error prone.
-
Security-wise the global current state is not ideal as we are missing simple features like limiting the number of failed login attempts: https://docs.google.com/document/d/1kJs3LTMMOL-ikvUEWodWjSYraNzcaC6XNDX4X5rwfKE/edit
-
There's no cost effective way to enforce best security recommendations globally (e.g. password strength, MFA, login limits, passwordless authentication, etc).
-
We have rolled our own sub-optimally maintained custom OpenID provider API Auth (originally written 7 years ago) instead of relying on a well maintained proven platform.
-
Often for service-to-service authentication, teams would store plain tokens with no expiration in services like Vault.
-
Services implement MFA flows through Verify but that's an optional feature that's up to the service owners instead of being globally enforced by security policies.
-
Operations Dashboard/Bosporus has taken a step in the right direction by using an IAM solution (Keycloak) for their needs.
-
Bosporus has SAML integration with Ebury's Google Workspace and is applying role based permissions based on Google Workspace roles (originally from hibob), showcasing some of the capabilities of Keycloak.
-
There's a fair amount of know-how accumulated around Keycloak (configuration, infrastructure, features) from the work done in Operations Dashboard and API.
Problem Description:
We want a single source of truth when it comes to authentication and authorization.
We want to use the best industry standards available for all our services and avoid re-implementing authentication and authorization across Ebury.
Solution
Leverage Keycloak as the central IAM solution for Ebury.
Keycloak is a mature, widely used, authentication and access management solution from Redhat, an established Open Source software provider. Keycloak is also comparatively easy to configure and develop against and we have built a lot on top of it during the Operations Dashboard development.
Keycloak is FAPI compliant and is actively following the latest developments of the specification.
Notably it supports: - WebAuthn/Passkeys for passwordless login - Brute force attack prevention on login - Client Initiated Backchannel Authentication: useful for example for payment authorization flows (PSD2/SCA) and other offline authentication uses (e.g. client servicing requests). - MFA: TOTP/HOTP support by default, easy to extend for support of other methods like SMS 2FA and push notifications. - mTLS Support: Which would take over the uses we have today in Ebury API Auth, Ebury API Webapp and EBO. - Full SAML 2.0, OpenID Connect support. Which includes "social login" through Google (as used by the Operations Dashboard) and others (like Facebook, Twitter, Github, LinkedIn, Microsoft and more). This would render Ebury API Auth obsolete. - A RESTful admin API that allows for automation and other use cases. - A mature Terraform module for configuration as code and there's also a A Kubernetes operator available. - Service Provider Interfaces allow us to extend Keycloak with features like other MFA devices, login flows, user providers and even custom REST endpoints. - [Authorization Services(https://www.keycloak.org/docs/21.1.1/authorization_services/]): Supports Attribute/Role/User/Context/Rule/Time based access control. Can also support custom access control mechanisms via SPI. - Realms: Realms allow for complete isolation of data. Each Realm is in essence a separate identity provider. This allows for a single Keycloak to manage multiple uses safely. - Migration: there are a handful of useful plugins to migrate things into Keycloak. For example there's a Keycloak User Migration plugin to make migration of existing service users into Keycloak's authentication quick and gradual. - Themes: Keycloak allows customizing the look and feel of end-user facing pages so they can integrate seamlessly with our applications. JAM has work already scheduled on this.
Service Ownership
Keycloak is currently deployed into the risk-admin namespace in our Kubernetes clusters and used only by the Operations Dashboard. We'll move the Keycloak deployment into publicfrontal in its own namespace (iam) so it's clear that it is available for any service, Operations Dashboard will just become one of the many applications that use Keycloak as the IAM system.
| New Service | Service Name | Service Owner |
|---|---|---|
| No | Keycloak | JAM Team |
Initially JAM will manage this service but the focus will be to provide the tools and processes such that other teams can use and configure Keycloak effectively for their own needs on the realms they'll own.
Alternatives
There are other alternatives like (among others): - Auth0 - Authlete - Okta - OneLogin - Ory - Raidiam
The alternatives listed all provide similar functionality. Some of them include explicit FAPI support.
The key reasons to continue with Keycloak are: - We have infrastructure in place - We have configuration as code ready - We have a lot of knowledge around the tool as it's already serving well on production - Red Hat paid support is an option if needed at any point - There are Cloud offerings available if we decide to move it out of our infrastructure, e.g. Cloud IAM or Phase Two.
Caveats
Setting up the IAM solution is just the first step, each integration and use case will require careful planning and coordination.
We expect each project to provide their RFC under the IAM umbrella.
Operation
The JAM team will set up the following to ensure a proper operation of the service:
- Infrastructure code: Charts, images and other infrastructure automation to deploy this service in our development, staging and production Kubernetes clusters.
- Configuration Management: The ebury-keycloak repository will manage all the configuration applied to development, staging and production environments.
- Monitoring and Auditing: Will instrument complete metrics and logging to ensure auditing and monitoring can be successfully done on authentication and authorization actions.
- Incident Response: Develop and implement any special incident response procedures (over the currently established ones) to address any security incidents or breaches related to IAM.
Each application provisioning and setup will be the responsibility of their team with review and oversight of JAM and the Security Engineering Team.
Security Impact
The impact will be gradual as more and more services migrate to the platform:
- Increased global security: As more projects implement Keycloak as their IAM for authentication and authorization, they'll get global security best practices enforced (MFA requirement, passwordless authentication, login limits, Clickjacking protection, etc).
- Centralized Auditability and Compliance: Improved auditability of authentication and authorization requests and policies. Keycloak is also already instrumented with metrics and log shipping.
- Key rotation capabilities: As Keycloak will manage keys, it will be possible to rotate such keys (of course with client services coordination). No more (leaked) credentials hardcoded forever.
- Speedy handling of CVEs: Keycloak has been historically handling CVEs in a timely manner https://www.cvedetails.com/product/46161/Redhat-Keycloak.html?vendor_id=25
Performance Impact
N/A
Developer Impact
-
Speed up development as new services and projects don't need to re-implement authentication / authorization and to think around the security implications.
-
Globally enforced security best practices will be available (e.g. limit number of failed logins, Clickjacking protection, recovery flows, password policies).
-
Increased auditing and logging out of the box for anything related to authentication and authorization.
-
A policy framework will allow developers to express their resource authorization rules in a uniform way.
-
User account management dashboard provided by the IAM. Here users can update their information, passwords, MFA settings, applications and review their device activity. Potentially removing the burden of creating such functionality on their own service. There are still account management APIs available if there's a need for custom management pages.
Deployment
The risk-admin Keycloak will move into a general namespace (proposed iam) and applications will use it as needed, configuring capabilities and needs as code in the ebury-keycloak repository.
The chart and images for deploying Keycloak are already available as it's serving the production Operations Dashboard.
We'll ensure availability of the admin dashboard and APIs is internal-only.
Dependencies
N/A