User Authentication & Authorisation in BOS Channels

This is a Request For Comments on a proposal to handle user identity and access management for BOS Channels with Keycloak (https://www.keycloak.org/).

This RFC relates to the "Bosporus" GraphQL gateway and Operations Dashboard.

Problem Description

We want to identify and authenticate Operators (users) via Google SSO for a simpler user provisioning story, and an overall better user experience.

We want to authorise Operator actions by user role, and other more fine-grained authorization policies.

We can not depend on BOS Django as it is a legacy project.

Background

Bosporus is a Backend For Frontend (BFF) to enable a new BOS Channel: The Operations Dashboard.

Frontend: A web server that presents the Operator Interface (OI) and requires authentication of Operators (users) in order to perform sensitive tasks.

Backend: An API gateway implementing GraphQL that requires fine grained authorisation rules in order to manage access, and reporting of same.

Operations Dashboard currently authenticates and authorises users via a BOS Django session; Bosporus services exist on the same URI as BOS via a Reverse Proxy on the BOS side. (This was a decision taken to speed up development of the project.)

Solution

The solution must meet the following constraints to ensure it is as much of an implementation detail as possible (making change easier):

  1. Identity management is out of scope (handled in Google Workspace)
  2. Authentication via well established standards
  3. Authorisation code not tightly bound to application code

Keycloak is a free, open source identity and access management solution from Redhat, in relation to above constraints Keycloak provides an off the shelf solution with:

  1. Identity Brokering (out of the box support for Google, Bitbucket, Github, LinkedIn...)
  2. OAuth 2.0/OpenID Connect support
  3. JavaScript and Node.js adaptors, along with community software like keycloak-connect-graphql

Keycloak is a mature, widely used, authentication and access management solution from Redhat, an established Open Source software provider. Keycloak is also comparatively easy to configure and develop against -- a Docker image and comprehensive documentation -- along with JavaScript and Node.js adaptors, which helps reduce lead time for the overall project.

Authentication

Operations Dashboard frontend will use OpenID Connect identity verification flow with Keycloak acting as an identity broker for Google SSO (IdP)

Google will be the default and only “social login” provider configured in this Keycloak, with users being redirected to Google directly without an interstitial login on Keycloak.

Operations Dashboard will use a session store for tokens (rather than storing the Token in a cookie), to enable Keycloak Admin directed logout of user sessions from Keycloak.

No user creation on Keycloak, IdP side is responsible for all user provisioning.

Keycloak logout will not trigger IdP logout.

Operations Dashboard OIDC client will allow revocation of all session and access tokens in case of systems compromise.

Each deployment of a frontend + backend represents a Keycloak Realm. A Realm secures and manages security metadata for a set of users, applications, and registered OAuth clients.

Authorisation

Authorisation patterns are described in https://www.keycloak.org/docs/latest/authorization_services/index.html, this section is about the application of any authorisation scheme to Bosporus services.

Eg. A policy can be configured in Keycloak that only allows write access to a resource when the following conditions are met:

  • A user has a mandatory role
  • A user has any, but not none, from a list of roles
  • The time of day is between 6am - 10pm

Roles

User groups in Google Workspace should be added as custom claims on users (https://cloud.google.com/identity-platform/docs/how-to-configure-custom-claims) . These claims can then be mapped to roles in Keycloak via Identity Provider Mappers (https://www.keycloak.org/docs/latest/server_admin/index.html#_mappers).

Resource Servers

Frontend

Simple page/URL level authorisation control (authenticated user can or cannot see this page).

server.get('/basic-auth-required', keycloak.protect(), (req, res, next) => {
  ...
});

Resource-Based Authorisation allows us to define policies in Keycloak to protect resources, keeping auth and business logic apart:

server.get('/bos-sensitive', keycloak.enforcer('bos:read'), (req, res, next) => {
  ...
});

Backend

Field level authorisation control.

Auth directives, for example @auth, @hasRole and @hasPermission can be used in the GraphQL schema. This declarative approach means auth logic is never mixed with business logic:

Simple check for identity based authorisation (if authenticated, then authorised):
type Query {
  listCurrencies: [Currency!]! @auth
}
Role based authorisation:
type Mutation {
  updateCurrency(currency: Currency!): Currency! @hasRole(role: “country-manager”)
}
Permission based authorisation:
type Mutation {
  updateCurrencyRisk(currencyCode: String!, risk: Risk!): CurrencyRisk! @hasPermission(resources: [“Risk:update”, “Currency:update”])
}

Two Person Principal Auth

This is an authorisation pattern where a task requires 2 different authorisations. The authorisation for each part is handled as per above. Each authorization can have different levels of permissions.

The application under authentication is responsible for knowing when both authorisations have been given. As such this is not detailed in this RFC.

Auditing and Events

Keycloak provides a rich set of auditing capabilities. Every single login action can be recorded and stored in the database and reviewed in the Admin Console. All admin actions can also be recorded and reviewed.

Alternatives

Auth0/Okta https://auth0.com/ - same idea but with different provider. This would come with a service charge, whereas Keycloak does not.

Google SSO with simple RBAC with something like maticzav/graphql-shield and custom claims in the Google auth token. We’d have to provide the auditing ourselves.

Ory Hydra / Ory OathKeeper / OPA - Hydra for OAuth/OIDC, Oathkeeper for authorisation, OPA for policy decisions (we aren't yet talking about policy decisions but Keycloak supports them). This is potentially a future direction we want to take and an RFC along these lines may supercede this one in the long term, but the ecosystem and ease of setup of Keycloak is preferred at this stage.

Caveats

n/a

Operation

Client Services will add new users to Google Workspace and assign roles to that user. Users without the required roles will be able to authenticate, but will have no access to sensitive areas of applications. The roles will be pre configured within Google Workspace and Keycloak (See Role / Configuration Management).

Hosting

Day to day running of a Keycloak would be the responsibility of the team who’s domain the service is authenticating (in the case of Risk Admin the JAM team), with review oversight from Security Team.

Role / Configuration Management

A single script can configure roles in both Keycloak and in Google Workspace and custom claims associated with these roles.

https://developers.google.com/admin-sdk/directory/reference/rest/v1/roles https://www.keycloak.org/docs-api/13.0/rest-api/index.html https://cloud.google.com/identity-platform/docs/how-to-configure-custom-claims

Gitops management of roles via config script would be the responsibility of Security and Support Teams (A similar process to the current Django roles management https://github.com/Ebury/ebury_support/blob/master/utils/role_positions/positions.json).

Security Impact

Pros:

  • Industry standard authentication.
  • Off-the-shelf, and well supported implementation.
  • Single system to maintain, e.g. upgrade, for all services.
  • Single place to manage user BOS access.

Cons:

  • Potential single point of attack (We may split BOS user interface functionality into multiple applications and "firewall" access within each either by using Keycloak Realms or by having multiple Keycloak instances)

See also Threat Model Mitigation https://www.keycloak.org/docs/latest/server_admin/#threat-model-mitigation )

Security Updates

https://www.cvedetails.com/product/46161/Redhat-Keycloak.html?vendor_id=25

CVEs for Keycloak appear to be handled in a timely manner.

Performance Impact

n/a

Developer Impact

n/a

Data Consumer Impact

n/a

Deployment

Kubernetes

https://www.keycloak.org/getting-started/getting-started-kube https://www.keycloak.org/getting-started/getting-started-operator-kubernetes

Upgrades

https://www.keycloak.org/docs/latest/upgrading/

Dependencies

  • PostgreSQL

References

  1. Keycloak Nodejs OIDC Adaptor https://www.keycloak.org/docs/latest/securing_apps/index.html#_nodejs_adapter
  2. GraphQL Keycloak auth https://github.com/aerogear/keycloak-connect-graphql
  3. Keycloak REST API https://www.keycloak.org/docs-api/13.0/rest-api/index.html
  4. Backends For Frontends https://samnewman.io/patterns/architectural/bff/

Other

We assume a running Service Mesh in k8s to handle machine to machine auth. However, if not, Keycloak can provide the Client Credentials Grant flow.