Network segregation in Kubernetes
Prerequisites
- [x] Support for Kubernetes network policies is available in our Kubernetes clusters.
- [x] VPC peering between Kubernetes cluster in
backofficeVPC andpublicfrontalandnatonlyVPCs is configured.
Reference Documents
| Reference | Document Location |
|---|---|
| NETWORKING01 | ECS clusters networking |
| NETWORKING02 | Kubernetes network policies |
| NETWORKING03 | Security groups for pods |
| SECURITY01 | Ebury Network and Cloud Security Policy |
| ARCHITECHTURE01 | Ebury 2.0 architechture |
Problem Description
At Ebury, we follow "defense in depth" and "principle of least privilege" when it comes to access to resources (see Ebury Network and Cloud Security Policy). This implies that all traffic between workloads in Kubernetes MUST be denied by default and allowed only when needed.
We need provide guidelines and an easy to use interface allowing development teams to define which connections need to be allowed for the different workloads to fulfill their tasks.
Background
ECS migration
A migration of workloads from Amazon ECS to Kubernetes is being planned.
The current setup in ECS provides network segmentation following guidelines specified at NETWORKING01 document.
The current segmentation defines three VPCs (publicfrontal, backoffice and natonly) with one
or more ECS clusters. The workloads are placed on different VPCs based on which access is going to
be allowed (respectively: available from the internet, available to Ebury operations offices and
VPN, and available internally for other services).
In addition, the workloads in ECS access a number of AWS resources (Databases, Cache, etc.) that are currently reachable only in private networks within those VPCs.
Furthermore, the legacy BOS infrastructure has its own VPC.
In order to add workloads from ECS into Kubernetes, the new infrastructure MUST support the same level of segmentation as we have in ECS.
New services
In addition to workloads migrated from ECS, new services belonging to Ebury 2.0 architecture are also being deployed in Kubernetes (mainly Query Services so far). Those workloads usually need access to Kafka cluster and its own databases, and are accessed via API from the different channels.
Tools
Several tools available in the cluster needs either access to all workloads (like Prometheus for scraping metrics), or being accessed from all workloads (like Vault for serving secrets). In addition, those services exposed outside the cluster will need to be reachable by the different ingress controllers.
Network policies
The NetworkPolicy object in Kubernetes can be applied to one or many pods in a namespace, and are
able to allow ingress from selected pods in the same or other namespaces. The namespace and pod
selection is done through labels in both pods and namespaces. Allowing ingress from specific IP
address is also possible, but not very useful for our use case. One capability of interest is to
define egress rules to specific CIDR blocks, this should be useful for restricting traffic to
external resources.
Network policies in Kubernetes works at OSI Layer 3/4.
Egress traffic
Since we have a single K8s cluster in production, but three VPCs with different AWS resources,
we will need to control which pods and namespaces in the cluster can reach resources in those VPCs.
As we cannot include those ACLs as ingress in the VPCs, we will need to have some sort of ACL for
egress in the K8s cluster. This requirement can be addressed using service mesh, but until we have
this capability we can implement it using a set of Egress network policy objects.
Accessing AWS resources
Services make use of AWS services and resources. Some of them are access the resources through API (SQS, SNS, S3, etc.), with authorization granted through IAM roles. Those are out of scope for this document.
Other resources, like RDS, Elasticache, Elasticsearch, MSK clusters or DocumentDB are accessed through Network.
| Service | RDS | Elasticache | Elasticsearch | MSK | DocumentDB |
|---|---|---|---|---|---|
| Autoclient Balancer | [x] | ||||
| Account Details Service | [x] | [x] | |||
| API | [x] | ||||
| Beneficiary Services | [x] | [x] | |||
| BOS | [x] | [x] | [x] | [x] | |
| BOS BrokerDeal Gateway | [x] | ||||
| Bosporus | [x] | ||||
| Cornado | |||||
| Documents Service | [x] | ||||
| EBO | [x] | [x] | |||
| EF3 (Form 3) | |||||
| Email Service | [x] | ||||
| FeeTier Service | [x] | ||||
| FileAct Sentenial Proxy | [x] | ||||
| FXSuite | [x] | [x] | [x] | ||
| FXS BrokerDeal Gateway | [x] | ||||
| Http Kafka Connector | [x] | ||||
| Payments Query Service | [x] | [x] | |||
| PSSDS | [x] | ||||
| QFC | [x] | ||||
| QFS | [x] | ||||
| SEPA Core | [x] | ||||
| SEPA Service | [x] | [x] | |||
| Sherlock | [x] | [x] | |||
| Smart Date | |||||
| SmartTrade FIX Gateway | [x] | [x] | |||
| SWIFT Funds Categorizer | [x] | ||||
| SWIFT Service | [x] | ||||
| Token.io | |||||
| Transformer INTERNAL | |||||
| Transformer SEPA | |||||
| Transformer SWIFT | |||||
| Verify | [x] | ||||
| Webhooks | [x] | [x] |
Solution
While migrating from ECS
Different workloads in Ebury transactional platform have access to different data with different levels of clearance required. Here we propose a model where, in a first phase, all data in AWS resources is reachable from any workload in the cluster (though protected with password in the case of databases). Follow up phases will allow us to restrict access from workloads to only the required resources.
Taking this into account, we will elaborate a migration plan where workloads with the less sensitive data is migrated first.
Access to workloads in Kubernetes cluster
Namespaces are created as part of infrastructure. All namespaces will include a label
(networkArea), mapping to each of the zones specified in
ECS clusters networking:
- networkArea:
publicfrontal - networkArea:
backoffice - networkArea:
internalservices(formerly,natonlyin the legacy infrastructure)
Each network area will have a different ingress controller, with different external traffic allowed:
publicfrontal: Accessible from the internet (protected by WAF)backoffice: Accesible from Ebury offices, VPN_GENERIC, VPN_IT, VPN_IT2, NAT Gateways for the legacypublicfrontal,backofficeandnatonlyVPCsinternalservices: Accesible from VPN_IT2 and NAT Gateways for the legacypublicfrontal,backofficeandnatonlyVPCs
Each namespace in the cluster will have some ingress network policies defined by default:
- Ingress traffic is denied by default.
- Ingress traffic within the same namespace is allowed.
- Ingress traffic from namespaces in the same network area is allowed.
- Ingress traffic from ingress controller for the specific network area is allowed.
- Ingress traffic from
publicfrontalarea will be allowed inbackofficeandinternalservicesareas. - Ingress traffic from
backofficearea will be allowed ininternalservicesarea. - Ingress traffic from monitoring namespace is allowed.
Services will be able to define their own additional, specific network policies by including them in the Helm Chart definitions. It is responsibility for each service to define the contract that any service connecting to them must fulfill (i.e. "Traffic is allowed for all pods in a given area that have a specific label"). Then, it is responsibility for the client to fulfill the contract (i.e. "To be deployed in a specific area, an define the needed label").
Building blocks for including ad-hoc network policies may be provided as part of the available Helm library.
Access from Kubernetes cluster to workloads in legacy infrastructure
NAT Gateway IPs for backoffice (where the production cluster is located), will be allowed in the
Load Balancers for all ECS services as well as the BOS Load Balancer.
Network access from Kubernetes cluster to AWS resources
VPC peering for private networks will be configured between then VPC where production cluster is
(backoffice) and the other VPCs (natonly, publicfrontal) and also the Kafka VPC.
Kafka cluster is already opened by security group to all the VPCs, so no further action is needed for granting access.
For other resources, traffic will be opened to the CIDRs for private network in the Kubernetes cluster (analogous to what has been done for BQS database).
Each namespace in the cluster will have some egress network policies defined by default, effectively
restricting access from arbitrary pods to sensitive resources in AWS (i.e. pods in publicfontal
will not be able to reach RDS in backoffice by network).
- Egress traffic to public IPs is allowed by default.
- Egress traffic to private IPs (besides own cluster CIDRs) is denied by default.
- Egress traffic to
backofficeCIDRs will be allowed frombackofficearea. - Egress traffic to
publicfrontalCIDRs will be allowed frompublicfrontalarea. - Egress traffic to
natonlyCIDRs will be allowed frominternalservicesarea. - Egress traffic to
kafkaCIDRs will be allowed frombackoffice,publicfrontalandinternalservicesareas.
NOTE: Several services are placed in the internalservices network area but they are using a Vault
server and database in the backoffice VPC. Enforcing the egress policies would prevent these services
from connecting to these external resources. The proper fix for this situation is for each team to
migrate their Vault, DB servers to the natonly VPC, however since this would be a costly effort, the
following workaround can be pursued:
- Move the impacted namespaces to the
backofficenetwork area. This will only imply a change in the label, not requiring the namespace/workloads to be re-created, thus preventing outages. - For each impacted namespace, create a new ingress network policy that will ensure traffic from the
internalservicesingress is explicitly allowed.
Namespaces impacted by this workaround are: beneficiaries, gpi-gateway, gpi-interface,
payments and risk-admin.
Mid term solution
Alongside with the migration of less sensitive workloads, operational and security enhancements will be developed.
Linkerd service mesh will be supported in the cluster. The service mesh will provide mTLS based authentication to workloads, allowing us to have fine grained control over which inbound and outbound traffic is allowed for meshed workloads. By default, all traffic will be disallowed, and each workload will need to allow it explicitly, through annotations and CustomResourceDefinitions (CRDs)
Access to workloads in Kubernetes cluster
backoffice area shall be deprecated over time, as over time, all access from Ebury offices shall
be done with authenticated proxy or through any of the channels.
In the long term, the policies will be modified for being more restrictive:
- Ingress traffic is denied by default.
- Traffic from namespaces in the same network area is allowed to pods with specific label (endpoint: 'true')
- Traffic from ingress controller for the specific network area is allowed to pods with specific label (endpoint: 'true')
- Traffic from
publicfrontalarea will be allowed ininternalarea to pods with specific label (endpoint: 'true') - Traffic from monitoring namespace is allowed.
Further connections between services will be specified by each service through "contracts" defined as network policies or rules in the service mesh.
Alternatives
-
Use three different clusters in three different networks (though in the same VPC) for each of the current VPCs. It would be cost and operationally inefficient, and it will need some refactor in the infrastructure code.
-
Instead of VPC Peering, use VPC Endpoints for resources in AWS. Though that would increase level of isolation, it would increase complexity and time to deliver. In addition, most likely, we may be moving all the AWS resources to a single VPC or to the same VPC as the cluster, controlling outbound traffic with service mesh, making the need for VPC endpoints less interesting.
-
Use Security groups for pods for controlling access to AWS resources from specific workloads in the cluster.
-
Centralize Network policies operation with Calico Enterprise. Overall, it will provide better visibility for connections between workloads, but in terms of defining policies, we do not perceive great advantages.
-
Define and implement support for Security Groups at pod level straight away and delay migration of workloads in ECS until security concerns are addressed.
-
Use a different service mesh. We have considered Istio and Kuma as alternatives, conducting PoC on both. Even though both are richer in terms of features as compared with Linkerd, Istio was discarded because of its complexity and big footprint, while Kuma has been discarded because its small community, being supported basically by Kong. In addition, it is quite likely we would need Kuma enterprise features at some point, and our prior experience with Kong licenses and pricing model is not very good.
In addition, in both cases it would mean a change in how we manage ingress controllers, replacing them with gateways. Although we will probably want to do that at some point, it is not strictly required at the moment. In the case of Kuma, as it is not possible to disable mTLS for specific pods, if we want to enable mTLS (and we would need to do so for enabling traffic permissions), traffic coming from external Load Balancer in AWS would be unauthenticated, so traffic between ingress and pods would be disallowed if we keep our ingress setup.
Caveats
Although mitigated by other authentication and authorization means and by perimeter security, "defense in depth" would be softened while the migration is ongoing and we have workloads in both Kubernetes and ECS.
Creation of namespaces is still something to be handled by platform. Taking into account that a namespace should correspond roughly to a business domain, it is not expected to be growing per project, and the impact in being a self service platform is not high.
Operation
Namespaces and default network policies are created as code and supervised and approved by Platform teams.
Specific Network policies are defined alongside the projects in their Helm charts, using building blocks provided for ease of use.
Security Impact
-
By opening the ingresses to ECS NAT gateways, the legacy VPCs will have access to services already migrated in the kubernetes cluster that were not available for them before. The limitation is because in Kubernetes we have a single Load Balancer per network area, while on ECS we have one Load Balancer per service. Once every service is in Kubernetes, that would not be a limitation, but while the migration is being conducted and we have workloads in both ECS and Kubernetes, there will be impact.
-
By allowing traffic from Kubernetes NAT Gateways in all the Load Balancers in the legacy infrastructure, for the time the migration is ongoing, services already migrated to Kubernetes will have access to some services in ECS that were not available before.
-
Some resources (mostly databases) that were not reachable by network before will be reachable now for any service inside the cluster, at least until long term solution is implemented. This is mitigated as the databases are protected by user and password.
-
Kafka cluster will be reachable and accessible for any workload in the cluster. This is no different than the current situation in ECS, but it is still a security concern as authentication and authorization is not implemented yet in our Kafka setup.
Performance Impact
N/A
Developer Impact
- Services will need to specify, by including a label, which pods in their deployments are expected to be accessed from other services.
Data Contracts
N/A
Deployment
The support for network policies will be staged in two phases:
- First, during the migration, we will focus in reachability between legacy platform and Kubernetes.
- Secondly, we will focus in proper segmentation and defense in depth for the workloads running in Kubernetes.