Jenkins split by domains

Problem Description

Our current CI/CD environment is a single Jenkins instance working as controller (a.k.a Jenkins master) from dozens of agents that are created and destroyed on EC2. This means that whenever there is an issue, bug, maintenance, etc. the full technology department is affected.

This infrastructure is the same for the both existing environments (test and production).

Background

Jenkins has been used as the CI/CD tool in Ebury since the beginning of tech department but due to the growth of the department and its projects we have started to experience poor performance. In addition we have been unable to update to the latest version (both Jenkins itself and plugins) because the maintenance of this platform is a high risk and could easily cause a SEV-2 due to blocking deployments.

Solution

Create several instances for Jenkins (one per business area) and in that way, we can have isolated nodes easily to manage, maintain, fix bugs, test. etc. Also, this affect directly to the incidents severity because in case of issues the affected area it is going to be reduced. About this solution: - Pros: - We already have this infrastructure done (two EC2 instances, Jenkins Test and Jenkins Master). - Easy and flexible migration to the development team. - From the DXP team development, we only should adapt the jenkins-jobs repository pipeline to deploy the Jenkins jobs depending the business area. - Improve the Jenkins structure by business areas/domains. - Cons: - We still using EC2 instances. - Maintenance in several instances. - We need to review the webhook configuration (probably add the different domains in Github).

Service Ownership

This Jenkins instances still being maintained by the DXP team.

Alternatives

  • Still using just one Jenkins controller to manage all tech department/services but migrating to K8s plugin.
    • Pros:
      • We don't require pre-requirements apart from analyse how to do it.
    • Cons:
      • Every change affects to all the development team and it could cause incidents.
  • Jenkins in different instances migrated to K8s:
    • Pros:
      • We already have a RFC done related to this proposal but related to the jenkins agents not with the master node (so this proposal is still valid even we implement the Jenkins master solution using EC2).
      • We already have the tasks defined in JIRA.
      • The K8s cluster for tooling could be used to another tools like SECO, Pypi, ...
    • Cons:
      • Requires a K8s cluster for tooling (specific requirement to use HDD).
      • Configure agents to use an EBS shared.
      • Maintenance in several instances. -This alternative is another RFC that should be the requirement for this one.

Caveats

  • Probably, we're gonna lost the possibility to search in jobs in Jenkins.
  • Maintenance in all the Jenkins instances instead of just one.
  • Have a balancer to access to the different Jenkins instances just using an URL.
  • Webhooks configuration.

Operation

  • Duplicate the current Jenkins instance in AWS to move one Jenkins job to this new one (as POC).
  • Create the infra to access to the current Jenkins as main balancer and redirect to the new one for the specific Jenkins job.
  • Once has been confirmed that everything is correct, create new instance to move a full business area.
  • Adapt the jenkins-jobs repository deployments according to the instances/business areas moved.
  • Configure the webhooks in case be required.
  • Repeat the process until move all the business areas to separate Jenkins instances.

Security Impact

From security point of view it could be a great step to have a better management of roles/user access considering that not everyone need to have access to all the Jenkins nodes.

Performance Impact

This is going to improve performance as not all nodes will have the same level of requirements (it's going to be flexible depending on the area, volume of jobs, executions, etc).

Developer Impact

This could affect them to have different domains to access to Jenkins (based in the domains)

Deployment

  • Clone the current Jenkins to different AWS instances.
  • Extract one domain to one node as a pilot.
  • Test this pilot to confirm that everything works fine.
  • Repeat until move all the domains.

Dependencies

  • Define and create automatic process to create backups in case of need recovery.



Based on RFC Template Version 1.1