Skip to content

Latest commit

 

History

History
105 lines (83 loc) · 6.15 KB

README.md

File metadata and controls

105 lines (83 loc) · 6.15 KB

CircleCI Runner Autoscaler

This go application manages the scaling out of CircleCI runners. It's composed of 2 different types of background workers:

  • Discovery Worker: discovers new resources classes that should be scaled and spawns new scaling workers
  • Scaling Worker: it checks unclaimed tasks on CircleCI for the resource class it manages, it scales the ASG related to that resource class and waits for instances to be up before continuing.

It supports both EC2 and Kubernetes-based runners.

CircleCI API Client

There was no open-source client for the CircleCI Runner API, so the one we are using is automatically generated by oapi-codegen based on the OpenAPI Definition available here

If you make any changes to the YAML file then run to generate the code:

$ make generate-client

Installing the circleci-runner-autoscaler

We are providing a Helm Chart for deploying the service

For the AWS EC2 Scaler, you need to create an IAM role with rights to autoscaling actions on your autoscaling groups. There's an example terraform module here.

You'll need to make sure the service account deployed by the chart uses the IAM Role if you are running it on Kubernetes.

We currently don't have any public repositories for the Docker Image or the Helm chart, but is something we are looking into.

Configurations

All configurations are loaded from environment variables using envconfig.

Configuration Name Environment Variable Default Value Description
KubernetesScalerEnabled APP_KUBERNETES_SCALER_ENABLED true Enable the kubernetes discovery and autoscaler
KubernetesNamespace APP_KUBERNETES_NAMESPACE circleci-runners Kubernetes namespace to use for runer discovery and scaling
CircleToken APP_CIRCLE_TOKEN CircleCI API Token to use for the runners API
CircleResourceNamespace APP_CIRCLE_RESOURCE_NAMESPACE vela-games CircleCI resource namespace to use for runner discovery

How it works

EC2 Runners

As part of a previous project, we open-sourced a terraform module to manage runners' autoscaling groups. We recommend you use this same module as it already has the necessary code to support this.

The service will discover all resource classes it has to scale by getting all autoscaling groups with the tag resource-class, after that, it will manage the desired capacity of the ASG based on the unclaimed tasks for the resource class.

This only handles scaling-out runners, to scale in we depend on a self-hosted runner configuration to kill itself after a certain timeout is reached, after the process is killed we run a script on the instance to detach it from the ASG and shut it down.

Kubernetes Runners (EXPERIMENTAL)

⚠️ DEPRECATED: We created this feature as a POC for scaling runners on Kubernetes before CircleCI released their Container Runner. We still use this feature internally at Vela but it never reached GA status. We recommend using CircleCI's official operator.

For this feature, we used a suspended CronJob as a pod template for the runners. The autoscaler discovers the resource classes managed by k8s by looking for all CronJobs in the configured namespace with the labels resource-class-org and resource-class-name. Here's an example of a CronJob you can use:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: small-runner
  namespace: circleci-runners
  labels:
    resource-class-org: "vela-games"
    resource-class-name: "k8s-small"
spec:
  suspend: true
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 21600
      template:
        metadata:
          labels:
            resource-class-org: "vela-games"
            resource-class-name: "k8s-small"
        spec:
          containers:
          - name: ci-small
            image: circleci-image:latest
            command: ["/opt/circleci/start.sh"]
            env:
            - name: LAUNCH_AGENT_RUNNER_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: LAUNCH_AGENT_API_AUTH_TOKEN
              valueFrom:
                secretKeyRef:
                  name: api-token
                  key: LAUNCH_AGENT_API_AUTH_TOKEN
                  optional: false
            imagePullPolicy: IfNotPresent
            resources:
              limits:
                cpu: 2000m
                memory: 1Gi
                ephemeral-storage: "15Gi"
              requests:
                cpu: 2000m
                memory: 1Gi
                ephemeral-storage: "15Gi"
          restartPolicy: OnFailure

At the moment we are not providing any publicly accessible base images for the k8s runners, but we are providing an example here