GitLab Runner on Kubernetes - Pod-Per-Job CI/CD

GitLab Runner on Kubernetes - Pod-Per-Job CI/CD architecture diagram

I use GitLab for source control and CI/CD. Running pipelines on GitLab's shared runners works, but there are reasons to want your own: privacy, cost, specific tooling, or just because you can.

This post covers how I run a self-hosted GitLab Runner on Kubernetes, using the Kubernetes executor to spin up pods for each job. While I'm running this in a homelab, the setup works for any Kubernetes cluster with network access to your GitLab instance.

Built on the homelab: This post uses the Kubernetes cluster and GitOps patterns from the Homelab Kubernetes Series - specifically ArgoCD app-of-apps for deployment and External Secrets for token management. You don't need to follow that series first, but it explains the infrastructure this runner runs on.

The Kubernetes Executor

GitLab Runner supports multiple executors: Shell, Docker, Kubernetes, and others. The Kubernetes executor is the interesting one for cluster deployments.

Instead of running jobs on a static VM or inside a single container, the Kubernetes executor creates a new pod for each CI/CD job. The pod runs your job, then gets deleted. Clean isolation, automatic scaling, and no leftover state between builds.

gitlab-runner-on-kubernetes/gitlab-runner-architecture diagram
Click to expand
2768 × 1231px

Why this matters:

  • Isolation: Each job runs in its own pod with fresh containers
  • Scaling: Kubernetes handles scheduling across nodes
  • Resource control: CPU/memory limits per job, not shared
  • Cleanup: Pods are deleted after jobs complete

The Helm Deployment

GitLab publishes an official Helm chart. I deploy it through ArgoCD as a multisource Application - the chart from GitLab's registry, plus custom manifests from my Git repo:

yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: gitlab-runner
  namespace: argocd
spec:
  sources:
    # Source 1: GitLab Runner Helm chart
    - repoURL: https://charts.gitlab.io
      chart: gitlab-runner
      targetRevision: 0.69.0
      helm:
        values: |
          gitlabUrl: https://gitlab.com/
          concurrent: 2
          checkInterval: 30
          # ... more config

    # Source 2: External Secrets from our repo
    - repoURL: https://gitlab.com/your-org/homelab.git
      path: manifests
      directory:
        include: "gitlab-runner-*.yaml"

The multisource pattern lets me keep the Helm chart separate from my custom resources (like ExternalSecrets). ArgoCD merges them during deployment.

The Runner Token Problem

GitLab Runners need a registration token to authenticate with GitLab. You get this token from GitLab's UI when creating a runner. The question is: where do you store it?

Not in Git, obviously. Environment variables in CI/CD? Then how does the runner get it to register in the first place?

I use External Secrets Operator with Infisical as the backend. The token lives in Infisical, and ESO syncs it into a Kubernetes Secret:

yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: gitlab-runner-token
  namespace: gitlab-runner
spec:
  refreshInterval: 15m
  secretStoreRef:
    name: infisical-cluster-secretstore
    kind: ClusterSecretStore
  target:
    name: gitlab-runner-token
    creationPolicy: Owner
    template:
      type: Opaque
      data:
        runner-token: "{{ .RUNNER_TOKEN }}"
        runner-registration-token: ""
  data:
    - secretKey: RUNNER_TOKEN
      remoteRef:
        key: "/gitlab/runners/homelab/RUNNER_TOKEN"

The Helm chart references this secret:

yaml
runners:
  secret: gitlab-runner-token
  secretPath: runner-token

No tokens in Git. The secret refreshes every 15 minutes, so token rotation is straightforward - update Infisical, wait for sync.

Runner Configuration

The runner config lives in the Helm values. The key section is runners.config, which becomes the config.toml the runner uses:

toml
[[runners]]
  [runners.kubernetes]
    namespace = "gitlab-runner"
    image = "ubuntu:22.04"
    privileged = false

    cpu_limit = "1000m"
    memory_limit = "2Gi"
    cpu_request = "100m"
    memory_request = "128Mi"

    helper_image = "gitlab/gitlab-runner-helper:x86_64-latest"
    service_account = "gitlab-runner"
    pull_policy = ["if-not-present"]

    [[runners.kubernetes.volumes.empty_dir]]
      name = "docker-certs"
      mount_path = "/certs/client"
      medium = "Memory"

Key settings explained:

  • namespace: Where job pods run (same namespace as runner)
  • image: Default container image for jobs (override in .gitlab-ci.yml)
  • privileged: false: No privileged containers - more on this below
  • Resource limits: Each job pod gets up to 1 CPU and 2GB RAM
  • pull_policy: if-not-present: Avoid pulling images that already exist locally
  • The empty_dir volume provides a memory-backed tmpfs for Docker certificates

RBAC: Least Privilege

The runner needs permissions to create and manage job pods. The Helm chart creates RBAC rules, but I've tuned them to the minimum required:

yaml
rbac:
  create: true
  rules:
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["list", "get", "watch", "create", "delete"]
    - apiGroups: [""]
      resources: ["pods/exec"]
      verbs: ["create"]
    - apiGroups: [""]
      resources: ["pods/log"]
      verbs: ["get"]
    - apiGroups: [""]
      resources: ["pods/attach"]
      verbs: ["list", "get", "create", "delete", "update"]
    - apiGroups: [""]
      resources: ["secrets"]
      verbs: ["list", "get", "create", "delete", "update"]
    - apiGroups: [""]
      resources: ["configmaps"]
      verbs: ["list", "get", "create", "delete", "update"]

The runner can manage pods, secrets, and configmaps in its namespace. It cannot do anything else - no access to other namespaces, no cluster-level permissions, no node operations.

Security Hardening

Running as non-root and dropping capabilities:

yaml
securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: false
  runAsNonRoot: true
  runAsUser: 999
  capabilities:
    drop: ["ALL"]

podSecurityContext:
  runAsUser: 999
  runAsGroup: 999
  fsGroup: 999

The runner process runs as UID 999 (the gitlab-runner user in the image). All Linux capabilities are dropped. Privilege escalation is blocked.

One catch: the runner needs to write to /.gitlab-runner for state files, but the image expects a specific directory structure. An init container handles this:

yaml
initContainers:
  - name: init-dirs
    image: busybox:1.35
    command: ['sh', '-c', 'mkdir -p /.gitlab-runner && chown 999:999 /.gitlab-runner && chmod 755 /.gitlab-runner']
    securityContext:
      runAsUser: 0
    volumeMounts:
      - name: gitlab-runner-home
        mountPath: /.gitlab-runner

The init container runs as root briefly to set up permissions, then the main runner runs unprivileged.

The Privileged Container Question

You'll notice privileged = false in the config. This means job pods cannot run in privileged mode. For many CI/CD tasks - running tests, building Go/Node/Python applications, deploying to Kubernetes - this is fine.

The problem is Docker-in-Docker (DinD). Building container images traditionally requires Docker, and Docker needs privileged mode. If your pipelines build Docker images, you have options:

  1. Enable privileged mode - Works, but security risk
  2. Kaniko - Builds images without Docker daemon, no privileges needed
  3. Buildah/Podman - Daemonless, can run rootless
  4. Use GitLab's shared runners - Let them deal with DinD

I use Kaniko for image builds, so privileged mode stays off:

yaml
# .gitlab-ci.yml example
build-image:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.23.2
    entrypoint: [""]
  script:
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"

Resource Limits

The runner pod itself is lightweight:

yaml
resources:
  limits:
    memory: 256Mi
    cpu: 200m
  requests:
    memory: 128Mi
    cpu: 100m

It's just polling GitLab and launching pods. The actual work happens in job pods, which get their own limits (1 CPU, 2GB in my config).

For a homelab, concurrent: 2 is reasonable - two jobs running at once. Production clusters might want more, and you'd consider running multiple runner pods for higher throughput.

ArgoCD Sync Configuration

The Application uses automated sync with self-healing:

yaml
syncPolicy:
  automated:
    prune: true
    selfHeal: true
  syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - PruneLast=true
  retry:
    limit: 5
    backoff:
      duration: 5s
      factor: 2
      maxDuration: 3m

If someone manually edits the runner deployment, ArgoCD reverts it. If I delete the Application YAML from Git, ArgoCD removes the runner. The retry policy handles transient failures during initial deployment.

What a Pipeline Looks Like

With the runner deployed, pipelines in your GitLab repos automatically pick it up (assuming the runner is registered for those projects). A simple pipeline:

yaml
stages:
  - test
  - build

test:
  stage: test
  image: golang:1.22
  script:
    - go test ./...

build:
  stage: build
  image: golang:1.22
  script:
    - go build -o app ./cmd/server
  artifacts:
    paths:
      - app

When this runs:

  1. GitLab triggers the pipeline
  2. Runner polls GitLab, sees the job
  3. Runner creates a pod with golang:1.22 image
  4. Pod clones the repo, runs tests
  5. Pod completes, gets deleted
  6. Repeat for build stage

Each stage gets a fresh pod. Clean, isolated, ephemeral.

Observability

The runner integrates with the cluster's observability stack:

  • Logs: Runner logs go to stdout, picked up by Loki
  • Metrics: Runner exposes Prometheus metrics (enable with metrics.enabled: true)
  • Network: Cilium/Hubble shows traffic between runner and GitLab, and between job pods

If a job fails, logs are in GitLab's UI. If the runner itself has issues, logs are in Loki. Network problems show up in Hubble.

Beyond Homelab

This setup isn't homelab-specific. The same configuration works for:

  • On-prem Kubernetes clusters connecting to GitLab.com
  • Self-hosted GitLab - just change gitlabUrl
  • Cloud Kubernetes (EKS, GKE, AKS) - same Helm chart, same patterns
  • Air-gapped environments - mirror images, point to internal GitLab

The External Secrets pattern works with any secrets manager: Vault, AWS Secrets Manager, GCP Secret Manager. The RBAC and security context work anywhere.

What I'd Change

Distributed caching: Job pods start fresh each time, which means downloading dependencies repeatedly. GitLab supports distributed caches backed by S3-compatible storage. MinIO is running in my cluster - connecting them would speed up builds significantly.

Runner autoscaling: Right now I have one runner pod with concurrent: 2. For bursty workloads, Kubernetes HPA could scale runner pods based on job queue depth. The official chart supports this.

Network policies: The runner namespace has no explicit network policies. Adding Cilium policies to restrict egress to just GitLab and internal registries would tighten security further.

Per-project runners: Currently one runner handles all projects. For larger setups or sensitive projects, dedicated runners with different configurations (more resources, specific tags) make sense.

Files Reference

If you want to replicate this setup:

FilePurpose
argocd-apps/gitlab-runner.yamlArgoCD Application (Helm + manifests)
manifests/gitlab-runner-external-secret.yamlToken sync from Infisical

The runner token itself comes from GitLab's UI: Settings > CI/CD > Runners > New project runner. Create the runner there, copy the token, store it in your secrets manager.


This is the first post in the CI/CD & Automation series covering self-hosted pipeline infrastructure on Kubernetes. See also the Homelab Kubernetes Series for the cluster setup this runner is deployed on.

← Back to all posts