Crossplane - Kubernetes-Native Infrastructure as Code

Crossplane - Kubernetes-Native Infrastructure as Code architecture diagram
Click to expand
1069 × 429px

Terraform is the go-to for infrastructure as code, but it has friction in a GitOps workflow. State files need to be stored somewhere, drift detection requires running terraform plan, and there's no natural integration with Kubernetes. Crossplane takes a different approach: infrastructure as Kubernetes resources, managed by Kubernetes controllers.

This post covers how I use Crossplane to manage Grafana alerting configuration as code, the issues I ran into, and why this approach makes sense for certain use cases.

Series context: This post kicks off the Platform Engineering series. It builds on concepts from the Homelab Kubernetes Series (especially GitOps with ArgoCD) and the Observability Series. If you've read Alerting Done Right, this post goes deeper into the Crossplane side of that setup.

What Crossplane Does

Crossplane extends Kubernetes with Custom Resource Definitions (CRDs) that represent external resources. Instead of running terraform apply to create a cloud database, you apply a Kubernetes manifest. A Crossplane controller watches the resource and reconciles reality with the desired state.

crossplane-kubernetes-native-iac/crossplane-architecture diagram
Click to expand
3198 × 790px

The architecture has three layers:

  1. Crossplane Core - The control plane that manages providers and resources
  2. Providers - Plugins that know how to manage specific external systems (AWS, GCP, Grafana, etc.)
  3. Managed Resources - Your actual infrastructure defined as Kubernetes manifests

The promise: declarative infrastructure with the same tools you use for applications. GitOps for everything.

Crossplane vs Terraform

Both tools solve the same problem. The difference is how they fit into your workflow:

AspectTerraformCrossplane
StateExternal file (S3, Terraform Cloud)Kubernetes etcd
Drift detectionManual terraform planContinuous reconciliation
ExecutionCLI or CI/CD runnerKubernetes controller
Access controlTerraform Cloud, VaultKubernetes RBAC
DependenciesHCL modules, providersCompositions, XRDs

Use Terraform when:

  • Managing cloud infrastructure before Kubernetes exists
  • Team is already proficient with HCL
  • Resources don't benefit from continuous reconciliation

Use Crossplane when:

  • Resources should be managed alongside applications
  • You want GitOps for infrastructure
  • Continuous drift correction is valuable
  • Infrastructure is consumed by Kubernetes workloads

For this homelab, I use Terraform to bootstrap the cluster (External Secrets Operator, initial ArgoCD config), then Crossplane takes over for resources that benefit from Kubernetes-native management.

Installing Crossplane

Crossplane installs via Helm. I deploy it through ArgoCD like everything else (see GitOps All The Things for the app-of-apps pattern):

yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: crossplane-core
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "1"
spec:
  source:
    repoURL: https://charts.crossplane.io/stable
    chart: crossplane
    targetRevision: 2.0.2

    helm:
      values: |
        resourcesCrossplane:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256Mi

        leaderElection: true

        packageCache:
          sizeLimit: 1Gi

        args:
          - --debug

        metrics:
          enabled: true

        rbacManager:
          leaderElection: true
          resourcesCrossplane:
            limits:
              cpu: 200m
              memory: 256Mi
            requests:
              cpu: 50m
              memory: 128Mi

  destination:
    server: https://kubernetes.default.svc
    namespace: crossplane-system

The packageCache setting is worth noting - Crossplane downloads provider packages, and caching them speeds up restarts. The debug flag is useful when troubleshooting provider issues.

The Grafana Provider

For this homelab, I use Crossplane to manage Grafana alerting. The LGTM stack provides the monitoring, and I want alert rules, contact points, and notification policies defined as code in Git.

First, install the provider:

yaml
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-grafana
  annotations:
    argocd.argoproj.io/sync-wave: "2"
spec:
  package: xpkg.upbound.io/grafana/provider-grafana:v0.34.0
  packagePullPolicy: IfNotPresent
  revisionActivationPolicy: Automatic
  revisionHistoryLimit: 3

Providers are Crossplane's plugin system. The Grafana provider knows how to talk to Grafana's API and manage resources like dashboards, folders, data sources, and alerting configuration.

Provider Configuration

The provider needs credentials. For Grafana, that's the admin password and URL. I pull this from Infisical via External Secrets (see Secrets Management with Infisical and External Secrets for the full setup):

yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: grafana-credentials
  namespace: crossplane-system
  annotations:
    argocd.argoproj.io/sync-wave: "1"
spec:
  refreshInterval: 15m
  secretStoreRef:
    name: infisical-cluster-secretstore
    kind: ClusterSecretStore
  target:
    name: grafana-credentials
    creationPolicy: Owner
    template:
      type: Opaque
      data:
        credentials: |
          {
            "auth": "admin:{{ .GF_SECURITY_ADMIN_PASSWORD }}",
            "url": "http://lgtm-simple.monitoring.svc.cluster.local:3000",
            "org_id": "1"
          }
  data:
    - secretKey: GF_SECURITY_ADMIN_PASSWORD
      remoteRef:
        key: "/lgtm/GF_SECURITY_ADMIN_PASSWORD"

The credentials format is specific to the Grafana provider - a JSON object with auth (basic auth format), url, and org_id.

Then configure the provider to use this secret:

yaml
apiVersion: grafana.crossplane.io/v1beta1
kind: ProviderConfig
metadata:
  name: grafana-config
  annotations:
    argocd.argoproj.io/sync-wave: "5"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: grafana-credentials
      key: credentials

All managed resources reference this ProviderConfig to know how to connect to Grafana.

Managing Grafana Resources

With the provider configured, I can create Grafana resources as Kubernetes manifests.

Folder for organising alerts:

yaml
apiVersion: oss.grafana.crossplane.io/v1alpha1
kind: Folder
metadata:
  name: homelab-alerts-folder
  annotations:
    argocd.argoproj.io/sync-wave: "5"
spec:
  forProvider:
    title: "Homelab Alerts"
    uid: "homelab-alerts"
    preventDestroyIfNotEmpty: true
  providerConfigRef:
    name: grafana-config

Contact point for Discord notifications:

yaml
apiVersion: alerting.grafana.crossplane.io/v1alpha1
kind: ContactPoint
metadata:
  name: discord-homelab-alerts
  annotations:
    argocd.argoproj.io/sync-wave: "10"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  forProvider:
    name: discord-homelab-alerts

    discord:
      - title: "Homelab Alert"
        message: |
          HOMELAB ALERT

          Alert: {{ .GroupLabels.alertname }}
          Status: {{ .Status }}
          Severity: {{ .GroupLabels.severity }}

          {{ range .Alerts }}
          Summary: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

        urlSecretRef:
          name: discord-webhook-secret
          namespace: monitoring
          key: DISCORD_WEBHOOK_URL

        disableResolveMessage: false

  providerConfigRef:
    name: grafana-config

Notification policy for routing:

yaml
apiVersion: alerting.grafana.crossplane.io/v1alpha1
kind: NotificationPolicy
metadata:
  name: discord-routing-policy
  annotations:
    argocd.argoproj.io/sync-wave: "11"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  forProvider:
    contactPoint: discord-homelab-alerts
    groupBy: ["grafana_folder", "alertname"]
    groupWait: "10s"
    groupInterval: "5m"
    repeatInterval: "12h"

    policy:
      - contactPoint: discord-homelab-alerts
        matcher:
          - label: grafana_folder
            match: "="
            value: "Homelab Alerts"
        groupWait: "5s"
        repeatInterval: "30m"

  providerConfigRef:
    name: grafana-config

Alert rules:

yaml
apiVersion: alerting.grafana.crossplane.io/v1alpha1
kind: RuleGroup
metadata:
  name: ztunnel-alerts
  annotations:
    argocd.argoproj.io/sync-wave: "12"
spec:
  forProvider:
    name: ztunnel-alerts
    intervalSeconds: 60
    folderUid: "homelab-alerts"

    rule:
      - name: ZtunnelCertificateExpired
        for: "2m"
        condition: B
        noDataState: OK
        execErrState: Alerting
        data:
          - refId: A
            datasourceUid: loki
            model: |
              {
                "expr": "sum(count_over_time({namespace=\"istio-system\", app=\"ztunnel\"} |~ \"certificate.*[Ee]xpired\" [5m]))",
                "refId": "A"
              }
          - refId: B
            datasourceUid: __expr__
            model: |
              {
                "conditions": [
                  {
                    "evaluator": { "params": [5], "type": "gt" },
                    "query": { "params": ["A"] },
                    "reducer": { "type": "last" },
                    "type": "query"
                  }
                ],
                "type": "classic_conditions"
              }
        annotations:
          summary: "Istio ztunnel workload certificates have expired"
          description: "Ztunnel is reporting certificate expiry errors. Run: kubectl rollout restart daemonset/ztunnel -n istio-system"
        labels:
          severity: critical
          team: homelab

  providerConfigRef:
    name: grafana-config

All of this lives in Git. Push a change, ArgoCD syncs it, Crossplane reconciles Grafana. Alert rules as code.

The Issues I Hit

Getting this working wasn't smooth. Three issues in particular:

Issue 1: ExternalSecret API Version

The first manifests used external-secrets.io/v1beta1, but my cluster had a newer version of External Secrets Operator that required v1:

yaml
# Wrong
apiVersion: external-secrets.io/v1beta1

# Right
apiVersion: external-secrets.io/v1

The symptom was the ExternalSecret staying in a pending state, with ArgoCD showing it as out of sync. Check your ESO version and use the matching API version.

Issue 2: org_id Type Mismatch

The Grafana provider expects org_id as a string in the credentials JSON, not an integer:

json
// Wrong - causes provider authentication failures
{
  "auth": "admin:password",
  "url": "http://grafana:3000",
  "org_id": 1
}

// Right
{
  "auth": "admin:password",
  "url": "http://grafana:3000",
  "org_id": "1"
}

This one took time to track down. The provider would fail to authenticate with no obvious error message. The fix was changing "org_id": 1 to "org_id": "1" in the ExternalSecret template.

Issue 3: Sync Wave Timing

The ProviderConfig depends on the Provider being ready, which depends on the Secret existing. Without proper ordering, ArgoCD would try to create the ProviderConfig before the provider was installed, failing dry-run validation.

The solution was sync waves (see GitOps All The Things for more on sync wave ordering):

Wave 0:  External Secrets (credentials)
Wave 1:  Crossplane Core
Wave 2:  Provider installation
Wave 5:  ProviderConfig
Wave 10: Contact Points
Wave 11: Notification Policies
Wave 12: Rule Groups

Plus the SkipDryRunOnMissingResource=true annotation on resources that reference CRDs from providers. Without this, ArgoCD's dry-run fails because it can't validate resources for CRDs that don't exist yet.

yaml
annotations:
  argocd.argoproj.io/sync-wave: "5"
  argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true

The Provider → ProviderConfig → Resource Pattern

Understanding Crossplane's resource hierarchy helps avoid issues:

Provider (pkg.crossplane.io/v1)
└── Installs provider pod + CRDs
    └── Creates new API types (grafana.crossplane.io/*)

 ProviderConfig (grafana.crossplane.io/v1beta1)
└── Credentials + connection settings
    └── One per external system instance

Managed Resource (e.g., Folder, ContactPoint)
└── References ProviderConfig
    └── Controller manages external resource

The full ecosystem, from the Upbound Marketplace through to your external systems:

crossplane-kubernetes-native-iac/crossplane-ecosystem diagram
Click to expand
1588 × 1448px

Common mistakes:

  • Creating ProviderConfig before Provider is ready
  • Missing providerConfigRef on managed resources
  • Wrong namespace for secrets (providers run in crossplane-system)

What Crossplane Gives You

Once working, the benefits are real:

GitOps for infrastructure: Alert rules are in Git. Review them in merge requests. Roll back with git revert. Same workflow as application code.

Continuous reconciliation: Someone manually changes an alert rule in Grafana? Crossplane reverts it. Desired state wins.

Kubernetes-native access control: Who can create alert rules? Configure it with RBAC. Same tools as pod security.

Dependency management: Resources can reference secrets, which can come from External Secrets, which pull from Infisical. The Kubernetes ecosystem composes naturally.

What I'd Change

More providers: The Grafana provider is useful, but Crossplane really shines with cloud providers. AWS, GCP, Azure providers let you manage databases, queues, storage - all as Kubernetes resources. For a homelab with cloud resources, this would be more valuable.

Compositions: Crossplane supports Composite Resource Definitions (XRDs) - essentially custom resources that compose multiple managed resources. I'm not using this yet, but it's how you'd build platform abstractions like "give me a database" that creates the instance, user, password secret, and network configuration together.

Better error visibility: When something fails, debugging often means reading controller logs. More specific status conditions on resources would help.

Exploring Available Providers

The Grafana provider is just one of many. The Upbound Marketplace is the official registry for Crossplane packages, with providers for:

  • Cloud platforms: AWS, Azure, GCP, Oracle Cloud - manage EC2 instances, S3 buckets, Cloud SQL databases, all as Kubernetes resources
  • Kubernetes: Manage resources in remote clusters
  • Databases: Direct providers for PostgreSQL, MySQL, MongoDB
  • Observability: Grafana, Datadog, New Relic
  • DNS & Networking: Cloudflare, Route53
  • Git & CI/CD: GitHub, GitLab
  • And many more: Terraform (yes, you can run Terraform from Crossplane), Helm, Vault

Providers are published at xpkg.upbound.io and can be installed by referencing the package in a Provider resource. The marketplace shows documentation, supported resources, and version history for each provider.

For homelabs, interesting options beyond Grafana include the Cloudflare provider (manage DNS records as code), the Kubernetes provider (multi-cluster management), and the Terraform provider (for services without native Crossplane support).


This is Part 1 of the Platform Engineering series. See also the Homelab Kubernetes Series and Observability Series for the foundational setup this builds upon.

Sources:

← Back to all posts