What is the LGTM stack?

LGTM stands for Loki (logs), Grafana (visualization), Tempo (traces), and Mimir (metrics). Together they provide a complete observability platform. The "all-in-one" container combines these into a single deployment for simpler setups.

When should you use LGTM all-in-one vs separate components?

Use all-in-one for homelabs, development, and small clusters where simplicity matters. Use separate components for production workloads needing high availability, horizontal scaling, or when you need to scale logs/metrics/traces independently.

How much resources does the LGTM all-in-one container need?

The LGTM all-in-one container runs well with 2-4GB RAM and 1-2 CPU cores for homelab use. Resource needs scale with data volume - expect ~1GB RAM per 100GB of stored data and more CPU during query-heavy workloads.

The LGTM All-in-One Stack - Unified Observability for Homelabs

Running a full observability stack usually means deploying Prometheus, Loki, Tempo, and Grafana separately. That's four Helm charts, four sets of configuration, and four things that can break independently. For a homelab, that felt like overkill.

Then I discovered Grafana's otel-lgtm image - an all-in-one container that bundles everything together. This is Part 1 of my observability series, covering the foundation of my monitoring setup.

What is LGTM?

LGTM stands for Loki, Grafana, Tempo, Mimir. It's Grafana's complete observability stack:

Loki: Log aggregation (like Elasticsearch, but simpler)
Grafana: Dashboards and visualisation
Tempo: Distributed tracing backend
Mimir: Long-term metrics storage (Prometheus-compatible)

The grafana/otel-lgtm image packages all of these (plus Pyroscope for profiling) into a single container. One deployment, one service, complete observability.

Why All-in-One?

For a homelab, the benefits are significant:

Resource efficiency: Instead of 4+ deployments with their own memory footprints, you get one container with shared resources. My LGTM stack runs happily with 512Mi-2Gi of memory.

Simpler configuration: One place to configure everything. No worrying about Prometheus scrape configs pointing to the right Loki endpoint.

Easier debugging: When something breaks, there's only one thing to look at.

Built-in OTLP support: The container includes an OpenTelemetry collector, so everything speaks the same protocol.

The trade-off is obvious: it's not production-grade. No high availability, no horizontal scaling. But for a homelab? Perfect.

Click to expand

1555 × 1775px

The Deployment

Here's how I deploy the LGTM stack:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: lgtm-simple
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: lgtm-simple
  template:
    spec:
      containers:
        - name: lgtm
          image: grafana/otel-lgtm:latest
          ports:
            - containerPort: 3000   # Grafana
            - containerPort: 9090   # Prometheus
            - containerPort: 4317   # OTLP gRPC
            - containerPort: 4318   # OTLP HTTP
          resources:
            requests:
              memory: "512Mi"
              cpu: "200m"
            limits:
              memory: "2Gi"
              cpu: "1000m"
          volumeMounts:
            - name: grafana-data
              mountPath: /data/grafana
            - name: loki-data
              mountPath: /data/loki
            - name: prometheus-data
              mountPath: /data/prometheus
            - name: tempo-data
              mountPath: /data/tempo

Each component gets its own PersistentVolume for data retention:

Component	Size	Purpose
Grafana	1Gi	Dashboards, users, settings
Loki	2Gi	Log storage
Prometheus	2Gi	Metrics TSDB
Tempo	1Gi	Trace storage
Pyroscope	500Mi	Profiling data

Exposed Endpoints

The Service exposes multiple ports for different protocols:

yaml

apiVersion: v1
kind: Service
metadata:
  name: lgtm-simple
  namespace: monitoring
spec:
  selector:
    app: lgtm-simple
  ports:
    - name: grafana
      port: 3000
    - name: prometheus
      port: 9090
    - name: otlp-grpc
      port: 4317
    - name: otlp-http
      port: 4318

Port 3000: Grafana UI - dashboards, alerting, exploration
Port 9090: Prometheus-compatible API - for tools that query metrics
Port 4317: OTLP gRPC - for high-throughput telemetry
Port 4318: OTLP HTTP - for simpler integrations

Everything flows through OTLP. Metrics, logs, and traces all use the same protocol and endpoint.

External Access via Gateway API

Grafana and Prometheus get external access through HTTPRoutes:

yaml

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: grafana-route
  namespace: monitoring
spec:
  parentRefs:
    - name: main-gateway
      namespace: istio-ingress
  hostnames:
    - "grafana.homelab.example.com"
  rules:
    - backendRefs:
        - name: lgtm-simple
          port: 3000

Same pattern for Prometheus at prometheus.homelab.example.com. TLS termination happens at the gateway.

Grafana Configuration

The admin password comes from Infisical via External Secrets:

yaml

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: lgtm-grafana-admin
  namespace: monitoring
spec:
  refreshInterval: 15m
  secretStoreRef:
    name: infisical-store
    kind: ClusterSecretStore
  target:
    name: lgtm-grafana-admin
  data:
    - secretKey: GF_SECURITY_ADMIN_PASSWORD
      remoteRef:
        key: /lgtm/GF_SECURITY_ADMIN_PASSWORD

Key environment variables:

yaml

env:
  - name: GF_SECURITY_ADMIN_USER
    value: "admin"
  - name: GF_SECURITY_ADMIN_PASSWORD
    valueFrom:
      secretKeyRef:
        name: lgtm-grafana-admin
        key: GF_SECURITY_ADMIN_PASSWORD
  - name: GF_SERVER_ROOT_URL
    value: "https://grafana.homelab.example.com"
  - name: GF_AUTH_ANONYMOUS_ENABLED
    value: "false"
  # Use Loki for alert state history
  - name: GF_UNIFIED_ALERTING_STATE_HISTORY_ENABLED
    value: "true"
  - name: GF_UNIFIED_ALERTING_STATE_HISTORY_BACKEND
    value: "loki"

That last bit is nice - Grafana stores alert state history in Loki, so you can query alert history like any other logs.

The Data Flow

Everything speaks OTLP. The LGTM container's built-in collector routes:

Metrics → Prometheus/Mimir
Logs → Loki
Traces → Tempo
Profiles → Pyroscope

What You Get Out of the Box

With this single deployment, you immediately have:

Log exploration: Query logs with LogQL

{namespace="kafka"} |= "error"

Metrics queries: Standard PromQL

rate(http_requests_total[5m])

Trace exploration: Search spans by service, duration, error status

Correlated data: Click from a log line to see related traces, or from a trace to see metrics at that time

Resource Considerations

The otel-lgtm image is designed for development and testing. My production-ish homelab settings:

yaml

resources:
  requests:
    memory: "512Mi"
    cpu: "200m"
  limits:
    memory: "2Gi"
    cpu: "1000m"

This handles:

~50 pods worth of metrics
Moderate log volume
100% trace sampling (more on that in Part 3)

If you're seeing OOM kills or slow queries, the first lever to pull is memory limits. The second is reducing data retention periods.

When NOT to Use This

The all-in-one approach breaks down when:

You need high availability
You have multiple clusters sending telemetry
Your data volume exceeds what a single container can handle
You need separate scaling for metrics vs logs vs traces

At that point, deploy the components separately. But for a homelab? The simplicity wins.

What's Next

The LGTM stack is the backend. But where does the data come from? In Part 2, I'll cover the metrics collection layer - Grafana's k8s-monitoring chart, service exporters for Kafka and PostgreSQL, and the blackbox exporter for external monitoring.

This is Part 1 of a 4-part series on homelab observability.

Related Posts