Running a full observability stack usually means deploying Prometheus, Loki, Tempo, and Grafana separately. That's four Helm charts, four sets of configuration, and four things that can break independently. For a homelab, that felt like overkill.
Then I discovered Grafana's otel-lgtm image - an all-in-one container that bundles everything together. This is Part 1 of my observability series, covering the foundation of my monitoring setup.
What is LGTM?
LGTM stands for Loki, Grafana, Tempo, Mimir. It's Grafana's complete observability stack:
- Loki: Log aggregation (like Elasticsearch, but simpler)
- Grafana: Dashboards and visualisation
- Tempo: Distributed tracing backend
- Mimir: Long-term metrics storage (Prometheus-compatible)
The grafana/otel-lgtm image packages all of these (plus Pyroscope for profiling) into a single container. One deployment, one service, complete observability.
Why All-in-One?
For a homelab, the benefits are significant:
Resource efficiency: Instead of 4+ deployments with their own memory footprints, you get one container with shared resources. My LGTM stack runs happily with 512Mi-2Gi of memory.
Simpler configuration: One place to configure everything. No worrying about Prometheus scrape configs pointing to the right Loki endpoint.
Easier debugging: When something breaks, there's only one thing to look at.
Built-in OTLP support: The container includes an OpenTelemetry collector, so everything speaks the same protocol.
The trade-off is obvious: it's not production-grade. No high availability, no horizontal scaling. But for a homelab? Perfect.
The Deployment
Here's how I deploy the LGTM stack:
apiVersion: apps/v1
kind: Deployment
metadata:
name: lgtm-simple
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: lgtm-simple
template:
spec:
containers:
- name: lgtm
image: grafana/otel-lgtm:latest
ports:
- containerPort: 3000 # Grafana
- containerPort: 9090 # Prometheus
- containerPort: 4317 # OTLP gRPC
- containerPort: 4318 # OTLP HTTP
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeMounts:
- name: grafana-data
mountPath: /data/grafana
- name: loki-data
mountPath: /data/loki
- name: prometheus-data
mountPath: /data/prometheus
- name: tempo-data
mountPath: /data/tempoEach component gets its own PersistentVolume for data retention:
| Component | Size | Purpose |
|---|---|---|
| Grafana | 1Gi | Dashboards, users, settings |
| Loki | 2Gi | Log storage |
| Prometheus | 2Gi | Metrics TSDB |
| Tempo | 1Gi | Trace storage |
| Pyroscope | 500Mi | Profiling data |
Exposed Endpoints
The Service exposes multiple ports for different protocols:
apiVersion: v1
kind: Service
metadata:
name: lgtm-simple
namespace: monitoring
spec:
selector:
app: lgtm-simple
ports:
- name: grafana
port: 3000
- name: prometheus
port: 9090
- name: otlp-grpc
port: 4317
- name: otlp-http
port: 4318- Port 3000: Grafana UI - dashboards, alerting, exploration
- Port 9090: Prometheus-compatible API - for tools that query metrics
- Port 4317: OTLP gRPC - for high-throughput telemetry
- Port 4318: OTLP HTTP - for simpler integrations
Everything flows through OTLP. Metrics, logs, and traces all use the same protocol and endpoint.
External Access via Gateway API
Grafana and Prometheus get external access through HTTPRoutes:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: grafana-route
namespace: monitoring
spec:
parentRefs:
- name: main-gateway
namespace: istio-ingress
hostnames:
- "grafana.homelab.example.com"
rules:
- backendRefs:
- name: lgtm-simple
port: 3000Same pattern for Prometheus at prometheus.homelab.example.com. TLS termination happens at the gateway.
Grafana Configuration
The admin password comes from Infisical via External Secrets:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: lgtm-grafana-admin
namespace: monitoring
spec:
refreshInterval: 15m
secretStoreRef:
name: infisical-store
kind: ClusterSecretStore
target:
name: lgtm-grafana-admin
data:
- secretKey: GF_SECURITY_ADMIN_PASSWORD
remoteRef:
key: /lgtm/GF_SECURITY_ADMIN_PASSWORDKey environment variables:
env:
- name: GF_SECURITY_ADMIN_USER
value: "admin"
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: lgtm-grafana-admin
key: GF_SECURITY_ADMIN_PASSWORD
- name: GF_SERVER_ROOT_URL
value: "https://grafana.homelab.example.com"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "false"
# Use Loki for alert state history
- name: GF_UNIFIED_ALERTING_STATE_HISTORY_ENABLED
value: "true"
- name: GF_UNIFIED_ALERTING_STATE_HISTORY_BACKEND
value: "loki"That last bit is nice - Grafana stores alert state history in Loki, so you can query alert history like any other logs.
The Data Flow
Everything speaks OTLP. The LGTM container's built-in collector routes:
- Metrics → Prometheus/Mimir
- Logs → Loki
- Traces → Tempo
- Profiles → Pyroscope
What You Get Out of the Box
With this single deployment, you immediately have:
Log exploration: Query logs with LogQL
{namespace="kafka"} |= "error"Metrics queries: Standard PromQL
rate(http_requests_total[5m])Trace exploration: Search spans by service, duration, error status
Correlated data: Click from a log line to see related traces, or from a trace to see metrics at that time
Resource Considerations
The otel-lgtm image is designed for development and testing. My production-ish homelab settings:
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "2Gi"
cpu: "1000m"This handles:
- ~50 pods worth of metrics
- Moderate log volume
- 100% trace sampling (more on that in Part 3)
If you're seeing OOM kills or slow queries, the first lever to pull is memory limits. The second is reducing data retention periods.
When NOT to Use This
The all-in-one approach breaks down when:
- You need high availability
- You have multiple clusters sending telemetry
- Your data volume exceeds what a single container can handle
- You need separate scaling for metrics vs logs vs traces
At that point, deploy the components separately. But for a homelab? The simplicity wins.
What's Next
The LGTM stack is the backend. But where does the data come from? In Part 2, I'll cover the metrics collection layer - Grafana's k8s-monitoring chart, service exporters for Kafka and PostgreSQL, and the blackbox exporter for external monitoring.
This is Part 1 of a 4-part series on homelab observability.