Kubernetes Network Observability: Monitoring Container Traffic
Kubernetes networking is complex and dynamic. Learn how to gain visibility into pod-to-pod traffic, service mesh metrics, and network policies.
Why Kubernetes Networking Is Different
Traditional network monitoring tracks physical interfaces and static IP addresses. Kubernetes throws this out the window: pods are ephemeral, IPs change constantly, and traffic flows through virtual overlays. The network fabric itself is software-defined and abstracted.
The Challenge
A pod might exist for 30 seconds. By the time you investigate an alert, the pod is gone, its IP reassigned, and logs scattered across nodes. You need observability built into the platform, not bolted on.
Kubernetes Network Architecture
Understanding what to monitor starts with understanding the layers:
Pod Network
Every pod gets a unique IP. Pods can communicate directly without NAT. Implemented by CNI plugins like Calico, Cilium, or Flannel.
Service Network
ClusterIP services provide stable endpoints. kube-proxy or eBPF handles load balancing. Service IPs exist only in iptables/IPVS rules.
Ingress Layer
External traffic enters through Ingress controllers or LoadBalancer services. This is where external monitoring typically starts.
Network Policies
Kubernetes-native firewalling. Defines allowed traffic between pods. Enforcement depends on CNI plugin capabilities.
Key Metrics to Monitor
| Metric | Source | What It Reveals |
|---|---|---|
| container_network_receive_bytes | cAdvisor | Pod ingress traffic volume |
| container_network_transmit_bytes | cAdvisor | Pod egress traffic volume |
| kube_pod_status_phase | kube-state-metrics | Pod lifecycle state |
| node_network_receive_drop | node_exporter | Node-level packet drops |
| nginx_ingress_controller_requests | Ingress controller | External request rates |
# PromQL: Network bytes by pod (top 10) topk(10, sum by (pod) ( rate(container_network_receive_bytes_total[5m]) ) ) # PromQL: Pods with network errors sum by (pod) ( rate(container_network_receive_errors_total[5m]) ) > 0
CNI Plugin Observability
Your CNI choice determines available network visibility:
| CNI | Observability Features |
|---|---|
| Cilium | eBPF-based flow visibility, Hubble UI, L7 policy metrics |
| Calico | Flow logs, network policy metrics, Felix stats |
| Weave | Connection tracking, Prometheus metrics endpoint |
| Flannel | Basic - relies on node-level monitoring |
Recommendation: If network observability is important, choose Cilium. Its eBPF-based Hubble provides the deepest visibility into Kubernetes networking without performance overhead.
Service Mesh Metrics
If you're running Istio, Linkerd, or similar service mesh:
Request Metrics
Request count, latency histograms, error rates per service pair. The mesh sidecar captures every request without application changes.
TCP Metrics
Bytes sent/received, connection duration, active connections. Useful for non-HTTP traffic.
mTLS Status
Track which connections are encrypted. Identify services that haven't enrolled in the mesh or policy violations.
# Istio: Request rate by source and destination sum by (source_workload, destination_workload) ( rate(istio_requests_total[5m]) ) # Istio: P99 latency histogram_quantile(0.99, sum by (destination_service, le) ( rate(istio_request_duration_milliseconds_bucket[5m]) ) )
Network Policy Monitoring
Network policies are only useful if you know they're working:
- -Policy hit counts: How often each policy allows or denies traffic. Unused policies might be misconfigured.
- -Denied connections: Alert on unexpected denials. Might indicate legitimate traffic being blocked or attack attempts.
- -Policy coverage: What percentage of pods are protected by policies? Default-deny should be the goal.
Troubleshooting Network Issues
Common Kubernetes networking problems and how to diagnose them:
| Symptom | Possible Cause | How to Check |
|---|---|---|
| Pods can't reach each other | CNI misconfiguration, network policy | kubectl exec + ping/curl, policy audit |
| Service DNS not resolving | CoreDNS issues, service selector | nslookup from pod, check endpoints |
| Intermittent timeouts | Node network saturation, conntrack limits | Node metrics, conntrack stats |
| External traffic failing | Ingress config, load balancer health | Ingress controller logs, LB status |
Recommended Observability Stack
Prometheus + Grafana
Standard for Kubernetes metrics. Scrape cAdvisor, kube-state-metrics, and CNI exporters. Pre-built dashboards available.
Hubble (with Cilium)
Deep flow visibility. See every connection between pods with source, destination, protocol, and verdict. UI and CLI available.
Kiali (with Istio)
Service mesh visualization. Shows traffic flow between services, health status, and configuration validation.