Time-Series Data in Network Monitoring: Storage, Retention, and Analysis
Network monitoring generates massive amounts of time-stamped data. Learn how to store, manage, and extract insights from your metrics effectively.
What Is Time-Series Data?
Time-series data is a sequence of data points indexed by time. In network monitoring, this includes bandwidth measurements, latency samples, error counts, CPU utilization - any metric collected at regular intervals over time.
# Example time-series data point { "metric": "interface.bytes_in", "timestamp": "2026-01-14T10:30:00Z", "value": 1547823456, "tags": { "device": "router-core-01", "interface": "GigabitEthernet0/1" } }
Unlike traditional databases optimized for transactions, time-series databases are designed for high write throughput and efficient time-range queries.
Storage Requirements
Network monitoring can generate enormous data volumes. Understanding your storage needs is crucial for capacity planning:
| Scale | Metrics/Second | Daily Storage |
|---|---|---|
| 100 interfaces @ 60s | ~2/sec | ~50 MB |
| 1,000 interfaces @ 60s | ~17/sec | ~500 MB |
| 10,000 interfaces @ 60s | ~170/sec | ~5 GB |
| 100,000 interfaces @ 60s | ~1,700/sec | ~50 GB |
These estimates assume basic metrics (bytes in/out, errors, discards). Additional metrics like per-interface counters, QoS queues, or sub-second polling multiply storage requirements.
Retention Strategies
You can't keep full-resolution data forever. A tiered retention strategy balances detail with storage costs:
Raw Data (1-7 days)
Full resolution for recent data. Essential for troubleshooting current issues and detailed analysis.
5-Minute Averages (30-90 days)
Aggregated data for medium-term trends. Sufficient for weekly reports and capacity reviews.
Hourly Averages (1-2 years)
Long-term trend analysis. Useful for year-over-year comparisons and capacity planning.
Daily Summaries (5+ years)
Archive data for compliance and historical reference. Minimal storage footprint.
Tip: When downsampling, preserve min/max/avg/count instead of just averages. This lets you reconstruct peak utilization even from aggregated data.
Time-Series Database Options
Several databases are optimized for time-series workloads:
| Database | Strengths | Best For |
|---|---|---|
| InfluxDB | Easy setup, good compression | Small to medium deployments |
| Prometheus | Pull model, great ecosystem | Cloud-native environments |
| TimescaleDB | SQL queries, PostgreSQL base | Complex analytics |
| VictoriaMetrics | High performance, low resources | Large-scale deployments |
| ClickHouse | Blazing fast queries | Massive analytical workloads |
Compression Techniques
Time-series data compresses extremely well due to its predictable patterns:
- -Delta encoding: Store differences between values instead of absolute values. Consecutive timestamps and slowly changing metrics compress to just a few bits.
- -Gorilla compression: Facebook's algorithm achieves 10-15x compression for typical metrics through XOR-based encoding.
- -Dictionary encoding: Repeated tag values (device names, interface names) stored once and referenced by ID.
Modern time-series databases achieve 10-20x compression ratios, meaning 50GB of raw data might require only 3-5GB on disk.
Querying Time-Series Data
Effective time-series queries require thinking in terms of time windows and aggregations:
# Common query patterns # Average bandwidth over last hour SELECT mean(bytes_in) FROM interface_metrics WHERE time > now() - 1h GROUP BY device, interface # 95th percentile latency per day SELECT percentile(latency, 95) FROM ping_metrics WHERE time > now() - 30d GROUP BY time(1d), target # Detect anomalies (values > 2 std dev) SELECT * FROM cpu_metrics WHERE value > mean(value) + 2 * stddev(value)
Pre-compute common aggregations to speed up dashboard queries. Calculating a 30-day trend on every page load is expensive; doing it once per hour in the background is efficient.
Best Practices
Use Consistent Timestamps
Always use UTC for storage. Apply timezone conversion only at display time. Align collection intervals to clock boundaries (start of minute, not 37 seconds past).
Design Tags Carefully
Tags with high cardinality (unique IP addresses, session IDs) destroy performance. Use tags for grouping dimensions, not unique identifiers.
Handle Gaps Gracefully
Missing data points happen. Decide whether to interpolate, use last known value, or leave gaps. Document your approach.
Monitor Your Monitoring
Track ingestion rate, query latency, and storage growth. Set alerts before you run out of disk space.