Fundamentals

Time-Series Data in Network Monitoring: Storage, Retention, and Analysis

Network monitoring generates massive amounts of time-stamped data. Learn how to store, manage, and extract insights from your metrics effectively.

What Is Time-Series Data?

Time-series data is a sequence of data points indexed by time. In network monitoring, this includes bandwidth measurements, latency samples, error counts, CPU utilization - any metric collected at regular intervals over time.

# Example time-series data point
{
  "metric": "interface.bytes_in",
  "timestamp": "2026-01-14T10:30:00Z",
  "value": 1547823456,
  "tags": {
    "device": "router-core-01",
    "interface": "GigabitEthernet0/1"
  }
}

Unlike traditional databases optimized for transactions, time-series databases are designed for high write throughput and efficient time-range queries.

Storage Requirements

Network monitoring can generate enormous data volumes. Understanding your storage needs is crucial for capacity planning:

Scale Metrics/Second Daily Storage
100 interfaces @ 60s~2/sec~50 MB
1,000 interfaces @ 60s~17/sec~500 MB
10,000 interfaces @ 60s~170/sec~5 GB
100,000 interfaces @ 60s~1,700/sec~50 GB

These estimates assume basic metrics (bytes in/out, errors, discards). Additional metrics like per-interface counters, QoS queues, or sub-second polling multiply storage requirements.

Retention Strategies

You can't keep full-resolution data forever. A tiered retention strategy balances detail with storage costs:

Raw Data (1-7 days)

Full resolution for recent data. Essential for troubleshooting current issues and detailed analysis.

5-Minute Averages (30-90 days)

Aggregated data for medium-term trends. Sufficient for weekly reports and capacity reviews.

Hourly Averages (1-2 years)

Long-term trend analysis. Useful for year-over-year comparisons and capacity planning.

Daily Summaries (5+ years)

Archive data for compliance and historical reference. Minimal storage footprint.

Tip: When downsampling, preserve min/max/avg/count instead of just averages. This lets you reconstruct peak utilization even from aggregated data.

Time-Series Database Options

Several databases are optimized for time-series workloads:

Database Strengths Best For
InfluxDBEasy setup, good compressionSmall to medium deployments
PrometheusPull model, great ecosystemCloud-native environments
TimescaleDBSQL queries, PostgreSQL baseComplex analytics
VictoriaMetricsHigh performance, low resourcesLarge-scale deployments
ClickHouseBlazing fast queriesMassive analytical workloads

Compression Techniques

Time-series data compresses extremely well due to its predictable patterns:

  • -Delta encoding: Store differences between values instead of absolute values. Consecutive timestamps and slowly changing metrics compress to just a few bits.
  • -Gorilla compression: Facebook's algorithm achieves 10-15x compression for typical metrics through XOR-based encoding.
  • -Dictionary encoding: Repeated tag values (device names, interface names) stored once and referenced by ID.

Modern time-series databases achieve 10-20x compression ratios, meaning 50GB of raw data might require only 3-5GB on disk.

Querying Time-Series Data

Effective time-series queries require thinking in terms of time windows and aggregations:

# Common query patterns

# Average bandwidth over last hour
SELECT mean(bytes_in) FROM interface_metrics
WHERE time > now() - 1h
GROUP BY device, interface

# 95th percentile latency per day
SELECT percentile(latency, 95) FROM ping_metrics
WHERE time > now() - 30d
GROUP BY time(1d), target

# Detect anomalies (values > 2 std dev)
SELECT * FROM cpu_metrics
WHERE value > mean(value) + 2 * stddev(value)

Pre-compute common aggregations to speed up dashboard queries. Calculating a 30-day trend on every page load is expensive; doing it once per hour in the background is efficient.

Best Practices

Use Consistent Timestamps

Always use UTC for storage. Apply timezone conversion only at display time. Align collection intervals to clock boundaries (start of minute, not 37 seconds past).

Design Tags Carefully

Tags with high cardinality (unique IP addresses, session IDs) destroy performance. Use tags for grouping dimensions, not unique identifiers.

Handle Gaps Gracefully

Missing data points happen. Decide whether to interpolate, use last known value, or leave gaps. Document your approach.

Monitor Your Monitoring

Track ingestion rate, query latency, and storage growth. Set alerts before you run out of disk space.