SNMP Polling Best Practices - NetGraph Pro Blog

Choosing the Right Polling Interval

The polling interval determines how often your monitoring system queries devices for metrics. Too frequent polling wastes resources and can impact device performance. Too infrequent polling means you miss short-lived issues.

Common Intervals

60 seconds - Standard for most interface metrics
300 seconds (5 min) - Suitable for system metrics like CPU, memory
30 seconds - Critical links where you need faster detection

For bandwidth monitoring, 60-second intervals work well. You get enough granularity to spot issues while keeping SNMP traffic reasonable. With 1000 interfaces polled every 60 seconds, you generate about 17 SNMP requests per second - manageable for most collectors.

Timeout Configuration

SNMP timeouts define how long to wait for a response before considering a request failed. Setting this correctly prevents both false alerts and delayed detection.

Network Type	Recommended Timeout
Local LAN	2-3 seconds
WAN / Remote sites	5-10 seconds
High-latency links	15-20 seconds

Start with conservative timeouts and reduce them once you understand your network's baseline latency. Monitor timeout rates - if you see more than 1-2% timeouts on healthy devices, increase the timeout value.

Retry Logic

Retries help handle transient failures - a single dropped UDP packet shouldn't trigger an alert. But too many retries delay detection of real problems.

# Example retry configuration
snmp:
  timeout: 5s
  retries: 2
  retry_interval: 2s

With 2 retries and a 5-second timeout, the worst-case detection time is 15 seconds (initial request + 2 retries). For most environments, 1-2 retries provide good balance between reliability and speed.

Tip: Use exponential backoff for retries. First retry after 2 seconds, second after 4 seconds. This reduces load during network congestion.

Handling Unreachable Devices

When a device becomes unreachable, your polling strategy matters. Continuing to poll at full rate wastes resources. Stopping completely means delayed recovery detection.

Backoff Strategy

After 3 consecutive failures, reduce polling frequency. Poll every 5 minutes instead of every minute until the device responds.

Parallel ICMP Checks

Run ICMP ping alongside SNMP. If ping succeeds but SNMP fails, it's likely an SNMP configuration issue, not a network problem.

Recovery Detection

When a device recovers, resume normal polling immediately. Don't wait for the next scheduled backoff interval.

Scaling Considerations

As your monitored device count grows, polling configuration becomes critical for performance.

-Stagger polling: Don't poll all devices at the same second. Spread requests across the interval.
-Bulk requests: Use SNMP GetBulk instead of multiple Get requests. Reduces round trips.
-Connection pooling: Reuse SNMP sessions instead of creating new ones per request.

A well-configured collector can handle 50,000+ interfaces on modest hardware. The bottleneck is usually network latency, not CPU or memory.