Network monitoring is a vital part of your security architecture and layered security portfolio. Step one in your network monitoring safety checklist is making sure you have network monitoring in the first place. The rest of our steps assume you have network monitoring and lay out five ways to optimize its value.
1. Consolidate Around a Central Tool
Even if you have a network monitoring solution, you may still be relying upon myriad tools to monitor all of your network devices. We're not advising that you throw away all these tools, but don't depend upon them for things that your centralized network monitoring solution can more easily and effectively provide.
Our advice is to take full advantage of a consolidated view of your entire network infrastructure. This holistic view creates broad and deep visibility that can't be matched by an array of disparate monitoring solutions. With network monitoring you can see your complex and interrelated network resources in an end-to-end way, and understand all the dependencies amongst your network components. Silo-specific monitoring tools simply don't understand all these dependencies.
When it comes time to troubleshoot, all these siloed solutions are roadblocks slowing down the process. “If you are using different tools to monitor network devices, servers, virtual environments and applications, it can be a nightmare to try to quickly locate the root cause of a problem. With disparate monitoring solutions, you lack visibility into your complex, interdependent network, applications and servers. This makes the process of finding the root cause of performance issues slow and painful,” explains our WhatsUp Gold piece on Best Practices for Network Troubleshooting. “So, when there is a problem, users put up with delays and difficulties for far longer than they are prepared to tolerate. Complaints escalate, and management gets frustrated… and involved.”
2. Sidestep Alert Storms
Not understanding dependencies can also lead to an aggravating condition called alert storms. This is where either your network monitoring solution or disparate monitoring tools are set to send alarms and alerts when there's a trouble with that particular component. The problem is the component itself is probably not the problem, but instead a network resource that that component is dependent upon has gone awry. In fact, there may be many dependent components all sending off alarms when a core component such as a router goes down.
IT needs to know which component is actually the problem and not be sidetracked by all the dependent component’s calls for help.
These alerts from dependent components are not false alarms, but are unnecessary alarms.
If you have a network monitoring solution, and we hope you do, leverage its awareness of dependencies so that only necessary alerts are sent out.
3. Depend on Dependencies
In order to gain the benefits we just discussed, your network monitoring must be dependency-aware. Not only should your solution see all your network devices and services and how they interconnect, it should analyze these dependencies automatically. Now, instead of that vexing alarm storm, IT is only alerted to the device that is actually at fault.
4. Embrace the Complexity
Complexity is a fact of life for network pros. Instead of fearing it, like a great chess champion learn to master it. And here, network monitoring can really up your game. “Today’s networks can be astounding in their complexity. Routers, switches, and hubs link the multitude of workstations to critical applications on myriad servers and to the Internet. In addition, there are numerous security and communications utilities and applications installed, including firewalls, virtual private networks (VPNs), and spam and virus filters. These technologies span all verticals and companies of all sizes. Network management, therefore, is not confined to only certain industries or solely to large, public companies,” the WhatsUp Gold Network Monitoring Best Practices blog explained. “Understanding the composition and complexity of your network, and having the capacity to be informed of how all the individual elements are performing at any given time, is a key success factor in maintaining the performance and integrity of the network – and often of the business – as a whole. There are potentially thousands of data points to monitor on a network, and it is critical to be able to access meaningful, accurate, and current information at any given time. Network administrators need to feel confident that they know what’s happening on their network from end to end at any given point in time. It is critical to ‘know your network’ at all times.”
5. Decide What to Monitor
Let's face it, your network is most likely a monster and if you monitor every single element in depth, the results become unwieldy. Like any good army, you have to pick your battles. “For something as mission critical as your network, it’s important to have the right information at the right time. Of primary importance is to capture status information about current network devices (e.g., routers and switches) and critical network servers. A network administrator also needs to know that essential services (e.g., email, website, and file transfer services) are consistently available,” explained the WhatsUp Gold blog on Network Monitoring Best Practices.
Here are 11 key items your network monitoring solution should be tracking and altering on:
Network device availability, including switches, routers, gateways, servers, etc.
These devices are your network’s “plumbing” and must be cared for to keep the network flowing.
Availability of key services running on the network
The CEO really freaks out when the entire network is down. But that's not the only problem. When discrete services are down, that still has a business-crushing impact. Let's say your email doesn't work or your FTP servers are not available. That can cripple the entire operation or certainly large swaths of it.
Disk space use and available space on key servers
Storage demands are always increasing which means disk space is often at a premium. Disk usage increases so much that even the largest disks fill up quickly, and when they do the applications using that disk space grind to a halt. You should track disk space and have thresholds so you can take action when you get close to reaching capacity, but also be alerted to any anomalies in disk capacity which can point to either a problem with the application or perhaps a cybercriminal event.
The percentage of your routers’ or switches’ maximum throughput used on average
Your network bandwidth, i.e. the size of the pipes, is critical to proper performance, but the routers and switches that move traffic over those links is just as important. By tracking how much of the maximum throughput is used, you can set thresholds which indicate a dangerous situation. And by knowing the average throughput use, IT can plan for router or switch upgrades to keep the network running effectively and safely.
Average memory and processor utilization and current utilization of key CPUs/servers
Just as router and switch throughput is important to track, so is the memory and processor utilization of your key servers and CPU based devices. Again, knowing the average use lets you create thresholds on what is normal, while current use shows how close you are to a problem. Don’t wait until memory is used up.
The functioning of security tools such as antivirus/antimalware, firewalls, updating services, and intrusion detection/prevention
If your security tools aren't working properly, do you really have security?
Level of traffic coming in and out of routers, switches, and gateways
Traffic monitoring is critical as traffic anomalies can point to cyber-criminal attacks. At the same time, understanding your peak periods and comparing that to your maximum throughput is crucial for ensuring optimal performance. This may mean moving to higher capacity routers, switches and gateways, or adding load balancing, bandwidth shaping or other traffic management solutions.
Network device availability
Let's face it, your network likely has an array of heterogeneous devices, all chosen for their best-of-breed capabilities or specific use cases. That is why IT should effectively monitor Windows, as well as Linux, UNIX, and a bevy of servers, workstations, and printers for performance and availability.
Events logs, such as WinEvent or Syslog
Log management begins with collecting logs and having a systematic way of analyzing logs and leveraging the information they contain. Event logs not only explain recent problems and anomalies, but are also critical for security forensics. In many cases, a security breach or data breach isn't found for months or years after it occurs. Analyzing event logs, if you're able to archive them for that long, lets you figure out how that breach occurred and help you stop those types of breaches from happening again.
Critical SNMP traps, including server room temperature probes and printer information
Network monitoring is also terrific for physical infrastructure such as server rooms. You want that server room to be nice and cool to protect the equipment inside. Here, temperature probes let you know if you have a problem with the air conditioning or if there's some other reason why the temperature is askew. On a broader level, SNMP traps can carefully track all your printers letting IT know if they are working or when the toner is getting close to running out. These are just two examples of the many things SNMP traps can do.
Critical apps such as Windows applications running on Windows servers
Not everyone thinks of applications as network resources, but they run on the network, depend on the network, and their failure or lack of performance impacts productivity and can indicate an outside invader. Ideally your network monitoring solution should also track the performance and availability of critical apps such as SQL Server or Exchange. In the case of Windows applications, this is done through WMI monitors.