Over the years I’ve worked in IT, this is probably the most common and frustrating issue I’ve seen in corporate networks. It’s often extremely hard to solve and often has a huge impact on all aspects of an organisation. When you sit and watch a domain logon take over twenty minutes regularly on a state of the art (and expensive infrastructure) you know something seriously wrong. However equally you’ll find lots of false accusations of ‘slow networks’ from frustrated users running resource hungry applications on under-powered machines, just because there is a network component.
The first step in analysing a slow network problem is to assertain is there really a network problem to be solved. There could be a whole host of reasons why people are complaining and many of them might have little to do with a slow network. For example it’s easy for an application or a piece of code to grind to a halt when accessing a network purely due to bad programming, an application with poor error control or a line searching for resources which are not accessible can make your super fast network look extremely unresponsive.
Most network engineers and experienced IT support staff have a certain methodology when trying to resolve performance problems on a network. If they’re familiar with the environment, they’ll likely have some suspicions on what’s to blame. It’s perfectly feasible for a rogue or misconfigured server with a gigabit network port to bring down a whole network on it’s own. Finding the source of high latency on a network can be fairly straight forward or fiendishly difficult but there are many tools available to help you.
A packet analysis program is though essential, without one of these and somewhere to plug it into you’re not going to get far. It doesn’t need much though a cheap laptop, a free copy of wire shark and a spanned network port to plug it into are easy enough to find in most places. There are also lots of pointers to help when you get access to such data – TCP error recovery features are some of the best tools to use for both locating and diagnosing these sorts of problems.
The key is to identify the cause of the latency in the network, that is the point at which a packet is delayed between it’s transmission and receipt. Pretty much all network problems come down to this, although the reasons can be quite varied ranging from hardware problems, incorrectly configured servers and a host of other possibilities. One of the first things to check is for TCP Retransmissions, that is the feature of TCP to retransmit packets when there is an issue, unfortunately this error recovery feature can often be the cause of problems so it’s worth checking. Often networks can be flooded with these sort of packets simply because of a small issue on a network. Often the problems may be instigated from outside the network as well particularly if there’s lots of remote access particularly through things like residential VPN systems configured for employees to access their data from the internet.
First thing to check is the parameter which determines if the retransmissions are actually necessary, this is called the retransmission timer. You can check this in your traffic logs as a value called RTO (retransmission timeout), it starts when a packet is transmitted using TCP and the it stops when the ACK is received. The time between these two events is called the round trip time (RTT) and is a useful place to start analysing any sort of significant network latency issues.
Further Reading: