Performance testing aims at reducing the risks of substandard performance. These risks can be divided in three areas and they can be tackled best in three corresponding stages. In the previous articles I demonstrated how the first stage, optimising the performance potential of the code is done.
The next stage is about optimising the capacities. We have to keep in mind that this is about both hardware and software resources. As long as we provide sufficient capacities for any load, the application under test will show only little change in its transaction profiles and response times. If there is insufficient capacity of any resource it will cause waiting times that extend the response times and may ruin performance. It is our mission to reduce that risk, prevent problems and if prevention fails to correct it as quick as possible. In this and the next article I focus on hardware capacities.
Before I demonstrate our model centric solution for stage 2, I provide some background in the next articles on capacities and the load model.
A capacity can be seen as a certain amount of resources. Numbers of cores, number of servers in a cluster, number of spindles, number of SSDs, number of (teaming) network connections. Not only the amounts, but also the speeds of the resources matter. A twice as fast resource also offers twice the capacity. However there is a big difference between two units offering double capacity and one unit with double speed.
Utilisation is an important aspect. When we use a resource half of the time we have a 50% utilisation. When we use one of a set of four resources all of the time, this set has a utilisation of 25%. An important question is: at how much utilisation do we want our resources? Since we spent money on hardware resources we (and especially the CFO) would like to see utilisations at 100%. However the higher the utilisation the longer the waiting times for the resource due to queuing. Waiting times increase in a non-linear way. The higher the utilisations, the faster the waiting times grow! Hence we have to maintain lower utilisations. How low depends on how much response time extending waiting times we tolerate. A resource charged too high we call a bottleneck. It causes excessive extension of response times. So when you have one or more overcharged resources you have a performance problem.
Queuing theory teaches us that there are several other factors than utilisation that determine how long waiting times become. E.g. a server with four cores shows lower waiting times than a server with only one core. With a large amount of users you have higher waiting times than with a few users. The bottom line being one user: no waiting time at all. If the dispersion in the times a resource is engaged goes up, the waiting times increase. What does that mean in practice? If you collect the right data you can calculate utilisations of resources and predict waiting times for a certain load model.
If we test at average values of the load everything may show all right, but then peak values with high utilisations can still cause performance problems. For many applications peak values are around 1.5 times the average. Performance should not only be OK at average load, but also at peak load.
How to handle peak load? The load model should reflect peak load.
What type of values should we use? Personally I prefer the 95 percentile value for most situations. Which means that only 5% of the time we may have higher values. Consequently 5% of your transactions may have substandard response times.