A performance model among a few other things takes care of the relationship between resource utilisation and waiting times. In other words: it shows waiting times and response times for the utilisations that result from a certain load model. From this we can get some idea of what adequate utilisations are.
The following table is an example that shows some values for utilisations in the categories Target, Minimum and Maximum. You try to realize the target values. When these values exceed the Max value you should provide extra resources. E.g. if three cores are loaded at 90%, then 4 cores are loaded at 90*3/4 = 67.5%. When the values drop below the min value you may try to withdraw resources.
Target Min Max
CPU 60% 40% 70%
Memory 90% 70% 95%
Disks 30% 10% 40%
Networks 30% 10% 40%
Performance will not immediately go bad at utilisations slightly above the target, but you have to keep room for unexpected events.
Networks and disk storage have made a tremendous development over the last decades.
Home connections with download speeds in excess of 100 Mbps are no exceptions anymore in many places. Corporate networks commonly have bandwidths of 10 Gbps.
Many infrastructures make use of grossly oversized self-optimizing storage systems, non-volatile disk cache is quite normal, there is SSD with access times below 1 msec. As a performance specialist you do not have any influence on disk optimisation anymore.
As a consequence of these amazing technological advances most of the transaction profiles show only small contributions of disk and network. In contrast, disk and network completely dominated transaction profiles that I created in 1990.
A resource cannot be utilised over 100% by nature. However with a model we can calculate the utilisations an application seeks for a certain load, even over 100% and this is a handy feature. E.g. when your analysis shows that an application at a certain load model seeks 180% utilisation, from the above table you see with the blink of an eye that you have to provide 3 times the capacity to solve this bottleneck.
When you have multiple bottlenecks, or in other words you have multiple overcharged resources, it is not trivial to identify all of them at once. There is also the possibility of secondary performance problems e.g. in your worker threads. If one resource is overcharged it causes a throughput reduction. Consequently other resources with insufficient capacity may still show only low utilisation. There is also the challenge of a balanced infrastructure. In a balanced infrastructure the utilizations of the resources are more or less equal over all hosts. With load and stress testing you have to resolve multiple bottlenecks one by one starting with the hardware. Run the test, increase the capacity of the bottleneck, rerun the test, increase the capacity for the next bottleneck etc. Each time you have resolved a bottleneck you have to run the next load test to identify the next bottleneck. It may take quite a few load tests to realize an infrastructure without bottlenecks that is balanced too. With a performance model after one test you have a complete overview over all bottlenecks and optimise all capacities in one stroke. Inspect the loads, change the numbers for all overloaded resources, do a recalc et voilà.
Finally, with the model you can produce the estimates of elapse time and resource usage for a certain load model that predict your costs in server-less computing such as Amazon’s Lambda computing.