Since I am working on my method-GAPP (see method-GAPP overview presentation) I have been challenged with the task to model a real system and not a Lab system with a programmed load profile. The big issue with a real system is that the load profile is changing all the time and the only thing we can recognize are periods of time we have a not to changing workload profile. For example an OLTP system will do during production hours from 9:30 in the morning till 11:30 and from 14:00 till 16:00 in the afternoon comparable things, but will do from 01:00 till 06:00 in the night something totally different. The given example could match maybe some OLTP systems but could be totally different for your OLTP production system.
When busy modelling response times from end-user processes you need to take in account that the above fact is the case in real systems. In method-GAPP I am trying to map known queueing model curves to the data. For example the application server cpu metric can be modelled with an M/M/n queings model curve. So it could be very well possible that the application server has 8 cores available, but that in real the M6 curve will fit on average the best to the data during production hours, and the M2 curve will fit on average the best to the data during night hours. This difference in modelling of the data can be explained by the fact that certain cores are on average occupied by long running processes and not available to your OLTP business process, leaving you in effect with less cores than you think looking at the specs.
The great power of using data mining within Method-GAPP, makes it possible to calculate the best fitting average curve to each component in the architecture for a given time period. It stays very important to know your production system and to know when your production system has a comparable load profile. The better you can differentiate similar workload profiles for your analysis the better in the end the predicting curves and formula’s will be. Due to the fact that for the data mining a linear ridge regression model is used and that the queings models are known you are able to calculate easy what the effect will be if the application server now is modified from 8 cores to 16 cores. You can judge based on the new calculated model graph if an investment in the hardware will have the desired effect for your end-user performance.