Sunday 21 September 2008 at Oracle Open World, I had the opportunity to present my method “GAPP” once more (HOTSOS 2008 and Planboard may 2008). This time I also mentioned how the method can be used with Service Oriented Architectures (SOA). For people not knowing what “GAPP” is all about I give a small introduction to the method. I also like to tell you why I started with “GAPP” in the first place and what the added value is of the method above other methods.
“GAPP” means General Approach Performance Profiling and can be used to find out where in your architecture the most wait time variance can be explained from your business process. “GAPP” makes it possible with very little data, in higly complex technical infrastructures, still be able to find the performance bottlenecks for a specific business process. The nice thing about the method is that it is not only able to pinpoint a bottleneck which is already there, it is also able to pinpoint a future bottleneck in a normal running system. This is something what only “GAPP” can do.
What makes “GAPP” special:
- The method can analyse the full infrastructure, so from front-end to back-end
- The method is not focussing on one piece of the infrastructure, like only the database
- The method is able to predict how the response time of a business process will react on changes in involved factors
- The method is able to predict when a certain bottleneck will evolve to a real problem
Further big advantages of the method are the gaps which can be jumped of:
- not allowed to trace
- not allowed to change anything in the application (eg. to get log data)
- not having a representable test environment
- a too complex technical infrastructure (eg. SOA environment, environments with more than 2 involved servers)
In the past a lot of times I saw in organization that at the end of the week the system administrators sent an email to their management with a response time graph from a business process (eg. load runner) and some graphs from the cpu, memory and I/O from the database servers. When I looked at such emails I was not sure what to think about it. The only thing I could say was that most of the
time spikes in the response time graph were not explained by the other graphs in the email. This made me wonder if it would be possible to explain the spikes in the response time based on data from the background architecture. This brought me to the idea of creating a method, which could do this job. After a lot of customer cases and investigation, the method is worth having the name “GAPP”. The method is now so powerful that it works even if you have very little information, as mentioned in the bullets above. In a lot of situations architectures are getting just too complicated or organizations have not ways to reproduce problems in their test environments. The method makes it possible with
available data to pinpoint where in the architecture a certain performance bottleneck resides.
The method can further be said to be really general, because of the fact that any architecture can be analyzed by the method. Within the method four important steps can be determined:
1) Data Collection
- Which parts of the technical infrastructure are involved in the business process
- What data is available and in which time granularity
2) Data Synchronization
- Aggregation on lowest granularity of all used data
- Decision should be made on if data granularity for a factor is high enough
3) Data Mining
- Factorial Analysis
- Model Creation
4) Data Interpretation
- Mapping results with architecture knowledge
- Compare of result with measured data
The used data in the analysis can be from any source. It is important to have data from the response times of the involved business process through time. Further is data of each involved server in the chain important, this can be sar, vmstat, nmon, etc. For the involved database(s) for example AWR, statspack, etc can be used. All this data will be brought to one place and at that time able to be data mined. Here it gets really interesting. On the moment the method is using Oracle Data Mining (ODM), although there are open source data minings engines, the ODM engine is really easy to be used and is really very powerful.
When I presented at Sunday I invited Charlie Berger, the Senior Product Manager from Oracle Data Mining. After the presentation he was nicely surprised by the usage of Oracle Data Mining for my performance profiling method. We had a very nice chat and we later on that week met again on the Oracle Demo Grounds here at Oracle Open World in San Francisco.
Charlie Berger and me on the Oracle Demo Ground for Oracle Data Mining
Oracle Data Mining is very powerful and “GAPP” is more and more using the great algorithms which are in ODM. Within the market it is still not very clear how powerful this software actually is. To give you an idea why ODM is so powerful you have to keep in mind the following:
- ODM is as powerful as products like SAS and SPSS.
- ODM has the big advantage to be part of the database, so it is not necessary to reallocate the to be analyzed data to another place.
- So no security issues, the data stays where it is.
- The GUI from ODM is getting better and better, it can easily be obtained at OTN.
- ODM is also very powerful from SQL, eg. DBMS_PREDICTIVE_ANALYTICS which I personal used much for “GAPP”
- Models are stored in the database and can be used right away in the queries
- 50 of the statistical functions are free of charge to use
- ODM itself has a license fee which is pretty cheap compared with SAS or SPSS.
Just look at the picture below, to give an impression how the GUI tool Oracle Data Miner looks like:
I personally have used “GAPP” for a pretty long time now (more than two years), although first in experimental state, I now mention the method by name, when I use it for a customer. Just to give you an impression how powerful the method can be, I use an example of a very recent case (September 2008) I investigated with “GAPP”.
At the customer the screen for “Employee information” in their application was working with a very high variance in response time. Sometimes the response times were so bad that it was really unacceptable to work with. Based on this issue it was not acceptable to add more users to the system. First of all there had to be done an investigation due to what the big variance in response time was caused. Due to the high amount of components in the technical infrastructure and the fact that the infrastructure was shared by many other applications, made it for the customer very hard to pinpoint how to go further. Also normal tracing was very difficult to do. The technical infrastructure at the customer site looked (simplified) like the following:
The process involved went through a lot of steps like:
- Request from client
- Going through a firewall
- Going through a webserver
- Going through a portal server (including a portal database call)
- Going through single sign on via Oracle Internet Directory (including repository database call)
- Going through ADF application
- Retrieving information from database
- Etc, etc.
Although I will not put the complete report for the customer in this blog, I like to show the following factorial analysis I did based on the data gathered from all the involved systems and the response time of the business process gathered from the ADF application.
In the graph are just the important factors on the x-axe, the other factors, which are in total approximate 180 are not really important. Based on the factorial analysis the following response time graph and its prediction can be created:
The model shows that some of the spikes cannot be explained by the analyzed factors. This was due the fact that some factors which have had influence, were not part of the analysis. Although this is the case, in most of the time the prediction is pretty good. What makes it possible to have matrices measured to be used to predict the end user performance experience. So basically “GAPP” has given you an instrument to see how the end user experience is by only looking at the matrices.
In the customer case I was able to give them advise, what to do in their environment to be able to cope with the extra users they wanted to serve. In this case the model showed that problems with memory on GH200 and GH300 would lead to unexpected performance drops of their application. The model can also predict how much these impacts are. The whole analysis took three days.
If you even want to go a step further with “GAPP”, you could also start to change factors to other values, to predict how the response time from the business process will react on this. This makes the use of “GAPP” really powerful and puts it above a level of other methods.
Currently I am very busy to finish my white paper on “GAPP” and to make the approach easier to be used in short performance tune assignments. I really hope that people start to see the big advantage of the method and understand that with the increasing complexity “GAPP” will be the right method to go.
If you have any request regarding the method, please sent me an email (see about section)