AWR Data further mined with Method-GAPP

Last week I got the great opportunity to present on Method-GAPP again at the UKOUG 2011 (see presentation of the UKOUG2011). This time the focus in the presentation was partly on the multi linear regression and for the other part especially on AWR data. The multi linear regression makes it possible to get a linear equation to calculate the end user response time, what makes it possible to get a complete breakdown of all involved components in the end user response time as show in the graph below. In the graph the test and modelling from the white paper is shown:

Breakdown of all the involved components for the end-user response time

Breakdown of all the involved components for the end-user response time

In the breakdown, the UTILR80 is the utilization of the I/O and the UTILRAU is the utilization of the CPU. The breakdown shows that basically the REST is time which is always there but might be split out in more components if the involved model is enhanced. So more time is explained from the found variance of the end-user (R) response time.

So Method-GAPP is in a way always working in two important steps:

1. Which components are involved in the technical infra structure (mostly based on primairy components). So is it for example the database server, application server, etc.

2. Which details are involved in the found component from step 1. For example involved SQL, involved waitevents in the database.

To continue with the second step as we already know from step 1 that the database server is the most involved, we can start focussing on AWR data. As shown in the whitepaper and in earlier blog post, the AWR data can be seen as secondary component data in the method.

While gathering the end-user response time (R) data from the client, the gathered data in AWR from the tables DBA_HIST_SQLSTAT (columns elapsed_time_delta/executions_delta) and DBA_HIST_SYSTEM_EVENT (column time_waited_micro_fg) can help to find relations between the involved SQL and involved events. First I show a graph with all the involved SQL statements.

Mining SQL information via Method-GAPP

Mining SQL information via Method-GAPP

The graph shows that the involved statements in the transactions in the researched case are indeed statements which are doing things for the transactions. The found statements are also important in the AWR report, so this shows that Method-GAPP gives via statistics similar information as when hard measured like in AWR reporting. Method-GAPP will get the correct SQL out, which is involved in your researched end-user process and will not only show the sql which was the most used on that specific database.

If we look a the wait events than we can find also a similar picture, the graph is shown below.

Wait events mined by Method-GAPP

Wait events mined by Method-GAPP

The found SQL and wait events can help to optimize the end-user process, by tuning SQL or make other changes. In general we can say that there are three ways to optimize a system:

1. Schedule, do task on other moments. In the end a system is always tuned for the top utilization, so by schedule tasks better we have our workload better spread over the available capacity. The amount of resources per task doesn’t change but because work is better distributed the current capacity can be used better and so more tasks can in the end be handled by the system than before.

2. Optimalization, change application / sql code so less resources are used. In other words more work can be done with the same capacity. Because one task cost less resources as before.

3. Capacity, put more hardware in. Most of the time this solution is fast, but most of the time far from the best solution. Nowadays a lot of customers have good agreements with hardware vendors. So in case a customer needs capacity for a short period of time, the capacity is only added for this period and given back afterwards.

In general can be said that optimizing and scheduling tasks is the best way forward. This is most of all due to the fact that more licence fees have to be paid when added more machine capacity. Method-GAPP can help a lot with a good way to do performance management and find out where in a complicated technical infrastructure problems are. Further more it can help with capacity planning by doing performance predictions before changing or adding extra hardware.

6 thoughts on “AWR Data further mined with Method-GAPP

  1. I especially like that you highlight scheduling in addition to individual task optimization to accomodating peak load within service levels. In today’s “I want it now” society, we often lose sight of this.

    Urging people to create application suites that routinely include specification of deadline goals and elasped time expectations for longer jobs will ultimately allow automation of load shifting of non-interactive and “real soon now” jobs to ebb load periods for execution.

    Automating this sort of workshift planning against the historical record can then become a usable tool to avoiding the purchase of unneeded capacity by reducing the actual experienced peak demand.

    Bravo.

  2. Hi Mark, Thank you for your feedback. Cary Millsap once showed in a presentation these three ways to optimize a system. The whole idea never let go of me and I still find it very useful in my dayly job of advising and optimize systems for performance.

    In the past I worked a lot with Oracle E-Business Suite and in this application there are concurrent managers which are processes to schedule jobs. By optimimzing workshifts and roles for these managers you could have a much better distribution of the task over the day. This same principle is used in several other applications and of course Oracle resource manager is also an implementation of this kind of workload scheduling. It is basically something a lot of people forget about, but is a lot of times a very cheap but very effective way of optimizing a system.

    Regards,
    Gerwin

  3. G’day Gerwin, I’m having some problems with your slides.

    p.28 In what sense is this a “corrected” M/M/1 curve? M/M/1 is a hyperbola. You
    show a linear plot as a function of some kind of unidentified nonlinear x-axis
    (not logarithmic either). Please explain.

    p.29 The red M/M/1 curve appears to be incorrect. The service time is roughly
    145 ms, i.e., the shortest possible response time. At 50% utilization, the M/M/1 response time should be TWICE that value. Your plot shows response time of only about 150 ms. (cf. correct M/M/1 plot on p.34)

    p.35 Although these data have the correct convex shape, they can’t possibly be
    modeled as an M/M/4 queue. Which begs the question: Where is the modeling
    curve on that slide?

    –njg

  4. Hi Neil it is great to hear from you. I really enjoyed our discussion last time at HOTSOS 2011 in Dallas. I really like to answer the questions you asked me.

    The first question regarding slide 28: I agree that it looks a bit odd, but slide 27 and 28 are in the presentation to explain that the data modelling step actually is used to make a curved relationship between utilization and normalized response time (R/S) a straight line. This can only be done by correcting the x-axe in the graph. The far right end of the x-axe will never reach 1.0, while the y-axe will go to unlimited. So the graph is only a plot of some utilizations on the x-axe plotted against the normalized response time (R/S) as a straight line. That is why the scale looks so weird.

    The second question regarding slide 29: You are right if the service time (S) at utilization “0%”, is the value on the y-axe. But this is in this case not the case, in this example there is a lot of unaccounted time on top of the service time (S) what makes that graph (see slide 34) in fact is shifted up. See also the pages 19, 20 and 21 in the whitepaper (v1.0), the service time (S) is around 6-7 ms. The service time we had a long discussion on at HOTSOS 2011 and I will show you soon some new data what will prove that service time (S) can be calculated correctly.

    The third question regarding slide 35: I think that this slide can be very confusing in a lot of ways. One of the things you said to me is the fact that different utilization look to have the same “increase of response time”, like shadows. At that time I explained it was due to the fact that by roundings this effect occurred in the graph. The otherthing you refer here too is the fact it doesn’t look like an M/M/4 queue graph. Well that statement looks to be correct, but when using the formula found in the white paper v 1.0, in figure 22 on page 20, and only use the following part:

    16193.15 x Mcpu,n=4

    You actually are able to find the plotted graph. Because the formula belonging to the graph is so odd and the M/M/4 curve plotted in it would also look odd I did not plot it. I think for presentation reasons it might be better not to present the slide at all, because it may cause confusion.

    Currently I am verifying more data and I am busy at some real customer cases and till now everything looks really promissing. I really would like to do some tests with PDQ and verify the actual found service times as we did at HOTSOS 2011.

    I really hope I gave you satisfying answers and hope to hear from you.

    Regards and thanks,

    Gerwin

  5. Gerwin,

    After 3 yrs (Hotsos 2008 http://goo.gl/5kk8S) or since Hotsos 2011 (9 months), depending on how one counts, I think it’s time that your claims were held to a higher standard. Otherwise, you are just saying the same thing over and over in a confusing manner.

    Because you are using *conventional* queueing models (M/M/m) and *conventional* statistical factor analysis (ODM), in an *unconventional* way (your claimed method), it is incumbent upon you to demonstrate your
    procedure more mathematically, not just repeating the same (possibly wong) data analysis over and over. Data alone can never be totally convincing b/c you could be making a sequence of mistakes (some of which might even cancel each other out) and you would never know it. I can’t tell from reading your slides (these or any older versions), but several of your steps appear unconvincing at best, and possibly wrong, at worst.

    Moreover, this sort of informal exposition would never be acceptable in other scientific discourse. Yet, I believe you want your “method” to be treated as though it were scientifically legitimate. Therefore, you have to show clearly and rigorously how your procedure locks together step by step. These slides
    don’t accomplish that, IMHO.

    To paraphrase Carl Sagan: Extraordinary claims require extraordinary proof.

    –njg

  6. Hi Neil,

    Thank you for your comment again, and sorry for the very late response. Currently extremely busy at a customer here in the netherlands. Normally that would not be such a problem, but due to the long travel time to and from the customer I am making very long days on the moment.

    I agree with your statements, also to give more mathematically prove about my claims and I would like to do that. The point is that I am currently looking into productizing the method and I think it is better to be a bit further with that before explaining deeper detail.

    I hope you understand my concern.

    Regards,

    Gerwin

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>