Friday, 27 February 2015

Elapsed time per exec of a SQL - AWR report puzzle!

I recently faced an issue when the user was complaining that a particular SQL was taking more time than expected. It usually runs for 40-50 minutes but on an unfortunate day it took more time.

AWR reports have always been my savior but when I generated one I was mildly surprised. Read on.

AWR reports:
                  Snap Time
Begin snap      1:00
End snap         2:00

SQL ordered by Elapsed Time
Elapsed Time (s) Executions    Elapsed Time per Exec (s) %Total  %CPU     %IO SQL Id
2,900.00                          0                                             74.04        6.07     94.19     acbcdefg

I see that the SQL was still executing (obvious due to the face that the executions still showed zero) at the end of 2:00 am.

              Snap Time
Begin snap     2:00
End snap        3:00

SQL ordered by Elapsed Time
Elapsed Time (s) Executions   Elapsed Time per Exec (s)       %Total   %CPU    %IO SQL Id 
2,800.00                        1                  2,800.00                68.03      5.38    96.15      acbcdefg

I generated another one during the interval 2:00-3:00 expecting I would have the SQL completed in this duration.

There it is , as expected Executions = 1, which means the SQL has completed by the end of snapshot at 3:00 am. 

As Elapsed Time per Exec showed 2,800 seconds without checking further informed my user saying the SQL has completed in 2,800 seconds as it was earlier feeling relieved that there was nothing wrong with the SQL and I could get a good night's sleep.

Soon after I told the user he immediately called me at 3.30 am stating that the AWR isn't showing  accurate results and that the SQL took about 90 minutes to complete.

Woken up from my sleep, rubbing my ever so drooping eyes I had to login and had to check what exactly was going on.

So, I generated another AWR , this time for snapshots between 1:00 - 3:00 am , I saw that the SQL had taken 5,700 seconds which shook me up and I was wide awake!.

             Snap Time
Begin snap     1:00
End snap        3:00

SQL ordered by Elapsed Time
Elapsed Time (s) Executions   Elapsed Time per Exec (s)       %Total   %CPU    %IO SQL Id 
5,700.00                        1                  5,700.00                                71.09      5.73     95.12      acbcdefg

How could this be? why was the AWR I had taken earlier lying to me? My assumptions were in for a rude shock!

Moments later realized that AWR only picks information for that particular snapshot and doesn't show the whole truth. My mistake really.

To confirm the same, I joined DBA_HIST_SQLSTAT DBA_HIST_SNAPSHOT for a couple of snapshots and voila! , AWR wasn't lying after all !.

I immediately fixed the query with an SQL plan baseline and it ran fine there after.

The solution was quite simple. 

The main idea of this post was to emphasize on a minor detail that we often skip when your mind starts making assumptions. 

Assumptions are good but then again they can go horribly wrong and wake up at 3.30 in the morning! :)

I shared this information on OTN as well which is one of my favorite forums. Here is the link