Background & Overview
Tracing Oracle sessions and reading the resultant trace files is one of the key tools in the Oracle DBA's armoury. Yet, from experience, it seems that many DBAs still do not trace sessions when encountering problems.
This real life example looks at an intermittent real life problem that perplexed an application team for several months. A very simple application problem which caused a major issue to a company’s user population was solved in minutes once a trace of the offending session was initiated.
Description of Problem
Every so often a companies warehouse application would suddenly become very slow and impact on the users ability to manage the companies inventory. Despite various application analyst looking at the problem over a number of weeks, none of them could pinpoint the issue or solve the problem.
As a result of this inability to find the underlying cause of the problem, the application analysts decided to write a standard operating procedure that describe the symptoms of the problem and stated that the fix for this problem was to bounce the application.
This "sledge hammer to crack a nut" example is sure to ring bells with both DBAs and application analysts. However, this needn't be the case.
Using the Trace Script found on this site and analysing the resultant trace file produced by this script, the problem described above was rapidly identified and resolved.
Summary of Analysis and Findings
On Monday 10th May between 10am and 10:13am the session being traced on mysid ran the following SQL statement.
update Task set Priority=:b0,Update_DStamp=SYSDATE
where
(Task_Key=:b1 and Priority=:b2)
This statement ran 445,363 times. Each time it ran bind variable “:b0” was equal to 2, bind variable “:b1” was equal to 16755645 and bind variable “:b2” was equal to 3.
No other statements were run by this session across this time period and none of the bind variables changed. This indictaed abnormal application behaviour and was passed to the vendor. The vendor confirmed a bug and released a code fix.
Below are the indepth details of the analysis undertaken.
Overview of Trace Files
The trace, using the trace script, was performed for approximately 13 minutes. The trace resulted in a single trace file being created called mysid_ora_20000.trc. The trace level defined in the script was performed at level 12, which results in all bind variables and their values being outputted to the trace file.
This trace file was then processed using the Oracle utility TKPROF. This utility analyses the trace file and outputs a processed trace file detailing all of the statements run both by Oracle the database (recursive SQL) and the User (User SQL). This file was named mysid_ora_20000.prof.
Analysis of the TKPROF Output (mysid_ora_20000.prof)
The output of the TKPROF file mysid_ora_20000.prof is listed below. This file clearly shows (highlighted in red) that during the 13 minute trace period, only 1 unique SQL statement was run, but this SQL statement was executed 445,363 times. No recursive SQL statements were run. Confirmation of this fact is given in the summary at the end of the trace file (Highlighted in Purple).
mysid_ora_20000.prof
myhost: cat mysid_ora_20000.prof
TKPROF: Release 10.2.0.4.0 - Production on Mon May 10 10:30:29 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Trace file: mysid_ora_20000.trc
Sort options: default********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
********************************************************************************
update Task set Priority=:b0,Update_DStamp=SYSDATE
where
(Task_Key=:b1 and Priority=:b2)
aacall aaaaaaacount aaaaaaaacpu aaaaaaaaelapsed aaaaaadisk aaaaaaaaaquery aaaaaaacurrent aaaaaaarows
------- aaaaa------ aaaaaa-------- aaaaa---------- aa---------- aaaa---------- aaaa---------- aaa----------
Parseaaaaaaaaaa0aaaaaaaaaa0.00aaaaaaaaaa0.00aaaaaaaaa0aaaaaaaaaaaaa0aaaaaaaaaaaa0aaaaaaaaaaa0
Executeaaaaa445363aaaaaa433.09aaaaaaaa421.69aaaaaaaaa0aaaaaaaa890726aaaaaaaaaaaa0aaaaaaaaaaa0
Fetchaaaaaaaaaa0aaaaaaaaaa0.00aaaaaaaaaa0.00aaaaaaaaa0aaaaaaaaaaaaa0aaaaaaaaaaaa0aaaaaaaaaaa0
------- aaaaa------ aaaaaa-------- aaaaa---------- aa---------- aaaa---------- aaaa---------- aaa----------
Total aaaaaaa445363aaaaaa433.09aaaaaaaa421.69aaaaaaaaa0aaaaaaaa890726aaaaaaaaaaaa0aaaaaaaaaaa0Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 28Elapsed times include waiting on following events:
Event waited onaaaaaaaaaaaaaaaaaaaaaaaTimesaMax. WaitaTotal Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to clientaaaaaaaaaaaaa445363aaaa0.00aaaaaaa0.95
SQL*Net message from clientaaaaaaaaaaa445363aaaa0.01aaaaaaa234.14
latch: cache buffers chainsaaaaaaaaaaaaa1aaaaaaaaa0.00 aaaaa 0.00
********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
aacall aaaaaaacount aaaaaaaacpu aaaaaaaaelapsed aaaaaadisk aaaaaaaaaquery aaaaaaacurrent aaaaaaarows
------- aaaaa------ aaaaaa-------- aaaaa---------- aa---------- aaaa---------- aaaa---------- aaa----------
Parseaaaaaaaaaa0aaaaaaaaaa0.00aaaaaaaaaa0.00aaaaaaaaa0aaaaaaaaaaaaa0aaaaaaaaaaaa0aaaaaaaaaaa0
Executeaaaaa445363aaaaaa433.09aaaaaaaa421.69aaaaaaaaa0aaaaaaaa890726aaaaaaaaaaaa0aaaaaaaaaaa0
Fetchaaaaaaaaaa0aaaaaaaaaa0.00aaaaaaaaaa0.00aaaaaaaaa0aaaaaaaaaaaaa0aaaaaaaaaaaa0aaaaaaaaaaa0
------- aaaaa------ aaaaaa-------- aaaaa---------- aa---------- aaaa---------- aaaa---------- aaa----------
Total aaaaaaa445363aaaaaa433.09aaaaaaaa421.69aaaaaaaaa0aaaaaaaa890726aaaaaaaaaaaa0aaaaaaaaaaa0Misses in library cache during parse: 0
Elapsed times include waiting on following events:
Event waited onaaaaaaaaaaaaaaaaaaaaaaaTimesaMax. WaitaTotal Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to clientaaaaaaaaaaaaa445363aaaa0.00aaaaaaa0.95
SQL*Net message from clientaaaaaaaaaaa445363aaaa0.01aaaaaaa234.14
latch: cache buffers chainsaaaaaaaaaaaaa1aaaaaaaaa0.00 aaaaa 0.00
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTSaacall aaaaaaacount aaaaaaaacpu aaaaaaaaelapsed aaaaaadisk aaaaaaaaaquery aaaaaaacurrent aaaaaaarows
------- aaaaa------ aaaaaa-------- aaaaa---------- aa---------- aaaa---------- aaaa---------- aaa----------
Parseaaaaaaaaaa0aaaaaaaaaa0.00aaaaaaaaaa0.00aaaaaaaaa0aaaaaaaaaaaaa0aaaaaaaaaaaa0aaaaaaaaaaa0
Executeaaaaaaaa0aaaaaaaaaa0.00aaaaaaaaaa0.00aaaaaaaaa0aaaaaaaaaaaaa0aaaaaaaaaaaa0aaaaaaaaaaa0
Fetchaaaaaaaaaa0aaaaaaaaaa0.00aaaaaaaaaa0.00aaaaaaaaa0aaaaaaaaaaaaa0aaaaaaaaaaaa0aaaaaaaaaaa0
------- aaaaa------ aaaaaa-------- aaaaa---------- aa---------- aaaa---------- aaaa---------- aaa----------
Totalaaaaaaaaaa0aaaaaaaaaa0.00aaaaaaaaaa0.00aaaaaaaaa0aaaaaaaaaaaaa0aaaaaaaaaaaa0aaaaaaaaaaa0Misses in library cache during parse: 0
1 user SQL statements in session.
0 internal SQL statements in session.
1 SQL statements in session.
********************************************************************************Trace file: mysid_ora_20000.trc
Trace file compatibility: 10.01.00
Sort options: default1 session in tracefile.
1 user SQL statements in trace file.
0 internal SQL statements in trace file.
1 SQL statements in trace file.
1 unique SQL statements in trace file.
8907345 lines in trace file.
797 elapsed seconds in trace file.
Analysis of the Raw Trace Output (mysid_ora_20000.trc)
Having established that there was only one statement run during the 13 minute tracing period, it is important to identify whether the bind values for that statement were changing or staying the same. If changing bind variables are found then this tends to suggest “normal” application behaviour. Finding bind variables that are static tends to suggest “abnormal” application behaviour, or more particular a looping session, although clearly any findings would have to be confirmed by the application vendor.
Analysis of the first 60 lines of the raw trace file (See below) shows that the first two occurrences of bind variables are indeed reoccurring (Highlighted in green). To confirm this is the case throughout the file a UNIX grep statement was run against the file collecting up all binds and values. This was then pushed through a UNIX sort statement to group values and binds together before being piped through the UNIX uniq statement to remove duplicate strings.
If the binds and values are the same throughout the raw trace file, then the output of the grep statement should produce 6 lines, 3 Lines for the three binds and three lines for the three values of each bind. This is exactly what is found. (See the output of the grep statement below in red).
mysid_ora_20000.trc
myhost: head -60 mysid_ora_20000.trc
/u01/app/oracle/admin/mysid/udump/mysid_ora_20000.trcOracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0
System name:aaaSunOS
Node name:aaaaaamyhosy
Release:aaaaaaaa5.10
Version:aaaaaaaaGeneric_xxxxxx-xx
Machine:aaaaaaaasun4v
Instance name: mysid
Redo thread mounted by this instance: 1
Oracle process number: 67
Unix process pid: 25522, image: oracle@myhost*** 2010-05-10 08:53:45.281
*** ACTION NAME:()
*** MODULE NAME:(mod1@myhost (TNS V1-V3))
*** SERVICE NAME:(mysid)
*** SESSION ID:(928.8439)
BINDS #2:
kkscoacd
Bind#0
aoacdty=96 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
aoacflg=00 fl2=1000000 frm=01 csi=873 siz=88 off=0
akxsbbbfp=ffffffff7b243350 bln=32 avl=01 flg=05
avalue="2"
Bind#1
aoacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
aoacflg=00 fl2=1000000 frm=00 csi=00 siz=0 off=32
akxsbbbfp=ffffffff7b243370 bln=22 avl=05 flg=01
avalue=16755645
Bind#2
aoacdty=96 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
aoacflg=00 fl2=1000000 frm=01 csi=873 siz=0 off=56
akxsbbbfp=ffffffff7b243388 bln=32 avl=01 flg=01
avalue="3"
=====================
PARSING IN CURSOR #2 len=104 dep=0 uid=28 oct=6 lid=28 tim=15987803131908 hv=524080716 ad='b28aa6c0'
update Task set Priority=:b0,Update_DStamp=SYSDATE where (Task_Key=:b1 and Priority=:b2)
END OF STMT
EXEC #2:c=10000,e=2072,p=0,cr=2,cu=0,mis=0,r=0,dep=0,og=4,tim=15987803131894
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=9963 tim=15987803132199
WAIT #2: nam='SQL*Net message from client' ela= 303 driver id=1413697536 #bytes=1 p3=0 obj#=9963 tim=15987803132630
BINDS #2:
kkscoacd
Bind#0
aoacdty=96 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
aoacflg=00 fl2=1000000 frm=01 csi=873 siz=88 off=0
akxsbbbfp=ffffffff7b243350 bln=32 avl=01 flg=05
avalue="2"
Bind#1
aoacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
aoacflg=00 fl2=1000000 frm=00 csi=00 siz=0 off=32
akxsbbbfp=ffffffff7b243370 bln=22 avl=05 flg=01
avalue=16755645
Bind#2
aoacdty=96 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
aoacflg=00 fl2=1000000 frm=01 csi=873 siz=0 off=56
akxsbbbfp=ffffffff7b243388 bln=32 avl=01 flg=01
avalue="3"
Grep Command and Output
myhost: egrep -i "Bind#|value=" mysid_ora_20000.trc |sort |uniq
value="2"
value="3"
value=16755645
Bind#0
Bind#1
Bind#2