When we're in a high CPU usage situation on Linux/Unix, usually we will use "top" or "prstat" (on Solaris) to find out which process is consuming a large percent of CPU. If we can find the top sessions in an Oracle database, it will be very useful in performance diagnosis. Also we can go by PID from "top" or "prstat" output, and then join V$PROCESS and V$SESSION to get the top sessions. This is not as convenient. In addition, you may ask how can you find the session which caused most of the "consistent gets" or "physical read"?

OTop can answer these questions in about 5 seconds. Oracle will collect a lot of statistics when running with TIMED_STATISTICS = TRUE, a parameter very useful in performance diagnosis. If we join V$SESSION and V$SESSTAT, we will be able to find top sessions ordered by any statistic name. Thus we can find which sessions are causing most of the consistent gets or physical read. As a matter of fact, I have been using OTop to resolve performance issues for many times. I initially wrote it in a few hours, and found it more powerful than I have imagined.

1, Free Download

2, Installation & Syntax

OTop is a small powerful utility. You must get Oracle client or server software installed on your machine. If you want to connect to a remote database, config the oracle network client also. Lastly, create a user ANYSQL with password ANYSQL and grant him CREATE SESSION and SELECT_CATALOG_ROLE privileges.

C:\MYDUL\utility>dcba -1 -h
AnySQL.net OTop for Oracle, Release 3.0.0
(c) Copyright Lou Fangxin (AnySQL.net) 2004-2006, all rights reserved.

Usage: dcba -[option][value]
          -{1|2|3|4} 1:OTop, 2:OPMon, 3:OTune, 4:OLoad
          -H    print help message
          -U    Oracle user name with select_catalog_role.
          -S    tns alias if remote database.
          -C    number of top SQLs displayed (min 5, max 16384)
          -Q    run in quiet mode, write to file
          -D    directory of output file in quiet mode
          -O    Order by (GETS/CPUT/DISK/SORT/WAIT/REDO/EXEC)
For OTop, the default time interval is 5 seconds, and the display count is 20. The default order is by "consistent gets". You can specify the order with "-o" option. There're some short terms for some common statistic names, as shown in the following:
  • DISK = physical reads
  • GETS = consistent gets
  • EXEC = execute count
  • CPUT = CPU used by this session
  • SORT = sorts (memory)
  • REDO = redo size
  • WAIT = enqueue waits
  • SENT = bytes sent via SQL*Net to client
    You can also specify the full statistic name with option -o"statistic name".

3, How to connect to database?

If we use SQL*Plus to connect to the database as "sqlplus anysql/anysql@prod", then you should run OTop as "dcba -1 -uanysql/anysql -sprod".
Find top sessions according to physical reads by "dcba -1 -uanysql/anysql -sprod -odisk ".
Find top sessions according to redo generation by "dcba -1 -uanysql/anysql -sprod -oredo ".
Find top sessions according to any statistics by "dcba -1 -uanysql/anysql -sprod -o"statistic_name" ".

4, Output Sample

 Top Session ( 16:54:47 - 16:54:53 ) Order by gets
-Statistics-------------------------------------------------------------------------------------
     1     0  logons cumulative   735   122  opened cursors        4     0  user commits
  8567  1427  user calls          826   137  CPU this session      0     0  enqueue waits
    48     8  db block gets      165K   27K  consistent gets    2739   456  physical reads
    55     9  db block changes      0     0  consistent change     0     0  physical writes
     0     0  DBWR buffers scan  2206   367  free buffer req       1     0  free buffer ins
     0     0  CR blocks created   17K  2959  redo size            37     6  redo blks written
  180K   30K  Table scan rows   1125K  187K  Table fetch rowid   744   124  Total parse count
     1     0  Hard parse count   3151   525  execute count     8025K 1337K  bytes via SQL*Net
   529    88  sorts (memory)        0     0  sorts (disk)

-Get/S--Mis/S--Slp/S-IGet/S-IMis/s-SGet/S--Latch------------------------------(Top Latch)-------
   18K    980     17      0      0    963  library cache
  5851    143      0      0      0    143  row cache objects
  7748     74      1      0      0     72  shared pool
  9475     35      1      0      0     34  library cache pin
  6248     26      0      0      0     26  library cache pin allocation

---SID-Serial--VALUE--VAL/S----PCT-----PrevSQL------CurSQL---Host-------------------------------
  1366  40755    61K    10K  37.28  2277403718           0   hostname (username)
  2031   1881    23K   3931  14.37  2277403718  2277403718   hostname (username)
   706  12041    23K   3926  14.35  2277403718  2277403718   hostname (username)
  2677  36301    22K   3768  13.77  2277403718  2277403718   hostname (username)
  1700  64805    10K   1803   6.59           0  1925564194   hostname (username)
   700  44726   2386    397   1.45           0           0   hostname (username)
   864  45587    799    133   0.49   755074748  3481933384   hostname (username)
   141  51698    772    128   0.47           0  3093740634   hostname (username)
   260  47501    711    118   0.43           0           0   hostname (username)
   604  50029    645    107   0.39  3093740634  3093740634   hostname (username)
  1086  42825    587     97   0.36           0  3093740634   hostname (username)
  2400  10629    544     90   0.33  3093740634  3093740634   hostname (username)
   683  23077    463     77   0.28           0  3093740634   hostname (username)
  2124   7060    415     69   0.25  3093740634           0   hostname (username)
   383  60489    400     66   0.24  2254641363  2254641363   hostname (username)
  1993  47707    365     60   0.22           0           0   hostname (username)
    18  21110    338     56   0.21           0           0   hostname (username)
   441  59330    334     55   0.20    97002241  2170752115   hostname (username)
  2313  33672    315     52   0.19           0  1907819273   hostname (username)
    98  27416    312     52   0.19           0           0   hostname (username)

All the values are the delta between the two time points. First part is the statistic change, the second part the top session.
You can see that the top 4 sessions caused most of the consistent gets. You can then find out the SQL text according to the SQL hash value, and then tune the bad SQL.
You will find that this utility is really a good tool for Oracle DBAs.

5, Notes

OTop will not cause noticeable load on the database server. I get the above report on a database server with more than 2000 concurrent sessions, and the query used in OTop can return in 1 second.