数据库的事务数高了, 再好的机器, 也顶不了多少, Log Write这块会成关键, 在一个每秒有几百上千个事务的库上, 发现了这样的负载现象, Rq是Run Que, Act是Active Session.
www.AnySQL.net Load SY/WT/US Rq Act
07/23-22:18:02 6.09 11/27/31 64 16
07/23-22:18:12 6.43 10/29/29 48 41
07/23-22:18:22 5.82 7/42/21 39 1173
07/23-22:18:32 6.07 12/41/34 72 196
07/23-22:18:42 5.62 8/52/24 52 329
07/23-22:18:52 5.41 16/27/35 60 50
07/23-22:19:02 5.03 13/27/33 44 26
看一下V$SYSTEM_EVENT中的内容, 就会明白了, 206是单块读事件, 203是log file sync事件.
07/23-22:18:02 206-32714:206455:63, 203-12736:14704:11
07/23-22:18:12 206-30236:227577:75, 203-11799:12400:10
07/23-22:18:22 203-8999:1922849:2136, 206-19601:358029:182
07/23-22:18:32 206-38475:3520870:915, 203-16434:1425936:867
07/23-22:18:42 203-12308:3448652:2801, 206-27495:3031272:1102
07/23-22:18:52 206-34469:322209:93, 203-13581:107794:79
07/23-22:19:02 206-32916:234741:71, 203-13072:16812:12
两周前Oracle的人来交流时, 我反映过这个问题, 他们说有几个和log file sync超长时间等待有关的Bug. 我也向他们要这些Bug的patch了, 可还没有发送给我们, 有些bug, 发现的晚了, 在metalink是找不到老版本的patch的, 需要特殊对待.
Bug 5065930 - "log file sync" timeouts can occur
Bug 3748799 - Average "log file sync" wait per transaction greater than 1 (no timeouts)
Bug 5087592 - "log file sync" waits from read only commits
Bug 6319685 - LGWR posts do not scale on some platforms
上面是我自已搜索到的相关Bug, 和交流时Oracle工程师给我看的有些不一样.
留言 (3)
What's your version? If I were you, I would talk to the Oracle support and apply one patch at a time (if needed), to see which one of them can solve the problem, and stop applying more if the problem is gone.
Posted by Yong Huang | Jul 26, 2008 1:38 AM
I don't have down time to test the patch, and I have create a SR for this.
Posted by anysql | Jul 26, 2008 7:24 AM
Work with Oracle support to narrow down to the exact problem so you only apply the needed patches. It's possible you can do that by taking a few error stacks and compare the stack trace with those in the bug reports. You can use oradebug errorstack or an OS level debugger to do it.
Posted by Yong Huang | Jul 30, 2008 2:54 AM