Home > Alex @ Pythian > Oracle 10.2.0.3 CRS - Missed Heart Beats Format in ocssd.log

Oracle 10.2.0.3 CRS - Missed Heart Beats Format in ocssd.log

February 4th, 2007 Alex Gorbachev

Oracle CRS 10.2.0.3 patchset changed the logging of missed heartbeats by CSS.

Here is example how heartbeats misses are logged in ocssd.log in 10.2.0.3:

[    CSSD]2007-02-02 14:41:06.867 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 50% heartbeat fatal, eviction in 29.440 seconds
[    CSSD]2007-02-02 14:41:21.865 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 75% heartbeat fatal, eviction in 14.440 seconds
[    CSSD]2007-02-02 14:41:30.864 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 5.440 seconds
[    CSSD]2007-02-02 14:41:31.866 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 4.440 seconds
[    CSSD]2007-02-02 14:41:32.868 [1199618400] >TRACE:   clssnmPollingThread: node node1 (1) is impending reconfig
[    CSSD]2007-02-02 14:41:32.868 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 3.440 seconds
[    CSSD]2007-02-02 14:41:32.868 [1199618400] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2007-02-02 14:41:33.870 [1199618400] >TRACE:   clssnmPollingThread: node node1 (1) is impending reconfig
[    CSSD]2007-02-02 14:41:33.870 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 2.430 seconds
[    CSSD]2007-02-02 14:41:34.862 [1199618400] >TRACE:   clssnmPollingThread: node node1 (1) is impending reconfig
[    CSSD]2007-02-02 14:41:34.862 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 1.440 seconds
[    CSSD]2007-02-02 14:41:35.864 [1199618400] >TRACE:   clssnmPollingThread: node node1 (1) is impending reconfig
[    CSSD]2007-02-02 14:41:35.864 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 0.440 seconds
[    CSSD]2007-02-02 14:41:36.306 [1199618400] >TRACE:   clssnmPollingThread: node node1 (1) is impending reconfig
[    CSSD]2007-02-02 14:41:36.306 [1199618400] >TRACE:   clssnmPollingThread: Eviction started for node node1 (1), flags 0x000f, state 3, wt4c 0

Note that 10.2.0.2 would start logging each heartbeat miss from the second miss:

[ CSSD]2006-11-07 22:15:59.420 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(2) checkin(s)
[ CSSD]2006-11-07 22:16:00.422 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(3) checkin(s)
[ CSSD]2006-11-07 22:16:01.424 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(4) checkin(s)
[ CSSD]2006-11-07 22:16:02.426 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(5) checkin(s)
[ CSSD]2006-11-07 22:16:03.428 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(6) checkin(s)
[ CSSD]2006-11-07 22:16:04.430 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(7) checkin(s)

10.2.0.3 has somewhat more user-friendly format and tells you when potential eviction would occur but starts logging only after 50% of heartbeats are missed. This means that you won’t be aware of “short” interconnect instability if there are any. I would prefer something like "missed i checking out of n.

The output above is from Linux platform. It might be different on other operating systems. For Linux even with 10g you should have installed hangcheck-timer but this is another topic and I might blog about it soon.

Categories: Alex @ Pythian Tags:
Comments are closed.