Oracle 10.2.0.3 CRS - Missed Heart Beats Format in ocssd.log
Published by Alex Gorbachev February 4th, 2007 in Alex @ PythianOracle CRS 10.2.0.3 patchset changed the logging of missed heartbeats by CSS.
Here is example how heartbeats misses are logged in ocssd.log in 10.2.0.3:
[ CSSD]2007-02-02 14:41:06.867 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 50% heartbeat fatal, eviction in 29.440 seconds [ CSSD]2007-02-02 14:41:21.865 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 75% heartbeat fatal, eviction in 14.440 seconds [ CSSD]2007-02-02 14:41:30.864 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 5.440 seconds [ CSSD]2007-02-02 14:41:31.866 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 4.440 seconds [ CSSD]2007-02-02 14:41:32.868 [1199618400] >TRACE: clssnmPollingThread: node node1 (1) is impending reconfig [ CSSD]2007-02-02 14:41:32.868 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 3.440 seconds [ CSSD]2007-02-02 14:41:32.868 [1199618400] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2007-02-02 14:41:33.870 [1199618400] >TRACE: clssnmPollingThread: node node1 (1) is impending reconfig [ CSSD]2007-02-02 14:41:33.870 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 2.430 seconds [ CSSD]2007-02-02 14:41:34.862 [1199618400] >TRACE: clssnmPollingThread: node node1 (1) is impending reconfig [ CSSD]2007-02-02 14:41:34.862 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 1.440 seconds [ CSSD]2007-02-02 14:41:35.864 [1199618400] >TRACE: clssnmPollingThread: node node1 (1) is impending reconfig [ CSSD]2007-02-02 14:41:35.864 [1199618400] >WARNING: clssnmPollingThread: node node1 (1) at 90% heartbeat fatal, eviction in 0.440 seconds [ CSSD]2007-02-02 14:41:36.306 [1199618400] >TRACE: clssnmPollingThread: node node1 (1) is impending reconfig [ CSSD]2007-02-02 14:41:36.306 [1199618400] >TRACE: clssnmPollingThread: Eviction started for node node1 (1), flags 0x000f, state 3, wt4c 0
Note that 10.2.0.2 would start logging each heartbeat miss from the second miss:
[ CSSD]2006-11-07 22:15:59.420 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(2) checkin(s) [ CSSD]2006-11-07 22:16:00.422 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(3) checkin(s) [ CSSD]2006-11-07 22:16:01.424 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(4) checkin(s) [ CSSD]2006-11-07 22:16:02.426 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(5) checkin(s) [ CSSD]2006-11-07 22:16:03.428 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(6) checkin(s) [ CSSD]2006-11-07 22:16:04.430 [1107360096] >TRACE: clssnmPollingThread: node node1 (1) missed(7) checkin(s)
10.2.0.3 has somewhat more user-friendly format and tells you when potential eviction would occur but starts logging only after 50% of heartbeats are missed. This means that you won’t be aware of “short” interconnect instability if there are any. I would prefer something like "missed i checking out of n“.
The output above is from Linux platform. It might be different on other operating systems. For Linux even with 10g you should have installed hangcheck-timer but this is another topic and I might blog about it soon.

