CRS eating CPU on VMware

Some time ago (yeah… shame on me) I mentioned having troubles running CRS on virtual machines using VMware Server. I found a solution a while ago and, since I promised to share if I find anything, now is the time.

First of all, I’m happy to admit that my observations regarding Windows hosted VM’s running better compare to Linux hosted were wrong. Indeed, how can Windows run faster than Linux?! ;-)

I used VMware Server 1.0.3. As host OS I used 64 bit Ubuntu or 32 bit Windows. Guest OS was 32 bit Oracle Enterprise Linux 4 (a la Larry Hat 4). As you could see later, I tried VMware Workstation 6.0 as well without any visible improvements. For shared storage I use either NFS exports from host OS (when using Ubuntu) or Openfiler when using Windows (even more CPU saturation).

To recall the problem… Virtual machine started to eat CPU like crazy when I start CRS inside virtual machine. Even without Oracle database - just starting CRS is enough. I could see that vmware-vmx process was consuming about 60% on one CPU core (AMD Athlon64 3800 X2). Inside virtual machine I could only see from time to time init.cssd in top and average CPU consumption jumping from 10% to 90% without any process in top that I could see. I tried strace on vmware-vmx processes in my host OS - could only see that most of the time is spent in poll system call.

It seems there were some short living processes consuming CPU and I couldn’t track them. I remember that on HP-UX I could easily catch them with glance but Linux is as usual behind (or is it me “behind”?). After some time spent inside init.cssd, init.crsd and init.evmd scripts, I tried to increase sleep time in couple places and, I couldn’t believe it, CPU was relieved — below 10% of a single core used per virtual machine.

What I did is replaced some sleep time in /etc/init.d/init.cssd file. In 10.2.0.3 I had to do it in two places:
- line 1132: $SLEEP 1 -> $SLEEP 30
- line 1249: $SLEEP 5 -> $SLEEP 35
Actually, the fist one should probably be enough

After that CRS started to run very smooth. Of course, this should never be done at your production environment. This would reduce frequency of checks for critical processes. It’s fine on my test/research environment but should never be done in other cases. I warned you. Anyway, you should be sane enough to avoid running RAC inside virtual machines.

I might mention that before I came up with the solution, I switched to VMware Workstation 6.0 (trial) and tried few settings in VMware configuration that can be set in .vmx file. One of the most useful option is setting which processors can be used for the virtual machine. I set them so that one VM run on CPU0 and another on CPU1. I still use this configuration.
LH1.vmx:
processor0.use = TRUE
processor1.use = FALSE

LH2.vmx:
processor0.use = FALSE
processor1.use = TRUE

Few other settings I tried:
MemTrimRate = "0"
sched.mem.pshare.enable = "FALSE"

Here is some info about memory trimming and page sharing from http://www.virtualization.info/2005/11/how-to-improve-disk-io-performances.html:

# Memory trimming
Workstation checks which part of the guest OS virtual memory is not used and allocates it back to the host OS. This permits to have more concurrent virtual machines running but everytime the guest OS asks back for its memory it suffers a performance degradation.

So, if you have enough free RAM for all planned concurrent VMs, be sure to disable memory trimming for guest OSes adding the following line to the virtual machine configuration (.vmx) file:

MemTrimRate=0

Note: Memory trimming can be disabled through GUI since Workstation 6.0.

# Page sharing (quoted from VMware documentation)
VMware uses a page sharing technique to allow guest memory pages with identical contents to be stored as a single copy-on-write page. Page sharing decreases host memory usage, but consumes system resources, potentially including I/O bandwidth.

You may want to avoid this overhead for guests for which host memory is plentiful and I/O latency is important. To disable page sharing, add the following line to the virtual machine configuration (.vmx) file:

sched.mem.pshare.enable=FALSE option

I also tried to play with monitor.idleLoopSpinUS but couldn’t see any improvements.

Below you can see how my CPU consumption dropped after the change:
optimizing-initcssd-scaled.png

I did the same magic trick with 11g (10.1.0.6):
- line 1334: $SLEEP 1 -> $SLEEP 30

That’s it for today. I’m off to bed and you should go and check how your virtual machines are doing.


4 Responses to “CRS eating CPU on VMware”  

  1. 1 Martin Decker

    Alex,

    you might be interested in MetaLink Note 415665.1 and Patch 5679560. This should also fix the cpu utilization of CRS during idle time.

    Best regards,
    Martin

  2. 2 Alex Gorbachev

    Thanks Martin. That looks promising. However, this issue doesn’t seem to affect 11g and that’s what I’m using more often now. I switched the host OS again and now I’m on Mac OSX Leopard and VMware Fusion. I’m not sure but it seems to be not better than VMware Server on Linux or Windows to say the least.

  3. 3 Tee

    Many thanks for this Alex. I observed the same problem on my RAC10g / VMware Server setup. I changed $SLEEP 1 to $SLEEP 5 and reclaimed 20% of my total server CPU.

  4. 4 Alex Gorbachev

    Tee, I’m glad it helped. I switched to Mac OSX as my host OS and VMware Fusion is not as advanced as on Linux or Windows, unfortunately. CPU consumption increased dramatically. Interesting that I tried to setup RAC cluster with Ubuntu guests and CPU consumption dropped significantly. However, installing RAC on Ubuntu is not a straightforward task and the environment is completely unsupported something might not work as expected - quite a few trick I had to play.

    Martin, I tried that patch - thanks - but it didn’t bring any noticeable improvements unfortunately. :( Note that I had the same issue with 11g and as I saw this bug supposed to be fixed there already.

Leave a Reply