Archive

Archive for the ‘Alex @ Pythian’ Category

Presenting at NoCOUG Spring Conference & Interview in NoCOUG Journal

May 10th, 2010 Alex Gorbachev No comments

I’ve never attended the North California Oracle User Group Conferences even though they are organized every quarter. However, I’ve been always jealous of the great agenda they put together. A couple months ago, Chen Shapira reminded me once again that the next NoCOUG conference was coming up and asked whether I would be able to come to present. What a chance, I thought, easy to plan as I have no other conferences in May.

So, at NoCOUG Spring Conference 2010 in just 10 days, I’ll be doing my two hour long presentation — Demystifying Oracle RAC Workload Management. If it’s your local conference, I hope you can attend and say hello. You might also want to download the whitepaper that I put together few years ago for Hotsos Symposium — Oracle RAC Workload Management.

The conference is free to members of NoCOUG and only $50 to non-members but it would make more sense to just join the user group as its annual fees are unbelievably low — I couldn’t say it better than Iggy Fernandez did:

How much does a NoCOUG membership cost? It doesn’t cost $400, as you might expect to pay for so much educational value. It doesn’t cost $300 and it doesn’t cost $200. It doesn’t even cost $100. Yes, a calendar-year NoCOUG membership only costs $95! Won’t you join today?

NoCOUG has also its own printed publication and it’s been a honor to be interviewed by Iggy Fernandez for the NoCOUG Journal. Amongst the topics was the discussion of Battle Against Any Guess as the follow up on the chapter I contributed to the latest book by OakTable Network members — Expert Oracle Practices: Oracle Database Administration from the Oak Table. I quote a small fragment below but you can read the whole interview in the May issue. “This could be the most important issue of the NoCOUG Journal you will ever read” as Iggy mentioned and NoCOUG has made the online version of May’s issue publicly available. The BAAG chapter from the aforementioned book is also reprinted in the journal along with a detailed book review by Dave Abercrombie.

Battle Against Any Guess

Tell us a story. Tell us two. We love stories!

It’s June 2007, and I still have enough time left in my day to be active on the Oracle-L list. I’m reading the threads and once again there is one thread full of guesswork-based solutions to solve a particular performance problem. Not the first one and not the last. After entering into the discussion, I felt the conversation was the same I’d had time and time again (like a broken record), and this prompted me to create a place on the internet that I can refer to whenever I and others need to point out the fallacies of guesswork solutions. And so, the BAAG Party was born–www.BattleAgainstAnyGuess.com. The name idea came from the BAARF Party (Battle Against Any Raid Five) organized by fellow OakTable Network members James Morle and Mogens Nørgaard.

What’s wrong with making an educated guess? We have limited data, limited knowledge, limited experience, limited tools, and limited time. Can we ever really know?

“Yes we can!” At least, we should strive to know.

I’ll never forget how enlightened I was the moment I saw the slide “Why Guess When You Can Know?” presented by Cary Millsap, another fellow member of the OakTable Network. Most real life problems can be solved with the knowledge that is available in the public domain, using data that is possible to extract by applying the right experience and tools and taking enough time to do the job properly.

It is the purpose of the Battle to promote the importance of knowledge fighting ignorance, selecting the right tools for the job, popularizing the appropriate troubleshooting techniques, gaining experience, and learning to take time to diagnose the issue before applying the solution. One might think that the BAAG motto is a bit extreme but that’s a political decision to emphasize the importance of the goal.

I have elaborated on the concept of the “educated guess” in the first chapter of the book Expert Oracle Practices: Oracle Database Administration from the Oak Table. The chapter is titled “Battle Against Any Guess.” I would like to quote the following from page 11:

Oracle Database is not only a complex product, it’s also proprietary software. Oracle Corporation introduced significant instrumentation and provided lots of new documentation in the last decade, but there are still many blanks about how the product works, especially when it comes to the implementation of new features and of some advanced deployments that hit the boundaries of software and hardware. Whether it’s because Oracle wants to keep some of its software secrets or because documentation and instrumentation are simply lagging, we always face situations that are somewhat unique and require deeper research into the software internals.

When I established the Battle Against Any Guess Party, a number of people argued that guesswork is the cruel reality with Oracle databases because sometimes we do hit the wall of the unknown. The argument is that at such point, there is nothing else left but to employ guesswork. Several times people have thrown out the refined term “educated guess.” However, I would argue that even in these cases, or especially in these cases, we should be applying scientific techniques. Two good techniques are deduction and induction.

When we have general knowledge and apply it to the particular situation, we use deductive reasoning or deductive logic. Deduction is often known as a “top-down” method. It’s easy to use when we have no gaps in our understanding. Deduction is often the path we take when we know a lot about the problem domain and can formulate a hypothesis that we can confirm or deny by observation (problem symptoms).

Inductive reasoning is often considered the opposite of deductive reasoning and represents a bottom-up approach. We start with particular observations, then recognize a pattern, and based on that pattern we form a hypothesis and a new general theory.

While these techniques are quite different, we can find ourselves using both at different stages as verification that our conclusions are correct. The more unknowns we face, the more we favor inductive reasoning when we need to come up with the generic theory while explaining a particular problem. However, when we form the theory via inductive logic, we often want to prove it with additional experiments, and that’s when we enter into a deduction exercise.

When taking a deductive approach first, when applying known knowledge and principles, we often uncover some inconsistencies in the results that require us to review existing theories and formulate new hypotheses. This is when research reverts into inductive reasoning path.

Deduction and induction each have their place; they are both tools in your arsenal. The trick is to use the correct tool at the correct time.

How do we decide which competing methodology to use? Which tool is the best tool for the job? In matters of performance tuning, should we trace, sample, or summarize?

Good questions. Logic and common sense come to mind as the universal methodology for any troubleshooting. If we focus on performance then we should define what it means to improve performance. For me, performance tuning is all about reducing the response time of a business activity. When I think performance, I think response time. This is what Cary Millsap taught me through his book Optimizing Oracle Performance–he shifted my paradigm of performance tuning back then (by the way, you can read more about the paradigm shift concept in my chapter referenced above).

Since we identified that response time is what matters, the next step is to analyze where the time goes–build the response time profile. Adopting a top-down approach we might find that 2% of the time is spent on the application tier and 98% of the time spent in the database. Drilling down to the next level of granularity, we could identify two SQL statements that consume a 42% response time each. Focusing on those two, we drill down further into, say, wait events. We could pinpoint the reason for excessive response time at this stage or we might need to dig even deeper–somewhere where timed information isn’t available. This is where the current battle lies–we could win it by introducing the right instrumentation and tools.

More than a decade ago, Oracle database performance analysts didn’t have the luxury of wait interface and had to rely on various aggregations and ratios as time proxies. The same happens now on another level–when wait interface granularity is not enough, we have to rely on counters and methods such as call-stack sampling. Again, the same goes when execution exits the database, for example, to do storage I/O. Current I/O systems are not instrumented to provide a clear response time profile.

However, I want to emphasize that the vast majority of mistakes during performance diagnostic happen much earlier when we have enough knowledge and tools to avoid applying guesswork solutions, but we often don’t.

I digressed in my response from the original question on what the best tools are, but, unfortunately, I will have to disappoint–there is no magic-bullet performance tool that will diagnose all problems. The most sound advice I can give is to study the performance methods and tools available, understand how they work, when they should be used, and what their limitations are and why. There are a number of books published and if you ask me to distinguish one of the recent books, I would mention Troubleshooting Oracle Performance by Christian Antognini.

Should we extend the scientific method to Oracle recommendations or should we adhere to the party line: use the cost-based optimizer, don’t use hints, collect statistics every night, upgrade to Oracle 11g, apply the latest patch set, CPU, and PSU, etc.? After all, nobody gets fired for following vendor recommendations. Many years ago, I lost a major political battle about Optimal Flexible Architecture (OFA) and never recovered my credibility there. Once Bitten, Twice Shy is now my motto.

I’ve touched on the issue of best practices in the BAAG chapter:

“Best practices” has become an extremely popular concept in recent years, and the way IT best practices are treated these days is very dangerous. Initially, the concept of best practices came around to save time and effort on a project. Best practices represent a way to reuse results from previous, similar engagements. Current application of best practices has changed radically as the term has come into vogue.

What are now called best practices used to be called “rules of thumb,” or “industry standards,” or “guidelines.” They were valuable in the initial phase of a project to provide a reasonable starting point for analysis and design. Unfortunately, modern best practices are often treated as IT law–if your system doesn’t comply, you are clearly violating that commonly accepted law.

Vendor recommendations are very valuable in the early stages of a project and even later on, as progress is made. In order to apply vendor recommendations correctly, one should understand the reasoning behind such advice, what problems it solves specifically and what else could possibly be affected. If you take an example of collecting statistics every night, then it makes sense for the majority of Oracle databases. There are plenty of exceptions, however, and at Pythian, we often modify the default collection schedule for our customers. Having a sound understanding of what a vendor recommends and why is the key to a successful implementation.

In some cases, it might be difficult to act contrary to generic vendor recommendations, and convincing management otherwise is usually very difficult. Some basic principles to keeping in mind when deciding your course of action are below:

  • Vendor recommendations are generic. Consider them as the default configuration of init.ora parameters. Nobody runs with all default parameters.
  • Instead of going against vendor recommendations, call it modifying or adapting to a particular environment.
  • Find a precedent where a recommendation has failed and why. It’s like being in court–nothing beats a precedent.
  • Playing politics is a whole different game. Either you are a player or you stay away.

If for some reason you can’t be at the conference, you can always schedule some time to catch up by emailing to events [at] pythian.com. See you soon!

Categories: Alex @ Pythian Tags:

MOW2010 ? Slides for Alex Gorbachev?s Sessions

April 22nd, 2010 Alex Gorbachev No comments

As the the Icelandic volcano ashes are clearing out and we finally have high hopes of flight home, I want to post the slides of the two presentations I did.

My first presentation was a double slot session about Oracle Clusterware internals. Presenting first thing in the morning on the first day is not easy at this event. Miracle Open World traditionally organized as 160% conference with 80% of technical content and 80% of networking and social interactions. Of course, the last 80% go deep into the night. Needless to say that 5am wake up call was tough — I had to craft few more slides to add some 11gR2 information and publish the first production of We Do Not Use TV Studio.

But I felt surprisingly well and fresh. The presentation itself was quite dynamic and all demos worked as planned except pausing OPROCD — 50/50 that eviction happens during my actions and it took me 10 time to repeat it. I couldn’t recall that I was lucky more than twice in a row until that session but… things happen. You can see the slides below for the reference. However, without the text and demos behind the slides, they are not very useful I’m afraid.

The second presentations was way more difficult for me to carry — not only because it was the topic that I much less proficient with but because the previous night was even longer and more intense with waterpark adventures (and I tell you, it was the best waterpark ever — well done!). The good news is that both I and Graham Wood went to bed at the same time and we had our presentations at the same slot (and I think we consumed comparable amount of magic liquids). Of course, we woke up synchronously and been in sync the whole day including our conditions that we managed to significantly improve before 11:30am when our sessions started. I was afraid that it may turn into disaster but it turned out to be reasonably well especially that I had an excuse if I couldn’t remember what I was talking about — I just reached out for the water and everyone could understand. :)

So my second presentation was about how we designed and run 1TB MySQL database in high availability setup. I’m not a hardcore MySQL DBA but learned enough about it to talk smart things and thanks to my special interest in everything HA, I actually could talk real deal. Slides are here for your reference.

Perhaps, I manage to publish more details and photos from that fantastic conference later so stay tuned. For now, I could tell that it was the longest Oracle conference ever.

Categories: Alex @ Pythian Tags:

Miracle OpenWorld 2010 Opening ? We Do Not Use TV

April 15th, 2010 Alex Gorbachev No comments

I’m in Denmark these days at the wonderful Oracle conference organized by Miracle A/S — Miracle OpenWorld 2010. This is the sort of 80/80 conferences that we all love — we spend 80% of our time on technical content and 80% on networking with your peer. Of course, you have to sacrifice something like sleep but such is life.

Opening session was something special this year (as if it’s non-special any other year) — Jonathan Lewis was talking about something that he admitted he is not an expert in! He was presenting about Microsoft SQL Server and whether it’s and Enterprise-ready RDBMS. His conclusions were positive.

The opening was rounded up with a new film production demo from We Do Not Use TV Studio.

Of course, the strongest continued in the party house as usual. This year it was Mogen’s house and I still didn’t see him this morning. Because my presentation starts at 9am in the morning I had to leave the party early but we still managed to stick around past 1am sipping very nice and smooth tequila brought by Graham Wood.

Time to grab a bite and get ready to my dual-slot presentation — Under the Hood of Oracle Clusterware which is the very first presentation of today. More to come — stay tuned.

Categories: Alex @ Pythian Tags:

Welcome Chen Shapira to Pythian

April 8th, 2010 Alex Gorbachev No comments

I’m excited to announce that Chen Shapira has started with Pythian this week. :) Chen is no stranger here and many of my colleagues already know her and were in touch — she just naturally fits in.

Chen is the world class production Oracle DBA. She has been maintaining a popular Oracle blog and is a great addition to Pythian bloggers — Chen posted on Pythian blog before her actual official join date authoring Log Buffer last week. She is also on Twitter and you can follow her @gwenshap. Turns out that even Pythian SQL Server DBA’s are frequent readers of her blog — who knew?

Chen is also a frequent presenter at the conferences such as RMOUG10, Hotsos09, OOW09 and OOW08.

Chen is an Oracle ACE and also a member of OakTable Network. She is very active in her local user group, Northern California Oracle User Group (NoCOUG), carrying duties of the Training Day Coordinator.

Welcome Chen to the Pythian team! I’m sure you are already working on acceptance testing of that new Oracle 11g RAC cluster that is slated to go live… eh… this weekend? ;-)

Categories: Alex @ Pythian Tags:

Oracle Enterprise Manager 11g Documentation Available Online! + new features preview

April 7th, 2010 Alex Gorbachev No comments

Update 9-Apr-10: The documentation was pulled out from public which is expected. Looks like someone “leaked” it hoping that it won’t be discovered without being referenced from the OTN. Guess, what? How many of us who expected it and downloaded it locally just in case? ;-) – documentation is back.

Oracle Enterprise Manager 11g

Thanks to Marco Gralike for drawing my attention to it — yes, Oracle Enterprise Manager 11g Grid Control documentation is available online!

There you can find the list of EM 11g new features. After I quickly skimmed through it I could mention few areas:

  • Full blown Fusion Middleware management support
  • More complete support for RAC, Grid Infrastructure and ASM
  • Integration with My Oracle Support
  • Improvements in provisioning (I hope it really works now and can actually start saving us time and not make us spend more time troubleshooting failures)
  • Better virtualization support (must be focused on Oracle VM though)


Here is the surprise you might not have expected — you now need to install Oracle WebLogic Server 10.3.2 (Oracle Fusion Middleware 11g Release 1 Patch Set 1) as a pre-requisite to Oracle Grid Control installation. It doesn’t come pre-packaged with an application server as was the case with iAS in EM 10g release. I fully expected that EM 11g will run on WebLogic Server infrastructure but installation might a bit less straightforward for some of you.

But worry not — managing WebLogic Server is generally easier than managing iAS. Plus you all wanted to learn Oracle Fusion Middleware. Didn’t you?

So my conclusion is that EM11g is not a breakthrough in terms of re-design, navigation in the user interface and functionality. However, it has the new engine under the hood which I’m personally very excited about.

Categories: Alex @ Pythian Tags:

When Was Your Last Disaster Recovery Test?

April 7th, 2010 Alex Gorbachev No comments

If you answer anything else but something like “last month and every month before that”, then you are probably in troubles. Learn from Wikipedia’s Data Center Overheating.

It doesn’t mean that they didn’t regularly test their disaster recovery process. Maybe they did but the failover mechanism was broken after the last test.

A regular DR procedure validation is designed to minimize the risk of a broken process to go unnoticed. If the failure is detected during a regular switchover process, you are prepared to handle it way better (or potentially just leave services on the currently primary site) than during emergency failover when you get to the “Oh shit!” moment under the tremendous pressure to get services back.

The business has to find the balance between switchover frequency, the risk they are prepared to take and change management processes (the more cowboy-style you operate, the higher the risk and more often you need to test your DR scenario).

Most of our customer leveraging regular DR switchovers, do it every month or two and run for a while on either site as a primary. This is ideal scenario and I wish everyone can adhere to the similar business continuity strategies. Do you?

Categories: Alex @ Pythian Tags:

Nobody Killed OpenSolaris ? Stop the FUD!

April 7th, 2010 Alex Gorbachev No comments

I’m tired reading all over the internet — Oracle taking back OpenSolaris, Open Solaris May Die
?
, Solaris Is Dead, Save Open Solaris, Oracle taking back OpenSolaris.

I’m so sick of it!

I see that some don’t even know the difference between OpenSolaris and commercial Oracle Solaris (former Sun Solaris 10)!

Wake up people! Oracle did make commercial Solaris 10… eh… commercial, that is. They (well, Sun but Oracle paid big $$ for it) have invested lots into Solaris IP and they have full rights to actually charge money for it and they probably should. Struggling Sun made commercial Solaris free to use in desperation to maintain their rapidly shrinking market share. Oracle doesn’t need that – they are not desperate. You’ve made the right decision Oracle – keep Solaris commercial and use these funds to continue developing this great operating system (or whatever makes business sense).

Having said all this, what does it have to do with OpenSolaris? Nothing!

OpenSolaris was and is free. I have just quickly skimmed through the licensing (Binary License and CDDL) and there are no caveats that I can see like 90 days limitation or whatsoever. All the OpenSolaris goodies are still available to everyone for free.

Whining starts that Oracle will not contribute to OpenSolaris anymore. Come on people! Couldn’t you just appreciate what’s been done already and what a great product OpenSolaris is? If you forgot what open-source is about, it’s about community contributions and not about a single vendor giving away it’s IP so that everyone around can scream how great open-source movement is what great products it produces. If one vendor pulls out and community can’t sustain product development, then the product cannot live its normal open-source life.

Get over it! Want a high quality software with great support without any fuss? Pay $$. Want a high quality free open-source software? Make it happen!

Categories: Alex @ Pythian Tags:

Meet the First Oracle ACE Director in MySQL ? Sheeri Cabral

April 6th, 2010 Alex Gorbachev No comments

Sheeri Cabral - The First Oracle ACE Director in MySQL I’m excited to share the news that Oracle ACE program has been extended to cover MySQL community now and Pythian’s Sheeri Cabral has become the very first Oracle ACE Director in MySQL expertize area. It’s a special privilege for me to blog about it because I had a pleasure to nominate Sheeri in the first place. Being an Oracle ACE Director myself and knowing what’s involved, I believed that if Oracle ACE program is extended to MySQL, Sheeri must be the number one candidate.

It’s impossible to overestimate Sheeri’s role in the MySQL community — her advocacy for the technology and commitment to building and supporting the community. She’s been presenting about MySQL countless number of times and been actively involved in several community projects and organizations. She blogs frequently and with passion. It’s no surprised that Sheeri has been awarded MySQL’s Community Advocate, Communicator and Facilitator of the Year for two years in a row (2007 and 2008).

Sheeri has already been engaged in Oracle ACE activities when she co-presented with Oracle ACE Director Dan Norris during Oracle Open World 2008 — So You Want to Be and Oracle ACE?. Back then, Sheeri shared what it means to be a truly community advocate and contributor, regardless which technology you are using. Now that Oracle and MySQL database technologies live under the single umbrella, I’m very excited to see that communities are uniting as well.

I know that there are more excellent candidates for Oracle ACE program so if you think of someone, read the guidelines and nominate that person. Oracle ACE program does look for more MySQL participants. Remember that Oracle ACE program is about recognizing community contributors and advocates — see Oracle ACE program FAQ for more details on how nominations work.

Sheeri, congratulations and welcome to the Oracle ACE program.

Categories: Alex @ Pythian Tags:

Hotsos Symposium 2010 ? Battle Against Any Guess Is Won

March 9th, 2010 Alex Gorbachev No comments

Video fragments of my session posted at the end — read on.

I arrived at Omni Mandalay Hotel on Sunday evening with Dan Norris. I was flying through Chicago and it turned out that Dan was on the same flight and only few rows behind me. Small world.

Preparations for the conference were very chaotic on my part and, of course, I didn’t have either of my presentations ready. I was very stressed and getting sick as well — it looked like a complete disaster waiting to happen. I’d like to say that I was feeling like Doug Burns as he often managed to get sick just before a conference. Of course, I worked on my slides for the last few days as well as on the flight and presentation was slowly getting there but boy was I tired!

I quickly said hello to the crowd in the bar on the way to my room and rushed away to do some more damage to my slides. And then I had a brilliant idea — I could still see one of my best mates and do something good about my presentation! I asked Doug if he was interested in the preview (he probably wasn’t interested but he couldn’t say it to me) especially that my session wasn’t on his original agenda. Of course, that would mean that he had to leave a bunch of other good friends and spend some time tete-a-tete. Knowing Doug, this is some of the hardest thing to ask from him but it shows how good of a friend he is! (Plus, everyone thinks that he is anti-social anyway. Shhhh!)

Doug has made my day — while he provided lots of ideas and feedback on few things that I was lucking, he generally approved the idea and confirmed that it wasn’t totally crazy. I guess that was all I needed back then and Doug knew how nervous I was about it. (Thanks mate!)

So I called Sunday a day very early and went to bed before midnight. I really needed some sleep. Woken up by the alarm at 5AM (I woke up few times during the night looking at the clock — making sure I didn’t sleep through) and slides were ready just before lunch. I even managed to do a test run and it took 65 minutes — a wee bit too long for one hour session. But it was good test and I knew I had to be just a bit more concise in few parts.

Mi morning was very productive. Unfortunately, I missed the opening keynote from Tom Kyte. Such a pity! If what Doug wrote is true, Tom was talking about the mistakes we make *because* of our experience and our assumptions. This was exactly one of the points I was making in my Battle Against Any Guess — experience is danger. I wish I could see Tom’s example. Oh well, maybe another time.

I managed to attend half of the Richard Foote’s session on indexes but my mind was far away — with my own slides. Though, I did manage to focus on bitmap indexes part and the myth of bitmap indexes not working well for columns with high cardinality. Very interesting conclusions. I’m still wondering how much overhead updates will do to such bitmap index.

After lunch, it was my turn. I ordered few copies of the latest OakTable book — Expert Oracle Practices: Oracle Database Administration from the Oak Table — that I co-authored with the bunch of other Oakies. I contributed chapter 1 in the book titled just like my presentation — Battle Against Any Guess. The plan was to give a copy away during the presentation and do a draw for another one at the end of the session. I was so nervous that I forgot about it until the end of the session so I just did a draw for two copies. The lucky winners were Lynn-Georgia Tesch and Surendra Anchula. Congratulations! For the rest of you who left the contact details — please stay tuned and we’ll organize few things online.

Now the main topic of this post — my presentation. What’s unusual about this session is that it’s not some technical stuff that I usually do but a more conceptual and motivational talk. Could I pull it off? Well, I think it went fairly well in general even though I did identify few rough places and my lack of English language mastering. Might need to work a little bit more on the flow of the presentation.

We had quite a few good laughs. Later, people in the next hall were asking about it and Dan was making the jokes on the stage so it must have been loud. Anyway, I think nobody fell asleep and I managed to get people thinking about the topic. I received many “thank you” notes yesterday and compliments on a good session so by the end of the day I was more and more pleased. Thanks everyone for attending and especially big thanks to those of you who brought to my attention examples from their own battles. If you have more to discuss — contact me by email (my last name) {at} pythian.com.

Thanks to Marco Gralike for recording some fragments and sharing them. I think he has more to come.

This is the introductory couple minutes. You can definitely notice how nervous I am starting on the stage:

Solving the wrong problem example:

That’s all for now. Stay tuned — more to come.

Categories: Alex @ Pythian Tags:

Oracle 11gR2 Grid Infrastructure ? Memory Footprint

March 6th, 2010 Alex Gorbachev No comments

DIMMsUpgrading to 11g Release Grid Infrastructure? You probably want to read on…

Oracle 11g Release 2 Grid Infrastructure has been dramatically redesigned compare to 10g and 11gR1 Clusterware. Coming with impressive set of new features, Grid Infrastructure also uses much more memory. While RAM is rather inexpensive these days, it does pose an inconvenience in some scenarios. Particularly, for sand-box type installations that I use all the time for my own tests and demonstrations. For production upgrades, you need to be aware of and plan for increased memory usage.

I’ve been able to easily run a 2 node 10g RAC cluster on my MacBook with 4 GB of RAM allocating less than 1 GB of RAM to each virtual machine. That was even enough for a mini database instance with a very small memory footprint. Oracle 11g Release 1 was pretty much the same except maybe the database instance itself required a bit more memory but one node could still fit within 1 GB of RAM.

In 11gR2, bare-bone Grid Infrastructure processes alone consume 10+ times more memory (11.2.0.1 on 32 bit Linux to be precise):

[gorby@cheese1 ~]$ ps -eo pid,%mem,rss,user,cmd --sort=rsz --cols 100 | grep -e '^ *PID' -e grid -e ohasd | grep -v grep
  PID %MEM   RSS USER     CMD
 3614  0.0  1080 root     /bin/sh /etc/init.d/init.ohasd run
 4322  0.2  3368 oracle   /nfs/11.2.0/grid/opmn/bin/ons -d
 4323  0.4  5164 oracle   /nfs/11.2.0/grid/opmn/bin/ons -d
 4117  0.6  7860 root     /nfs/11.2.0/grid/bin/oclskd.bin
 3830  0.6  8788 oracle   /nfs/11.2.0/grid/bin/gipcd.bin
 5048  0.7  8992 oracle   /nfs/11.2.0/grid/bin/tnslsnr LISTENER -inherit
 4167  0.7 10052 oracle   /nfs/11.2.0/grid/bin/evmlogger.bin -o /nfs/11.2.0/grid/evm/log/evmlogger.i
 3969  0.9 12412 oracle   /nfs/11.2.0/grid/bin/diskmon.bin -d -f
 3860  0.9 12736 oracle   /nfs/11.2.0/grid/bin/mdnsd.bin
 4067  1.1 14648 root     /nfs/11.2.0/grid/bin/octssd.bin reboot
 5016  1.2 15860 root     /nfs/11.2.0/grid/bin/orarootagent.bin
 3956  1.3 16964 root     /nfs/11.2.0/grid/bin/orarootagent.bin
 4292  1.4 17984 oracle   /nfs/11.2.0/grid/bin/oraagent.bin
 3874  1.5 20112 oracle   /nfs/11.2.0/grid/bin/gpnpd.bin
 3817  1.5 20300 oracle   /nfs/11.2.0/grid/bin/oraagent.bin
 4083  1.8 23700 oracle   /nfs/11.2.0/grid/bin/evmd.bin
 4372  2.4 31548 oracle   /nfs/11.2.0/grid/jdk/jre//bin/java -Doracle.supercluster.cluster.server=eo
 3564  3.2 41532 root     /nfs/11.2.0/grid/bin/ohasd.bin reboot
 4081  3.5 44932 root     /nfs/11.2.0/grid/bin/crsd.bin reboot
 3906 18.6 239428 root    /nfs/11.2.0/grid/bin/cssdagent
 3887 18.6 239444 root    /nfs/11.2.0/grid/bin/cssdmonitor
 3924 20.1 258564 oracle  /nfs/11.2.0/grid/bin/ocssd.bin

The second column above gives you amount of resident memory in KB for processes related to Grid Infrastructure. As you can cleanly see, processes of CSS components consume well above 700MB! In total we can account for 1 GB. (those calculations are flawed — see below)

Compare that with 10g (10.2.0.3 on 32 bit Linux) — bare-bone Clusterware processes consume only 60MB:

[oracle@lh1 ~]$ ps -eo pid,%mem,rss,user,cmd --sort=rsz --cols 100 | grep -e '^ *PID' -e nfs -e crs -e css -e evm | grep -v grep
  PID %MEM  RSS USER     CMD
 6524  0.0  348 oracle   /nfs1/oracle/oracle/product/10.2.0/crs/opmn/bin/ons -d
 4892  0.1  992 oracle   /bin/sh -c cd /nfs1/oracle/oracle/product/10.2.0/crs/log/lh1/cssd/oclsomon;
 3262  0.1 1072 root     /bin/sh /etc/init.d/init.evmd run
 3506  0.1 1100 root     /bin/sh /etc/init.d/init.crsd run
 4575  0.1 1116 root     /bin/su -l oracle -c sh -c 'ulimit -c unlimited; cd /nfs1/oracle/oracle/pro
 4890  0.1 1120 root     /bin/su -l oracle -c /bin/sh -c 'cd /nfs1/oracle/oracle/product/10.2.0/crs/
 4664  0.1 1180 root     /bin/sh /etc/init.d/init.cssd oclsomon
 3263  0.1 1188 root     /bin/sh /etc/init.d/init.cssd fatal
 4677  0.1 1188 root     /bin/sh /etc/init.d/init.cssd daemon
 6525  0.5 4792 oracle   /nfs1/oracle/oracle/product/10.2.0/crs/opmn/bin/ons -d
 4922  0.6 5224 oracle   /nfs1/oracle/oracle/product/10.2.0/crs/bin/oclsomon.bin
 5915  0.7 6280 oracle   /nfs1/oracle/oracle/product/10.2.0/crs/bin/evmlogger.bin -o /nfs1/oracle/or
 4576  1.1 9312 oracle   /nfs1/oracle/oracle/product/10.2.0/crs/bin/evmd.bin
 5018  1.1 9428 oracle   /nfs1/oracle/oracle/product/10.2.0/crs/bin/ocssd.bin
 4606  2.0 16712 root    /nfs1/oracle/oracle/product/10.2.0/crs/bin/crsd.bin reboot

The memory usage above is a bit overstated. There are some shared memory accounted multiple times. I could use Smaps interface to get better per process statistics. For example, you could see that 3 of the “top offenders” (CSS binaries) have about 40MB of shared libraries each:

[root@cheese1 ~]# ./smaps.pl 3924 | head
VMSIZE:     258576 kb
RSS:       258564 kb total
            39164 kb shared
             5180 kb private clean
           214220 kb private dirty
PRIVATE MAPPINGS
     vmsize   rss clean   rss dirty  file
   15052 kb        0 kb    15052 kb
   12016 kb        0 kb    12016 kb
   11184 kb        0 kb    11184 kb
[root@cheese1 ~]# ./smaps.pl 3887 | head
VMSIZE:     239456 kb
RSS:       239444 kb total
            40096 kb shared
             6200 kb private clean
           193148 kb private dirty
PRIVATE MAPPINGS
     vmsize   rss clean   rss dirty  file
   14624 kb        0 kb    14624 kb
   10240 kb        0 kb    10240 kb
   10240 kb        0 kb    10240 kb
[root@cheese1 ~]# ./smaps.pl 3906 | head
VMSIZE:     239440 kb
RSS:       239428 kb total
            40096 kb shared
             6200 kb private clean
           193132 kb private dirty
PRIVATE MAPPINGS
     vmsize   rss clean   rss dirty  file
   14624 kb        0 kb    14624 kb
   10240 kb        0 kb    10240 kb
   10240 kb        0 kb    10240 kb
[root@cheese1 ~]#

One way to get a practical number is to check system memory usage with and without Grid Infrastructure running — the difference is about 750MB (see the “free” column of the second row).

[root@cheese1 ~]# free
             total       used       free     shared    buffers     cached
Mem:       1283040    1131584     151456          0      18504     295668
-/+ buffers/cache:     817412     465628
Swap:       655328         76     655252
[root@cheese1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'cheese1'
...
...
CRS-4133: Oracle High Availability Services has been stopped.
[root@cheese1 ~]# free
             total       used       free     shared    buffers     cached
Mem:       1283040     397144     885896          0      18640     316632
-/+ buffers/cache:      61872    1221168
Swap:       655328         76     655252
[root@cheese1 ~]# ps -eo pid,%mem,rss,user,cmd --sort=rsz --cols 100 | grep -e '^ *PID' -e grid -e ohasd | grep -v grep
  PID %MEM   RSS USER     CMD
 3614  0.0  1084 root     /bin/sh /etc/init.d/init.ohasd run

I don’t have 11gR1 test cluster handy so I can’t check 100% but Oracle 11g Release 1 Clusterware is not much different from 10g so memory usage must be similar.

The lesson is that if you are upgrading your Oracle RAC Cluster to 11gR2 from 10g or 11gR1, then you have to account for additional 700MB memory for Grid Infrastructure alone on each node. Note that, this doesn’t take into account higher memory usage of the database instances themselves.

Categories: Alex @ Pythian Tags: