Filed Under (Uncategorized) by Dave Mast on August-7-2008

I could have sworn I posted this earlier in the week,  but I don’t see it in my blog anywhere.  So, here it is for real.

Last week we were in what I could call a “very dangerous position.” … our RAID controller on our VM host began throwing PCI parity errors, which REALLY doesn’t go over well on our Linux host OS.  This past Saturday night, I was able to take the host machine down and make it right.  I shut our VMs down and copied them over to a separate disk array to keep them safe.  Once that was done I went ahead and swapped the High Point RR2320 card out for an Adaptec 3805.  The Adaptec card got very good reviews from other Newegg customers, and the price/features balance was a deal maker.

After getting the build started on the new RAID 50 array (w00t for background builds), I did a reinstall of Fedora 8, formatted the new array, and started bringing our VMs back.  After installing VMware server and the web UI, I took a deep breath and pressed the start button to begin bringing the VMs online.  I got more and more relieved as each VM came back to life, and I probably broke out in cheer once we were 100% back online.  All that was left after that was to install NRPE so that our Nagios box could monitor the health of the VM host.

Some thoughts from this project…

- This couldn’t have timed out any better… since there was no church on Sunday, I was able to take 75% of our systems down with minimal impact on our users.

- I was REALLY wrestling with the thought of trying ESXi out on the host instead of Fedora…I imagine it would have worked.  However, the ability to monitor the host’s hardware with Dell OpenManage is trump.  Although OM is not supported on Fedora, it will run if all the dependencies are satisfied.

- I’m glad we’ve got a coffee machine in the church…I wouldn’t have made it through the night otherwise.

- My initial plan was to install CentOS on the this server, but I had problems with GRUB hanging at boot time after the installation.  I went back to Fedora because I just didn’t have time to mess around.

- I’m very thankful that my fiance (or wife, depending on when you read this) understands what I do and accepts the fact that I’m passionate about this stuff.

- Our server room’s AC works very well… almost too well.  Again, hot coffee was a plus.



Post a comment
Name: 
Email: 
URL: 
Comments: 

FireStats iconPowered by FireStats