Filed Under (Uncategorized) by Dave Mast on July-28-2008

I’ve been looking for ways to make my life as an IT guy easier.  I know, what a fresh concept, right?

Most of the time I find myself either in the middle of a big project or fighting small fires disguised as helpdesk tickets.  The past 2-3 weeks has seen me doing little-to-no helpdesk work, and so I’ve taken the opportunity to look at various incarnations of monitoring and alerting tools.  As I’ve looked around, the traits for monitoring/alerting software match the model of most other software:  "Inexpensive, easy to configure, good features…chose two." That left me with two options…either spend money I don’t have, or get into things that I don’t know much about.  I decided to take a dive into the open source world to experiment with a couple monitoring tools, which meant brushing up on my *nix-fu and swimming through documentation and forums.

My first endeavor was setting up Cacti - a pretty well-known graphing tool that allows you to create rather sexy-looking RRD graphs of just about any metric you can grab with SNMP and WMI (you need to do some wrenching for WMI though).  In addition, Cacti has a very large user community and is very well-documented.  Chances are that someone’s made a graph or data template for whatever you’d like to monitor.  I’ve currently got Cacti up on a test box and am very pleased with it so far.  It took awhile for me to even partially get my head around how data flows through Cacti from creating the initial query to putting that data on a graph. Once I have it nailed down, I’ll explain it in terms that are easy for me (and hopefully you) to remember.  After Jess and I get back from Maine, I hope to get a fully-documented production box up and running.

Cacti is great for graphing, but unless I apply some add-ons (which I’m not ready for yet), I don’t get notified when thresholds are crossed or when a host or service drops out completely. I set up a Nagios box to fill in the gaps that Cacti leaves open.  Nagios is another open-source monitoring tool that allows you to keep close track of stats and services, and allows you to set thresholds for "warning" and "critical" statuses.  It’s very flexible and allows you to monitor just about any SNMP or WMI element that you want.  Depending on where you download it from (the VMware Appliances page has it available in a number of flavors), you can use a WebGUI to editing the config files.  Personally, I found that to be more complicated than just creating/editing config files by hand.  Again, once you understand the way the configurations work and everything "clicks" it becomes a piece of cake.

I’m hoping that these two solutions will help keep things running smoothly.  So far the price has been right.  Like I said, I had a LOT of brushing-up to do on my Linux (thank goodness for the ability to snapshot a VM while it’s hot. I’ve definitely used that feature while getting back into the swing of things), and I’ve definitely felt some frustration while getting my head around how Nagios and Cacti are configured.  It’s been worth it though, and I’m anxious to see how far I can go with these products.



Post a comment
Name: 
Email: 
URL: 
Comments: 

FireStats iconPowered by FireStats