Archive for the ‘servers’ Category

Filed Under (servers, virtualization) by Dave Mast on May-2-2008

Here’s how you know your week is going to contain 50% less sleep … one of your co-workers walks up to you and asks "Hey, did you know there’s a loud beeping noise coming from your server room?"

I should have gone home right then and gotten my jammies and pillow to prepare for the week.

As it turns out, the beeping was exactly what I thought it would be.  Another hard drive had bitten the dust in our would-be file server, PowerEdge 1400 that up to this point had been rock-solid for us.  With dual 1GHz P3 CPUs and 2GB of RAM, it would have made a great file server.  WOULD HAVE … except for that it had somehow managed to eat 3 hard drives before I could even put it into production.  However, this was the last straw.  3 dead hard drives in 2 months is enough to convince me that I don’t want this machine in the lineup anymore.

The plan this week was supposed to be simple.  Copy our file server and EMS Lite data to the PE1400 and bring it online.  After that, install a second array of disks in the PE1800 (where the file server once was), install Ubuntu on it, and begin using it as our VM host.  Why Ubuntu?  Because I’m budget-tight at the moment, and also because the current install of Win 2k3 Standard wasn’t utilizing all 8GB of RAM as well as the 64-bit CPUs that the 1800 now has.

This plan seemed pretty airtight, except I didn’t plan on losing server hardware just before making this transition.  However, the migration needed to proceed, and so I loaded a working RAID5 array (controller and drives) into a newer desktop box, threw Server 2k3 on it, and began the Robocopying all over again.  By now it’s Tuesday, and tonight I’m scheduled to take all the servers down and transition our PE1800 over to Ubuntu so it can be a big bad 64-bit VM host.  However, in the midst of copying file server data, I forgot about EMS Lite.

EMS our current calendaring software, and the only SQL (MSDE) database we have on-site that gets any end-user interaction.  It figures that this tiny-but-critical program would hold things up for about 24 hours while I learn how to successfully migrate the database from one instance to another without breaking things.  HUGE thanks to Jeremy Marx for taking time out of his day to help me through this.  (That’s the power of the CITRT community!)

That 24-hour period was not wasted though … during that time I did some test runs with Ubuntu 64-bit and also got our new file server straightened out.  By 3:30pm Wednesday (yes, it’s Wednesday now) I had been up for about 30 hours, but I was very please just to have conquered the EMS data issue.  I fell asleep around 4pm Wednesday and didn’t wake up until about 8am Thursday morning.

It’s now Thursday night (almost Friday morning) now, and I’m on the last leg of this transition.  All of our VMs are being copied over to another server, and once that’s done, I’ll take the old array out of our PE1800 and install a new array.  That new array will have Kubuntu 8.04 on it, and will server as our new 64-bit VM host.

I’m already starting to wear down a bit (I wouldn’t ever make it as a Bering Sea crabber), but I’m pumped to see this project finally coming to a close.  It took longer than I thought and it cost a few hours of sleep, but I feel like the benefit will be worth the trouble.



Filed Under (active directory, servers, troubleshooting) by Dave Mast on November-23-2007

Last week before I went on a mini vacation, I set up a new user account for a staffer.  I issued them their username and password, got them connected to Exchange, and everything seemed to be just hunky-dorey.

This past Tuesday I got a call from my boss.  Apparently the user wasn’t able to receive email, although they could send it just fine.  I checked out the normal stuff… permissions, time sync across the servers, the usual.  Not knowing what to do yet, I went ahead and backed up the user’s Exchange data and deleted the mailbox with the intent of starting over.  After re-creating the mailbox, I was quite perplexed to find that I could no longer even connect to Exchange with this user’s account.

I’m not sure what let me do to it, but I remoted to both DCs to take a look at their AD structure side-by-side.  Imagine my surprise when I discovered that there were user accounts missing from our #2 DC.  A further look into the event logs shows that replication between the 2 DCs has been stopped due to a bad computer account.  Because of this, not only is Active Directory broken on this DC, but DNS services (which are relying on AD) are broken as well.  Digging further into the event log, I find that the system is getting hardware errors while attempting to write to the hard disk, which is what corrupted the computer account responsible for shutting down AD replication.

Fast-forward a little to Black Friday.  I don’t shop on Black Friday.  Ever.  However, I am looking over the hard disk that our 2nd DC runs off of.  A disk scan is turning up massive amounts of physical errors on this drive, and although I’d like to try ghosting the system onto a new disk, there’s a good chance that I am going to be building “ripping out” this system from the domain with a little help from the MS Knowledge Base.  It’s a little frustrating to be repairing this domain with a rebuild happening at the same time, but I don’t want to take any chances

We’ll see how this works out.



Filed Under (active directory, domain rebuild, servers, work night) by Dave Mast on October-26-2007

I didn’t have a whole lot of time to spend on the domain rebuild this week, but I did get it started as of Tuesday night.  Currently it’s in the form of a Server 2003 VM.  AD was installed, and the OU structure has been replicated from our existing domain.

I used the Group Policy Management Tool to make printouts of our active GPOs.  It would be nice if the tool also allowed you to print out a list of OUs that link to each object as well, but a little bit of handwriting never hurt anyone.  I also started a mind map of everything I can think of that will need to happen for this domain migration to go smoothly.

Next week I’ll concentrate on recreating the GPOs in the new domain, as well as moving over any login scripts.  I’m also going to continue mind mapping so I can get my mind around the magnitude of this project… I’m nowhere near done. :-)

I’m just glad I’m not on a hard timeline.



Filed Under (active directory, domain rebuild, infrastructure, servers) by Dave Mast on October-16-2007

One of the projects I’ve been wanting to take on over the past year has been a rename/rebuild of our domain.  We’re still carrying our old domain name around from our previous name and location.  This hasn’t been a high-priority matter, but I do want to get it done.

After some research on how to go about it properly, I feel like I’m ready to take this on.  There’s a lot to think about and plan for, though.  The fact that I’m now able to to testing and pre-production building in VMware is a HUGE benefit, and since I’m not really on a time limit, I’m going to be able to work without being under the gun.

I’ll be posting on this more as the project take shape more.



Filed Under (cool tools, exchange, servers, storage) by Dave Mast on June-25-2007

It’s about 2AM right now in Northeast Ohio, and earlier this evening I started the task of taking all of our virtual servers down so that I could defrag the host machines that they live on.

For taking care of large virtual disk files, I use Contig, which is part of the Windows Sysinternals software lineup.  Basically Contig is a tool for defragmenting large files.  It can take wildcards and even recurse subdirectories if you want it to.  This makes it pretty simple to go to the directory where your virtual machines are kept and defrag all of your .vmdk (virtual disk) files in one sweep.

I was nervous for a good while this evening, because the Contig utility was taking an EXTRA long time to defragment a piece of the virtual hard drive that is part of our Exchange server, and perfmon was showing little-to-no disk activity at the same time.  About halfway through the second paragraph, however, the virtual disk finally finished up and Contig continued on to the next file.  WHEW!

Looking at our file server’s disk usage, I am amazed at how our storage needs have skyrocketed.  When I started in 2005, our dinky little file server had a 30something-GB SCSI drive on it, and it was enough to hold everyone’s information.  Since then we’ve moved to a 425GB RAID array, and we’ve managed to fill over 80% of that space.  Safe to say we’ll be looking for another storage solution sooner than later.



Filed Under (power, servers) by Dave Mast on June-21-2007

This past Saturday night I was able to get a script working on our servers that would take them all down gracefully in the event of a power outage.  This was in response to an previous blunder on my part that had allowed every last server in our building to go down hard (although nothing was damaged in the event).

Fast forward to this Tuesday (earlier this week).  Our area got rocked by a couple of huge storms going through the area.  About 3/4 of the way through the first storm, the lights in the office flickered, dimmed, and finally went out.  I didn’t think the UPS script would get tested this quickly.

When I flipped open the KVM on our server rack, I was pleased to see that a countdown was already running on each machine.  The UPS software had run its script, and now each server was about 60 seconds away from shutting down automatically.  When it was all said and done, every last server shut down on its own, with plenty of battery life to spare.

The only tweak I ended up making to this process so far involved our physical domain controller (we have two, and the other one is a VM).  It resides in one of the IDFs and it shut down too quickly in response to the power outage.  As a result, after the VM-based DC went down, the remaining servers had no DC to talk to, and thus took a longer time to shut down cleanly.  All-in-all though, the real-life test proved successful, and as a result.  I have one more reason to sleep better at night.



Filed Under (power, servers) by Dave Mast on June-16-2007

After tinkering around with PowerAlert a little more tonight (yeah, I know, it’s Saturday), I stumbled onto some interesting things about the program, and ultimately got it working the way I want.

First, as far as executing command scripts:  You need to make sure that the file, even if it’s a CMD or BAT file, has its permissions set properly.  PowerAlert attempts to run the scripts as SYSTEM (<LOCALMACHINE>\SYSTEM), so not only did I need to set permissions on the script files to reflect that, I also had to set permissions on psshutdown.exe as well.

After doing that, things worked like a charm.  Once the UPS lost power, the monitoring server starts a 2-minute timer.  At the end of that 2 minutes, PsShutdown starts, reads a text file containing the names of the servers to shut down, and sends the shutdown command to each one.  Plus, if the power would happen to kick back on within those 60 seconds, a second script is run that calls PsShutdown to cancel the previous shutdown commands.

Since our firewalls (we run pfSense) are running on PCs as well, one of the next steps will be to find a utility that can automatically SSH-telnet to each firewall and shut it down as well.

There’s still a couple minor tweaks that I want to do, but they will have to wait until the UPS battery is back up to 100%.  For now though, I’m pretty happy that PowerAlert is working like it’s supposed to.  AND, I feel a lot better knowing that the servers are going to take themselves down gracefully next time we lose power.



Filed Under (power, servers) by Dave Mast on June-16-2007

It seems like things I’ve put at the back burner since the move have started to move quickly towards the edge of the stove on their own…

During my drive home from Granger on Wedesday evening, my phone started receiving text messages stating that either our Exchange Server had gone down, or our internet connection had been totally hosed.  Minutes passed, then an hour, and I didn’t receive any notification that things had come back up.  Finally, about 90 minutes later, my Q started buzzing again to tell me that connectivity had been restored.  Relieved that things appeared to be back to normal, I continued my trek back to Dover.

The next day came in to work and found no signs that anything was wrong.  Servers were working properly, my phone didn’t have any voicemail on it, and it seemed that our infrastructure HADN’T turned into a flaming ball of molten aluminum while I was gone.  However, when I logged into our backup server to work on our CommVault configuration, I got hit up with a prompt asking me to explain an unexpected shutdown that took place on Wednesday night.

As I logged onto each server and looked at the event logs and did some asking around, it became pretty apparent what had happened:  The notices that I received on the way home from GCC were actually a result of our server rack’s UPS running out of juice after a lengthy power outage.  Our servers, every last one, had gone down.  Hard.  Ouch.

How EVERY server managed to start back up with no errors is beyond me…WAY beyond me.  But needless to say, the once-back-burner task of automating a shutdown process for our servers has come straight to the front. 

I’ve spent the past 2 days dorking around with Tripp-Lite’s PowerAlert software.  My plan is to have it execute a PsTools command called PsShutdown, which has the ability to shut down any windows machine on your network remotely.  So far though, I’ve yet to see any evidence of PowerAlert even trying to run the script.  I’m going to mess with it a little more and then give Tripp-Lite’s tech support a call.  In any event, I found myself thanking God yesterday that we were able to get off so easy from my mistake of doing IT “on the edge” like that.  Things could have been much MUCH different on Thursday, in which case I might still be in the server room right now.



Filed Under (cool stuff, infrastructure, servers) by Dave Mast on May-30-2007

I just came across this tonight.  I wish this product would have been available when we were in our previous building!

Here’s a link to the manufacturer’s product page.




FireStats iconPowered by FireStats