It seems like things I’ve put at the back burner since the move have started to move quickly towards the edge of the stove on their own…
During my drive home from Granger on Wedesday evening, my phone started receiving text messages stating that either our Exchange Server had gone down, or our internet connection had been totally hosed. Minutes passed, then an hour, and I didn’t receive any notification that things had come back up. Finally, about 90 minutes later, my Q started buzzing again to tell me that connectivity had been restored. Relieved that things appeared to be back to normal, I continued my trek back to Dover.
The next day came in to work and found no signs that anything was wrong. Servers were working properly, my phone didn’t have any voicemail on it, and it seemed that our infrastructure HADN’T turned into a flaming ball of molten aluminum while I was gone. However, when I logged into our backup server to work on our CommVault configuration, I got hit up with a prompt asking me to explain an unexpected shutdown that took place on Wednesday night.
As I logged onto each server and looked at the event logs and did some asking around, it became pretty apparent what had happened: The notices that I received on the way home from GCC were actually a result of our server rack’s UPS running out of juice after a lengthy power outage. Our servers, every last one, had gone down. Hard. Ouch.
How EVERY server managed to start back up with no errors is beyond me…WAY beyond me. But needless to say, the once-back-burner task of automating a shutdown process for our servers has come straight to the front.
I’ve spent the past 2 days dorking around with Tripp-Lite’s PowerAlert software. My plan is to have it execute a PsTools command called PsShutdown, which has the ability to shut down any windows machine on your network remotely. So far though, I’ve yet to see any evidence of PowerAlert even trying to run the script. I’m going to mess with it a little more and then give Tripp-Lite’s tech support a call. In any event, I found myself thanking God yesterday that we were able to get off so easy from my mistake of doing IT “on the edge” like that. Things could have been much MUCH different on Thursday, in which case I might still be in the server room right now.