Archive for the ‘storage’ Category

Hard Disk ICU
Originally uploaded by npccdave
This is what you call a “last ditch effort…”
I got a call from our Music Director late Friday night. He said that he had big problems booting his computer, and everything he described pointed to file corruption, which in this case, was a sure sign of disk failure.
So I’ve got the offending member hooked up to my desktop PC right now, pulling everything I can off of it with Stellar Phoenix, and hoping for the best.
|
It’s been awhile since I talked about this, as things have been moving a little slow. However, over the past week we have made the final steps in getting out video data protected in a manner that helps me sleep better at night.
About a month ago, we installed a PC and drive chassis in our editing room. This system has 2 2.8TB RAID5 arrays (with 2 additional hot spares drives) that will do nothing but hold video data, SFX, production music, and final cut project files. We’ve been slowly moving old projects and new finished work onto these arrays over the past couple of weeks.
Yesterday morning, I installed a new gigabit switch in our IDF that serves the editing and control rooms. This switch also links back to our MDF, and so we now have a much faster link back to our servers.
Finally, earlier this evening, I was able to get our editing storage server talking to our Galaxy Express backup server using some VLAN voodoo and an extra network card. Our network is mixed 100/1000, so we opted to do our backing up on a separate network. Since our video server is the first server NOT to be in the rack, I opened up a new VLAN and routed it back to the switch in the server rack that connects the “backup network.”
Note to self: Time to update the Visio charts of our network layout. Yippee.
Seriously, I don’t know why I ever considered NOT buying managed switches. Sure I would save money, but I would be at a serious disadvantage when it came time to do stuff like this.
|
Last night’s main event involved attaching a new drive cabinet to our CommVault backup server to increase its storage capacity. This machine already had 1.5TB in RAID5 array mounted internally, and it was very close to being full, so we purchased a 12-bay cabinet and filled it with 750GB drives. I also ended up buying a RocketRaid 2440 controller and some multilane cables to make the connection between the PC and the drive cabinet.
The installation went very smooth except for a minor detail. The controller for the internal drives (a RocketRaid 1740) was causing me a little bit of grief when I tried to set up the new controller. I was able to get to the BIOS settings for the 1740, but not the 2240. I ended up pulling the 1740 out of the case temporarily and this allowed me to get to the BIOS settings for the new 2440 card. Once the new RAID array was set up on the 2440, I popped the 1740 card back in and both cards worked flawlessly, giving the backup server roughly 9TB of total storage space.
Here’s a shot of the drive cabinet next to our whitebox backup server. It will get rack-mounted sometime in the next couple of weeks, I just have to get some other equipment moved around in the server rack first.
Once we get a gigabit switch mounted in our #1 IDF, we’ll begin backing up our video content across the network to this server as well. Once all that content is in 2 places simultaneously, I’ll be breathing a lot easier.
|
I was sitting at my desk yesterday when I heard a noise come from outside my cubicle. I turned around just in time to see a FedEx driver leaving the scene with his dolly.
This is what he left behind.

This is actually going to be a storage upgrade for our backup server. The price was good (under $5k for the cabinet, 12 750gb drives, a controller and cables), and this unit will provide plenty of room to last us for awhile.
It goes in next week. Yes, I’m pumped. 
|
It’s about 2AM right now in Northeast Ohio, and earlier this evening I started the task of taking all of our virtual servers down so that I could defrag the host machines that they live on.
For taking care of large virtual disk files, I use Contig, which is part of the Windows Sysinternals software lineup. Basically Contig is a tool for defragmenting large files. It can take wildcards and even recurse subdirectories if you want it to. This makes it pretty simple to go to the directory where your virtual machines are kept and defrag all of your .vmdk (virtual disk) files in one sweep.
I was nervous for a good while this evening, because the Contig utility was taking an EXTRA long time to defragment a piece of the virtual hard drive that is part of our Exchange server, and perfmon was showing little-to-no disk activity at the same time. About halfway through the second paragraph, however, the virtual disk finally finished up and Contig continued on to the next file. WHEW!
Looking at our file server’s disk usage, I am amazed at how our storage needs have skyrocketed. When I started in 2005, our dinky little file server had a 30something-GB SCSI drive on it, and it was enough to hold everyone’s information. Since then we’ve moved to a 425GB RAID array, and we’ve managed to fill over 80% of that space. Safe to say we’ll be looking for another storage solution sooner than later.
|
My adventures with the Norco disk system have continued throughout the day, and after some more tinkering and some software-aided intervention, our storage array for the editor seems to be back up and running.
Apple’s Disk Utility program failed to do any sort of repair work on the array. After many repair attempts, all I could get were directory errors. After browsing around for some Mac disk repair utilities, I landed on one that showed promise: DiskWarriorby Alsoft.
I fired up DiskWarrior, pointed it to our now-unmounted disk array, and watched it go to work. It identified the array and cleaned up the directory structure, various file attributes, and various other things, and in about 5 minutes the array was back online. SWEET! This was WELL worth the price tag (about $80).
Thinking things were back to normal, I went ahead and shut everything down so I could install a UPS in front of the Mac and the resurrected RAID box. After plugging everything in, I turned on the Norco drives, powered the Mac up, and watched in utter astonishment….as the system failed to recognize five of the twelve installed drives.
At this point I was about at my limit with this RAID array. Not really knowing what else to do, I went ahead and shut both the disks and the Mac completely down, and then restarted the RAID after about a minute. I let the disk system run for about 2 minutes before powering up the Mac. My thought on this was that if the backplane or anything else in the RAID box has to run a POST or anything, I’m going to give it plenty of time to do so before restarting the computer.
I pushed the power button on the Mac, and lo and behold, I heard many drives starting to spin up all at one time. In a few seconds, all 12 drives had spun up, and the RAID seemed to be back on its feet. A quick look through the Finder revealed that all of our stuff on the array appeared to be intact.
So is this over? I really don’t know. I really haven’t felt like testing to see if I can replicate the issue by starting the Mac “too soon” after the RAID array is powered up. What I am going to do is exchange our eSATA controller card for one that has been tested and is more compatible with the unit. If that clears things up, then I’ll feel more confident about marking this down as a hardware issue. At this point, it makes sense that 2 different chipsets wouldn’t play well together. But at the same time, you’d think there would be more consistency to it.
Maybe the constant here is just sheer unreliability. Time will tell. Until then, I’ll be copying our FCP project files onto a safer hard drive.
|
Well, after playing around with the RAID array throughout the wee hours of the morning, it’s pretty apparent that something went seriously wrong. A massive power flux? A dicey hard drive? I really don’t know at this point. S.M.A.R.T. status on all the drives shows that they’re running just fine. So far 2 ideas are floating around in my head:
- The system suffered a massive power fluctuation that totally ticked off the Mac, or the RAID unit, or both.
- There is a major compatibility problem with the RAID unit and the HighPoint Technology card that I had to use in place of the bundled controller card. The only thing I can think of is that there might be 2 different chipsets between the RAID unit and the controller that don’t like each other at all.
Either way, I’m glad this problem decided to rear its head NOW instead of later, when we’ve got the drive populated with irreplaceable data.
Speaking of which, it just so happens that most of the files that were on that RAID5 array are still sitting in other areas! THAT is letting me breathe so much easier right now. However, there were quite a few Final Cut project files that were only on that array, which is still a bummer. I’m in the process of looking through data recovery software to see if there’s anything decent that I can try.
In the meantime, I’ve got a UPS set hook into that system immediately. Plus, if I can’t make the array work after another rebuild, I’m doing to set a separate PC up there and connect it to the MacPro. Since the PC has a PCI slot on it, I can use the Norco’s bundled controller card. I’m not exactly thrilled about putting a PC up there JUST to act as a bridge between the editing system and the RAID, but I may find that I have no choice.
More updates as the plot unfolds.
|
This morning started with a phone call from Jeff in the editing room. He was having trouble getting Final Cut to recognize our Sony MiniDV deck over FireWire.
I head upstairs and we begin tinkering around with it. After a few unsuccessful attempts, we decide to reboot the machine and start from scratch. No big deal, right?
Well, after rebooting, I get an error on the desktop saying that a disk is unreadable by OS X. I’m used to seeing these when I insert DVDs, but the only thing in the DVD drive was an audio disc. Then I noticed that the RAID5 array that we just put into production at the end of last week is missing from the desktop. Oh no…this is not good. This drive was working just fine with no signs of errors. Now I can see it, but I can’t mount it. Even a look at the RAID controller shows that there are no errors on the array.
As soon as I attempt to run a verify operation though, the event log shows that there are inconsistencies on the array, and the controller starts a rebuild. For the moment, I was slightly relieved. The array will rebuild and I’ll be able to remount the drive with no problems, right?
—
Well, here I am 12 hours later (rebuilding a 6.75TB array takes a LONG time!). I’ve been at home monitoring the rebuild, and it finished just a few minutes ago. After holding my breath and attempting to remount the array in OS X, I was a little surprised (and very disappointed) to get nothing but errors after attempting a verify and then a repair.
So now I begin the task of looking for a disk repair utility for the Mac. I’m not sure if it will do any good, but I’m rapidly running out of ideas.
It’s going to be a long night.
|
Last night was another work night at NewPointe. Here’s what went down.
–> Data/phone lines were run to the kitchen. This is funny, because I remember doing the wiring plan thinking When will we ever need phones and data in there? Well here we are, 6 months into the building and they’re being installed. It just goes to show you, never say “never” and don’t lay down too many absolutes when planning a network.
–> A 12-drive eSATA array made its way to our Final Cut desk. Since mid-March, we’ve been making it a point to archive video from our services from the weekend…at least one of them. The problem comes with the fact that our video system is HD (it looks great, but it’s a double-edged sword). Recording HD video will stretch your hard drives to the limit.  Compressed HD runs about 60 or so GB per hour (which is great compared to uncompressed HD, which is upwards of about 650GB per hour).
Enter the Norco DS-1220. It’s got some pretty good reviews on it and so far I’ve really liked what I’ve seen. We loaded ours up with 12 750GB drives; 10 for the array and 2 for hot spares. The only drawback I’ve seen with this board is that it comes with a PCI-X eSATA controller, and since our MacPros don’t have PCI-X on them, I ended up buying a HighPoint RR2314 eSATA controller to work with our Mac’s PCI-express. Other than that, I’m pretty happy with it thus far.
Next Work Night: It’s gonna be a cable-pulling extravaganza. We’re putting some much-needed data and phone drops into the control and editing rooms, and also taking care of some AV lines in the process.
|
|
|