Archive for the ‘raid’ Category

Filed Under (backup, raid, storage, video) by Dave Mast on August-22-2007

It’s been awhile since I talked about this, as things have been moving a little slow.  However, over the past week we have made the final steps in getting out video data protected in a manner that helps me sleep better at night.

About a month ago, we installed a PC and drive chassis in our editing room.  This system has 2 2.8TB RAID5 arrays (with 2 additional hot spares drives) that will do nothing but hold video data, SFX, production music, and final cut project files.  We’ve been slowly moving old projects and new finished work onto these arrays over the past couple of weeks.

Yesterday morning, I installed a new gigabit switch in our IDF that serves the editing and control rooms.  This switch also links back to our MDF, and so we now have a much faster link back to our servers.

Finally, earlier this evening, I was able to get our editing storage server talking to our Galaxy Express backup server using some VLAN voodoo and an extra network card.  Our network is mixed 100/1000, so we opted to do our backing up on a separate network.  Since our video server is the first server NOT to be in the rack, I opened up a new VLAN and routed it back to the switch in the server rack that connects the “backup network.”

Note to self:  Time to update the Visio charts of our network layout.  Yippee. ;-)

Seriously, I don’t know why I ever considered NOT buying managed switches.  Sure I would save money, but I would be at a serious disadvantage when it came time to do stuff like this.



Filed Under (backup, raid, storage) by Dave Mast on July-26-2007

I was sitting at my desk yesterday when I heard a noise come from outside my cubicle.  I turned around just in time to see a FedEx driver leaving the scene with his dolly.

This is what he left behind.

img032

This is actually going to be a storage upgrade for our backup server.  The price was good (under $5k for the cabinet, 12 750gb drives, a controller and cables), and this unit will provide plenty of room to last us for awhile.

It goes in next week.  Yes, I’m pumped. ;-)



Filed Under (macs, raid, storage, troubleshooting, video) by Dave Mast on June-6-2007

My adventures with the Norco disk system have continued throughout the day, and after some more tinkering and some software-aided intervention, our storage array for the editor seems to be back up and running.

Apple’s Disk Utility program failed to do any sort of repair work on the array.  After many repair attempts, all I could get were directory errors.  After browsing around for some Mac disk repair utilities, I landed on one that showed promise:  DiskWarriorby Alsoft.

I fired up DiskWarrior, pointed it to our now-unmounted disk array, and watched it go to work.  It identified the array and cleaned up the directory structure, various file attributes, and various other things, and in about 5 minutes the array was back online.  SWEET!  This was WELL worth the price tag (about $80).

Thinking things were back to normal, I went ahead and shut everything down so I could install a UPS in front of the Mac and the resurrected RAID box.  After plugging everything in, I turned on the Norco drives, powered the Mac up, and watched in utter astonishment….as the system failed to recognize five of the twelve installed drives.

At this point I was about at my limit with this RAID array.  Not really knowing what else to do, I went ahead and shut both the disks and the Mac completely down, and then restarted the RAID after about a minute.  I let the disk system run for about 2 minutes before powering up the Mac.  My thought on this was that if the backplane or anything else in the RAID box has to run a POST or anything, I’m going to give it plenty of time to do so before restarting the computer.

I pushed the power button on the Mac, and lo and behold, I heard many drives starting to spin up all at one time.  In a few seconds, all 12 drives had spun up, and the RAID seemed to be back on its feet.  A quick look through the Finder revealed that all of our stuff on the array appeared to be intact.

So is this over?  I really don’t know.  I really haven’t felt like testing to see if I can replicate the issue by starting the Mac “too soon” after the RAID array is powered up.  What I am going to do is exchange our eSATA controller card for one that has been tested and is more compatible with the unit.  If that clears things up, then I’ll feel more confident about marking this down as  a hardware issue.  At this point, it makes sense that 2 different chipsets wouldn’t play well together.  But at the same time, you’d think there would be more consistency to it.

Maybe the constant here is just sheer unreliability.  Time will tell.  Until then, I’ll be copying our FCP project files onto a safer hard drive.



Filed Under (macs, power, raid, storage, troubleshooting, video) by Dave Mast on June-6-2007

Well, after playing around with the RAID array throughout the wee hours of the morning, it’s pretty apparent that something went seriously wrong.  A massive power flux?  A dicey hard drive?  I really don’t know at this point.  S.M.A.R.T. status on all the drives shows that they’re running just fine.  So far 2 ideas are floating around in my head:

  1. The system suffered a massive power fluctuation that totally ticked off the Mac, or the RAID unit, or both.
  2. There is a major compatibility problem with the RAID unit and the HighPoint Technology card that I had to use in place of the bundled controller card.  The only thing I can think of is that there might be 2 different chipsets between the RAID unit and the controller that don’t like each other at all.

Either way, I’m glad this problem decided to rear its head NOW instead of later, when we’ve got the drive populated with irreplaceable data.

Speaking of which, it just so happens that most of the files that were on that RAID5 array are still sitting in other areas!  THAT is letting me breathe so much easier right now.  However, there were quite a few Final Cut project files that were only on that array, which is still a bummer.  I’m in the process of looking through data recovery software to see if there’s anything decent that I can try.

In the meantime, I’ve got a UPS set hook into that system immediately.  Plus, if I can’t make the array work after another rebuild, I’m doing to set a separate PC up there and connect it to the MacPro.  Since the PC has a PCI slot on it, I can use the Norco’s bundled controller card.  I’m not exactly thrilled about putting a PC up there JUST to act as a bridge between the editing system and the RAID, but I may find that I have no choice.

More updates as the plot unfolds.



Filed Under (raid, storage, troubleshooting) by Dave Mast on June-5-2007

This morning started with a phone call from Jeff in the editing room.  He was having trouble getting Final Cut to recognize our Sony MiniDV deck over FireWire.

I head upstairs and we begin tinkering around with it.  After a few unsuccessful attempts, we decide to reboot the machine and start from scratch.  No big deal, right?

Well, after rebooting, I get an error on the desktop saying that a disk is unreadable by OS X.  I’m used to seeing these when I insert DVDs, but the only thing in the DVD drive was an audio disc.  Then I noticed that the RAID5 array that we just put into production at the end of last week is missing from the desktop.  Oh no…this is not good.  This drive was working just fine with no signs of errors.  Now I can see it, but I can’t mount it.  Even a look at the RAID controller shows that there are no errors on the array.

As soon as I attempt to run a verify operation though, the event log shows that there are inconsistencies on the array, and the controller starts a rebuild.  For the moment, I was slightly relieved.  The array will rebuild and I’ll be able to remount the drive with no problems, right?

Well, here I am 12 hours later (rebuilding a 6.75TB array takes a LONG time!).  I’ve been at home monitoring the rebuild, and it finished just a few minutes ago.  After holding my breath and attempting to remount the array in OS X, I was a little surprised (and very disappointed) to get nothing but errors after attempting a verify and then a repair.

So now I begin the task of looking for a disk repair utility for the Mac.  I’m not sure if it will do any good, but I’m rapidly running out of ideas.

It’s going to be a long night.



Filed Under (raid) by Dave Mast on February-21-2007

Last week I came into the office to find that a drive on our domain controller had failed.  No big deal, right?  I overnighted 2 drives, and installed them both the next day; one as a replacement for the failed drive, and another as a hot spare.

I was hurried into work this morning by a voicemail from K.  She informed me that noone was able to log into the Exchange server.  I know I had everything running last night before I left, so this really caught me off-guard.

I rush into the office (fortunately I’m only about 5 miles from the church) and sit down to begin diagnosis.  My tests led me once again to the domain controller I had just worked on less than a week ago.  I tried to click on the start menu, and the whole system froze up.  Wow, the same symtoms that I experienced last week.  I winced, hit the reset button on the machine, and sure enough, the RAID controller’s BIOS shows a rebuild in progress.  Once the server was back up and running, I could see in the event log that the failure had taken place at 7:30

I was partially relieved to see that the drive I had just installed wasn’t the culprit, but was another one of the original drives for that array.  Wow…2 drives in under a week’s time.  These were both Maxtor DiamondMax hard drives that had failed on me, and they’ve been in use for less than a year.  The controller is a Promise Technology TX4310, and we’ve had no issues up to this point.

screenhunter_09-feb-21-1303.jpg

 

 

 

 

 

 

.

 screenhunter_10-feb-21-1312.jpg

 

 

 

 

 

 

 

Another weird point is that the rebuild actually started BEFORE the machine locked up.  The hot spare stepped in just like it was supposed to, but the controller shows 5 device timeouts before the system gave up on it.

Hopefully this is a drive issue and not a controller issue.  Nonetheless, the task of getting a second DC up and running just went up in priority by many notches.




FireStats iconPowered by FireStats