Archive for the ‘backup’ Category

Filed Under (backup, storage) by Dave Mast on January-13-2008


Hard Disk ICU

Originally uploaded by npccdave

This is what you call a “last ditch effort…”

I got a call from our Music Director late Friday night. He said that he had big problems booting his computer, and everything he described pointed to file corruption, which in this case, was a sure sign of disk failure.

So I’ve got the offending member hooked up to my desktop PC right now, pulling everything I can off of it with Stellar Phoenix, and hoping for the best.



Filed Under (backup, networking, windows) by Dave Mast on September-4-2007

One of the best pieces of troubleshooting advice I’ve received was something I got from Ed during this ordeal with the video and backup servers:  Write down everything you know about the situation, [no matter how minute] and draw pictures if you can.  Use this knowledge to aid in your troubleshooting.

I noted each and every little detail about the situation that I could think of, and then started hammering away at the variables.  Shortly after lunch I came across an avenue that hadn’t yet been explored:  This whole time I had the same NIC assigned to the backup network, and I had treated the connection to the core network like the “problem area.”  What if that wasn’t the case?  What if the problem wasn’t with the connection to the core network, but the backup network connection instead?

So after thinking through this, I unplugged both NICs from the network, swapped IP address assignments, and plugged both cables back in to their cards according to their new assignments.  Lo and behold, the CommVault server opened a stream to our video box and the data started flowing freely. 

“And there was much rejoicing…”   -Monty Python and the Holy Grail

 

To say that some weight has been lifted off my shoulders would be an understatement.  I was seriously a couple “elimination steps” away from doing a rebuild of the video server.

So what was the problem?  Windows network services was trying to access the “core network” NIC ahead of the backup NIC.  This is why the backup server could initiate a connection from the backup server to start a backup request, but would not open a stream.  Apparently the stream is initiated on the client end, and the video server was trying to do this with the wrong NIC.  There’s a way you can avoid this using Windows settings that I didn’t even know existed until today.

1.) Go to Control Panel -> Network Connections.  Click on the Advanced menu and select Advanced Settings.

image

2.)  From here, you can set the order in which your network adapters are access by Windows.  Set the order however it suits your needs and click OK.

image

I felt a little silly for not exploring Windows a little more than I did before uncovering this, but nonetheless I’m very glad to have this new piece of knowledge in my arsenal.  You can bet I’m going to be spending some time later this afternoon checking the rest of the servers that are attached to CommVault and make sure their settings are correct.



Filed Under (backup, networking) by Dave Mast on September-4-2007

(Part 1 and Part 2)

Shortly after services were done on Sunday, I had some downtime while I was waiting on KidStuf to finish up, so I figured “what the heck…may as well try that new NIC out.”

I had gone to Staples the previous day and purchased a Linksys NIC for the video server so that we didn’t have 2 identical cards in the box.  I know it’s a long shot, but I’ve seen weird stuff happen with 2 like-branded printers trying to coexist on the same PC…maybe Intel NICs to the same thing?  I popped the new adapter into the box, fired it up, installed the drivers, set IP addresses, and…..

*insert chirping cricket SFX here*

…still nothing.  The system acts just as it previously did.  Backups still won’t run unless the video server is disconnected from our core network.

So by now you can probably guess that I’m getting frustrated.  I’m pretty good at understanding things, even if I just understand that I don’t know enough about the issue.  But everything I see from this experience tells me that our video and backup servers SHOULD BE TALKING.

  1. Both systems reply to pings that are sent on the proper network, and display the proper IP addresses.
  2. Both systems can run a successful traceroute, which again, shows the proper IP addresses.
  3. A netstat list shows actual ESTABLISHED connections between the 2 machines…there are also TIME_WAIT-status connections present also.
  4. Changing NICs out did not affect this issue, so my having 2 PRO 1000/GT cards in the video server is not a problem.
  5. Moving the backup network off of its VLAN and onto a separate piece of copper did not resolve the issue either, so switch/VLAN configurations are ruled out.

I had a chance to talk to Ed Buford about this issue (who by the way is not only a gentleman’s gentleman, he’s also a network admin’s gentleman).  His response was the same as mine… “this is just weird.”  It made me feel good to know that I hadn’t missed something stupid in my search.  Out of our conversation, I’ve got some more tricks to try, and since I’m not doing a work night this week, I’ll have a chance to try them first thing this morning:

  1. Reinstall the client software.  Is it possible that the CommVault client-side software looks at your installed NICs and binds to them?  I have changed adapters out since the client software was installed.  This won’t take long either.  May as well try it first.
  2. Disconnect the backup server from the core network.  This is definitely not my first choice, but I’m ready to try it if we need to.  All backup operations happen on a network that is physically separate from the core (except for the VLAN’ed connection coming from the video box), and we’ve got a KVM in the server rack for when I need to access the backup server.
  3. Permanently remove the video server from the core network.  Yet another thing I really don’t want to do, but when it comes down to it, the video server does serve ONLY our editing and playback system for storage purposes.  As long as the the Final Cut rigs and the backup server can talk to it, then it’s ultimately still doing its job.

Another thing I just thought of too… the video server is the ONLY backup client that isn’t a domain member.  Is this a big deal?  In my mind it shouldn’t be, but it’s another thing to explore…

I’ve had 4 cups of coffee and 2 waffles this morning and I’m feeling pretty psyched up.  I want to see this problem go away before the end of this week.

Stay tuned for the outcome.



Filed Under (backup, networking) by Dave Mast on September-1-2007

(Story starts here…)

12:00pm - I took a walk down to the IDF that our video server plugs into and opened the door.  WHOA.  Blinking lights everywhere.  According to the “blinkey light test method,” there appears to be something very wrong here.  BOTH switches in the IDF are completely lit up, and when I checked the second IDF, I found the same thing.  Is it related to the backup server?  Maybe.  It it still an issue?  Absolutely.

12:15pm - After cutting off the links between racks, it’s pretty apparent that our “talker” is connected to the first IDF.  I disconnected the link between the #1 and #2 switches, and bingo… loads of traffic is still pouring in from somewhere on the #2 switch.  Even more curious now, I decided to start disconnecting one link at a time until the switch showed that things were calming down.  As soon as I pulled the first cable, every light on the switch stopped flashing.  What luck.  I plugged it in again, and sure enough, the traffic began to pile up on the switch.  A quick trace showed that the culprit was a wireless access point in one of our venues.

12:40pm - Back at my desk, I attempted to run a backup from our video server with it connected to our primary LAN… still no luck though.  Now it appears we have 2 issues to deal with, though the access point is probably just a configuration issue (one can hope, anyway).  In any case, I’ll deal with it later.

1:20pm - I’ve now reduced the network to only the core switch and the gigabit switch that the video server is connected to.  Still no luck getting backups to run.  This leaves me a little perplexed.  WHAT is causing this miscommunication between the 2 servers.  Furthermore, HOW can the servers ping each other but still not connect?

1:50pm - A trip back to the IDF reveals some interesting stuff.  I am unable to get a link established between an unmanaged switch and the patch to the video server…I’m going to try to move the connection between the backup server and video server to a different physical network, but first I need to be able to get a link.  A look at one of the NICs in the video server reveals the problem:  It was set to a certain link speed instead of auto-negotiation.  While this doesn’t cure the root issue, it does allow me to link the 2 servers together without a VLAN.

2:20 - After putting the video server’s “backup NIC” on a completely different physical network, I am still unable to start a backup stream in Galaxy Express unless I disconnect the server from the primary network.  This pretty much kills my theory about the switch/VLAN configurations being an issue, and also rests the blame back on the video server itself.

So the question now is “what’s the problem with the server?”  It has 2 Intel Pro/1000 GT cards in it.  The IP settings are correct, and the MAC addresses aren’t even close to similar.  A trip to Staples might be in order today to pick up a non-Intel card to see how it functions.  I’ve run into issues before with printers where having two same-brand printers hooked to one machine caused some serious funk.  It this possible with NICs?  For the record, I HOPE that it’s that easy at this point.



Filed Under (backup, networking) by Dave Mast on September-1-2007

So last week I was pretty psyched up about having our video storage server connected to Galaxy Express…so much so that I forgot to monitor a backup job AFTER the server was plugged back in to the primary network.

You can imagine my dismay once I found out that backup jobs from video were failing. The reason? Galaxy Express could not establish a connection back to the video server.

The weird part? If I disconnect the cable connecting the server to the primary network (leaving only the connection VLAN’d to Galaxy Express), then things work just fine. As soon as I plug back into the primary network, Galaxy Express ceases to work, BUT I can still ping either server. VERY weird.

I’ve already tried swapping out NICs on the video server, but to no avail. This tells me that that it probably has to be something in the switching…but how? Everything I see tells me that the VLANs are configured correctly on each switch. And as I said before, pings are going through with no problem, so the connectivity seems to be there.

More on this as the plot unfolds. Possible film at 11.



Filed Under (backup, raid, storage, video) by Dave Mast on August-22-2007

It’s been awhile since I talked about this, as things have been moving a little slow.  However, over the past week we have made the final steps in getting out video data protected in a manner that helps me sleep better at night.

About a month ago, we installed a PC and drive chassis in our editing room.  This system has 2 2.8TB RAID5 arrays (with 2 additional hot spares drives) that will do nothing but hold video data, SFX, production music, and final cut project files.  We’ve been slowly moving old projects and new finished work onto these arrays over the past couple of weeks.

Yesterday morning, I installed a new gigabit switch in our IDF that serves the editing and control rooms.  This switch also links back to our MDF, and so we now have a much faster link back to our servers.

Finally, earlier this evening, I was able to get our editing storage server talking to our Galaxy Express backup server using some VLAN voodoo and an extra network card.  Our network is mixed 100/1000, so we opted to do our backing up on a separate network.  Since our video server is the first server NOT to be in the rack, I opened up a new VLAN and routed it back to the switch in the server rack that connects the “backup network.”

Note to self:  Time to update the Visio charts of our network layout.  Yippee. ;-)

Seriously, I don’t know why I ever considered NOT buying managed switches.  Sure I would save money, but I would be at a serious disadvantage when it came time to do stuff like this.



Filed Under (backup, storage, work night) by Dave Mast on August-1-2007

Last night’s main event involved attaching a new drive cabinet to our CommVault backup server to increase its storage capacity.  This machine already had 1.5TB in RAID5 array mounted internally, and it was very close to being full, so we purchased a 12-bay cabinet and filled it with 750GB drives.  I also ended up buying a RocketRaid 2440 controller and some multilane cables to make the connection between the PC and the drive cabinet.

The installation went very smooth except for a minor detail.  The controller for the internal drives (a RocketRaid 1740) was causing me a little bit of grief when I tried to set up the new controller.  I was able to get to the BIOS settings for the 1740, but not the 2240.  I ended up pulling the 1740 out of the case temporarily and this allowed me to get to the BIOS settings for the new 2440 card.  Once the new RAID array was set up on the 2440, I popped the 1740 card back in and both cards worked flawlessly, giving the backup server roughly 9TB of total storage space.

100_0413

Here’s a shot of the drive cabinet next to our whitebox backup server.  It will get rack-mounted sometime in the next couple of weeks, I just have to get some other equipment moved around in the server rack first.

Once we get a gigabit switch mounted in our #1 IDF, we’ll begin backing up our video content across the network to this server as well.  Once all that content is in 2 places simultaneously, I’ll be breathing a lot easier.



Filed Under (backup, raid, storage) by Dave Mast on July-26-2007

I was sitting at my desk yesterday when I heard a noise come from outside my cubicle.  I turned around just in time to see a FedEx driver leaving the scene with his dolly.

This is what he left behind.

img032

This is actually going to be a storage upgrade for our backup server.  The price was good (under $5k for the cabinet, 12 750gb drives, a controller and cables), and this unit will provide plenty of room to last us for awhile.

It goes in next week.  Yes, I’m pumped. ;-)



Filed Under (backup, networking) by Dave Mast on April-23-2007
[Disclaimer:  This is a long post, and primarily for my own future reference.]

One of the things I’ve wanted to do with our backup server is get it running backups on a network separate from our LAN.  This would allow us to run backups at any time without loading down the network.

I’m no network guru (not even close), so I had a little bit of trouble getting my mind around how this would work.  After talking with my good friend Ed, it made much more sense.  Basically what we needed to do was put a new NIC in each physical machine and give it an address that’s a different class from our LAN.  (our LAN is 10.80.*; we chose 172.16.* for the auxiliary).  After that, I would set the backup server and clients to listen on their 172.16.* addresses.  Thus, all backup traffic should get pushed through the new NICs and onto the auxiliary network, and life would be good.

After installing the new hardware, the next task is to go into the virtual network settings of your VMware host and bridge each of your physical NICs to a specific VMnet.  By default, the VMware server bridges your first adapter to VMnet0.  You’ll need to disable this in your host’s virtual network settings.

I bridged the physical NIC for the 10.80.* network to VMnet0 and then bridged the new NIC to VMnet3.

The next task is to add new virtual network adapter.  You’ll want to select a Custom network connection for the new adapter and point it directly at a specific VMnet (VMnet3 for me).

After you’re done, go into your VM, configure your new virtual NIC for the right network, and you’re good to go!

We use CommVault Galaxy Express for backups, and so changing the client to run on the alternate network is pretty easy.

 

I am by no means saying that “this is how you should back up your stuff,” but this is what’s working for us.  I’ve only been using Galaxy Express for a couple of weeks, but I am already a huge fan of the software.  It is extremely flexible and as far as I’m concerned, quite easy to use.  If you’re in the market for a backup solution, this is definitely a tool that you’ll want to consider.



Filed Under (backup, infrastructure, networking, work night) by Dave Mast on April-21-2007

On Tuesday night I took the opportunity to shut our servers down for the night, clean up the wiring in our server rack and MDF, and move the server rack to the other side of the room.  The moving of the server rack wasn’t super-crucial, but it did line the rack’s exhaust fans up with the air return in the room, so our air flow is now slightly more efficient.

I said on Wednesday morning that I would post some pics… really more for my benefit than anyone else’s.  So here they are.

A picture of the server room before the cleanup.  We’ve got wires hanging all over the back of the MDF/Telco rack. 

Wires hanging out of the server rack too.  I didn’t take pictures of the inside of the server rack, but believe me it was UGLY.

Here’s a couple pics of the server room and rack after the cleanup.  The wiring at the MDF is cleaned up, and the CAT5e wiring from the servers has all been replaced with CAT6 and is running to the ceiling through a piece of flex tubing.

The wiring in the server rack was also tied down with velcro ties (just in case we need to loosen it up to run more wires through).  Again, no pictures of the inside of the rack, because some goof (me) forgot to take them.

Why so much trouble to clean this up?  I admit, I’ve got a little bit of OCD when it comes to keeping things neat and orderly (and I’m still learning how to do it better as I go).  But I also believe that if something is hard to look at (like messy, unlabelled cables strewn loosely inside racks and such), it’s going to be hard to WORK ON, too. 

I’d rather make a time investment to get things in order rather than pay the price for it when something goes down and we have to start chasing down a patch to this or that.  We’ve still got a bunch to do in this room (as well as our IDFs), but Tuesday night got us off to a good start, and I’m looking forward to doing more of this in the future.




FireStats iconPowered by FireStats