Archive for the ‘troubleshooting’ Category

Filed Under (troubleshooting, video) by Dave Mast on February-22-2008

Over the past couple weeks, we’ve had intermittent (my favorite) issues with one of our cameras — the signal would start getting jumpy and eventually go back to normal after a few seconds.  It looked just like someone had unplugged the genlock connection and put it right back.

I finally had a chance to take a look at it today, and I was able to get a nice glimpse of the problem with my camera:

IMG_0018

See how the copper in the center of the left connector is flared out?  That’s the camera’s genlock connection, and most-likely where our issue is.

I found a small needle-nosed pliers and closed the copper up as best I could… it’s looking a little better now.

IMG_0020

I thought about getting in there with a small soldering tip and dripping some solder on that copper to brace it a little.  Going in there makes me a little nervous though.  See that white at the back of the connector?  That’s plastic.  I don’t really want to risk melting it, or worse, accidentally shorting the connector out.

So far, the camera is passing all the stress-tests (or different wire-wiggling techniques) that I’m subjecting it to, so I’m feeling good about this being the cause of our signal issues.  We’ll see in a couple weeks…



Filed Under (support, time warner, troubleshooting) by Dave Mast on February-17-2008

It doesn’t matter how many situations and problems you’ve dealt with as an IT guy, something new and different will eventually come along to make you scratch your head.

On Friday morning I got a notice that our internet connection had dropped out for a short period of time.  After just a minute of being down, the connection came back up.  No big deal, right?

Throughout Saturday morning, I’ve received another notice stating that our connection was down, and then up again.  This happened twice on Saturday morning.  Slightly concerned, I downloaded a little utility (Uptime Scout) to constantly ping our network and log any timeouts that occurred.  At the same time, I called Time Warner Tech Support and opened a ticket with them so they at least had it on record.

When I returned later in the evening, what I found in the logs was very interesting.  Our internet connection was dropping out every 33 minutes, almost to the second.  The outage didn’t last long … a minute at the very most.  However, we use F1, and if this happens on Sunday morning things could get a little messy.

Now I’m very intrigued about our problem, and so I decide to go on-site to look at our cable modem and firewall.  I’m figuring that SURELY our modem is resetting itself or that there’s something that I’ll be able to see.  However, after sitting in the cold datacenter for 50 minutes and watching 2 more outages happen, it’s quite apparent that there’s nothing wrong with the cable modem.  It’s not flinching, and the traffic indicators are still flashing despite my inability to ping anything on the internet.

Well, that leaves you with 2 possible causes … faulty equipment directly outside our building, or a faulty network card in our pfSense box.  As unlikely as I think it is that our pfSense box would have a bad nic (I just replaced it), it’s definitely not outside the realm of possibility. 

So, for the last 40 minutes, my laptop has been connected directly to the cable modem in an attempt to rule out any faulty component on our network, and just as I started typing this paragraph, my pings began to time out and Uptime Scout started making noises at me.  From what I can see, whatever is puking every 33 minutes is not connected to our network or even in the building.

I called Time Warner back with this extra tidbit of information, and I must say that their tech support has been very nice to work with, even at 1:00am.



Filed Under (active directory, servers, troubleshooting) by Dave Mast on November-23-2007

Last week before I went on a mini vacation, I set up a new user account for a staffer.  I issued them their username and password, got them connected to Exchange, and everything seemed to be just hunky-dorey.

This past Tuesday I got a call from my boss.  Apparently the user wasn’t able to receive email, although they could send it just fine.  I checked out the normal stuff… permissions, time sync across the servers, the usual.  Not knowing what to do yet, I went ahead and backed up the user’s Exchange data and deleted the mailbox with the intent of starting over.  After re-creating the mailbox, I was quite perplexed to find that I could no longer even connect to Exchange with this user’s account.

I’m not sure what let me do to it, but I remoted to both DCs to take a look at their AD structure side-by-side.  Imagine my surprise when I discovered that there were user accounts missing from our #2 DC.  A further look into the event logs shows that replication between the 2 DCs has been stopped due to a bad computer account.  Because of this, not only is Active Directory broken on this DC, but DNS services (which are relying on AD) are broken as well.  Digging further into the event log, I find that the system is getting hardware errors while attempting to write to the hard disk, which is what corrupted the computer account responsible for shutting down AD replication.

Fast-forward a little to Black Friday.  I don’t shop on Black Friday.  Ever.  However, I am looking over the hard disk that our 2nd DC runs off of.  A disk scan is turning up massive amounts of physical errors on this drive, and although I’d like to try ghosting the system onto a new disk, there’s a good chance that I am going to be building “ripping out” this system from the domain with a little help from the MS Knowledge Base.  It’s a little frustrating to be repairing this domain with a rebuild happening at the same time, but I don’t want to take any chances

We’ll see how this works out.



Filed Under (troubleshooting, windows) by Dave Mast on September-23-2007

It’s now 11:12am, and I just walked back to my desk to check something. It now appears that DC #2 as well as all our PCs on the network are now 1/2 hour off… the other way!

I’m going to add the syncing program to DC #2 and see if that doesn’t resolve the issue.

This is too weird.

Update: After installing TimeSync on the second DC, things seem to be back on track at least for now. Hopefully they will stay that way. :-)

Another Update:  I actually watched DC #1 fall out of sync by 2 hours a little later in the afternoon.  Not knowing what else to do, I went ahead and rebooted both DCs simultaneously.  It’s been about 24 hours now and I haven’t seen any clock skews as of yet.  Isolated incident?  We’ll see…



Filed Under (troubleshooting, windows) by Dave Mast on September-23-2007

Yesterday I came in to install updates on all our servers.  I updated every server except our #2 DC.  No particular reason, I just forgot.

I walked into the building this morning and sat down at my desk only to notice that my computer clock (along with everyone else’s) is exactly 1/2 hour off GMT-5 time.  The only clock that isn’t off is the one domain controller that syncs its time externally, which is not the aforementioned #2 DC.

After installing updates on the forgotten DC and rebooting, everything seemed to go back to normal.  In fact, all of the PCs on the network synced back to the correct time almost instantly.

Aside from “Patch your Domain Controllers simultaneously,” I don’t know what the moral of this story is, but it’s definitely an interesting issue.



Filed Under (macs, raid, storage, troubleshooting, video) by Dave Mast on June-6-2007

My adventures with the Norco disk system have continued throughout the day, and after some more tinkering and some software-aided intervention, our storage array for the editor seems to be back up and running.

Apple’s Disk Utility program failed to do any sort of repair work on the array.  After many repair attempts, all I could get were directory errors.  After browsing around for some Mac disk repair utilities, I landed on one that showed promise:  DiskWarriorby Alsoft.

I fired up DiskWarrior, pointed it to our now-unmounted disk array, and watched it go to work.  It identified the array and cleaned up the directory structure, various file attributes, and various other things, and in about 5 minutes the array was back online.  SWEET!  This was WELL worth the price tag (about $80).

Thinking things were back to normal, I went ahead and shut everything down so I could install a UPS in front of the Mac and the resurrected RAID box.  After plugging everything in, I turned on the Norco drives, powered the Mac up, and watched in utter astonishment….as the system failed to recognize five of the twelve installed drives.

At this point I was about at my limit with this RAID array.  Not really knowing what else to do, I went ahead and shut both the disks and the Mac completely down, and then restarted the RAID after about a minute.  I let the disk system run for about 2 minutes before powering up the Mac.  My thought on this was that if the backplane or anything else in the RAID box has to run a POST or anything, I’m going to give it plenty of time to do so before restarting the computer.

I pushed the power button on the Mac, and lo and behold, I heard many drives starting to spin up all at one time.  In a few seconds, all 12 drives had spun up, and the RAID seemed to be back on its feet.  A quick look through the Finder revealed that all of our stuff on the array appeared to be intact.

So is this over?  I really don’t know.  I really haven’t felt like testing to see if I can replicate the issue by starting the Mac “too soon” after the RAID array is powered up.  What I am going to do is exchange our eSATA controller card for one that has been tested and is more compatible with the unit.  If that clears things up, then I’ll feel more confident about marking this down as  a hardware issue.  At this point, it makes sense that 2 different chipsets wouldn’t play well together.  But at the same time, you’d think there would be more consistency to it.

Maybe the constant here is just sheer unreliability.  Time will tell.  Until then, I’ll be copying our FCP project files onto a safer hard drive.



Filed Under (macs, power, raid, storage, troubleshooting, video) by Dave Mast on June-6-2007

Well, after playing around with the RAID array throughout the wee hours of the morning, it’s pretty apparent that something went seriously wrong.  A massive power flux?  A dicey hard drive?  I really don’t know at this point.  S.M.A.R.T. status on all the drives shows that they’re running just fine.  So far 2 ideas are floating around in my head:

  1. The system suffered a massive power fluctuation that totally ticked off the Mac, or the RAID unit, or both.
  2. There is a major compatibility problem with the RAID unit and the HighPoint Technology card that I had to use in place of the bundled controller card.  The only thing I can think of is that there might be 2 different chipsets between the RAID unit and the controller that don’t like each other at all.

Either way, I’m glad this problem decided to rear its head NOW instead of later, when we’ve got the drive populated with irreplaceable data.

Speaking of which, it just so happens that most of the files that were on that RAID5 array are still sitting in other areas!  THAT is letting me breathe so much easier right now.  However, there were quite a few Final Cut project files that were only on that array, which is still a bummer.  I’m in the process of looking through data recovery software to see if there’s anything decent that I can try.

In the meantime, I’ve got a UPS set hook into that system immediately.  Plus, if I can’t make the array work after another rebuild, I’m doing to set a separate PC up there and connect it to the MacPro.  Since the PC has a PCI slot on it, I can use the Norco’s bundled controller card.  I’m not exactly thrilled about putting a PC up there JUST to act as a bridge between the editing system and the RAID, but I may find that I have no choice.

More updates as the plot unfolds.



Filed Under (raid, storage, troubleshooting) by Dave Mast on June-5-2007

This morning started with a phone call from Jeff in the editing room.  He was having trouble getting Final Cut to recognize our Sony MiniDV deck over FireWire.

I head upstairs and we begin tinkering around with it.  After a few unsuccessful attempts, we decide to reboot the machine and start from scratch.  No big deal, right?

Well, after rebooting, I get an error on the desktop saying that a disk is unreadable by OS X.  I’m used to seeing these when I insert DVDs, but the only thing in the DVD drive was an audio disc.  Then I noticed that the RAID5 array that we just put into production at the end of last week is missing from the desktop.  Oh no…this is not good.  This drive was working just fine with no signs of errors.  Now I can see it, but I can’t mount it.  Even a look at the RAID controller shows that there are no errors on the array.

As soon as I attempt to run a verify operation though, the event log shows that there are inconsistencies on the array, and the controller starts a rebuild.  For the moment, I was slightly relieved.  The array will rebuild and I’ll be able to remount the drive with no problems, right?

Well, here I am 12 hours later (rebuilding a 6.75TB array takes a LONG time!).  I’ve been at home monitoring the rebuild, and it finished just a few minutes ago.  After holding my breath and attempting to remount the array in OS X, I was a little surprised (and very disappointed) to get nothing but errors after attempting a verify and then a repair.

So now I begin the task of looking for a disk repair utility for the Mac.  I’m not sure if it will do any good, but I’m rapidly running out of ideas.

It’s going to be a long night.



DST
Filed Under (IT, troubleshooting) by Dave Mast on March-13-2007

With everything going on last week in the land of video, I had this one thought in the back of my mind… it was like a splinter, causing me to cringe out of nowhere and sometimes causing severe discomfort.

That thought?  DST updates.

I don’t think it could’ve happened on a more hectic week.  What with 10 video pieces to crank out, all airing on or BEFORE Sunday, I was really wondering “where on earth is this going to fit in?“  I already had plans to set up WSUS, but I wasn’t sure how that was going to work out, and after reading all the KB articles on the Exchange Calendar Tool, I was about sick to my stomach wondering how this was all going to come together.

I decided that we would go the WSUS route to get our updates.  I set up a VM, and ended up taking the host machine to my house to download the updates (yes, I have MORE bandwidth at home than I do at the office….for now).  After I downloaded what we needed (and then some), I took the server back to the office, reconnected it, and set a GPO to point all of our computers at the new WSUS server, now loaded with every critical and security update we would possibly need at the time.  After doing some random spot checks, it appeared that on Friday, every last server and PC (except our lonely Win2k box in the control room), had received the DST update and was ready to roll.

*whew*… ok.  So far, so good.  Now, it was time to update Exchange and all of our calendars.  The Exchange patch installed with no issues, but I had MAJOR issues trying to update everyone’s calendar using Exchange Calendar Update.  I decided that since there were only about 30 calendars to update, and since everyone worked in the same office area, I would load up the Outlook calendar update onto a flash drive and spend my Monday visiting everyone and updating their calendar.  Was this the most desirable course of action?  No, not by a long shot.  I’m all about remote administration, so this was not in my Top 10 List of Things I Wanna Do On Monday.  However, it worked out very well.  The update tool worked like a charm, and by the end of the day, everyone’s calendar was back in order.

Was this smooth?  To me, yes.  The process went MUCH smoother than it could have.  It amazes me sometimes how God shows up on the scene when we humans think that we’re pretty much hosed.  I honestly don’t know HOW the update process went as smooth as it did… it just did.  I’m amazed, but really, should I be so surprised?  No… that’s just how God works.  I just kinda stand there with my jaw dropped.




FireStats iconPowered by FireStats