Tuesday, April 04, 2006

Computer Fun

Warning: No questionably amusing anecdotes in today's post. Just nerdy computer stuff.

I'm not sure that I've mentioned it on the blog but my computer has been blue screening on startup. It flashes a blue screen for less then a second and then reboots itself (making it physically impossible to read, because the monitor switches modes slower then it is on the screen, so all you see is a twisting blue color and then back to the bios).

This doesn't happen every time on startup. Sometimes when it does happen a few reboots will do the trick and it is eventually able to startup. Sometimes I have to boot to recovery console and run chkdsk on the drive. I've never had it fail to start after running a chkdsk (which usually finds a few problems and fixes them).

I've run full chkdsk /f /r several times and found no bad blocks on the hard drive.

I've uninstalled a number of programs that have been installed over the last few months thinking one might be the culprit. The last program to go was Alcohol 120%, and I hadn't seen the problem for a week or two, until yesterday.

Yesterday in the morning the computer wouldn't boot up, but in the afternoon when I came home it would. I ran chkdsk /f and it had some trouble getting restarted to the point of running chkdsk but then eventually worked. chkdsk found some problems, fixed them and also complained about getting an failure trying to read a group of blocks (a fairly large amount of blocks actually). It completed whatever it did, didn't change the number of bad blocks, and dumped some hex information of something to the screen.

The computer booted up just fine after that, and I ran sfc /scannow, which I don't think found any problems.

So.. where does that leave me?

I could have a bad driver or some other problem in Windows that is causing problems on startup, and ends up causing some minor filesystem corruption as a result. A fresh installation of Windows would probably fix this.

I could have a hard drive that is failing. I don't think it is failing in the 'no longer magnetic' way, since no blocks show up as bad when chkdsk scans them. But it could have some sort of circuit board problem that causes it to return bad data sometimes. That bad data could cause the blue screen on reboot. Replacing the hard drive would solve this.

I could have a bad SATA chip on the motherboard that causes reads/writes to the hard drive to fail intermittently. Exhibiting the same issues as above, but through no fault of the hard drive. Replacing the motherboard would solve this.

Fixing these involve fairly significant investments of time or money (or both). In the case of the motherboard, it might lead me to spend many hundreds of dollars upgrading my system at the same time (since I would probably be reluctant to buy a fancy new motherboard and not get a new CPU and video card (because my existing one is AGP) at the same time).

I'm worried about the hard drive failing theory, because I have a bunch of stuff on that drive that doesn't have reasonable backups. And which I can't really reliably back up anymore because for all I know all those videos of our wedding have bad blocks in the middle now. I also don't have enough space to copy all that data anywhere else for a format/re-install.

Buying a new hard drive is a relatively cheap solution, and would solve the 'no space for temporary storage during a new installation' problem. But I am worried that it could be a problem with the motherboard that would continue to persist after the swap out, causing me to waste money on a hard drive I don't (really) need.

The reason the motherboard popped into my head this morning as a potential culprit is that I replaced some of the fans with Zalman heatsinks and fans, and I used to occasionally get heat warnings while playing Civ 4. It could be that I had some heat problems that have now screwed up something on the motherboard.

On the other hand, it doesn't make much sense that anything related to heat would show up only on startup. Especially in the morning when the house is relatively cold. Also, if it was the motherboard wouldn't problems show up with at least the other SATA hard drive in the system?

So, am I missing something obvious here?

I pretty much have to do something, as both me and Linzy are tired of the computer being unreliable in terms of being able to get it started. I also don't really want to lose any of the pictures and videos that are only on the hard drive that may or may not be failing.

At this point I am leaning towards the buying a new hard drive plan. I'll do a re-install of Windows at the same time, killing two birds with one stone. Plus I would get another 100G of storage out of the deal and a drive with NCQ and faster transfer rates (though I'm not sure my motherboard supports this). But I'm open to suggestions.

4 comments:

Bill Roehl said...

Could it be flaky/failing RAM? That's an easy test: take out what you have one stick at a time (if you only have one get a 128MB DIMM from somewhere else and test with that) and see if it stops the blue screening.

Anonymous said...

I guess that i am wondering why you even think the problem is a hardware problem. A blue screen error is a software problem. Hardware problems usually present themselves in the bios before the computer even reads information off the hard drive. As per your comment, that you don't want to waste money on a drive that you don't need, if you do not have the disk space to perform a format/reload, you do need the drive.

Steve Eck said...

Bill, I hadn't really considered the RAM, since the problem only occurs during startup versus just randomly during operation. But that is a good point, I might have to get a memory checker program or something.

Anonymous, I agree that blue screens are a software issue, but they can certainly be caused by hardware poblems. Take for example a hard drive that goes south and ends up with a bad block in the middle of a DLL. Instant blue screen when code tries to call into that section of the DLL and/or a failure to load the DLL.

Bill Roehl said...

Last time I had RAM go south I was getting freezes on boot as well as during operation but I was running Linux.

I haven't had any experience with Windows and bad RAM.

Good luck.