Monday, July 7, 2008

Frying Eggs on your MacBook Pro...

So, I'd been having problems with the MacBook Pro recently. It'd work fine, stable as a rock, when I'd run OS X. When I'd reboot to Windows XP SP3, *usually* it'd be ok, but when I did large network transfers or tried to play a game it'd lock up almost every time. When I say lock up, I mean total hang - no bluescreen, no mouse, audio card buffer stuck, no key response except the power button, etc. No blue screen, so no hints. Frustrating....

I assumed it was software at first - probably drivers - mainly because OS X was fine, but XP wasn't (although XP running in a virtual using VMWare Fusion did just fine). Backdated XP SP3 to SP2, disabled various pieces of hardware (especially network, since it seemed related to large transfers), changed drivers, the works. Some things seemed to reduce the problem, but nothing actually seemed to fix it.

While trying to find new 8600M GT drivers (wild goose chase btw - NVidia doesn't want to release reference drivers apparently), I grabbed nVidia's nTune software, figuring I might check out its diagnostics. Painful stuff to use, but you can get it to do a nice TaskManager-style display of CPU, Disk, Memory, and GPU Temperatures.

Noticed that the crashes tended to happen while it was running a bit warm (~85 - 91C) on the GPU... Ugh. I hate heat problems. I'm so not an overclocker....

Found a program called Input Remapper that allows manual control of the fans - turned 'em all full, underclocked the GPU with nTune - much better, but still occasional crashes during gaming.

Took off the aftermarket plastic cover I had put on the MacBook Pro to protect it's finish (my last Mac, a Titanium G4, scratched *extremely* easily and really looked like hell after a year or two, so I had put these on). Then propped up the back of the MacBook to give it some air under it, and tossed the fans to be on about 80% minimum.

The idle temperation on the GPU has gone from ~65C to about 57C, and the max under pressure has gone from about 91 down to about 85. And no crashes. I don't think it's the NVidia card objecting to the heat and crashing, but their diags definitely point to crappy cooling in the system, and the covers making it worse. Put the clocks on the GPU back up to normal - little bit warmer, still no crashes.

Somewhere during all of this I came across a few articles. 1 2.

Ugh. Apparently some Genius at Apple thinks thermal paste is a magic cooling compound that chips should bathe in, rather than just a thin heat conductive layer to help transmit heat from the chips to the heatsinks.

I really don't want to void my warranty, but I loathe handing my development machines to any local retail tech shop (and when I do have to, it's sans hard drives), whether they're called 'Geniuses', 'Geeks', 'Nerds', or whatever. Guess I'll order some thermal paste and fix it sometime when I'm bored. Raising it up, removing the covers, and blasting the fans has made it stable so far though, so hopefully I can make do for now.

And why was it stable under Mac OS X? Probably mostly behavioural on my part. Mac games are.... rare. I also think that the stock drivers for running the fan are tweaked differently between OS X and XP for the MacBook Pro, although I haven't verified this.

In other fun "I hate computer hardware" news, my wife's PC was also constantly crashing recently. Hers at least had more direct clues - sometimes the BIOS couldn't find the SATA drives, or it would get stuck detecting IDE drives (on an EVGA nForce 590 SLI AM2 motherboard, running two SATA drives in RAID-0 and the DVD-RW). After several attempts at diagnosing the problem with it going away temporarily each time only to come back later, I finally tracked it down to a $25 ASUS Lightscribe DVD-RW (DRW-1814BLT, Feb 07 manufacture date). Not sure how/why it hoses the system so badly, but it does. Replaced it with a Pioneer DVD-RW that cost $30, and no problems at all anymore. Bizarre.

Previously, the hard drive access on that machine had been periodically HORRIBLY slow, and I had tracked that problem down to the ASUS DVD-RW as well. That problem had gone away when I updated the firmware on the drive (and I presume that was when the nasty crashing problem was first introduced).

