So I am borrowing this title from Mark Russinovich because this one took a while to figure out. I am writing this down mostly for my own records, but perhaps someone may find value in this encounter.
Rewind to SQL Saturday 285 Atlanta 2014. At the end of the session, I get distracted and close the lid on my Lenovo X230 Tablet without turning the system off. Put the system in the backpack. For most, that’s not a problem, but I set Power Options to Do Nothing when closing the lid (this has to do with fingerprint reader access while docked under a monitor stand). The next day, I get the laptop out and turn it on. No power. Makes sense, I didn’t turn it off so it drained the battery and eventually shut down. Connect to power and the battery charge light blinks orange (it either means “really empty” or “bad battery” – the former in my case).
Then, I swipe my finger to turn on the system, perform pre-boot authentication, boot into Windows and automatically log on. No response from the fingerprint reader. It was not a bad swipe because then the light would blink orange. Try again, no avail. Still no lights. Turn the system on manually. Asks for the power-on password – but still no finger swipe accepted. I type in the power-on password. It continues to boot. In Windows, no option to log on with fingerprint reader. Log on with password and get to work.
By this time, I am suspecting that the fingerprint reader died. I have several conferences and personal travel trips coming up, so I figure I’ll deal with it later. I can live without it for a few weeks. I happily use the tablet for a whole week. Then I get to Houston for TechEd North America 2014. I receive an e-mail from our backup system that the tape drive needs cleaning. No problem, I’ll log on to the VPN, move a cleaning tape from the library to the drive and back and move on. I fire up the VPN client, my password manager and attempt to connect. All of a sudden, after entering the password, my system freezes. No LBSOD (light-blue screen of death, in Windows 8), no response to power, etc.
About two weeks earlier, our IT group had asked me to test the VPN on Windows 8.1 because other users had reported stability issues. It worked fine for me, but my thoughts went back to that. I thought perhaps connecting to the VPN caused the freeze. Pull power cable, pull battery, turn it back on.
Then I get the dreaded HDD0 not found BIOS message. Uh oh. Now I am in real trouble. Or maybe not. Maybe my backpack got bumped too hard and something inside came loose. Like any good boy scout (even though I only made it two years) I come prepared. I open up the laptop, remove the SSD, inspect the connectors and put it back. Same result…
So, now I am faced with a broken SSD. But it’s a nearly-new 1 TB Samsung 840 EVO, supposedly the most reliable SSD ever built? My thoughts are racing to the work I might have lost building some VMs for an upcoming talk, etc. I decided to hook up the SSD to my Surface Pro tablet (thank you Microsoft for the full size USB 3 port!). It is recognized immediately and works fine. I am still considering the possibility that perhaps the drive is about to fail, so I don’t leave it connected for too long. My Surface Pro doesn’t have the kind of storage needed to backup what I need to safeguard. And while I have a 1.5 TB external drive with me, I don’t have a USB 3 hub that would allow me to transfer that kind of data in any reasonable amount of time.
I even try to boot the system from a USB connection thinking if the fingerprint reader went bad, why couldn’t the SATA connector have gone bad?
I ask one of my IT helpdesk techs to overnight a Windows 8.1 USB installation disk and a new SSD drive. That will allow me to see if the laptop is dead, and if not, install a new OS and get my critical data that wasn’t backed up yet. In between, I take a look at the BIOS of the machine. Everything seems in order. I am thinking about the possibility that this dead fingerprint reader has something to do with it – but why now?
Turns out, I am having a lot of trouble accessing the BIOS. Sometimes I can get in, sometimes not. It seems to work more reliably to access the BIOS with the 1 TB SSD removed. I am hunting in the BIOS for a way to turn off the fingerprint reader, but can’t find it (it exists, I just couldn’t find it). While in the BIOS, I realize that I never enabled UEFI Secure Boot and make a note to make sure to do that with the next OS install.
When the spare drive comes in, I mount it, install the OS. It goes off without a hitch. So now I am pretty convinced that the SSD is bad. After installing the OS, I hook up the “bad” SSD and my 1.5 TB HDD and start copying data. I might as well get everything off as long as I am copying data, so I end up copying about 150 GB. The SSD performs great. How is that possible?
I decide to find a tool that can read S.M.A.R.T. data from the USB bus. I found a tool and it turns out that no issues were reported at all.
Now, I decide the take things a step further. Back to the BIOS and reset all security data, including TPM keys, secure boot keys (even though disabled, the keys exist) and I finally manage to find the setting to disable the fingerprint reader. (Note the new OS on the new SSD didn’t mind the faulty fingerprint reader.) One more attempt at mounting the “bad” SSD. IT WORKS!
Boot into Windows, no problems. So what happened? I call upon an old friend, the Reliability report. My reliability index tanked after SQL Saturday…because the Windows Biometric Service kept crashing. Turns out (what I realize now) that the bad hardware or connection to the fingerprint reader kept “finding” the fingerprint reader, then not, then again. This caused the Windows service to presumably go nuts and crash – many times that week. I just never noticed it.
What caused the system to finally freeze may or may not have been related to the VPN connection. I also don’t quite understand why it wouldn’t let me reboot anymore until I cleared TPM keys and/or disabled the fingerprint reader. Clearly, some of this did the trick.
For my upcoming conference trips, I’ll be sure to take a spare laptop just in case I misdiagnosed the issue still, or in case additional hardware fails.
Conclusion
My best guess at this time is that after SQL Saturday, my laptop got really hot in my bag. That caused an issue (fortunately only) with the fingerprint reader and subsequently caused a system freeze.