links
300 Images From 1800 Sites
Punctuated Productivity
ascii table
brainjar.com: css positioning
Catman's Reference Guide to XHTML 1.1
Catman's XHTML 1.1 Elements and Attributes Reference Guide
citeseer
Color Scheme Generator
common errors in english
cool images
Copying music between authorized computers with iTunes for Windows
css layout-o-matic
daypop
del.icio.us
elegant hack
emacs wiki
floatutorial
imho...
keystroke shortcuts for windows xp
mozilla keyboard shortcuts
NameVoyager
perldoc.com
programming language popularity
regular expression tester
selectoracle
short url services
simple urls for search engines
the unix acronym list
yahoo dictionary
most read last 60 days: audiotron to audioscrobbler: atronscrobbler (101)
hard disk failure (86)
apache / fcgi / debian / rails (72)
lessons learned in electronic media (70)
home network performance (70)
categories
cygwin / linux / unix
emacs
entertainment
government
health
restlater
ruby on rails
software development
system administration
textpattern
web technologies
sections
about
article
photos
portfolio
recently
Citizen’s Briefing Book
Tbone walking in the Park
rav this!
I'll rest later...
please don't feed the rails programmers
replication in rails
apache / fcgi / debian / rails
miguel's hell of gratuitous rewriting
favorite sig lines
listening reimplemented in ruby on rails
hard disk failure
maxloss
got backups?
installing ruby and rails on debian
hëävy mëtäl ümläüt
the lighter side: japanese error messages
home network performance
installing atronscrobbler on windows using cygwin
audiotron to audioscrobbler: atronscrobbler
lessons learned in electronic media
I just replaced the primary hard disk in this system. Here’s the story:
I began to notice weeks ago that my system seemed a bit sluggish. This server is a few years old and probably under configured for what it is being asked to do. I’m constantly piling more stuff on it and for the most part it hums along without complaint. I attributed the periodic sluggishness to insufficient memory and associated swapping.
A few days BC (before crash) I decided there was something more to it than swapping. I’m not sure exactly why. I think it had to do with some specific operation that I was doing over and over again. Thinking that little else was going on on the system, it didn’t make sense to call the slowness “swapping”—I mean, there was no time or reason for this to be swapped out. Also I think I noticed that some operations would slow down in the middle – perhaps something was loading and it would load part of it fast, then get real slow.
I started looking for a problem. Nothing new in the system logs that I noticed. I realize this is all very vague, but I decided it was the disk and that the disk was having trouble. I started trying to monitor the disk health.
I installed S.M.A.R.T. Monitoring Tools. SMART is a standard for reporting the health of a disk. Disks are pretty, ah, smart these days, and they track lots of things that are related to their health. SMARTMON interacts with the disk to report and control this monitoring.
The instant smartmon tests reported some problems, but concluded that the disk was OK, I think. I asked the disk to perform the more comprehensive tests—tests which apparently take 30 minutes or more. It never finished. The test was aborted. I tried this a few times and never got a complete test. The system reported that the test was canceled on request. I did not cancel the tests and I haven’t figured out what did cancel the test. Some have reported that a system going to sleep would cancel a test though my system is not configured to sleep.
I spent days, perhaps a week, messing with this. I thought I had time. At some point I decided that my disk was hurting and that I needed to replace it, so I bought a new drive and brought it home.
It wasn’t till then that I decided to make a backup.
This was a crucial mistake. What I perceived to be a very gradual slide into a non-optimal but still working state was really a drive plunging off the cliff of its life.
An aside: I may have contributed. Did I accelerate the destruction of this drive? Often you (read “I”) aggravate the situation more than you (I) help. I’m an amateur administrator of these systems and I don’t see failures like this very often, so I’m always learning or relearning, since the last failure happened so long ago. When it was not working well I tried to “tune it up” with hdparms. I tried several settings but none moved the drive performance above “abysmal”.
Of course the backup failed. Now the drive was reporting consistency problems. Files that appeared to be there were unreadable. I made several attempts to salvage pieces and had some success. Unfortunately there were recent files that were not readable.
I’ve put the system back together via a combination of a 6 month old full backup, fragments that I was able to salvage from the failing drive, and various copies from other systems.
* * *