Entries from September 2006 ↓

The myth of the Gutmann method

Most people know that when you sell or give away a computer, you should format its hard drive to make sure your sensitive information doesn’t fall into the wrong hands. And most of those people know that formatting a drive doesn’t actually erase all the data. Instead, you should use a special utility that overwrites every block of data on the drive. And a smaller portion of those people know that overwriting a block just once isn’t enough. If you really want to be safe, you should apply the Gutmann method, which overwrites every data block 35 times. And an even smaller portion of those people know that the Gutmann method is a myth.

I had always thought that the idea of overwriting the same data block 35 times was a bit dubious. (Why would 35 times be secure but 34 times not be?) And yet, most disk utilities provide an option to erase a hard drive 35 times over.

Disk Utility erase options

In a recent Slashdot article, I discovered a comment that shed some light on this issue. User Psionicist wrote (with typos faithfully reproduced):

I would like to take the oppertunity here to debunk a very common myth regarding hard drive erasure.

You DO NOT have to overwrite a file 35 times to be “safe”. This number originates from a misunderstanding of a paper about secure file erasure, written by Gutmann.

The 35 patterns/passes in the table in the paper are for all different hard disk encodings used in the 90:s. A single drive only use one type of encoding, so the extra passes for another encoding has no effect at all. The 35 passes are maybe useful for drives where the encoding is unknown though.

For new 2000-era drives, simply overwriting with random bytes is sufficient.

Here’s an epilogue by Gutmann for the original paper:

Epilogue In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques. As a result, they advocate applying the voodoo to PRML and EPRML drives even though it will have no more effect than a simple scrubbing with random data. In fact performing the full 35-pass overwrite is pointless for any drive since it targets a blend of scenarios involving all types of (normally-used) encoding technology, which covers everything back to 30+-year-old MFM methods (if you don’t understand that statement, re-read the paper). If you’re using a drive which uses encoding technology X, you only need to perform the passes specific to X, and you never need to perform all 35 passes. For any modern PRML/EPRML drive, a few passes of random scrubbing is the best you can do. As the paper says, “A good scrubbing with random data will do about as well as can be expected”. This was true in 1996, and is still true now.

Looking at this from the other point of view, with the ever-increasing data density on disk platters and a corresponding reduction in feature size and use of exotic techniques to record data on the medium, it’s unlikely that anything can be recovered from any recent drive except perhaps one or two levels via basic error-cancelling techniques. In particular the the drives in use at the time that this paper was originally written have mostly fallen out of use, so the methods that applied specifically to the older, lower-density technology don’t apply any more. Conversely, with modern high-density drives, even if you’ve got 10KB of sensitive data on a drive and can’t erase it with 100% certainty, the chances of an adversary being able to find the erased traces of that 10KB in 80GB of other erased traces are close to zero.

So it seems my suspicions have been confirmed: You do not need to erase a hard drive 35 times before selling it on eBay. A quick zeroing out of the data is sufficient.

Top ten longest titles of research papers

One of the things you learn as a Ph.D. student is how to do research. Though I’m still far from mastering that particular lesson, there’s something I’ve discovered along the way: Academic researchers love coming up with long titles for their papers. In fact, a colleague’s recent 27-word Ph.D. thesis had me wondering, “Just how long do these titles get?” I decided to find out. I wrote a little script that scans the DBLP database and spits out the longest titles it finds (based on number of characters, not words). Excluding non-English titles, here’s the top-ten list:

  1. In silico exploration of the fructose-6-phosphate phosphorylation step in glycolysis: genomic evidence of the coexistence of an atypical ATP-dependent along with a PPi-dependent phosphofructokinase in Propionibacterium freudenreichii subsp. shermanii
  2. A Comparative Study of Artificial Neural Networks Using Reinforcement Learning and Multidimensional Bayesian Classification Using Parzen Density Estimation for Identification of GC-EIMS Spectra of Partially Methylated Alditol Acetates on the World Wide Web
  3. Performance of empirical potentials (AMBER, CFF95, CVFF, CHARMM, OPLS, POLTEV), semiempirical quantum chemical methods (AM1, MNDO/M, PM3), and ab initio Hartree-Fock method for interaction of DNA bases: Comparison with nonempirical beyond Hartree-Fock results
  4. Joint quantum chemical and polarizable molecular mechanics investigation of formate complexes with penta- and hexahydrated Zn2+: Comparison between energetics of model bidentate, monodentate, and through-water Zn2+ binding modes and evaluation of nonadditivity effects
  5. A Simple Flexible Program for the Computational Analysis of Amyl Acyl Residue Distribution in Proteins: Application to the Distribution of Aromatic versus Aliphatic Hydrophobic Amino Acids in Transmembrane alpha-Helical Spanners of Integral Membrane Transport Proteins
  6. Three-Dimensional Quantitative Structure-Property Relationship (3D-QSPR) Models for Prediction of Thermodynamic Properties of Polychlorinated Biphenyls (PCBs): Enthalpies of Fusion and Their Application to Estimates of Enthalpies of Sublimation and Aqueous Solubilities
  7. WEB OBJECTS TIME: When Microsoft Started Speaking Like a Good Open-Standards Citizen, The Netscape Extensions Tail Tried to Wag The Dog and Object-Oriented Software Turned Static Web Pages Into Dynamically-Linked Access Boulevards to Significant Online Collection Databases
  8. Hydrogen bonding in diols and binary diol-water systems investigated using DFT methods. II. Calculated infrared OH-stretch frequencies, force constants, and NMR chemical shifts correlate with hydrogen bond geometry and electron density topology. A reevaluation of geometrical criteria for hydrogen bonding
  9. Molecular mechanical models for organic and biological systems going beyond the atom centered two body additive approximation: aqueous solution free energies of methanol and N-methyl acetamide, nucleic acid base, and amide hydrogen bonding and chloroform/water partition coefficients of the nucleic acid bases
  10. The nucleotide sequence of a 3.2 kb segment of mitochondrial maxicircle DNA from Crithidia fasciculata containing the gene for cytochrome oxidase subunit III, the N-terminal part of the apocytochrome b gene and a possible frameshift gene; further evidence for the use of unusual initiator triplets in trypanosome mitochondria

Of course, a trivia researcher’s work is never done. For future analysis, I’ll focus on papers with the highest number of authors. (I’ve already discovered a potential candidate.)

orbit burp

After discovering Spamusement, the website of cartoons inspired by actual spam, I began to notice that some of my junk mail would make pretty good cartoons. Though I’m no artist, I thought I’d try my hand at making some “spamusement” of my own. Here’s my first attempt:

orbit burp

orbit burp

Yes, I actually received some spam titled “orbit burp.” It was an ad for a penny stock, but the subject line was randomly generated, obviously.

I posted my drawing in the Spamusement forums, and it was surprisingly well-received!