[Noisebridge-discuss] Linux disk stress test & burn-in

Fri Jul 8 06:19:49 UTC 2011

On Thu, Jul 07, 2011 at 06:48:36PM -0700, Seth David Schoen wrote:
> http://www.coker.com.au/bonnie++/

I prefer fio(1).  http://linux.die.net/man/1/fio and
http://git.kernel.dk/?p=fio.git;a=summary

> It seems to me that trying to check for errors is very unlikely to
> find anything, because hard drives have extensive internal soft
> error correction.  The probability of an error that gets detected
> by the drive and reported to SMART must be _much_ higher than the
> probability of an error where bad data silently reaches the
> application.

Soft errors get reported in SMART as Hardware_ECC_Recovered and
Reallocated_Sector_Ct and Raw_Read_Error_Rate (though how to interpret
various vendors' RRER values is only documented in NDA materials AFAIK).
You can also simply measure how long a given IO took to complete as a
proxy for sector unreliability.

I am pretty skeptical of the value of a burn-in cycle for disks; if you
care so much about your data, just buy enterprise spindles (and pay the
premium) or else use a software reliability layer above your cheap-ass
SATA storage.  But, y'know, shrug.  To each his own data reliability
model. :)

-andy