• For RAM - typically Measured in FIT ("Failures In Time") per megabit.

  • 1 FIT per megabit == 1 single bit failure per billion hours (at sea level in New York State - actually at IBM's Thomas J Watson research centre).

  • For a cluster with 200 gigabytes of RAM, with an error rate of 600FIT/megabit = average of 1 single bit error every hour (!).

  • For a cluster with 200 gigabytes of RAM, with an error rate of 6FIT/megabit = average of 1 single bit error every 4 days.

  • Future trends in device susceptibility unclear.

  • If you have ECC, you are probably alright for a while.