+ All Categories
Home > Documents > Ephemera

Ephemera

Date post: 23-Sep-2016
Category:
Upload: isabel
View: 212 times
Download: 0 times
Share this document with a friend
1
JANUARY/FEBRUARY 2010 Copublished by the IEEE CS and the AIP 1521-9615/10/$26.00 © 2010 IEEE 3 F ROM T HE E DITORS T HE WORD “EPHEMERA” DERIVES FROM ANCIENT GREEK, ORIGINALLY MEAN- ING THINGS LASTING NO MORE THAN A DAY . THE CONNECTION TO COMPUTA- TIONAL SCIENCE AND ENGINEERING WILL BECOME CLEAR SOON, CERTAINLY IN UNDER A DAY. One of the less interesting lectures I’ve ever heard was on the topic of error detection and correction. This wasn’t because the subject is dull. In fact, detecting and correcting errors in digi- tal data involves a fascinating set of ideas and techniques, with- out which computers and indeed all telecommunications would be impossible. The talk I heard was less than great because the speaker decided that the way to introduce an audience of nonspe- cialists to the subject was by reciting the excruciating details of a particular computation. “If this number is a one and this number is a one then we add them and get a zero. And then we look at the next bit …” As many CiSE readers know, the idea is to send extra bits and then do a short computation on the received data that will tell you if what was received is what was sent. How exactly this is done and why it works is a deep subject dating back at least to the work of Claude Shannon. Shannon’s techniques are robust and quite reliable. By the time computers had been around for 15 years, machine errors were extremely rare. A good friend tells me that in 1965 he got into a very heated argument with a fellow graduate student over wheth- er a certain program was buggy or a true machine error had oc- curred. It got so bad that the guy who had the “bug infested” program finally stomped off in a huff. The argument was never settled because the next time the program was run—without any changes—it ran correctly. No error was found in either the pro- gram or the hardware. Possibly this was an example of a “soft” error. According to Wikipedia, a soft error is data that’s wrong, but not because of a programming mistake or hardware failure. After a soft error, there’s no implication that the system is any less reliable than before. If the data is rewritten, the circuit will work perfectly again. Soft errors involve changes to data—the electrons in a storage circuit, for example—but not changes to the physical circuit itself, the atoms. A bit is flipped and the cause of the flip vanishes. The error is ephemeral. Soft errors are thought to be caused by an alpha particle pass- ing through exactly the right part of the circuit at exactly the right time. Soft errors used to be extremely rare events. But now that terabytes of data are accessed and feature sizes on chips are well below 100 nanometers, soft errors seem to be cropping up. They’re certainly being offered more and more as a reason for program execution failure. I t seems to me that this is an excellent time for research in theory and algorithms for detection and correction of soft er- rors. I suspect this will be much harder than the classical theory. Let’s call it the hard science of soft errors. Yesterday upon the stair I saw a man who wasn’t there He wasn’t there again today Oh how I wish he’d go away. This verse, “Antigonish,” was written by William Hughes Mearns in 1899. I’m grateful to Francis Sullivan for his interesting comments and insight on this subject. EPHEMERA By Isabel Beichl, Editor in Chief Selected articles and columns from IEEE Computer Society publications are also available for free at http://ComputingNow. computer.org. Get Involved CiSE is always looking for interesting articles and insight- ful peer reviewers. For submission guidelines, editorial calendar, and other information, visit www.computer. org/cise.
Transcript

January/February 2010 Copublished by the IEEE CS and the AIP 1521-9615/10/$26.00 © 2010 IEEE 3

F r o m T h e E d i t o r s

the word “ephemera” derives from ancient Greek, oriGinally mean-

inG things lasting no more than a day. the connection to computa-

tional science and enGineerinG will become clear soon, certainly in

under a day.

one of the less interesting lectures i’ve ever heard was on the topic of error detection and correction. this wasn’t because the subject is dull. in fact, detecting and correcting errors in digi-tal data involves a fascinating set of ideas and techniques, with-out which computers and indeed all telecommunications would be impossible. the talk i heard was less than great because the speaker decided that the way to introduce an audience of nonspe-cialists to the subject was by reciting the excruciating details of a particular computation. “if this number is a one and this number is a one then we add them and get a zero. and then we look at the next bit …” as many Cise readers know, the idea is to send extra bits and then do a short computation on the received data that will tell you if what was received is what was sent. how exactly this is done and why it works is a deep subject dating back at least to the work of claude shannon.

shannon’s techniques are robust and quite reliable. by the time computers had been around for 15 years, machine errors were extremely rare. a good friend tells me that in 1965 he got into a very heated argument with a fellow graduate student over wheth-er a certain program was buggy or a true machine error had oc-curred. it got so bad that the guy who had the “bug infested” program finally stomped off in a huff. the argument was never settled because the next time the program was run—without any changes—it ran correctly. no error was found in either the pro-gram or the hardware. possibly this was an example of a “soft” error.

according to wikipedia, a soft error is data that’s wrong, but not because of a programming mistake or hardware failure. after a soft error, there’s no implication that the system is any less reliable than before. if the data is rewritten, the circuit will work perfectly again. soft errors involve changes to data—the electrons in a storage circuit, for example—but not changes to the physical circuit itself, the atoms. a bit is flipped and the cause of the flip vanishes. the error is ephemeral.

soft errors are thought to be caused by an alpha particle pass-ing through exactly the right part of the circuit at exactly the right time. soft errors used to be extremely rare events. but now that terabytes of data are accessed and feature sizes on chips are well below 100 nanometers, soft errors seem to be cropping up. they’re certainly being offered more and more as a reason for program execution failure.

i t seems to me that this is an excellent time for research in theory and algorithms for detection and correction of soft er-

rors. i suspect this will be much harder than the classical theory. let’s call it the hard science of soft errors.

yesterday upon the stair

i saw a man who wasn’t there

he wasn’t there again today

oh how i wish he’d go away.

this verse, “antigonish,” was written by william hughes mearns in 1899.

i’m grateful to francis sullivan for his interesting comments and insight on this subject.

EphEmEraBy Isabel Beichl, Editor in Chief

selected articles and columns from ieee Computer society publications are also available for free at http://Computingnow.

computer.org.

Get involvedCiSE is always looking for interesting articles and insight-ful peer reviewers. For submission guidelines, editorial calendar, and other information, visit www.computer.org/cise.

Recommended