https://buttersideup.com/mediawiki/api.php?action=feedcontributions&user=TimSmall&feedformat=atomEdacWiki - User contributions [en-gb]2024-03-28T16:31:16ZUser contributionsMediaWiki 1.35.13https://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingMemoryErrors&diff=159WhyAmIgettingMemoryErrors2010-11-09T11:00:28Z<p>TimSmall: /* Symptoms */</p>
<hr />
<div>To help understand why you are seeing memory errors, please have a look at [[HowMemoryEdacHardwareWorks]].<br />
<br />
You may well have been experiencing these errors for a while, it's just that nothing was checking them until you enabled the EDAC module. Note that your system is probably experiencing data corruption (if you are getting UEs - uncorrectable errors), so you should really check this out (this is why EDAC is set to `panic()` on UEs by default).<br />
<br />
The reason that you are seeing problems is very likely to be one of:<br />
<br />
* Your RAM is bad.<br />
* Your Motherboard is bad.<br />
* Your CPU is bad (for CPUs which have the memory controller built into the CPU core, such at the AMD Opteron/Athlon-64).<br />
* The connection between your motherboard and your CPU, or memory module is bad.<br />
* Some of your hardware is being operated outside of its design specification, such as:<br />
** Things are being run too hot.<br />
** Timings are being violated (e.g. running memory too fast, or bad DRAM clock generation).<br />
** Supply voltages to the critical compontents are too high/low (this may even happend very briefly, as a supply "spike", or "droop").<br />
* You have seen one or more "Single Event Upsets" - see [[SoftErrors]].<br />
* Memory ECC check bits are not properly initialised by BIOS prior to Linux boot.<br />
* The EDAC module is buggy.<br />
* Memory loading is exceeded.<br />
* The powersupply is insufficient.<br />
<br />
== So Which One Is It Then? ==<br />
<br />
Good question. Time to try some things:<br />
<br />
=== Symptoms ===<br />
<br />
Here are the most likely symptoms.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| <b>Problem</b> <br />
| <b>Error Addresses</b><br />
| <b>Error Slot or Row</b> <br />
| <b>Error Frequency</b> <br />
|-<br />
| Bad Memory Module(s) <br />
| Single/Few <br />
| Probably only 1 <br />
| May vary if part is marginally out of spec <br />
|-<br />
| Bad Motherboard <br />
| Probably many <br />
| Maybe 1, maybe many <br />
| ? <br />
|-<br />
| Bad Connection <br />
| Probably many <br />
| 1 (bad mem), prob all (bad CPU)<br />
| ? <br />
|-<br />
| Temp out of spec <br />
| Probably few <br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher, with higher temp <br />
|-<br />
| Timings out of spec <br />
| Probably few <br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher, with higher temp <br />
|-<br />
| Voltages out of spec <br />
| Probably random<br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher at higher system load<br />
|-<br />
| Bad BIOS check bit init <br />
| Probably random<br />
| Probably all <br />
| High/very high (stops after a while for systems with background scrub)<br />
|-<br />
| Single event upsets <br />
| Random <br />
| Varies with effective "cross-section" of part<br />
| Rare - more common with some part designs, and at high altitude etc.<br />
|}<br />
<br />
=== Things to try to isolate the problem ===<br />
<br />
General:<br />
<br />
* Get a second opinion e.g. from [[http://www.memtest.org/]] or [[http://www.memtest86.com/]] - note that you should be sure that either:<br />
** The memory testing software knows how to disable ECC on your system, or<br />
** You have disabled ECC before running memory tester (note that memtest86 currently displays "ECC: No" on chipsets which have ECC, but which it doesn't know about!).<br />
* This may not catch problems like power-supply related problems, which don't occur when the memory tester is running.<br />
* Use a system stress tester such as "burnbx" from [[http://pages.sbcglobal.net/redelm/]].<br />
* Put your system under stress by (e.g.) running a parallelised Linux kernel build, whilst doing some heavy 3D graphics display, and a lot of disk I/O.<br />
<br />
Suspected bad module:<br />
<br />
* Remove Module.<br />
* Move Module to different slot (do errors move with module).<br />
* Move Module to different machine.<br />
* See "suspected temp out of spec".<br />
* See "suspected timings out of spec".<br />
* See "voltages out of spec".<br />
* Clean connections.<br />
* Check Memory Loading<br />
** Some memory controllers can only support so many 'ranks' of memory at a given speed. <br />
For example, Opterons/Athlon64s can support only 4 ranks of 2 GB at PC3200. <br />
See http://www.valueram.com/memoryranks/default.asp for definitions.<br />
<br />
Suspected bad motherboard:<br />
<br />
* Check motherboard docs for memory module compatability.<br />
* Move modules to different slots.<br />
* Clean connections.<br />
* Upgrade BIOS.<br />
* Select BIOS "fail-safe defaults", or equivalent change settings from there to isolate cause.<br />
<br />
Suspected bad connection:<br />
<br />
* Visually check connectors, pins, modules etc.<br />
* [[HowToCleanEdgeConnectors]].<br />
<br />
Suspected temp out of spec:<br />
<br />
* Measure temp, compare to published specs:<br />
** Use internal machine sensors (motherboard, hard drive etc.) if possible.<br />
** Use a temperature probe or infra-red thermometer.<br />
* Check airflow.<br />
* De-dust.<br />
* Lower temp:<br />
** Lower room temp.<br />
** Increase cooling.<br />
** Improve airflow (tidy cables etc.).<br />
<br />
Suspected timings out of spec:<br />
<br />
* Try different BIOS version.<br />
* Set pessimistic memory timings in BIOS.<br />
* Compare memory controller timings to DIMM specs, using decode-dimms.pl from the Linux i2c project.<br />
* Try disabling "spread spectrum" in the BIOS (easy if available), or by using an i2c driver for your board's clock generator (hard).<br />
<br />
Suspected voltage out of spec:<br />
<br />
* Check PSU specs vs. total demand of system components.<br />
* Swap power supply with another machine.<br />
* Fit voltage regulator/spike suppressor to machine power supply.<br />
<br />
Suspected single event upsets:<br />
<br />
* Fit less susceptible components<br />
* Move to a lower altitude, or area with lower cosmic radiation.<br />
* Move your data centre underground.<br />
* Improve error-reporting utilities to ignore them.<br />
<br />
Suspected bad check-bit init:<br />
<br />
* Upgrade BIOS.<br />
* Don't enable BIOS "quick boot".<br />
* Don't manually skip BIOS memory check.<br />
<br />
Suspected insufficient powersupply:<br />
<br />
* Try detaching some devices that are hardly use. Start with USB devices.<br />
* If the problems stop, either structurally reduce the devices, or get a higher capacity powersupply.<br />
* Use a DC current clamp (pref one with peak/inrush measurement function) to check over-capacity at a particular voltage.<br />
* This is closely related to voltage out of spec. That can also be caused by just a broken supply.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingMemoryErrors&diff=158WhyAmIgettingMemoryErrors2010-11-09T10:59:01Z<p>TimSmall: /* Things to try to isolate the problem */</p>
<hr />
<div>To help understand why you are seeing memory errors, please have a look at [[HowMemoryEdacHardwareWorks]].<br />
<br />
You may well have been experiencing these errors for a while, it's just that nothing was checking them until you enabled the EDAC module. Note that your system is probably experiencing data corruption (if you are getting UEs - uncorrectable errors), so you should really check this out (this is why EDAC is set to `panic()` on UEs by default).<br />
<br />
The reason that you are seeing problems is very likely to be one of:<br />
<br />
* Your RAM is bad.<br />
* Your Motherboard is bad.<br />
* Your CPU is bad (for CPUs which have the memory controller built into the CPU core, such at the AMD Opteron/Athlon-64).<br />
* The connection between your motherboard and your CPU, or memory module is bad.<br />
* Some of your hardware is being operated outside of its design specification, such as:<br />
** Things are being run too hot.<br />
** Timings are being violated (e.g. running memory too fast, or bad DRAM clock generation).<br />
** Supply voltages to the critical compontents are too high/low (this may even happend very briefly, as a supply "spike", or "droop").<br />
* You have seen one or more "Single Event Upsets" - see [[SoftErrors]].<br />
* Memory ECC check bits are not properly initialised by BIOS prior to Linux boot.<br />
* The EDAC module is buggy.<br />
* Memory loading is exceeded.<br />
* The powersupply is insufficient.<br />
<br />
== So Which One Is It Then? ==<br />
<br />
Good question. Time to try some things:<br />
<br />
=== Symptoms ===<br />
<br />
Here are the most likely symptoms.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| <b>Problem</b> <br />
| <b>Error Addresses</b><br />
| <b>Error Slot or Row</b> <br />
| <b>Error Frequency</b> <br />
|-<br />
| Bad Memory Module(s) <br />
| Single/Few <br />
| Probably only 1 <br />
| May vary if part is marginally out of spec <br />
|-<br />
| Bad Motherboard <br />
| Probably many <br />
| Maybe 1, maybe many <br />
| ? <br />
|-<br />
| Bad Connection <br />
| Probably many <br />
| 1 (bad mem), prob all (bad CPU)<br />
| ? <br />
|-<br />
| Temp out of spec <br />
| Probably few <br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher, with higher temp <br />
|-<br />
| Timings out of spec <br />
| Probably few <br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher, with higher temp <br />
|-<br />
| Voltages out of spec <br />
| Probably random<br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher at higher system load<br />
|-<br />
| Bad BIOS check bit init <br />
| Probably random<br />
| Probably all <br />
| High/very high (stops after a while for systems with background scrub)<br />
|-<br />
| Single event upsets <br />
| Random <br />
| Varies with effective "cross-section" of part<br />
| Rare - more common with some parts, and at high altitude etc.<br />
|}<br />
<br />
=== Things to try to isolate the problem ===<br />
<br />
General:<br />
<br />
* Get a second opinion e.g. from [[http://www.memtest.org/]] or [[http://www.memtest86.com/]] - note that you should be sure that either:<br />
** The memory testing software knows how to disable ECC on your system, or<br />
** You have disabled ECC before running memory tester (note that memtest86 currently displays "ECC: No" on chipsets which have ECC, but which it doesn't know about!).<br />
* This may not catch problems like power-supply related problems, which don't occur when the memory tester is running.<br />
* Use a system stress tester such as "burnbx" from [[http://pages.sbcglobal.net/redelm/]].<br />
* Put your system under stress by (e.g.) running a parallelised Linux kernel build, whilst doing some heavy 3D graphics display, and a lot of disk I/O.<br />
<br />
Suspected bad module:<br />
<br />
* Remove Module.<br />
* Move Module to different slot (do errors move with module).<br />
* Move Module to different machine.<br />
* See "suspected temp out of spec".<br />
* See "suspected timings out of spec".<br />
* See "voltages out of spec".<br />
* Clean connections.<br />
* Check Memory Loading<br />
** Some memory controllers can only support so many 'ranks' of memory at a given speed. <br />
For example, Opterons/Athlon64s can support only 4 ranks of 2 GB at PC3200. <br />
See http://www.valueram.com/memoryranks/default.asp for definitions.<br />
<br />
Suspected bad motherboard:<br />
<br />
* Check motherboard docs for memory module compatability.<br />
* Move modules to different slots.<br />
* Clean connections.<br />
* Upgrade BIOS.<br />
* Select BIOS "fail-safe defaults", or equivalent change settings from there to isolate cause.<br />
<br />
Suspected bad connection:<br />
<br />
* Visually check connectors, pins, modules etc.<br />
* [[HowToCleanEdgeConnectors]].<br />
<br />
Suspected temp out of spec:<br />
<br />
* Measure temp, compare to published specs:<br />
** Use internal machine sensors (motherboard, hard drive etc.) if possible.<br />
** Use a temperature probe or infra-red thermometer.<br />
* Check airflow.<br />
* De-dust.<br />
* Lower temp:<br />
** Lower room temp.<br />
** Increase cooling.<br />
** Improve airflow (tidy cables etc.).<br />
<br />
Suspected timings out of spec:<br />
<br />
* Try different BIOS version.<br />
* Set pessimistic memory timings in BIOS.<br />
* Compare memory controller timings to DIMM specs, using decode-dimms.pl from the Linux i2c project.<br />
* Try disabling "spread spectrum" in the BIOS (easy if available), or by using an i2c driver for your board's clock generator (hard).<br />
<br />
Suspected voltage out of spec:<br />
<br />
* Check PSU specs vs. total demand of system components.<br />
* Swap power supply with another machine.<br />
* Fit voltage regulator/spike suppressor to machine power supply.<br />
<br />
Suspected single event upsets:<br />
<br />
* Fit less susceptible components<br />
* Move to a lower altitude, or area with lower cosmic radiation.<br />
* Move your data centre underground.<br />
* Improve error-reporting utilities to ignore them.<br />
<br />
Suspected bad check-bit init:<br />
<br />
* Upgrade BIOS.<br />
* Don't enable BIOS "quick boot".<br />
* Don't manually skip BIOS memory check.<br />
<br />
Suspected insufficient powersupply:<br />
<br />
* Try detaching some devices that are hardly use. Start with USB devices.<br />
* If the problems stop, either structurally reduce the devices, or get a higher capacity powersupply.<br />
* Use a DC current clamp (pref one with peak/inrush measurement function) to check over-capacity at a particular voltage.<br />
* This is closely related to voltage out of spec. That can also be caused by just a broken supply.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingMemoryErrors&diff=157WhyAmIgettingMemoryErrors2010-11-09T10:56:08Z<p>TimSmall: /* Symptoms */</p>
<hr />
<div>To help understand why you are seeing memory errors, please have a look at [[HowMemoryEdacHardwareWorks]].<br />
<br />
You may well have been experiencing these errors for a while, it's just that nothing was checking them until you enabled the EDAC module. Note that your system is probably experiencing data corruption (if you are getting UEs - uncorrectable errors), so you should really check this out (this is why EDAC is set to `panic()` on UEs by default).<br />
<br />
The reason that you are seeing problems is very likely to be one of:<br />
<br />
* Your RAM is bad.<br />
* Your Motherboard is bad.<br />
* Your CPU is bad (for CPUs which have the memory controller built into the CPU core, such at the AMD Opteron/Athlon-64).<br />
* The connection between your motherboard and your CPU, or memory module is bad.<br />
* Some of your hardware is being operated outside of its design specification, such as:<br />
** Things are being run too hot.<br />
** Timings are being violated (e.g. running memory too fast, or bad DRAM clock generation).<br />
** Supply voltages to the critical compontents are too high/low (this may even happend very briefly, as a supply "spike", or "droop").<br />
* You have seen one or more "Single Event Upsets" - see [[SoftErrors]].<br />
* Memory ECC check bits are not properly initialised by BIOS prior to Linux boot.<br />
* The EDAC module is buggy.<br />
* Memory loading is exceeded.<br />
* The powersupply is insufficient.<br />
<br />
== So Which One Is It Then? ==<br />
<br />
Good question. Time to try some things:<br />
<br />
=== Symptoms ===<br />
<br />
Here are the most likely symptoms.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| <b>Problem</b> <br />
| <b>Error Addresses</b><br />
| <b>Error Slot or Row</b> <br />
| <b>Error Frequency</b> <br />
|-<br />
| Bad Memory Module(s) <br />
| Single/Few <br />
| Probably only 1 <br />
| May vary if part is marginally out of spec <br />
|-<br />
| Bad Motherboard <br />
| Probably many <br />
| Maybe 1, maybe many <br />
| ? <br />
|-<br />
| Bad Connection <br />
| Probably many <br />
| 1 (bad mem), prob all (bad CPU)<br />
| ? <br />
|-<br />
| Temp out of spec <br />
| Probably few <br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher, with higher temp <br />
|-<br />
| Timings out of spec <br />
| Probably few <br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher, with higher temp <br />
|-<br />
| Voltages out of spec <br />
| Probably random<br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher at higher system load<br />
|-<br />
| Bad BIOS check bit init <br />
| Probably random<br />
| Probably all <br />
| High/very high (stops after a while for systems with background scrub)<br />
|-<br />
| Single event upsets <br />
| Random <br />
| Varies with effective "cross-section" of part<br />
| Rare - more common with some parts, and at high altitude etc.<br />
|}<br />
<br />
=== Things to try to isolate the problem ===<br />
<br />
General:<br />
<br />
* Get a second opinion e.g. from [[http://www.memtest.org/]] or [[http://www.memtest86.com/]] - note that you should be sure that either:<br />
** The memory testing software knows how to disable ECC on your system, or<br />
** You have disabled ECC before running memory tester (note that memtest86 currently displays "ECC: No" on chipsets which have ECC, but which it doesn't know about!).<br />
* This may not catch problems like power-supply related problems, which don't occur when the memory tester is running.<br />
* Use a system stress tester such as "burnbx" from [[http://pages.sbcglobal.net/redelm/]].<br />
* Put your system under stress by (e.g.) running a parallelised Linux kernel build, whilst doing some heavy 3D graphics display, and a lot of disk I/O.<br />
<br />
Suspected bad module:<br />
<br />
* Remove Module.<br />
* Move Module to different slot (do errors move with module).<br />
* Move Module to different machine.<br />
* See "suspected temp out of spec".<br />
* See "suspected timings out of spec".<br />
* See "voltages out of spec".<br />
* Clean connections.<br />
* Check Memory Loading<br />
** Some memory controllers can only support so many 'ranks' of memory at a given speed. <br />
For example, Opterons/Athlon64s can support only 4 ranks of 2 GB at PC3200. <br />
See http://www.valueram.com/memoryranks/default.asp for definitions.<br />
<br />
Suspected bad motherboard:<br />
<br />
* Check motherboard docs for memory module compatability.<br />
* Move modules to different slots.<br />
* Clean connections.<br />
* Upgrade BIOS.<br />
* Select BIOS "fail-safe defaults", or equivalent change settings from there to isolate cause.<br />
<br />
Suspected bad connection:<br />
<br />
* Visually check connectors, pins, modules etc.<br />
* [[HowToCleanEdgeConnectors]].<br />
<br />
Suspected temp out of spec:<br />
<br />
* Measure temp, compare to published specs:<br />
** Use internal machine sensors (motherboard, hard drive etc.) if possible.<br />
** Use a temperature probe.<br />
* Check airflow.<br />
* De-dust.<br />
* Lower temp:<br />
** Lower room temp.<br />
** Increase cooling.<br />
** Improve airflow (tidy cables etc.).<br />
<br />
Suspected timings out of spec:<br />
<br />
* Try different BIOS version.<br />
* Set pessimistic memory timings in BIOS.<br />
* Compare memory controller timings to DIMM specs, using decode-dimms.pl from the Linux i2c project.<br />
* Try disabling "spread spectrum" in the BIOS (easy), or by using an i2c driver for your board's clock generator (hard).<br />
<br />
Suspected voltage out of spec:<br />
<br />
* Check PSU specs vs. total demand of system components.<br />
* Swap power supply with another machine.<br />
* Fit voltage regulator/spike surpressor to machine power supply.<br />
<br />
Suspected single event upsets:<br />
<br />
* Fit less susceptible components<br />
* Move to a lower altitude, or area with lower cosmic radiation.<br />
* Move your data centre underground.<br />
* Improve error-reporting utilities to ignore them.<br />
<br />
Suspected bad check-bit init:<br />
<br />
* Upgrade BIOS.<br />
* Don't enable BIOS "quick boot".<br />
* Don't manually skip BIOS memory check.<br />
<br />
Suspected insufficient powersupply:<br />
<br />
* Try detaching some devices that are hardly use. Start with USB devices.<br />
* If the problems stop, either structurally reduce the devices, or get a beefier powersupply.<br />
* This is closely related to voltage out of spec. That can also be caused by just a broken supply.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=EdacWiki:About&diff=156EdacWiki:About2010-11-08T20:58:26Z<p>TimSmall: Created page with "If you find problems with this site, please email me at [mailto:tim@buttersideup.com]."</p>
<hr />
<div>If you find problems with this site, please email me at [mailto:tim@buttersideup.com].</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=155Main Page2010-11-08T20:53:23Z<p>TimSmall: /* Other Resources */</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the [http://bluesmoke.sourceforge.net/ Linux EDAC project]<br />
<br />
== What is it? ==<br />
<br />
[http://en.wikipedia.org/wiki/Error_detection_and_correction EDAC] Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* [http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction System RAM errors] (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* [http://en.wikipedia.org/wiki/Memory_scrubbing RAM scrubbing] - some memory controllers support "scrubbing" DRAM during normal operation. Continuously scrubbing DRAM allows for actively detecting and correcting ECC errors.<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
* Cache ECC errors<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/edac.txt Documentation/edac.txt ].<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives (for both the [http://marc.info/?l=linux-edac current] and the [http://sourceforge.net/mail/?group_id=93775 previous] mailing lists) for your problem first.<br />
<br />
If you have exhausted these possibilities, then by all means post to [http://vger.kernel.org/vger-lists.html#linux-edac the mailing list]...<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. There is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version than the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMCC <br />
| 4xx <br />
| [[ppc4xx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.30)<br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 8111 <br />
| [[amd8111_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.30)<br />
| <br />
|-<br />
| Freescale <br />
| MPC83xx <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.32)<br />
| <br />
|-<br />
| Freescale <br />
| MPC85xx <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Freescale <br />
| P2020 <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.32)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| i3100 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.26)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|-<br />
| Intel<br />
| 3000/3010<br />
| [[i3000_edac.c]]<br />
| [http://www.intel.com/products/server/chipsets/3010/3010-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Intel<br />
| X38<br />
| [[x38_edac.c]]<br />
| [http://www.intel.com/Products/Desktop/Chipsets/X38/X38-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.28)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
Mailing list [http://vger.kernel.org/vger-lists.html#linux-edac]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top of the page<br />
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=134Main Page2009-01-16T18:04:26Z<p>TimSmall: Link to main project page, and wikipedia</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the [http://bluesmoke.sourceforge.net/ Linux EDAC project]<br />
<br />
== What is it? ==<br />
<br />
[http://en.wikipedia.org/wiki/Error_detection_and_correction EDAC] Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top of the page<br />
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=107Main Page2008-04-22T08:21:16Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=105Main Page2007-06-23T11:49:07Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=104Main Page2007-06-22T15:29:16Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=103Main Page2007-06-21T12:56:37Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=102Main Page2007-06-21T12:48:56Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ FIXME<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=101Main Page2007-06-19T15:54:46Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ FIXME<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=100Main Page2007-06-19T15:49:45Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=98Main Page2007-06-19T12:47:01Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=93Main Page2007-05-30T07:53:42Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=90Main Page2007-05-26T17:23:30Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=88Main Page2007-05-09T12:20:00Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=87Main Page2007-05-09T10:14:43Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=85Main Page2007-05-09T08:35:37Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=83Main Page2007-05-05T21:31:05Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=81Main Page2007-05-05T12:51:40Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=79Main Page2007-05-04T08:48:13Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=77Main Page2007-04-11T10:10:02Z<p>TimSmall: Add Intel 5000 support entry to table.</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=76Main Page2007-04-11T09:22:21Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in CVS<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=75Main Page2007-03-31T13:05:38Z<p>TimSmall: Add CVS web link.</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the CVS at:<br />
<br />
[http://bluesmoke.cvs.sourceforge.net/bluesmoke/bluesmoke/edac/patches/?pathrev=edac]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=31542754&forum_id=43090 Beta driver (see mailing list)][http://sourceforge.net/mailarchive/forum.php?thread_id=31727532&forum_id=43090 GX update]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=74Main Page2007-03-31T13:03:48Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous CVS checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ cvs -d:pserver:anonymous@bluesmoke cvs.sf.net:/cvsroot/bluesmoke login<br />
$ cvs -z3 -d:pserver:anonymous@bluesmoke.cvs.sf.net:/cvsroot/bluesmoke co -r edac bluesmoke<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
You will need a recent Linux kernel tree to apply the patches to.<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=31542754&forum_id=43090 Beta driver (see mailing list)][http://sourceforge.net/mailarchive/forum.php?thread_id=31727532&forum_id=43090 GX update]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=73Main Page2007-02-27T14:39:17Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=31542754&forum_id=43090 Beta driver (see mailing list)][http://sourceforge.net/mailarchive/forum.php?thread_id=31727532&forum_id=43090 GX update]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=72Main Page2007-02-27T14:29:53Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=9944021&amp;amp;forum_id=43090 Beta driver (see mailing list)]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=71Main Page2007-02-27T11:52:47Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=9944021&amp;amp;forum_id=43090 Beta driver (see mailing list)]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=70Main Page2007-02-27T11:24:31Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=9944021&amp;amp;forum_id=43090 Beta driver (see mailing list)]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/]<br />
| [[EDAC]] <br />
| Author Needed ;o)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=69Main Page2007-02-27T11:23:16Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=9944021&amp;amp;forum_id=43090 Beta driver (see mailing list)]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694<br />
| <br />
| [http://buttersideup.com/files/via133docs/]<br />
| [[EDAC]] <br />
| Author Needed ;o)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=63Main Page2006-08-16T19:47:07Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=9944021&forum_id=43090 Alpha driver (see mailing list)]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694<br />
| <br />
| Please email if you have it!<br />
| [[EDAC]] <br />
| Datasheet Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=R82600_edac.c&diff=30R82600 edac.c2006-08-16T19:36:09Z<p>TimSmall: </p>
<hr />
<div>= Radisys 82600 Memory Controller EDAC Driver =<br />
<br />
The r82600 memory controller EDAC driver supports the following hardware:<br />
<br />
* Radisys 82600 (all known revisions)<br />
<br />
== Memory Controller Capabilities ==<br />
<br />
== Products Which Use the Radisys 82600 Memory Controller ==<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Product Name <br />
| Product Model<br />
| Version<br />
| Tested by <br />
| <br />
|-<br />
| Radisys <br />
| EPC-6315 <br />
| EPC-6315 <br />
| <br />
| tim@buttersideup.com<br />
| <br />
|}<br />
<br />
== Known Issues and Workarounds ==<br />
<br />
The module has only been tested on boards which have a single (soldered down) bank of 256M of ECC PC133 SDRAM<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=HardwareWanted&diff=15HardwareWanted2006-08-16T19:33:36Z<p>TimSmall: </p>
<hr />
<div>= Hardware Wanted =<br />
<br />
The nature of the EDAC project is such that if developers don't have access to the necessary hardware, development is next to impossible, and maintenance is hard. Broken hardware is often very useful (particularly DIMMs).<br />
<br />
= How to Donate =<br />
<br />
== General Hardware ==<br />
<br />
If you have hardware which is not listed here, but which you think could be useful to the project, please [http://sourceforge.net/mail/?group_id=93775 contact the developer mailing list]<br />
<br />
== Specific Hardware Requests ==<br />
<br />
If you have any pieces of hardware around which you are willing to donate, then please contact the developers below, and CC the developer mailing list.<br />
<br />
PC100 ECC DIMMs (any size) - [mailto:tim@buttersideup.com]<br />
<br />
PC133 ECC DIMMs (any size) - [mailto:tim@buttersideup.com]<br />
<br />
Intel 440GX based motherboard - [mailto:tim@buttersideup.com]<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=62Main Page2006-08-16T18:50:35Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=9944021&forum_id=43090 Alpha driver (see mailing list)]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=60Main Page2006-04-21T12:55:04Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| [http://sourceforge.net/mailarchive/forum.php?thread_id=9944021&forum_id=43090 Alpha driver (see mailing list)]<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=59Main Page2006-03-24T14:35:37Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=58Main Page2006-03-24T14:31:53Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it (the memory may later fail completely, and you won't know anything until your systems recieves an NMI and/or crashes), with EDAC, you get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data - including memory modules which are bad as-shipped, before such systems are put into service.<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=57Main Page2006-03-24T14:27:21Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. corrupted network, or storge I/O), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it (the memory may later fail completely, and you won't know anything until your systems recieves an NMI and/or crashes).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=56Main Page2006-03-24T14:23:40Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. corrupted network, or storge I/O), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it (the memory may later fail completely, and you won't know anything until your systems recieves an NMI and/or crashes).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17, please contribute to this effort, and help develop the necessary userspace tools (see below).<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| EDAC <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| EDAC, Error Scrub<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| EDAC, Error Scrub<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingPciErrors&diff=26WhyAmIgettingPciErrors2006-03-23T12:04:26Z<p>TimSmall: </p>
<hr />
<div>You probably have been for ages, it's just that nothing was checking them until you enabled the EDAC module. Note that your system may be experiencing data corruption, so you should really check this out.<br />
<br />
* Your PCI device is broken by design, and reports parity errors, when none occur (if you get pretty sure that this is the case, then please add it to the list of [[PCIDevicesWithBrokenParityDetection]])<br />
* Your PCI device is faulty (Please add it below)<br />
* Your Motherboard is faulty (try moving device to another PCI slot)<br />
* Bad connection caused by dirty connectors - see [[HowToCleanEdgeConnectors]]<br />
* Your power supply is faulty/underspeced<br />
* The electrical supply is faulty (e.g. transiant spikes / droops)<br />
* Other electrical noise (either from inside, or outside the system) is causing the problems<br />
<br />
Please expand on this and add links!<br />
<br />
== Devices which are known to cause Genuine PCI Parity Errors ==<br />
<br />
* [[SuperMicro]] H8DAR-E motherboards - see mailing list archives (FIXME - which revisions?)<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingPciErrors&diff=25WhyAmIgettingPciErrors2006-03-23T11:12:27Z<p>TimSmall: </p>
<hr />
<div>You probably have been for ages, it's just that nothing was checking them until you enabled the EDAC module. Note that your system may be experiencing data corruption, so you should really check this out.<br />
<br />
* Your PCI device is broken by design, and reports parity errors, when none occur (if you get pretty sure that this is the case, then please add it to the list of PCIDevicesWithBrokenParityDetection)<br />
* Your PCI device is faulty (Please add it below)<br />
* Your Motherboard is faulty (try moving device to another PCI slot)<br />
* Bad connection caused by dirty connectors - see [[HowToCleanEdgeConnectors]]<br />
* Your power supply is faulty/underspeced<br />
* The electrical supply is faulty (e.g. transiant spikes / droops)<br />
* Other electrical noise (either from inside, or outside the system) is causing the problems<br />
<br />
Please expand on this and add links!<br />
<br />
== Devices which are known to cause Genuine PCI Parity Errors ==<br />
<br />
* [[SuperMicro]] H8DAR-E motherboards - see mailing list archives (FIXME - which revisions?)<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingPciErrors&diff=24WhyAmIgettingPciErrors2006-03-23T11:11:58Z<p>TimSmall: </p>
<hr />
<div>You probably have been for ages, it's just that nothing was checking them until you enabled the EDAC module. Note that your system may be experiencing data corruption, so you should really check this out.<br />
<br />
* Your PCI device is broken by design, and reports parity errors, when none occur (if you get pretty sure that this is the case, then please add it to the list of PCIDevicesWithBrokenParityDetection)<br />
* Your PCI device is faulty (Please add it below)<br />
* Your Motherboard is faulty (try moving device to another PCI slot)<br />
* Bad connection caused by dirty connectors - see [[HowToCleanEdgeConnectors]]<br />
* Your power supply is faulty/underspeced<br />
* The electrical supply is faulty (e.g. transiant spikes / droops)<br />
* Other electrical noise (either from inside, or outside the system) is causing the problems<br />
<br />
Please expand on this and add links!<br />
<br />
== Devices which are known to cause <b>Genuine</b> PCI Parity Errors ==<br />
<br />
* [[SuperMicro]] H8DAR-E motherboards - see mailing list archives (FIXME - which revisions?)<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingPciErrors&diff=23WhyAmIgettingPciErrors2006-03-23T11:02:20Z<p>TimSmall: </p>
<hr />
<div>You probably have been for ages, it's just that nothing was checking them until you enabled the EDAC module. Note that your system may be experiencing data corruption, so you should really check this out.<br />
<br />
* Your PCI device is broken by design, and reports parity errors, when none occur (if you get pretty sure that this is the case, then please add it to the list of PCIDevicesWithBrokenParityDetection)<br />
* Your PCI device is faulty (Please add it below)<br />
* Your Motherboard is faulty (try moving device to another PCI slot)<br />
* Bad connection caused by dirty connectors - see [[HowToCleanEdgeConnectors]]<br />
* Your power supply is faulty/underspeced<br />
* The electrical supply is faulty (e.g. transiant spikes / droops)<br />
* Other electrical noise (either from inside, or outside the system) is causing the problems<br />
<br />
Please expand on this and add links!<br />
<br />
== Devices which are known to cause <b>Genuine</b> PCI Parity Errors ==<br />
<br />
* [[SuperMicro]] H8DAR-E motherboards (FIXME - which revisions?)<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=55Main Page2006-03-23T10:57:39Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Help! ==<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is expected to be in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17.<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| EDAC <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| EDAC, Error Scrub<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| EDAC, Error Scrub<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=SoftErrors&diff=18SoftErrors2006-03-23T10:56:14Z<p>TimSmall: </p>
<hr />
<div>Soft Errors ("Single Event Upsets") are caused by radiation.<br />
<br />
A Single Event Upset, basically means that a high energy particle (e.g. from a cosmic "particle shower") arrives in a semiconductor device (e.g. a DRAM part), and redistributes, or dumps enough charge there to change the logic state of information stored in some part of the device (i.e. flips a bit).<br />
<br />
This radiation can come from within, or close to the device itself (e.g. from Lead which is used in device manufacture, or soldering), or from distant sources, such as cosmic radiation.<br />
<br />
As of the time of writing, the vast majority of soft errors are believed to originate from cosmic radiation.<br />
<br />
The use of BPSG (Boro-Phospho-Silicate-Glass) [http://www.semiconfareast.com/dielectric.htm dielectric] layers in semiconductors can make a device many times more vulnerable to cosmic radiation and most manufacturers have switched to other pasivation layers as a result.<br />
<br />
There are a few references in this mailing list post:<br />
<br />
[http://sourceforge.net/mailarchive/message.php?msg_id=12124702]<br />
<br />
Note that recent manufacturer datasheets which I have seen do however quote error rates approx (from memory) two orders of magnitude <b>lower</b> than the levels predicted by most references in the above mailing list posting.<br />
<br />
If you have a large number of systems, then please share your experiences here and/or on the mailing list.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=54Main Page2006-03-23T10:08:12Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Help! ==<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. *Please* try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please *search* [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't know about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Status ==<br />
<br />
The EDAC code is expected to be in Linux Kernel version 2.6.16. The userspace API (via sysfs) is still a work in progress, and is not expected to firm-up until 2.6.17.<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| EDAC <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| EDAC, Error Scrub<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| EDAC, Error Scrub<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=52Main Page2006-03-14T14:40:43Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Help! ==<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. Please try and check out the possibilities listed before you post to the mailing list!<br />
<br />
If you think you've found a bug, please search [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
== Status ==<br />
<br />
The EDAC code is expected to be in Linux Kernel version 2.6.16<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| EDAC, Error Scrub<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| EDAC, Error Scrub<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=51Main Page2006-03-14T14:39:04Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Help! ==<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. Please try and check out the possibilities listed before you post to the mailing list!<br />
<br />
If you think you've found a bug, please search [[http://edacbugs.buttersideup.com/ the EDAC Bugzilla]] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
== Status ==<br />
<br />
The EDAC code is expected to be in Linux Kernel version 2.6.16<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [[http://www.intel.com/design/chipsets/440bx/ Intel]]<br />
| EDAC, Error Scrub<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]]<br />
| EDAC, Error Scrub<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmallhttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=50Main Page2006-03-14T14:38:34Z<p>TimSmall: </p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises of a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Help! ==<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. Please try and check out the possibilities listed before you post to the mailing list! If you think you've found a bug, please search [[http://edacbugs.buttersideup.com/ the EDAC Bugzilla]] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
== Status ==<br />
<br />
The EDAC code is expected to be in Linux Kernel version 2.6.16<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]]<br />
| EDAC, Error Scrub, Background Scrub <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [[http://www.intel.com/design/chipsets/440bx/ Intel]]<br />
| EDAC, Error Scrub<br />
| Alpha driver (see mailing list)<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]]<br />
| EDAC, Error Scrub<br />
| Supported (Linux 2.6.16)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted ==<br />
<br />
Please feel free to:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Related Articles ==<br />
<br />
Sourceforge web page - [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page <br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]]. <br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>TimSmall