https://buttersideup.com/mediawiki/api.php?action=feedcontributions&user=WikiSysop&feedformat=atomEdacWiki - User contributions [en-gb]2024-03-28T21:53:20ZUser contributionsMediaWiki 1.35.13https://buttersideup.com/mediawiki/index.php?title=MediaWiki:Common.css&diff=647MediaWiki:Common.css2018-04-09T19:31:32Z<p>WikiSysop: Created page with "/* CSS placed here will be applied to all skins */ #ca-nstab-main { display:none!important; } #ca-nstab-user { display:none!important; } #ca-talk { display:none!important; } #..."</p>
<hr />
<div>/* CSS placed here will be applied to all skins */<br />
#ca-nstab-main { display:none!important; }<br />
#ca-nstab-user { display:none!important; }<br />
#ca-talk { display:none!important; }<br />
#ca-view { display:none!important; }<br />
#ca-edit { display:none!important; }<br />
#ca-history { display:none!important; }<br />
#ca-watch { display:none!important; }<br />
#ca-unwatch { display:none!important; }<br />
#ca-delete { display:none!important; }<br />
#ca-move { display:none!important; }<br />
#ca-protect { display:none!important; }<br />
#ca-viewsource { display:none!important; }</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=EdacWiki:About&diff=646EdacWiki:About2014-01-07T11:45:48Z<p>WikiSysop: </p>
<hr />
<div>If you find problems with this site, please email me at tim at buttersideup dot com</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=EdacWiki:Terms_of_Service&diff=645EdacWiki:Terms of Service2014-01-07T11:39:06Z<p>WikiSysop: </p>
<hr />
<div>Don't spam or otherwise abuse this site. Really, don't. It'll make me grumpy and you wouldn't like me when I'm grumpy - I'll try my best to get any email addresses closed down, report you to your ISP - in fact anything I can to inconvenience you. So, if you were thinking of spamming this site, then please do yourself a favour, and just fuck off instead.<br />
<br />
<br />
Once you've created your request. '''You MUST email me''' at tim at seoss dot co dot uk '''and''' tim at buttersideup dot com and ask me to approve your request, giving the user name which you used. Otherwise I'll assume you're a spammer, and ignore it. Sorry for the hassle, but the ratio of real to spam requests on this site is worse than 1 to 1000...</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=EdacWiki:Terms_of_Service&diff=644EdacWiki:Terms of Service2014-01-07T11:37:23Z<p>WikiSysop: add emphasis for skimmers</p>
<hr />
<div>Don't spam or otherwise abuse this site. Really, don't. It'll make me grumpy and you wouldn't like me when I'm grumpy - I'll try my best to get any email addresses closed down, report you to your ISP - in fact anything I can to inconvenience you. Please don't do it.<br />
<br />
<br />
Once you've created your request. '''You MUST email me''' at tim at seoss dot co dot uk '''and''' tim at buttersideup dot com and ask me to approve your request, giving the user name which you used. Otherwise I'll assume you're a spammer, and ignore it. Sorry for the hassle, but the ratio of real to spam requests on this site is worse than 1 to 1000...</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=EdacWiki:Terms_of_Service&diff=643EdacWiki:Terms of Service2014-01-07T11:29:53Z<p>WikiSysop: </p>
<hr />
<div>Don't spam or otherwise abuse this site. Really, don't. It'll make me grumpy and you wouldn't like me when I'm grumpy - I'll try my best to get any email addresses closed down, report you to your ISP - in fact anything I can to inconvenience you. Please don't do it.<br />
<br />
<br />
Once you've created your request. You MUST email me at tim at seoss dot co dot uk and ask me to approve your request. Otherwise I'll assume you're a spammer, and ignore it. Sorry for the hassle, but the ratio of real to spam requests on this site is worse than 1 to 1000...</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=EdacWiki:Terms_of_Service&diff=642EdacWiki:Terms of Service2014-01-07T11:16:44Z<p>WikiSysop: Created page with "Don't spam or otherwise abuse this site. Really, don't. It'll make me grumpy and you wouldn't like me when I'm grumpy - I'll try my best to get any email addresses closed do..."</p>
<hr />
<div>Don't spam or otherwise abuse this site. Really, don't. It'll make me grumpy and you wouldn't like me when I'm grumpy - I'll try my best to get any email addresses closed down, report you to your ISP - in fact anything I can to inconvenience you. Please don't do it.</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=HowToCleanEdgeConnectors&diff=220HowToCleanEdgeConnectors2012-09-25T13:28:07Z<p>WikiSysop: Reverted edits by Janhouston (Talk) to last revision by WikiSysop</p>
<hr />
<div>* Observe anti-static precautions<br />
* Don't set fire to yourself<br />
* Don't blame me if you break stuff<br />
* Wear suitable gloves, avoid inhaling solvents, use appropriate ventilation etc.<br />
* Clean edge-connectors with a cotton-bud (cue-tip) dipped in isopropyl alcohol ("IPA"/Isopropanol)<br />
* Clean DIMM/PCI slots using a stiff piece of card (the same thickness at the DIMM/PCI PCB), wetted with isopropyl<br />
<br />
Isopropanol is available from many chemists/pharmacists.<br />
<br />
You can also buy commercial connector, and PCB cleaning solutions and devices.<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=185Main Page2011-11-16T12:36:50Z<p>WikiSysop: /* The EDAC Mailing List */</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the [http://bluesmoke.sourceforge.net/ Linux EDAC project]<br />
<br />
== What is it? ==<br />
<br />
[http://en.wikipedia.org/wiki/Error_detection_and_correction EDAC] Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* [http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction System RAM errors] (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* [http://en.wikipedia.org/wiki/Memory_scrubbing RAM scrubbing] - some memory controllers support "scrubbing" DRAM during normal operation. Continuously scrubbing DRAM allows for actively detecting and correcting ECC errors.<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
* Cache ECC errors<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/edac.txt Documentation/edac.txt ].<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives (for both the [http://marc.info/?l=linux-edac current] and the [http://sourceforge.net/mail/?group_id=93775 previous] mailing lists) for your problem first.<br />
<br />
If you have exhausted these possibilities, then by all means post to [http://vger.kernel.org/vger-lists.html#linux-edac the mailing list]...<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. There is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version than the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMCC <br />
| 4xx <br />
| [[ppc4xx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.30)<br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 8111 <br />
| [[amd8111_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.30)<br />
| <br />
|-<br />
| Freescale <br />
| MPC83xx <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.32)<br />
| <br />
|-<br />
| Freescale <br />
| MPC85xx <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Freescale <br />
| P2020 <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.32)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| i3100 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.26)<br />
| <br />
|-<br />
| Intel <br />
| i3200 and i3210<br />
| [[i3200_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.??)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|-<br />
| Intel<br />
| 3000/3010<br />
| [[i3000_edac.c]]<br />
| [http://www.intel.com/products/server/chipsets/3010/3010-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Intel<br />
| X38<br />
| [[x38_edac.c]]<br />
| [http://www.intel.com/Products/Desktop/Chipsets/X38/X38-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.28)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
Mailing list [http://vger.kernel.org/vger-lists.html#linux-edac]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top of the page<br />
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=184Main Page2011-11-16T12:35:38Z<p>WikiSysop: Reverted edits by Hokky (Talk) to last revision by Amigadave</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the [http://bluesmoke.sourceforge.net/ Linux EDAC project]<br />
<br />
== What is it? ==<br />
<br />
[http://en.wikipedia.org/wiki/Error_detection_and_correction EDAC] Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* [http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction System RAM errors] (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* [http://en.wikipedia.org/wiki/Memory_scrubbing RAM scrubbing] - some memory controllers support "scrubbing" DRAM during normal operation. Continuously scrubbing DRAM allows for actively detecting and correcting ECC errors.<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
* Cache ECC errors<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/edac.txt Documentation/edac.txt ].<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives (for both the [http://marc.info/?l=linux-edac current] and the [http://sourceforge.net/mail/?group_id=93775 previous] mailing lists) for your problem first.<br />
<br />
If you have exhausted these possibilities, then by all means post to [http://vger.kernel.org/vger-lists.html#linux-edac the mailing list]...<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
* Use [http://www.1331media.com flash gallery script]<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. There is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version than the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMCC <br />
| 4xx <br />
| [[ppc4xx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.30)<br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 8111 <br />
| [[amd8111_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.30)<br />
| <br />
|-<br />
| Freescale <br />
| MPC83xx <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.32)<br />
| <br />
|-<br />
| Freescale <br />
| MPC85xx <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Freescale <br />
| P2020 <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.32)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| i3100 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.26)<br />
| <br />
|-<br />
| Intel <br />
| i3200 and i3210<br />
| [[i3200_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.??)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|-<br />
| Intel<br />
| 3000/3010<br />
| [[i3000_edac.c]]<br />
| [http://www.intel.com/products/server/chipsets/3010/3010-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Intel<br />
| X38<br />
| [[x38_edac.c]]<br />
| [http://www.intel.com/Products/Desktop/Chipsets/X38/X38-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.28)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
Mailing list [http://vger.kernel.org/vger-lists.html#linux-edac]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top of the page<br />
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=HardwareWanted&diff=182HardwareWanted2011-11-01T09:20:27Z<p>WikiSysop: /* Specific Hardware Requests */</p>
<hr />
<div>= Hardware Wanted =<br />
<br />
The nature of the EDAC project is such that if developers don't have access to the necessary hardware, development is next to impossible, and maintenance is hard. Broken hardware is often very useful (particularly DIMMs).<br />
<br />
= How to Donate =<br />
<br />
== General Hardware ==<br />
<br />
If you have hardware which is not listed here, but which you think could be useful to the project, please [http://sourceforge.net/mail/?group_id=93775 contact the developer mailing list]<br />
<br />
== Specific Hardware Requests ==<br />
<br />
If you have any pieces of hardware around which you are willing to donate, then please contact the developers below, and CC the developer mailing list.<br />
<br />
Intel Blackford/Greencreek (5000P/V/X chipset) motherboard, to port bluesmoke MC driver<br />
to EDAC - [mailto:norsk5@xmission.com]<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=154Main Page2010-11-08T20:46:56Z<p>WikiSysop: /* The EDAC Mailing List */ Update mailing list info...</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the [http://bluesmoke.sourceforge.net/ Linux EDAC project]<br />
<br />
== What is it? ==<br />
<br />
[http://en.wikipedia.org/wiki/Error_detection_and_correction EDAC] Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* [http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction System RAM errors] (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* [http://en.wikipedia.org/wiki/Memory_scrubbing RAM scrubbing] - some memory controllers support "scrubbing" DRAM during normal operation. Continuously scrubbing DRAM allows for actively detecting and correcting ECC errors.<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
* Cache ECC errors<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/edac.txt Documentation/edac.txt ].<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives (for both the [http://marc.info/?l=linux-edac current] and the [http://sourceforge.net/mail/?group_id=93775 previous] mailing lists) for your problem first.<br />
<br />
If you have exhausted these possibilities, then by all means post to [http://vger.kernel.org/vger-lists.html#linux-edac the mailing list]...<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. There is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version than the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMCC <br />
| 4xx <br />
| [[ppc4xx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.30)<br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 8111 <br />
| [[amd8111_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.30)<br />
| <br />
|-<br />
| Freescale <br />
| MPC83xx <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.32)<br />
| <br />
|-<br />
| Freescale <br />
| MPC85xx <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Freescale <br />
| P2020 <br />
| [[mpc85xx_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.32)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| i3100 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.26)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|-<br />
| Intel<br />
| 3000/3010<br />
| [[i3000_edac.c]]<br />
| [http://www.intel.com/products/server/chipsets/3010/3010-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Intel<br />
| X38<br />
| [[x38_edac.c]]<br />
| [http://www.intel.com/Products/Desktop/Chipsets/X38/X38-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.28)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top of the page<br />
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=144Main Page2009-07-24T09:05:00Z<p>WikiSysop: Reverted edits by Harry23 (Talk); changed back to last version by 129.33.49.251</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the [http://bluesmoke.sourceforge.net/ Linux EDAC project]<br />
<br />
== What is it? ==<br />
<br />
[http://en.wikipedia.org/wiki/Error_detection_and_correction EDAC] Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. There is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version than the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[amd64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[am64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[am64_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|-<br />
| Intel<br />
| 3000/3010<br />
| [[i3000_edac.c]]<br />
| [http://www.intel.com/products/server/chipsets/3010/3010-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.25)<br />
| <br />
|-<br />
| Intel<br />
| X38<br />
| [[x38_edac.c]]<br />
| [http://www.intel.com/Products/Desktop/Chipsets/X38/X38-overview.htm Intel]<br />
| [[EDAC]]<br />
| Supported (Linux 2.6.28)<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top of the page<br />
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=HowToWriteNewMemoryControllerDrivers&diff=132HowToWriteNewMemoryControllerDrivers2009-01-16T14:09:29Z<p>WikiSysop: New page: * Take a look at the documentation under Documentation/edac.txt in the kernel.org tree. * Obtain the manufacturer's datasheet (and add to the main page of this wiki). * Take a look at the ...</p>
<hr />
<div>* Take a look at the documentation under Documentation/edac.txt in the kernel.org tree.<br />
* Obtain the manufacturer's datasheet (and add to the main page of this wiki).<br />
* Take a look at the sample code "test_device_edac", and try to find the existing driver which is most similar to the driver you intend to write.<br />
* Make sure you're not duplicating effort with someone else! Consult the mailing list!<br />
* Get writing!<br />
* Release early and often.</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=FrontPage&diff=131FrontPage2009-01-16T12:45:52Z<p>WikiSysop: FrontPage moved to Main Page: moinmoin import</p>
<hr />
<div>#REDIRECT [[Main Page]]</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=130Main Page2009-01-16T12:45:52Z<p>WikiSysop: FrontPage moved to Main Page: moinmoin import</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top of the page<br />
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=129Main Page2009-01-16T12:45:15Z<p>WikiSysop: /* How to use this site */</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top of the page<br />
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=HowToCleanEdgeConnectors&diff=128HowToCleanEdgeConnectors2009-01-16T12:42:13Z<p>WikiSysop: 3 revisions</p>
<hr />
<div>* Observe anti-static precautions<br />
* Don't set fire to yourself<br />
* Don't blame me if you break stuff<br />
* Wear suitable gloves, avoid inhaling solvents, use appropriate ventilation etc.<br />
* Clean edge-connectors with a cotton-bud (cue-tip) dipped in isopropyl alcohol ("IPA"/Isopropanol)<br />
* Clean DIMM/PCI slots using a stiff piece of card (the same thickness at the DIMM/PCI PCB), wetted with isopropyl<br />
<br />
Isopropanol is available from many chemists/pharmacists.<br />
<br />
You can also buy commercial connector, and PCB cleaning solutions and devices.<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingMemoryErrors&diff=124WhyAmIgettingMemoryErrors2009-01-16T12:42:13Z<p>WikiSysop: 15 revisions</p>
<hr />
<div>To help understand why you are seeing memory errors, please have a look at [[HowMemoryEdacHardwareWorks]].<br />
<br />
You may well have been experiencing these errors for a while, it's just that nothing was checking them until you enabled the EDAC module. Note that your system is probably experiencing data corruption (if you are getting UEs - uncorrectable errors), so you should really check this out (this is why EDAC is set to `panic()` on UEs by default).<br />
<br />
The reason that you are seeing problems is very likely to be one of:<br />
<br />
* Your RAM is bad.<br />
* Your Motherboard is bad.<br />
* Your CPU is bad (for CPUs which have the memory controller built into the CPU core, such at the AMD Opteron/Athlon-64).<br />
* The connection between your motherboard and your CPU, or memory module is bad.<br />
* Some of your hardware is being operated outside of its design specification, such as:<br />
** Things are being run too hot.<br />
** Timings are being violated (e.g. running memory too fast, or bad DRAM clock generation).<br />
** Supply voltages to the critical compontents are too high/low (this may even happend very briefly, as a supply "spike", or "droop").<br />
* You have seen one or more "Single Event Upsets" - see [[SoftErrors]].<br />
* Memory ECC check bits are not properly initialised by BIOS prior to Linux boot.<br />
* The EDAC module is buggy.<br />
* Memory loading is exceeded.<br />
* The powersupply is insufficient.<br />
<br />
== So Which One Is It Then? ==<br />
<br />
Good question. Time to try some things:<br />
<br />
=== Symptoms ===<br />
<br />
Here are the most likely symptoms.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| <b>Problem</b> <br />
| <b>Error Addresses</b><br />
| <b>Error Slot or Row</b> <br />
| <b>Error Frequency</b> <br />
| <br />
|-<br />
| Bad Memory Module(s) <br />
| Single/Few <br />
| Probably only 1 <br />
| May vary if part is marginally out of spec <br />
| <br />
|-<br />
| Bad Motherboard <br />
| Probably many <br />
| Maybe 1, maybe many <br />
| ? <br />
| <br />
|-<br />
| Bad Connection <br />
| Probably many <br />
| 1 (bad mem), prob all (bad CPU)<br />
| ? <br />
| <br />
|-<br />
| Temp out of spec <br />
| Probably few <br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher, with higher temp <br />
| <br />
|-<br />
| Timings out of spec <br />
| Probably few <br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher, with higher temp <br />
| <br />
|-<br />
| Voltages out of spec <br />
| Probably random<br />
| Maybe one, if different mem mfrs/parts<br />
| Usually higher at higher system load<br />
| <br />
|-<br />
| Bad check bit init <br />
| Probably random<br />
| Probably all <br />
| High/very high (stops after a while for systems with background scrub)<br />
| <br />
|-<br />
| Single event upsets <br />
| Random <br />
| Varies with effective "cross-section" of part<br />
| Very rare <br />
| <br />
|}<br />
<br />
=== Things to try to isolate the problem ===<br />
<br />
General:<br />
<br />
* Get a second opinion e.g. from [[http://www.memtest.org/]] or [[http://www.memtest86.com/]] - note that you should be sure that either:<br />
** The memory testing software knows how to disable ECC on your system, or<br />
** You have disabled ECC before running memory tester (note that memtest86 currently displays "ECC: No" on chipsets which have ECC, but which it doesn't know about!).<br />
* This may not catch problems like power-supply related problems, which don't occur when the memory tester is running.<br />
* Use a system stress tester such as "burnbx" from [[http://pages.sbcglobal.net/redelm/]].<br />
* Put your system under stress by (e.g.) running a parallelised Linux kernel build, whilst doing some heavy 3D graphics display, and a lot of disk I/O.<br />
<br />
Suspected bad module:<br />
<br />
* Remove Module.<br />
* Move Module to different slot (do errors move with module).<br />
* Move Module to different machine.<br />
* See "suspected temp out of spec".<br />
* See "suspected timings out of spec".<br />
* See "voltages out of spec".<br />
* Clean connections.<br />
* Check Memory Loading<br />
** Some memory controllers can only support so many 'ranks' of memory at a given speed. <br />
For example, Opterons/Athlon64s can support only 4 ranks of 2 GB at PC3200. <br />
See http://www.valueram.com/memoryranks/default.asp for definitions.<br />
<br />
Suspected bad motherboard:<br />
<br />
* Check motherboard docs for memory module compatability.<br />
* Move modules to different slots.<br />
* Clean connections.<br />
* Upgrade BIOS.<br />
* Select BIOS "fail-safe defaults", or equivalent change settings from there to isolate cause.<br />
<br />
Suspected bad connection:<br />
<br />
* Visually check connectors, pins, modules etc.<br />
* [[HowToCleanEdgeConnectors]].<br />
<br />
Suspected temp out of spec:<br />
<br />
* Measure temp, compare to published specs:<br />
** Use internal machine sensors (motherboard, hard drive etc.) if possible.<br />
** Use a temperature probe.<br />
* Check airflow.<br />
* De-dust.<br />
* Lower temp:<br />
** Lower room temp.<br />
** Increase cooling.<br />
** Improve airflow (tidy cables etc.).<br />
<br />
Suspected timings out of spec:<br />
<br />
* Try different BIOS version.<br />
* Set pessimistic memory timings in BIOS.<br />
* Compare memory controller timings to DIMM specs, using decode-dimms.pl from the Linux i2c project.<br />
* Try disabling "spread spectrum" in the BIOS (easy), or by using an i2c driver for your board's clock generator (hard).<br />
<br />
Suspected voltage out of spec:<br />
<br />
* Check PSU specs vs. total demand of system components.<br />
* Swap power supply with another machine.<br />
* Fit voltage regulator/spike surpressor to machine power supply.<br />
<br />
Suspected single event upsets:<br />
<br />
* Fit less susceptible components<br />
* Move to a lower altitude, or area with lower cosmic radiation.<br />
* Move your data centre underground.<br />
* Improve error-reporting utilities to ignore them.<br />
<br />
Suspected bad check-bit init:<br />
<br />
* Upgrade BIOS.<br />
* Don't enable BIOS "quick boot".<br />
* Don't manually skip BIOS memory check.<br />
<br />
Suspected insufficient powersupply:<br />
<br />
* Try detaching some devices that are hardly use. Start with USB devices.<br />
* If the problems stop, either structurally reduce the devices, or get a beefier powersupply.<br />
* This is closely related to voltage out of spec. That can also be caused by just a broken supply.<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=Main_Page&diff=108Main Page2009-01-16T12:42:13Z<p>WikiSysop: 73 revisions</p>
<hr />
<div><!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more<br />
--><br />
<!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]].<br />
--><br />
<!-- ##master-page:[[FrontPage]]<br />
--><br />
<!-- #format wiki<br />
--><br />
<!-- #language en<br />
--><br />
<!-- #pragma section-numbers off<br />
--><br />
= EDAC Wiki =<br />
<br />
This is a wiki for the Linux EDAC project<br />
<br />
== What is it? ==<br />
<br />
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:<br />
<br />
* System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate<br />
* PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection<br />
<br />
== Why do I need it? ==<br />
<br />
Without the EDAC modules, on most current Linux systems:<br />
<br />
* You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).<br />
* If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).<br />
* If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).<br />
<br />
= How to turn it on =<br />
<br />
* PCI error checking can be enabled with:<br />
<br />
<pre><nowiki>dougal:~# modprobe edac_mc<br />
dougal:~# cd /sys/devices/system/edac/pci/<br />
dougal:/sys/devices/system/edac/pci# cat check_pci_parity<br />
0<br />
dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
1<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
usb0: rxqlen 0 --> 4<br />
usb0: no IPv6 routers present<br />
EDAC MC: Ver: 2.0.1 May 9 2007<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0<br />
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02)<br />
dougal:/sys/devices/system/edac/pci# arecord > /dev/null<br />
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono<br />
Aborted by signal Interrupt...<br />
dougal:/sys/devices/system/edac/pci# cat pci_parity_count<br />
15<br />
dougal:/sys/devices/system/edac/pci# dmesg | tail -4<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
EDAC PCI: Detected Parity Error on 0000:00:09.0<br />
</nowiki></pre><br />
<br />
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).<br />
<br />
= Help! =<br />
<br />
=== About the Errors that EDAC generates ===<br />
<br />
If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.<br />
<br />
=== The EDAC Bug Database ===<br />
<br />
If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.<br />
<br />
=== In-kernel documentation ===<br />
<br />
There is some documentation in the kernel in Documentation/drivers/edac/ .<br />
<br />
=== The EDAC Mailing List ===<br />
<br />
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives for your problem first. If you have exhausted these possibilities, then by all means post to [http://sourceforge.net/mail/?group_id=93775 the mailing list]!<br />
<br />
* Be polite<br />
* Please make sure you give all information which might be relevant e.g. your (exact) kernel version<br />
* Be patient<br />
<br />
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.<br />
<br />
== Userspace Tools ==<br />
<br />
There are userspace tools in development at http://sourceforge.net/projects/edac-utils<br />
<br />
The userspace needs some help, please get involved and help out!<br />
<br />
== Status ==<br />
<br />
The EDAC code is in Linux Kernel version 2.6.16. Their is a userspace API (via sysfs) in 2.6.18 and above.<br />
<br />
=== Getting the code ===<br />
<br />
If you want a more recent version that the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:<br />
<br />
<pre><nowiki><br />
$ cd mydev-dir<br />
$ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/<br />
$ less bluesmoke/edac/patches/README<br />
</nowiki></pre><br />
<br />
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.<br />
<br />
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:<br />
<br />
[http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/]<br />
<br />
== History ==<br />
<br />
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.<br />
<br />
== Supported Hardware ==<br />
<br />
=== System Main Memory EDAC ===<br />
<br />
==== Supported Memory Controllers ====<br />
<br />
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Model <br />
| EDAC Driver <br />
| Tech Docs<br />
| Controller Capabilities <br />
| Status <br />
| <br />
|-<br />
| AMD <br />
| Opteron <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| Athlon64 <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| AthlonFX <br />
| [[k8_edac.c]] <br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD]<br />
| [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] <br />
| Supported Development Tree<br />
| <br />
|-<br />
| AMD <br />
| 760 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 762 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| AMD <br />
| 768 <br />
| [[amd76x_edac.c]]<br />
| [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD]<br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7500 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7501 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7505 <br />
| [[e7xxx_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7520 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7525 <br />
| [[e752x_edac.c]] <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82875p <br />
| [[i82875p_edac.c]]<br />
| <br />
| [[EDAC]] <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| e7210 <br />
| <br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 82860 <br />
| [[i82860_edac.c]]<br />
| <br />
| <br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Intel <br />
| 5000(P/V/X)<br />
| [[i5000_edac.c]]<br />
| <br />
| <br />
| Patch in CVS<br />
| <br />
|-<br />
| Intel <br />
| 82443BX/GX(440BX/GX)<br />
| [[i82443bxgx_edac.c]]<br />
| [http://www.intel.com/design/chipsets/440bx/ Intel]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Patch in SVN<br />
| <br />
|-<br />
| [http://www.radisys.com Radisys]<br />
| 82600<br />
| [[r82600_edac.c]]<br />
| [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys]<br />
| [[EDAC]], [[ErrorScrub]]<br />
| Supported (Linux 2.6.16)<br />
| <br />
|-<br />
| Via <br />
| VT82c693/694(Pro133)<br />
| <br />
| [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local]<br />
| [[EDAC]] <br />
| Author Needed<br />
| <br />
|}<br />
<br />
=== Customisation for your Hardware ===<br />
<br />
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page.<br />
<br />
== PCI Error Reporting ==<br />
<br />
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.<br />
<br />
==== Error Detection Overhead ====<br />
<br />
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.<br />
<br />
==== Faulty Hardware ====<br />
<br />
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page.<br />
<br />
== Help Wanted! ==<br />
<br />
We need your help:<br />
<br />
* Improve this documentation<br />
* [[HowToWriteNewMemoryControllerDrivers]]<br />
* [[HardwareWanted]]<br />
* Test the code<br />
* Report broken hardware for the blacklists<br />
* Create memory slot entries for your hardware<br />
* Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)<br />
* Create a script to generate dimm labels, whitelists from the WIKI contents<br />
<br />
== Other Resources ==<br />
<br />
Sourceforge project page [http://bluesmoke.sourceforge.net/]<br />
<br />
An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection]<br />
<br />
The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/]<br />
<br />
A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/]<br />
<br />
== How to use this site ==<br />
<br />
A Wiki is a collaborative site, anyone can contribute and share:<br />
* Edit any page by pressing <b>Edit</b> at the top or the bottom of the page<br />
* Create a link to another page with joined capitalized words (like [[WikiSandBox]]) or with <code><nowiki>[[quoted words in brackets]]</nowiki></code><br />
* Search for page titles or text within pages using the search box at the top of any page<br />
* See [[HelpForBeginners]] to get you going, [[HelpContents]] for all help pages.<br />
<br />
To learn more about what a [[WikiWikiWeb]] is, read about [[MoinMoin]]:[[WhyWikiWorks]] and the [[MoinMoin]]:[[WikiNature]]. Also, consult the [[MoinMoin]]:[[WikiWikiWebFaq]].<br />
<br />
This wiki is powered by MediaWiki.<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=MemoryControllerEdacDriverTemplate&diff=34MemoryControllerEdacDriverTemplate2009-01-16T12:42:05Z<p>WikiSysop: 2 revisions</p>
<hr />
<div>= XYZ Memory Controller EDAC Driver =<br />
<br />
The xyz memory controller EDAC driver supports the following hardware:<br />
<br />
* YYY Corp's XYZ Chipset<br />
<br />
== Memory Controller Capabilities ==<br />
<br />
== Products Which Use the XYZ Hardware ==<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Product Name <br />
| Product Model<br />
| Version<br />
| Tested by <br />
| <br />
|-<br />
| ABC Corp <br />
| Cool Motherboard DEF<br />
| DEFMB4 <br />
| 1a <br />
| youremail@here.com<br />
| <br />
|-<br />
| ABC Corp <br />
| XXX PC Motherboard <br />
| XXX-server <br />
| 1 to 5 <br />
| youremail@here.com<br />
| <br />
|}<br />
<br />
== Known Issues and Workarounds ==<br />
<br />
On the Blah chipset, you have to enable the Froob option in the BIOS<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=R82600_edac.c&diff=31R82600 edac.c2009-01-16T12:42:04Z<p>WikiSysop: 2 revisions</p>
<hr />
<div>= Radisys 82600 Memory Controller EDAC Driver =<br />
<br />
The r82600 memory controller EDAC driver supports the following hardware:<br />
<br />
* Radisys 82600 (all known revisions)<br />
<br />
== Memory Controller Capabilities ==<br />
<br />
== Products Which Use the Radisys 82600 Memory Controller ==<br />
<br />
{| border="1" cellpadding="2" cellspacing="0"<br />
| Manufacturer<br />
| Product Name <br />
| Product Model<br />
| Version<br />
| Tested by <br />
| <br />
|-<br />
| Radisys <br />
| EPC-6315 <br />
| EPC-6315 <br />
| <br />
| tim@buttersideup.com<br />
| <br />
|}<br />
<br />
== Known Issues and Workarounds ==<br />
<br />
The module has only been tested on boards which have a single (soldered down) bank of 256M of ECC PC133 SDRAM<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=WhyAmIgettingPciErrors&diff=28WhyAmIgettingPciErrors2009-01-16T12:42:04Z<p>WikiSysop: 8 revisions</p>
<hr />
<div>You probably have been for ages, it's just that nothing was checking them until you enabled the EDAC module. Note that your system may be experiencing data corruption, so you should really check this out.<br />
<br />
* Your PCI device is broken by design, and reports parity errors, when none occur (if you get pretty sure that this is the case, then please add it to the list of [[PCIDevicesWithBrokenParityDetection]])<br />
* Your PCI device is faulty (Please add it below)<br />
* Your Motherboard is faulty (try moving device to another PCI slot)<br />
* Bad connection caused by dirty connectors - see [[HowToCleanEdgeConnectors]]<br />
* Your power supply is faulty/underspeced<br />
* The electrical supply is faulty (e.g. transient spikes / droops)<br />
* Other electrical noise (either from inside, or outside the system) is causing the problems<br />
<br />
Please expand on this and add links!<br />
<br />
== Devices which are known to cause Genuine PCI Parity Errors ==<br />
<br />
* [[SuperMicro]] H8DAR-E motherboards - see mailing list archives (FIXME - which revisions?)<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=SoftErrors&diff=19SoftErrors2009-01-16T12:42:04Z<p>WikiSysop: 1 revision</p>
<hr />
<div>Soft Errors ("Single Event Upsets") are caused by radiation.<br />
<br />
A Single Event Upset, basically means that a high energy particle (e.g. from a cosmic "particle shower") arrives in a semiconductor device (e.g. a DRAM part), and redistributes, or dumps enough charge there to change the logic state of information stored in some part of the device (i.e. flips a bit).<br />
<br />
This radiation can come from within, or close to the device itself (e.g. from Lead which is used in device manufacture, or soldering), or from distant sources, such as cosmic radiation.<br />
<br />
As of the time of writing, the vast majority of soft errors are believed to originate from cosmic radiation.<br />
<br />
The use of BPSG (Boro-Phospho-Silicate-Glass) [http://www.semiconfareast.com/dielectric.htm dielectric] layers in semiconductors can make a device many times more vulnerable to cosmic radiation and most manufacturers have switched to other pasivation layers as a result.<br />
<br />
There are a few references in this mailing list post:<br />
<br />
[http://sourceforge.net/mailarchive/message.php?msg_id=12124702]<br />
<br />
Note that recent manufacturer datasheets which I have seen do however quote error rates approx (from memory) two orders of magnitude <b>lower</b> than the levels predicted by most references in the above mailing list posting.<br />
<br />
If you have a large number of systems, then please share your experiences here and/or on the mailing list.<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=HardwareWanted&diff=17HardwareWanted2009-01-16T12:42:04Z<p>WikiSysop: 2 revisions</p>
<hr />
<div>= Hardware Wanted =<br />
<br />
The nature of the EDAC project is such that if developers don't have access to the necessary hardware, development is next to impossible, and maintenance is hard. Broken hardware is often very useful (particularly DIMMs).<br />
<br />
= How to Donate =<br />
<br />
== General Hardware ==<br />
<br />
If you have hardware which is not listed here, but which you think could be useful to the project, please [http://sourceforge.net/mail/?group_id=93775 contact the developer mailing list]<br />
<br />
== Specific Hardware Requests ==<br />
<br />
If you have any pieces of hardware around which you are willing to donate, then please contact the developers below, and CC the developer mailing list.<br />
<br />
PC100 ECC DIMMs (any size) - [mailto:tim@buttersideup.com]<br />
<br />
PC133 ECC DIMMs (any size) - [mailto:tim@buttersideup.com]<br />
<br />
Intel 440GX based motherboard - [mailto:tim@buttersideup.com]<br />
<br />
Intel Blackford/Greencreek (5000P/V/X chipset) motherboard, to port bluesmoke MC driver<br />
to EDAC - [mailto:norsk5@xmission.com]<br />
<br />
__NOTOC__</div>WikiSysophttps://buttersideup.com/mediawiki/index.php?title=MemorySlotLabels&diff=14MemorySlotLabels2009-01-16T12:42:04Z<p>WikiSysop: 2 revisions</p>
<hr />
<div>Describe [[MemorySlotLabels]] here.<br />
<br />
Please include the BIOS "DMI" string (as produced by the "dmidecode" program) with your definition.<br />
<br />
= Tyan Tiger MP (S2460) (AMD760) =<br />
<pre><nowiki><br />
# dmidecode 2.7<br />
SMBIOS 2.3 present.<br />
31 structures occupying 914 bytes.<br />
Table at 0x000EF590.<br />
<br />
Handle 0x0000, DMI type 0, 20 bytes.<br />
BIOS Information<br />
Vendor: Phoenix Technologies Ltd.<br />
Version: PGNA02-8<br />
Release Date: 12/12/2002<br />
Address: 0xE4B40<br />
Runtime Size: 111808 bytes<br />
ROM Size: 256 kB<br />
Characteristics:<br />
PCI is supported<br />
PNP is supported<br />
APM is supported<br />
BIOS is upgradeable<br />
BIOS shadowing is allowed<br />
ESCD support is available<br />
Boot from CD is supported<br />
Selectable boot is supported<br />
EDD is supported<br />
5.25"/360 KB floppy services are supported (int 13h)<br />
5.25"/1.2 MB floppy services are supported (int 13h)<br />
3.5"/720 KB floppy services are supported (int 13h)<br />
3.5"/2.88 MB floppy services are supported (int 13h)<br />
Print screen service is supported (int 5h)<br />
8042 keyboard services are supported (int 9h)<br />
Serial services are supported (int 14h)<br />
Printer services are supported (int 17h)<br />
CGA/mono video services are supported (int 10h)<br />
ACPI is supported<br />
USB legacy is supported<br />
AGP is supported<br />
LS-120 boot is supported<br />
[...]<br />
Handle 0x0002, DMI type 2, 8 bytes.<br />
Base Board Information<br />
Manufacturer: Tyan<br />
Product Name: Guinness<br />
[...]<br />
Handle 0x0006, DMI type 5, 24 bytes.<br />
Memory Controller Information<br />
Error Detecting Method: 128-bit ECC<br />
Error Correcting Capabilities:<br />
Other<br />
Supported Interleave: One-way Interleave<br />
Current Interleave: One-way Interleave<br />
Maximum Memory Module Size: 4096 MB<br />
Maximum Total Memory Size: 16384 MB<br />
Supported Speeds:<br />
Other<br />
Supported Memory Types:<br />
DIMM<br />
Memory Module Voltage: 3.3 V<br />
Associated Memory Slots: 4<br />
0x0006 <--Wrong! Should be 0x7-0xA<br />
0x0007<br />
0x0008<br />
0x0009<br />
Enabled Error Correcting Capabilities:<br />
Unknown<br />
<br />
Handle 0x0007, DMI type 6, 12 bytes.<br />
Memory Module Information<br />
Socket Designation: DIM1<br />
Bank Connections: 7 6<br />
Current Speed: 10 ns<br />
Type: DIMM SDRAM<br />
Installed Size: Not Installed<br />
Enabled Size: Not Installed<br />
Error Status: OK<br />
<br />
Handle 0x0008, DMI type 6, 12 bytes.<br />
Memory Module Information<br />
Socket Designation: DIM2<br />
Bank Connections: 5 4<br />
Current Speed: 10 ns<br />
Type: DIMM SDRAM<br />
Installed Size: Not Installed<br />
Enabled Size: Not Installed<br />
Error Status: OK<br />
<br />
Handle 0x0009, DMI type 6, 12 bytes.<br />
Memory Module Information<br />
Socket Designation: DIM3<br />
Bank Connections: 3 2<br />
Current Speed: 10 ns<br />
Type: DIMM SDRAM<br />
Installed Size: 8 MB (Single-bank Connection)<br />
Enabled Size: 8 MB (Single-bank Connection)<br />
Error Status: OK<br />
<br />
Handle 0x000A, DMI type 6, 12 bytes.<br />
Memory Module Information<br />
Socket Designation: DIM4<br />
Bank Connections: 1 0<br />
Current Speed: 10 ns<br />
Type: DIMM SDRAM<br />
Installed Size: 8 MB (Single-bank Connection)<br />
Enabled Size: 8 MB (Single-bank Connection)<br />
Error Status: OK<br />
</nowiki></pre><br />
Homepage for this board: http://www.tyan.com/archive/products/html/tigermp.html<br />
manual here: ftp://ftp.tyan.com/manuals/m_s2460_100.pdf or ftp://ftp.tyan.com/manuals/m_s2460_103.pdf (rev 1.00 & rev 1.03) (see pages 11 & 21)<br />
<br />
__NOTOC__</div>WikiSysop