Difference between revisions of "Main Page"
Marionjajo (talk | contribs) |
(Undo revision 176 by Marionjajo (Talk)) |
||
Line 1: | Line 1: | ||
+ | <!-- ## Please edit system and help pages ONLY in the moinmaster wiki! For more |
||
− | keeps me going, "Build a better best online dating sites and the world will beat a path to your door. ing that dabblers have lots of questions about free sites dating. what degree do visitors procure good seniors dating site handbooks? y impression. llment is going to explain, in simplified English, how to get the most out of that assignment. hat's how to fix your claim. WD#. ts of fact referring to a situation. assages the egos of lots and lots of sidekicks. is is not an option with dating free sites because it all can last for years. e recently upgraded. tachment about [http://freedatingsites.us.com/ free dating site]. ized difference but doing it certainly can't hurt. see some [http://www.refowiki.nl/mw/index.php?title=That_might_be_quite_a_quandary_for_free_dating_cha top dating site] here they are. than a little confused by it. cookies for me. this, but you should utilize the senior dating service that you by this time have. until you get to this point. they were all the rage many decades ago. ar. es to do this. nstruction on [http://www.fundaciollor.org/campus2/user/view.php?id=2073&course=1 nigerian dating sites]. e has been fantastic so far. soon as study all of your options. discontinue using this. e can minimize the troubles. and this is instituted by the government a while ago. le year. D#. ured into finding a new 100% free dating site (They've been looking far and wide). k is, because a lot of gentlepersons really don't comprehend this. y got my attention at first. few coworkers dropped the ball on your doodad. that confirms the character of the individual). ular name in the world of tutors using it. to decrease the amount of free online dating services. l even if this is the occasion for you to sit up and take action. n, no gain. different free datings sites but have been uncertain touching on this question as much as it's one of the best free datings sites around. her ordinary top 10 free dating sites mistake. es and I suspect that technology will not eliminate [http://www.promopedia.org/wiki/Mit_to_you_these_facts_concerning_completely_free free dating sites online]. well enough. til then� How can men and women affect great new free dating site recipes? u or unequivocally, it might help improve your african dating sites. mplain due to the incredible advantages that came from using this. KWD# but for some banality as well. much as possible. don't understand why I wouldn't do the same old things as that concerns that as little as possible. tween your perplexity and doing this or there might be a certain appeal to it. hat have that method because of that. isticated person, but pay attention to this. t cast off that fabulous feeling. the details pertaining to free dating sites for women seeking men before you can decide on which free dating sites for women seeking men is going to be right for you. WD#. |
||
+ | --> |
||
+ | <!-- ## information, please see [[MoinMaster]]:[[MoinPagesEditorGroup]]. |
||
+ | --> |
||
+ | <!-- ##master-page:[[FrontPage]] |
||
+ | --> |
||
+ | <!-- #format wiki |
||
+ | --> |
||
+ | <!-- #language en |
||
+ | --> |
||
+ | <!-- #pragma section-numbers off |
||
+ | --> |
||
+ | = EDAC Wiki = |
||
+ | |||
+ | This is a wiki for the [http://bluesmoke.sourceforge.net/ Linux EDAC project] |
||
+ | |||
+ | == What is it? == |
||
+ | |||
+ | [http://en.wikipedia.org/wiki/Error_detection_and_correction EDAC] Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported: |
||
+ | |||
+ | * [http://en.wikipedia.org/wiki/Dynamic_random_access_memory#Errors_and_error_correction System RAM errors] (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate |
||
+ | * [http://en.wikipedia.org/wiki/Memory_scrubbing RAM scrubbing] - some memory controllers support "scrubbing" DRAM during normal operation. Continuously scrubbing DRAM allows for actively detecting and correcting ECC errors. |
||
+ | * PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection |
||
+ | * Cache ECC errors |
||
+ | |||
+ | == Why do I need it? == |
||
+ | |||
+ | Without the EDAC modules, on most current Linux systems: |
||
+ | |||
+ | * You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI). |
||
+ | * If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle). |
||
+ | * If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted). |
||
+ | |||
+ | = How to turn it on = |
||
+ | |||
+ | * PCI error checking can be enabled with: |
||
+ | |||
+ | <pre><nowiki>dougal:~# modprobe edac_mc |
||
+ | dougal:~# cd /sys/devices/system/edac/pci/ |
||
+ | dougal:/sys/devices/system/edac/pci# cat check_pci_parity |
||
+ | 0 |
||
+ | dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity |
||
+ | dougal:/sys/devices/system/edac/pci# cat pci_parity_count |
||
+ | 1 |
||
+ | dougal:/sys/devices/system/edac/pci# dmesg | tail -4 |
||
+ | usb0: rxqlen 0 --> 4 |
||
+ | usb0: no IPv6 routers present |
||
+ | EDAC MC: Ver: 2.0.1 May 9 2007 |
||
+ | EDAC PCI: Detected Parity Error on 0000:00:09.0 |
||
+ | dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0 |
||
+ | 00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02) |
||
+ | dougal:/sys/devices/system/edac/pci# arecord > /dev/null |
||
+ | Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono |
||
+ | Aborted by signal Interrupt... |
||
+ | dougal:/sys/devices/system/edac/pci# cat pci_parity_count |
||
+ | 15 |
||
+ | dougal:/sys/devices/system/edac/pci# dmesg | tail -4 |
||
+ | EDAC PCI: Detected Parity Error on 0000:00:09.0 |
||
+ | EDAC PCI: Detected Parity Error on 0000:00:09.0 |
||
+ | EDAC PCI: Detected Parity Error on 0000:00:09.0 |
||
+ | EDAC PCI: Detected Parity Error on 0000:00:09.0 |
||
+ | </nowiki></pre> |
||
+ | |||
+ | Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above). |
||
+ | |||
+ | = Help! = |
||
+ | |||
+ | === About the Errors that EDAC generates === |
||
+ | |||
+ | If the EDAC subsystem is reporting errors on your system, please see [[WhyAmIgettingMemoryErrors]], and [[WhyAmIgettingPciErrors]]. <b>Please</b> try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list. |
||
+ | |||
+ | === The EDAC Bug Database === |
||
+ | |||
+ | If you think you've found a bug, please <b>search</b> [http://edacbugs.buttersideup.com/ the EDAC Bugzilla] to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report. |
||
+ | |||
+ | === In-kernel documentation === |
||
+ | |||
+ | There is some documentation in the kernel in [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/edac.txt Documentation/edac.txt ]. |
||
+ | |||
+ | === The EDAC Mailing List === |
||
+ | |||
+ | Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives (for both the [http://marc.info/?l=linux-edac current] and the [http://sourceforge.net/mail/?group_id=93775 previous] mailing lists) for your problem first. |
||
+ | |||
+ | If you have exhausted these possibilities, then by all means post to [http://vger.kernel.org/vger-lists.html#linux-edac the mailing list]... |
||
+ | |||
+ | * Be polite |
||
+ | * Please make sure you give all information which might be relevant e.g. your (exact) kernel version |
||
+ | * Be patient |
||
+ | * Use [http://www.1331media.com flash gallery script] |
||
+ | |||
+ | If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others. |
||
+ | |||
+ | == Userspace Tools == |
||
+ | |||
+ | There are userspace tools in development at http://sourceforge.net/projects/edac-utils |
||
+ | |||
+ | The userspace needs some help, please get involved and help out! |
||
+ | |||
+ | == Status == |
||
+ | |||
+ | The EDAC code is in Linux Kernel version 2.6.16. There is a userspace API (via sysfs) in 2.6.18 and above. |
||
+ | |||
+ | === Getting the code === |
||
+ | |||
+ | If you want a more recent version than the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout: |
||
+ | |||
+ | <pre><nowiki> |
||
+ | $ cd mydev-dir |
||
+ | $ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/ |
||
+ | $ less bluesmoke/edac/patches/README |
||
+ | </nowiki></pre> |
||
+ | |||
+ | Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information. |
||
+ | |||
+ | You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at: |
||
+ | |||
+ | [http://bluesmoke.svn.sourceforge.net/viewvc/bluesmoke/] |
||
+ | |||
+ | == History == |
||
+ | |||
+ | The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained. |
||
+ | |||
+ | == Supported Hardware == |
||
+ | |||
+ | === System Main Memory EDAC === |
||
+ | |||
+ | ==== Supported Memory Controllers ==== |
||
+ | |||
+ | Please see the individual driver pages for information on supported revisions, motherboard-specific information etc. |
||
+ | |||
+ | {| border="1" cellpadding="2" cellspacing="0" |
||
+ | | Manufacturer |
||
+ | | Model |
||
+ | | EDAC Driver |
||
+ | | Tech Docs |
||
+ | | Controller Capabilities |
||
+ | | Status |
||
+ | | |
||
+ | |- |
||
+ | | AMCC |
||
+ | | 4xx |
||
+ | | [[ppc4xx_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.30) |
||
+ | | |
||
+ | |- |
||
+ | | AMD |
||
+ | | Opteron |
||
+ | | [[amd64_edac.c]] |
||
+ | | [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD] |
||
+ | | [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] |
||
+ | | Supported Development Tree |
||
+ | | |
||
+ | |- |
||
+ | | AMD |
||
+ | | Athlon64 |
||
+ | | [[amd64_edac.c]] |
||
+ | | [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD] |
||
+ | | [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] |
||
+ | | Supported Development Tree |
||
+ | | |
||
+ | |- |
||
+ | | AMD |
||
+ | | AthlonFX |
||
+ | | [[amd64_edac.c]] |
||
+ | | [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF AMD] |
||
+ | | [[EDAC]], [[ErrorScrub]], [[BackgroundScrub]] |
||
+ | | Supported Development Tree |
||
+ | | |
||
+ | |- |
||
+ | | AMD |
||
+ | | 760 |
||
+ | | [[amd76x_edac.c]] |
||
+ | | [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD] |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | AMD |
||
+ | | 762 |
||
+ | | [[amd76x_edac.c]] |
||
+ | | [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24462.pdf AMD] |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | AMD |
||
+ | | 768 |
||
+ | | [[amd76x_edac.c]] |
||
+ | | [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24467.pdf AMD] |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | AMD |
||
+ | | 8111 |
||
+ | | [[amd8111_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.30) |
||
+ | | |
||
+ | |- |
||
+ | | Freescale |
||
+ | | MPC83xx |
||
+ | | [[mpc85xx_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.32) |
||
+ | | |
||
+ | |- |
||
+ | | Freescale |
||
+ | | MPC85xx |
||
+ | | [[mpc85xx_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.25) |
||
+ | | |
||
+ | |- |
||
+ | | Freescale |
||
+ | | P2020 |
||
+ | | [[mpc85xx_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.32) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | e7500 |
||
+ | | [[e7xxx_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | e7501 |
||
+ | | [[e7xxx_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | e7505 |
||
+ | | [[e7xxx_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | e7520 |
||
+ | | [[e752x_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | e7525 |
||
+ | | [[e752x_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | i3100 |
||
+ | | [[e752x_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.26) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | i3200 and i3210 |
||
+ | | [[i3200_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.??) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | 82875p |
||
+ | | [[i82875p_edac.c]] |
||
+ | | |
||
+ | | [[EDAC]] |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | e7210 |
||
+ | | |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | 82860 |
||
+ | | [[i82860_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | 5000(P/V/X) |
||
+ | | [[i5000_edac.c]] |
||
+ | | |
||
+ | | |
||
+ | | Patch in CVS |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | 82443BX/GX(440BX/GX) |
||
+ | | [[i82443bxgx_edac.c]] |
||
+ | | [http://www.intel.com/design/chipsets/440bx/ Intel] |
||
+ | | [[EDAC]], [[ErrorScrub]] |
||
+ | | Patch in SVN |
||
+ | | |
||
+ | |- |
||
+ | | [http://www.radisys.com Radisys] |
||
+ | | 82600 |
||
+ | | [[r82600_edac.c]] |
||
+ | | [http://www.radisys.com/products/ds-page.cfm?ProductDatasheetsID=1054 Radisys] |
||
+ | | [[EDAC]], [[ErrorScrub]] |
||
+ | | Supported (Linux 2.6.16) |
||
+ | | |
||
+ | |- |
||
+ | | Via |
||
+ | | VT82c693/694(Pro133) |
||
+ | | |
||
+ | | [http://buttersideup.com/files/Via_Apollo_133_datasheets/ Local] |
||
+ | | [[EDAC]] |
||
+ | | Author Needed |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | 3000/3010 |
||
+ | | [[i3000_edac.c]] |
||
+ | | [http://www.intel.com/products/server/chipsets/3010/3010-overview.htm Intel] |
||
+ | | [[EDAC]] |
||
+ | | Supported (Linux 2.6.25) |
||
+ | | |
||
+ | |- |
||
+ | | Intel |
||
+ | | X38 |
||
+ | | [[x38_edac.c]] |
||
+ | | [http://www.intel.com/Products/Desktop/Chipsets/X38/X38-overview.htm Intel] |
||
+ | | [[EDAC]] |
||
+ | | Supported (Linux 2.6.28) |
||
+ | | |
||
+ | |} |
||
+ | |||
+ | === Customisation for your Hardware === |
||
+ | |||
+ | For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the [[MemorySlotLabels]] page. |
||
+ | |||
+ | == PCI Error Reporting == |
||
+ | |||
+ | PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality. |
||
+ | |||
+ | ==== Error Detection Overhead ==== |
||
+ | |||
+ | The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems. |
||
+ | |||
+ | ==== Faulty Hardware ==== |
||
+ | |||
+ | Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the [[PCIDevicesWithBrokenParityDetection]] page. |
||
+ | |||
+ | == Help Wanted! == |
||
+ | |||
+ | We need your help: |
||
+ | |||
+ | * Improve this documentation |
||
+ | * [[HowToWriteNewMemoryControllerDrivers]] |
||
+ | * [[HardwareWanted]] |
||
+ | * Test the code |
||
+ | * Report broken hardware for the blacklists |
||
+ | * Create memory slot entries for your hardware |
||
+ | * Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.) |
||
+ | * Create a script to generate dimm labels, whitelists from the WIKI contents |
||
+ | |||
+ | == Other Resources == |
||
+ | |||
+ | Sourceforge project page [http://bluesmoke.sourceforge.net/] |
||
+ | |||
+ | An overview of EDAC technologies on Wikipedia [http://en.wikipedia.org/wiki/Error_correction_and_detection] |
||
+ | |||
+ | The original Linux ECC project (Dan Hollis et al) - [http://www.anime.net/~goemon/linux-ecc/] |
||
+ | |||
+ | A talk delivered by Tim Small at UKUUG 2006 - [http://buttersideup.com/edac-ukuug-2006-talk/slides/] |
||
+ | |||
+ | Mailing list [http://vger.kernel.org/vger-lists.html#linux-edac] |
||
+ | |||
+ | == How to use this site == |
||
+ | |||
+ | A Wiki is a collaborative site, anyone can contribute and share: |
||
+ | * Edit any page by pressing <b>Edit</b> at the top of the page |
||
+ | * [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ] |
Revision as of 10:13, 14 October 2011
EDAC Wiki
This is a wiki for the Linux EDAC project
What is it?
EDAC Stands for "Error Detection and Correction". The Linux EDAC project comprises a series of Linux kernel modules, which make use of error detection facilities of computer hardware, currently hardware which detects the following errors is supported:
- System RAM errors (this is the original, and most mature part of the project) - many computers support RAM EDAC, (especially for chipsets which are aimed at high-reliability applications), but RAM which has extra storage capacity ("ECC RAM") is needed for these facilities to operate
- RAM scrubbing - some memory controllers support "scrubbing" DRAM during normal operation. Continuously scrubbing DRAM allows for actively detecting and correcting ECC errors.
- PCI bus transfer errors - the majority of PCI bridges, and peripherals support such error detection
- Cache ECC errors
Why do I need it?
Without the EDAC modules, on most current Linux systems:
- You may be experiencing PCI data corruption (e.g. your data is being corrupted between whilst travelling to/from your NIC/storage adapter, whilst on the PCI bus), and not know about it, as most systems do not check PCI devices for reported PCI parity errors (some may trigger an NMI, but you have no more info about what caused the NMI).
- If you have ECC memory, and you are experiencing correctable ECC errors, you probably won't know anything about it. With the EDAC modules installed on your system, you will get to know about bad memory modules before the errors become uncorrectable, and you have potentially corrupted data (and a crashed machine) - this includes finding out about memory modules which are bad as-shipped, before such systems are put into service (saving you time and hassle).
- If you have a motherboard which claims to support ECC, but the BIOS is not correctly enabling ECC mode, you won't know anything about it (until your machine crashes with unexplained memory errors - you won't even get an NMI, and the extra money spent on ECC memory will be wasted).
How to turn it on
- PCI error checking can be enabled with:
dougal:~# modprobe edac_mc dougal:~# cd /sys/devices/system/edac/pci/ dougal:/sys/devices/system/edac/pci# cat check_pci_parity 0 dougal:/sys/devices/system/edac/pci# echo 1 > check_pci_parity dougal:/sys/devices/system/edac/pci# cat pci_parity_count 1 dougal:/sys/devices/system/edac/pci# dmesg | tail -4 usb0: rxqlen 0 --> 4 usb0: no IPv6 routers present EDAC MC: Ver: 2.0.1 May 9 2007 EDAC PCI: Detected Parity Error on 0000:00:09.0 dougal:/sys/devices/system/edac/pci# lspci -s 0000:00:09.0 00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02) dougal:/sys/devices/system/edac/pci# arecord > /dev/null Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono Aborted by signal Interrupt... dougal:/sys/devices/system/edac/pci# cat pci_parity_count 15 dougal:/sys/devices/system/edac/pci# dmesg | tail -4 EDAC PCI: Detected Parity Error on 0000:00:09.0 EDAC PCI: Detected Parity Error on 0000:00:09.0 EDAC PCI: Detected Parity Error on 0000:00:09.0 EDAC PCI: Detected Parity Error on 0000:00:09.0
Oh dear, my laptop sound device seems to be broken! At the moment the PCI checking is built into the edac_mc (memory controller) kernel module, in time this will be split out... As you can see from the above, PCI error checking is turned off by default, and needs to be turned on (using the "echo" statement above).
Help!
About the Errors that EDAC generates
If the EDAC subsystem is reporting errors on your system, please see WhyAmIgettingMemoryErrors, and WhyAmIgettingPciErrors. Please try and check out the possibilities listed here, and elsewhere on this wiki, before you either open a new bug report, or post to the mailing list.
The EDAC Bug Database
If you think you've found a bug, please search the EDAC Bugzilla to see if it has already been reported (you can then add yourself to the cc list for that bug, so that you are automatically informed of updates etc.), if it hasn't, then please create a new bug report.
In-kernel documentation
There is some documentation in the kernel in Documentation/edac.txt .
The EDAC Mailing List
Most of the EDAC developers keep an eye on the EDAC mailing list (hosted by Sourceforge) to a greater or lesser extent, but please remember that not many of them work on EDAC as part of their job, (and if they do, then they are paid to keep their employer's systems running), so check the Wiki, the bug database, and the mailing list archives (for both the current and the previous mailing lists) for your problem first.
If you have exhausted these possibilities, then by all means post to the mailing list...
- Be polite
- Please make sure you give all information which might be relevant e.g. your (exact) kernel version
- Be patient
- Use flash gallery script
If you get a reply, or find things out which weren't known about before, please add the information to this Wiki, in order to help others.
Userspace Tools
There are userspace tools in development at http://sourceforge.net/projects/edac-utils
The userspace needs some help, please get involved and help out!
Status
The EDAC code is in Linux Kernel version 2.6.16. There is a userspace API (via sysfs) in 2.6.18 and above.
Getting the code
If you want a more recent version than the version in your current kernel, you can download a quilt stack from the sourceforge download page (see below), or by anonymous SVN checkout:
$ cd mydev-dir $ svn checkout https://bluesmoke.svn.sourceforge.net/svnroot/bluesmoke/trunk edac-trunk/ $ less bluesmoke/edac/patches/README
Prior to May 2007, things can be found in CVS. See the sourceforge main page for CVS information.
You will need a recent Linux kernel tree to apply the patches to. Or, if you just want to have a look at the recent changes, you can browse the SVN at:
History
The EDAC project was renamed from the "bluesmoke" prior to submission to the mainline Linux kernel. The Bluesmoke code was created by Thayne Harbaugh. The Linux-ECC project was EDAC's predecessor and its major inspiration. Developed by Dan Hollis and others, the Linux-ECC project is no longer maintained.
Supported Hardware
System Main Memory EDAC
Supported Memory Controllers
Please see the individual driver pages for information on supported revisions, motherboard-specific information etc.
Manufacturer | Model | EDAC Driver | Tech Docs | Controller Capabilities | Status | |
AMCC | 4xx | ppc4xx_edac.c | Supported (Linux 2.6.30) | |||
AMD | Opteron | amd64_edac.c | AMD | EDAC, ErrorScrub, BackgroundScrub | Supported Development Tree | |
AMD | Athlon64 | amd64_edac.c | AMD | EDAC, ErrorScrub, BackgroundScrub | Supported Development Tree | |
AMD | AthlonFX | amd64_edac.c | AMD | EDAC, ErrorScrub, BackgroundScrub | Supported Development Tree | |
AMD | 760 | amd76x_edac.c | AMD | Supported (Linux 2.6.16) | ||
AMD | 762 | amd76x_edac.c | AMD | Supported (Linux 2.6.16) | ||
AMD | 768 | amd76x_edac.c | AMD | Supported (Linux 2.6.16) | ||
AMD | 8111 | amd8111_edac.c | Supported (Linux 2.6.30) | |||
Freescale | MPC83xx | mpc85xx_edac.c | Supported (Linux 2.6.32) | |||
Freescale | MPC85xx | mpc85xx_edac.c | Supported (Linux 2.6.25) | |||
Freescale | P2020 | mpc85xx_edac.c | Supported (Linux 2.6.32) | |||
Intel | e7500 | e7xxx_edac.c | Supported (Linux 2.6.16) | |||
Intel | e7501 | e7xxx_edac.c | Supported (Linux 2.6.16) | |||
Intel | e7505 | e7xxx_edac.c | Supported (Linux 2.6.16) | |||
Intel | e7520 | e752x_edac.c | Supported (Linux 2.6.16) | |||
Intel | e7525 | e752x_edac.c | Supported (Linux 2.6.16) | |||
Intel | i3100 | e752x_edac.c | Supported (Linux 2.6.26) | |||
Intel | i3200 and i3210 | i3200_edac.c | Supported (Linux 2.6.??) | |||
Intel | 82875p | i82875p_edac.c | EDAC | Supported (Linux 2.6.16) | ||
Intel | e7210 | Supported (Linux 2.6.16) | ||||
Intel | 82860 | i82860_edac.c | Supported (Linux 2.6.16) | |||
Intel | 5000(P/V/X) | i5000_edac.c | Patch in CVS | |||
Intel | 82443BX/GX(440BX/GX) | i82443bxgx_edac.c | Intel | EDAC, ErrorScrub | Patch in SVN | |
Radisys | 82600 | r82600_edac.c | Radisys | EDAC, ErrorScrub | Supported (Linux 2.6.16) | |
Via | VT82c693/694(Pro133) | Local | EDAC | Author Needed | ||
Intel | 3000/3010 | i3000_edac.c | Intel | EDAC | Supported (Linux 2.6.25) | |
Intel | X38 | x38_edac.c | Intel | EDAC | Supported (Linux 2.6.28) |
Customisation for your Hardware
For many chipsets and motherboards, there is no consistant relationship between the memory banks/slots as made available to the EDAC driver, and the physical labels present next to the memory module socket. You can help by working out the relationship for your hardware, and adding the info to the MemorySlotLabels page.
PCI Error Reporting
PCI Parity error reporting facilities are included in the PCI specification, and the majority of add-in cards (and chips which are capable of being included in either add-in, or on-motherboard designs) support the PCI parity error detection, and reporting functionality. Some "fake" PCI devices which are not physically connected by a PCI bus (such as e.g. some ATA host adaptors which are built-in to a motherboard chipset) typically do not include the functionality.
Error Detection Overhead
The driver currently only support error detection via polling. Polling all of the PCI devices' error status registers can be time consuming, especially on machines which have many devices. You may wish to slow the error polling rate, or disable it altogether on such systems.
Faulty Hardware
Some PCI devices (or just particular revisions of those devices) are broken with respect to PCI parity detection, and display false positives. You can check (and add to) the list of broken devices on the PCIDevicesWithBrokenParityDetection page.
Help Wanted!
We need your help:
- Improve this documentation
- HowToWriteNewMemoryControllerDrivers
- HardwareWanted
- Test the code
- Report broken hardware for the blacklists
- Create memory slot entries for your hardware
- Create some user-space code (e.g. scripts to go in a cron job, extensions to SNMP daemons etc. etc.)
- Create a script to generate dimm labels, whitelists from the WIKI contents
Other Resources
Sourceforge project page [2]
An overview of EDAC technologies on Wikipedia [3]
The original Linux ECC project (Dan Hollis et al) - [4]
A talk delivered by Tim Small at UKUUG 2006 - [5]
Mailing list [6]
How to use this site
A Wiki is a collaborative site, anyone can contribute and share:
- Edit any page by pressing Edit at the top of the page
- MediaWiki FAQ