Howto setup IPMI under Linux (Debian / Sarge) on the Intel
SR2300 Server Chassis (Intel Server Board SE7501WV2)
$Revision: 1.11 $ $Author: tim $
Introduction
This document describes how to setup Debian / Sarge to take
advantage of the management features of the Intel SR2300, this
chassis uses the Intel Server Board SE7501WV2, but nearly all of
this is also relevant to other related Intel server motherboards
(such as the SE7501BR2, and the SE7501HG2), a lot of it will be
relevant to other boards which implement IPMI v1.5, or later.
At the time of writing (July 2004), the Linux IPMI support is quite
mature, but I found that information was on the sparse side, and
getting a working system together seemed to require a lot of
googling, reading of think technical documentation, and stabbing in
the dark. Hence this document - the purpose of which is to allow
Semantico staff to recreate the IPMI-based installation which I
carried out during July, but which will hopefully be helpful to
others as well.
What is IPMI, and why should I care?
The original motivation for setting up IPMI for me was to make use
of Serial Over LAN - this allows you to deploy these servers in a
remote location, make only power, and Ethernet connections to each
server, and yet still get nearly all of the benefits of expensive
KVM, or other remote control systems - such as those built around
serial concentrators, with:
- Less wiring
- Less hardware
- Lower cost
IPMI stands for Intelligent Platform
Management Interface and is an open standard for machine
health, and control (including remote control), and is implemented
by many hardware vendors - Intel is one of the originators, and
early adopters of the standard. Here are some useful things
that IPMI can do on the SR2300 with Linux:
- Check on hardware health, and report on problems (via the OS,
or autonomously via the network)
- Provide a watchdog timer (in case the OS goes away, or programs
can otherwise not run, the machine will be reset)
- Provide remote "lights out" access to both the Linux console,
and the BIOS via ethernet (no serial concentrators, multi-port
serial cards, or extra cabling required)
- Provide remote, OS independent control over the reset, and
power buttons via ethernet (no funny remote control power sockets,
relays, or other hacks required)
- Provide remote control of a server over a modem connection
- Make emergency remote management possible from a variety of
simple devices (e.g. PDAs)
If you would like to know more, then
this document from the 2003 Linux Symposium provides more
detail. IPMI is a large standard, with a slight whiff of
committee
about it, so I'm just going to consider what I think are the most
useful bits, and the implementation which the SE7501WV2 makes (this
is supposed to be about a single type of server, after all).
Note that IPMI seems to have more than its fair share of
TLAs.
How IPMI works, and jargon
It is useful to know a bit about how IPMI does its stuff - so I'll
give an overview, and try to bust some weird IPMI/Intel
jargon. There is a second autonomous computer on the
motherboard (or baseboard, in IPMI's politically correct /
obfusicated language), this is a very simple, low power-consumption
device, which should operate as long as power is connected to the
machine (including when the majority of the server is powered down)
- in IPMI speak, this computer is called the BMC - the Baseboard
Management Controller - it uses its own firmware, which is
independent of the system BIOS. On the SE7501WV2, and
particularly on the SR2300 the BMC is connected to:
- The power (at all times)
- The main PC via something that looks like a keyboard controller
to the PC - the KCS interface
- All of the hardware sensors on the motherboard via its i2c bus
(this is why these boards show up no sensors with the normal
lm_sensors drivers, whilst similar non-IPMI boards do)
- Both of the NICs via a sneaky secondary interface to the NIC
chipsets
- In-line between Serial Port B's RJ45 connector, and the
motherboard Super I/O controller
- The "ID" button, and blue LED
- The power, and reset switch circuitry
- The SCSI backplane, and redundant power supplies
This means that you can talk to the BMC from the server itself
under Linux, or from a remote machine via the network (if
configured). The IPMI standard allows for other interfaces as
well.
Install
Getting the Software
The packages and tools that I used to gain access to IPMI
functionality are:
- Kernel space tools:
-
- The kernel.org 2.4.26 kernel (with the rmap patch, but this shouldn't
make any difference to the IPMI side of things)
- OpenIPMI - this
provides local machine communication with the BMC - I had a crash
with the kcs driver included in 2.4.26, so I updated to v30, there
are now newer
versions available. If you didn't want to (or couldn't)
use this, you could use the Intel bootable CD that comes with the
board to set up IPMI for LAN access only, and do all of your access
via this, instead - you would only be able to access IPMI from
other machines via the LAN, in that case.
- i2c v2.8.7, and
lm-sensors v2.8.7 - with a minor patch to get it to work with
newer versions of OpenIPMI - not needed if you do not want
lm_sensors integration (not required to check sensor values, but
probably good if you want to use the various other user-land
utilities which have been written to the lm_sensors
interface).
- If you like, you can then add the patch to fix RTS/CTS serial
console support to the kernel - I was unable to find an up to date
version of the patch, but it could probably be manually patched in
using the version for 2.4.22 (this is left as an exercise for the
reader) - without this patch, you face either:
- Risking losing serial data from the console output, if you have
not configured it for RTS/CTS hardware serial flow control - this
happens if the BMC cannot send serial data over the LAN quickly
enough, and fills up its buffers, thus dropping data.
- Having the kernel block (very bad - especially during a reboot)
whilst timing out sending kernel output to the serial port, if the
serial over LAN session is down
- User space tools:
-
- ipmitool - a
reasonable command line utility to interact with the BMC from a
Linux box, it support both LAN, and OpenIPMI interfaces - not
currently part of Debian, but includes the necessary files to build
a Debian package in the tar ball (untar, chdir to the ipmitool
top-level directory and run "dpkg-buildpackage")
- The Intel DPC (Direct Platform Control) command line utilities
(dpccli and dpcproxy) for Linux - these can be downloaded from the
Intel website, under the support/downloads
for the SE7501WV2 as part of the ISM (Intel Server Management)
suite. The downloaded ism*.exe is a self extracting zip file,
and can be extracted on Debian using the unzip command. The
alien tool can be used to convert the Redhat8.0 rpm -
Software/linux/cli/8.0/CLI-2.0-1.i386.rpm to a .deb
Setup
Setting up IPMI
ipmitool + OpenIPMI
I will assume here that you will want local access to the BMC from
Linux, using the OpenIPMI drivers, the advantages of doing this,
over using Intel's bootable CD are:
- Can alter BMC settings (e.g. passwords etc.) from within the
OS
- Can access the BMC from the local machine (if you only use the
LAN interface, this is otherwise a tricky proposition)
- Can make use of features such as the IPMI watchdog driver to
automatically reset the machine on OS failure
- Can be automated over many machines, and carried out remotely
(e.g. using ssh)
The disadvantages are:
- You may need to compile your own kernel (the OpenIPMI which
shipped with 2.4.26 didn't seem reliable to me, YMMV)
Once you have what you feel is a suitable kernel installed, you
will want to load the appropriate modules, on my machine these
are:
ipmi_si_drv
ipmi_devintf
The kernel should say something like this (check /var/log/kern.log):
IPMI System Interface driver version v30, KCS
version v30, SMIC version v30, BT version v30
ipmi_si: Found SMBIOS-specified state machine at
I/O address 0xca2
IPMI kcs interface initialized
If you aren't using devfs, ensure that you have an /dev/ipmi0
device for ipmitool to talk to:
# mknod -m 0600 /dev/ipmi0 c 254 0
Note, that as far as I know, the IPMI device is most likely
to end up at device major number 254, but that it will take devices
from the 240-254 block, which according to
linux/Documentation/devices.txt is "Allocated for
local/experimental use. For devices not assigned official numbers,
these ranges should be used in order to avoid conflicting with
future assignments." I believe that, this is because device numbers
are no longer being "officially" assigned, in preparation for the
introduction of fully dynamic device number allocation. So, if you
have another driver which uses character device numbers from this
block, and this other driver gets there first, then ipmi will end
up at c 253 0 or lower...
To check where your ipmidev has ended up,
# cat /proc/devices
Testing ipmitool + OpenIPMI
You should now be able to speak to the BMC using ipmitool locally,
e.g.
# ipmitool -I open chassis status
System
Power :
on
Power Overload
: false
Power Interlock :
inactive
Main Power Fault :
false
Power Control Fault : false
Power Restore Policy : previous
Last Power Event :
ac-failed
Chassis Intrusion :
inactive
Front-Panel Lockout :
inactive
Drive
Fault :
false
Cooling/Fan Fault :
false
If this works, you may want to try out the following, and/or have a
look through the manual page to see what else you can do -
# ipmitool -I open sdr list
# ipmitool -I open sel list
# ipmitool -I open chassis identify 1
# ipmitool -I open chassis identify 0
ipmitool + IPMI over LAN
As shipped, the BMC in the WV2 boards doesn't listen on the LAN
interfaces - it must be configured to do so, there are at least two
ways of doing this, the ones I know about are:
- The bootable CD that ships with the Intel motherboards - I
haven't used this, but there is documentation in Intel's Platform
Guide document
- Using ipmitool, with the OpenIPMI driver to set up LAN access
on the local machine
-
- The /usr/bin/bmcautoconf.sh script which comes with ipmitool
will automate the majority of the setup - this currently needs some
editing to select the correct ethernet interface - also note that
unpatched versions of this script will need to be altered on Debian
to use gawk instead of awk (you may need to install gawk), and a
few binary paths may be wrong
- Unless you are feeling very trusting, you will need to set up
passworded access - IPMI includes the concept of multiple users,
and privilege levels, but I have not looked into this
closely. To setup up a single password which gives full
access, (this is partially setup by the bmcautoconf.sh script), run
these commands to set the password for both of the SE7501WV2's
interfaces:
-
- ipmitool -I open lan set 6 password <your
password here>
- ipmitool -I open lan set 7 password
<your password here>
Testing ipmitool + IPMI over LAN
You will now want to test IPMI over LAN, two things are worth
pointing out at this stage:
- You will need to do this from another machine (due to the way
that the BMC conspires with the NIC to intercept packets - if you
try to send the packets from the local machine, Linux will deliver
the packets locally, without touching the NIC, so the BMC doesn't
get a chance to steal the packets)
- Any local firewall on the target machine will not need altering
(for the same reason as above, if the BMC is configured correctly,
the Linux kernel on the target machine doesn't get to see the
packets at all)
Install ipmitool on another machine (this machine needn't
have OpenIPMI, or an IPMI equipped board), then, from that machines
try the following:
$ IPMI_PASSWORD=<your password here>
ipmitool -I lan -H <target hostname, or IP address> -E
chassis status
This should give you similar output to running the command
locally. Now for something more interesting (well, I think it
is more interesting, anyway), shutdown the target machine (e.g.
# shutdown -h now), you should still be
able to run the previous command, except amongst the output you
should see:
System
Power : off
Now you can do this:
$ IPMI_PASSWORD=<your password here>
ipmitool -I lan -H <target hostname, or IP address> -E
chassis power on
The machine should power up, and boot
automatically. You can also use "power
reset" to affect a remote hardware reset of the target
machine.
Setting up Serial Over LAN (SOL)
If you like, you can set up the BMC on these Intel boards to
talk to the main PC via its serial port B (the rj45 port), and
relay the input, and output to another machine, over the LAN
interface.
The IPMI v2.0 specification specifies RTS/CTS flow control
should be used between the main PC, and the BMC during serial
communication - this is necessary because the BMC needs a way to
tell the main PC to stop sending it data, in the case that it is
about to run out of buffer space - because it hasn't been able to
send data to the SOL client quickly enough. As previously
mentioned, the Linux kernel, currently has a problem with RTS/CTS
on serial consoles (although serial log-ins are unaffected, since
the console output seems to be independant of the settings that the
getty sets on the same port).
At the time of writing, the BMC code on the SE7501WV2 implements
IPMI v1.5 - and IPMI v1.5 does not define SOL support, so the SOL
implementation on these boards is proprietary, and Intel is not
currently releasing details except under an NDA for some strange
reason(boo, hiss). The SOL support in IPMI v2.0 is believed
to be based on the Intel implementation, so it may be possible to
reverse engineer the implementation, but in the mean time you must
use Intel's "dpccli / dpcproxy" programs to use the SOL
functionality, unfortunately at the time of writing, these:
- Are closed source
- Are only available in an RPM, as part of a large download
- Are flaky (they seem to crash on me quite frequently)
- Have a sucky user interface - which also makes them difficult
to script
- The dpccli program munges the enter key, so that it is unusable
with serial BIOSs' - you must use the telnet interface instead
But they are all that can be used at the moment, and they work well
enough to be useful.
A Quick Note About Security
Although the dpcproxy program must be spoken to using telnet (if
you need any security at all on the LAN, then I recommend making
the dpcproxy bind to the loopback interface only, and ssh <dpcproxyserver> telnet localhost 623), the
SOL session itself (between dpcproxy, and the BMC) is encrypted by
default (although Intel gives no details of the encryption), so
passwords typed over the SOL session are (probably) not
trivially interceptable.
IPMI includes some security, but I have not looked into what
strengh of security ipmitool is able to use with the SE7501 boards.
Any answers welcome..
The Recipe
The basic setup procedure is as follows..
On the machine(s) from which you will manage the other
servers:
- Download the Intel Server Management suite from http://support.intel.com/support/motherboards/server/SE7501WV2/
- I used v5.5.7 - a version should have shipped with the
motherboard, you could try using this instead if you like
- Use the unzip program (apt-get install
unzip) to extract the archive (which is a Windows
self-extracting zip file) e.g. unzip
/path/to/ism557_build2.exe 'Software/linux/*'
- Use alien to convert the Redhat8 binary rpm to a .deb -
# alien -d
Software/linux/cli/8.0/CLI-2.0-1.i386.rpm
- Install the .deb - dpkg -i
cli_2.0-2_i386.deb
- # ln -s /usr/local/cli/dpccli
/usr/local/bin
- # mv /etc/rc.d/init.d/cliservice /etc/init.d/
- Optionally edit the init script to start dpcproxy with the '-L'
switch to bind to the loop back address (127.0.0.1) only
- # /etc/init.d/cliservice start
On the target machine(s), you will need to set up serial console
support - this is covered in detail in http://www.tldp.org/HOWTO/Remote-Serial-Console-HOWTO/
but I will include a brief recipe here:
- In /etc/inittab, put a line like this "T1:23:respawn:/sbin/getty -h -L ttyS1 19200 vt100"
- # killall -HUP init
- Skip forward to the testing section now if you like, then come
back to complete the setup once you are satisfied that it is
working
- Setup the BIOS for console redirection - 19200 baud, 8n1,
RTS/CTS hardware flow control
- Set up the Linux kernel for serial console operation, e.g. in
/boot/grub/menu.lst "kernel /boot/vmlinuz root=/dev/md2 ro console=tty0
console=ttyS1,19200n8"
- If and only if you have patched the
kernel for better RTS/CTS support, make it: "kernel /boot/vmlinuz root=/dev/md2 ro console=tty0
console=ttyS1,19200n8r" - if you haven't patched the kernel,
then it will block whilst it times out on each character of serial
output when the SOL session is not running - this is bad - if you don't use hardware flow control, by using
the 'r' flag, then you will lose some kernel output - this is also
bad, but nowhere near as bad.
- On these machines, do not set up your
boot loader (e.g. grub) for serial output - as the BIOS redirects
the boot loader's video console output, and they will tread on each
other's toes - once the linux kernel is loaded, and enters
protected mode, the BIOS doesn't get a look in, so the kernel
output is OK.
- Ensure ttyS1 is mentioned in /etc/securetty - otherwise you will not be able to log
in as root on the serial console (this should be in there by
default on Debian)
Enabling and Testing SOL
If you like, you can test the above configuration, using a real
serial null-modem cable, and a terminal program such as "minicom",
or "gkermit", in either case, you should then do the
following:
On the management machine (you could also do the first step on the
target machine, using the OpenIPMI interface)
# IPMI_PASSWORD=<your password here>
ipmitool -I lan -H <target hostname, or IP address> -E
chassis sol setup
Note that the ipmitool "sol" command is likely to be renamed when
IPMI v2.0 sol support is added to the program. I have read
documentation which hints that it might be possible to set SOL
support to "always on" using the Intel bootable CD - using the
ipmitool "sol setup" command, the SOL session can be interupted by
some actions (e.g. local keyboard activity, as simulated by some
kvms), but I have not verified this.
# telnet localhost
623
Trying 127.0.0.1...
Connected to
localhost.localdomain.
Escape character is '^]'.
Server: <your server name>
Username:
Password: *********
Login successful
dpccli> console
myservername login:
You should then be able to log in as root, and reboot the machine,
following the entire boot process on the serial console. To
get out of the SOL session, you need to send the sequence "~." to
the dpcproxy program. This clashes nicely with the ssh escape
sequence ("<cr>~." tells ssh to
terminate an ssh session - so that you may need to remember to type
"~~." if you are typing the sequence after
a new line), and also means that you cannot type the tilde
character on the console (at least I have been unable to figure out
how to, and there is no man page). Nice one Intel.
Other defects include the fact that you cannot send a serial break
(for sysrq), and that it quite often seems to exit, and get
confused when there is a lot of I/O going on.
Using the Serial BIOS
The serial BIOS interface is a bit brain damaged in that it does
not recognise the "F11", and "F12" key escape codes that most
terminal programs send, instead you can send "Esc-!", and "Esc-@"
(yes very logical, as long as the '@' key is normally typed using
'Shift-2' - as on US keyboards, not miles away from the '2' key, as
on many non-US keyboards). These escapes from HP, and Dell
serial BIOS' may or may not be useful:
Defined As
F1 F2
F3 F4
F5 F6
F7 F8
F9 F10
F11 F12
Keyboard Entry <ESC>1 <ESC>2
<ESC>3 <ESC>4 <ESC>5 <ESC>6 <ESC>7
<ESC>8 <ESC>9 <ESC>0 <ESC>!
<ESC>@
Defined As
Home End Insert Delete PageUp
PageDn
Keyboard Entry <ESC>h <ESC>k
<ESC>+ <ESC>- <ESC>?
<ESC>/
Use the <ESC><Ctrl><M> key sequence for <Ctrl><M>
Use the <ESC><Ctrl><H> key sequence for <Ctrl><H>
Use the <ESC><Ctrl><I> key sequence for <Ctrl><I>
Use the <ESC><Ctrl><J> key sequence for <Ctrl><J>
Use the <ESC><X><X> key sequence for <Alt><x>, where x is any letter key, and X is the upper case of that key
Adding Remote Logging of the Serial Output
Interactive use of the SOL consoles is very useful, but the
addition of unattended logging of the output is even more useful -
it can be used to catch kernel panics, and pre-cleaned kernel
output, in the case that a box is compromised, two pick to
examples.
The conserver program
can carry out this kind of logging for other types of serial
consoles, and with the addition of the "solsession" expect script
from the IPMI_on_Debian_files/
directory - it can be made to speak to the Intel dpcproxy program.
An example conserver configuration is also provided.
Conserver will also automatically restart dead dpcproxy
connections, and continue to log output during interactive use
(unless you tell it not to).
Setting up the IPMI watchdog
In this context, a watchdog is a device which has the job of
reseting a computer system, if it thinks the software running on
the system has hung, or is otherwide not operating as it should. A
watch dog is often implemented as follows.. The software running on
the computer system must carry out a regular task (such as writing
a character to a device file, on Linux) in order to reassure the
watchdog that everything is as it should be - if the software fails
to carry out the task, the watchdog assumes that the computer has
hung, and will reset it.
A watchdog is usually implemented so that it is likely to survive
problems that might otherwise take out the operating system (a
partial exception to this is the Linux "softdog" watchdog module,
which runs in the kernel - it is still useful if the kernel is
partially, but not entirely knackered). The IPMI standard includes
a watchdog, and the OpenIPMI Linux drivers include a module which
provides an implementation the Linux watchdog interface, which is
backed by the IPMI BMC - such that it will reset (reset is the
default behaviour) the computer if the watchdog device is not
attended to in a timely fashion.
A simple recipe for setting up the IPMI watchdog on Debian/Sarge is
presented below:
- Ensure that the kernel ipmi_watchdog module has been
built.
- # echo '# Watchdog' >>
/etc/modutils/aliases
- # echo 'alias char-major-10-130 ipmi_watchdog'
>> /etc/modutils/aliases
- # echo 'options ipmi_watchdog timeout=40' >>
/etc/modutils/watchdog
- # update-modules
- # apt-get install watchdog
- # echo 'watchdog-device = /dev/watchdog' >
/etc/watchdog.conf
- # /etc/init.d/watchdog start
- # tail /var/log/kern.log
- # tail /var/log/daemon.log
Note that the default time-out for the ipmi_watchdog module is 10
seconds, this is also the default write interval for the watchdog
daemon which is included in the Debian "watchdog" package, so make
sure you change one of them, otherwise there will be no margin for
error at all (in the example, I've upped the kernel module
time-out).
TODO
The IPMI system event log will currently (AFAIK) fill up, and stop
being appended to (or maybe old entries will be nuked, I am not
sure, as it hasn't happened yet..). Ipmitool will allow you to
clear the log, so the cron jobs should probably be modified to do
this, and archive old entries under /var/log (so that e.g.
logrotate cat take care of them). Contributions welcome!
Credits
The original version of this document was written by Tim Small for
a client of WPAD Ltd. - Semantico Ltd. Semantico support the
publishing this document as part of their backing of open source
software. Thankyou, Semantico.
Thanks also to:
- Erwan Velu at Mandrakesoft, for an email which prompted the
movable character device major number note.
Feedback
Please send feedback, patches (e.g. remote modem support), etc. to
tim@buttersideup.com