Pixie Chronicles: Part 3 PXE boot
The Story So Far
In Part 1 I outlined my plans: to build a server using network install. However, I got sidetracked by problems. In Part 2 I made some progress and dealt with one of the problems. Now I'm going to detail what I did when I got it right.
The task is to install Fedora 10 on a machine (the target machine) which will become a web and email server. I'm not going to use CDs; I'm going to install from a network. So, somewhere on the network, I need a machine to supply all the information normally supplied by CDs or a DVD.
How booting from a disk works
When a machine is (re)started, the BIOS takes control. Typically, a user configures the BIOS to try various devices in order (DVD/CD, Floppy, HDD). If there is no removable disk in any drive, the boot proceeds from the hard drive. The BIOS typically reads the first block from the hard drive into memory and executes the code it finds (typically GRUB). This leads to more reading and more executing.
How PXE works
When a machine using PXE is (re)started, it behaves like a DHCP client. It broadcasts a request on a NIC using the MAC address (aka Ethernet address) of the NIC as an identifier.
If the machine is recognised, the DHCP server sends a reply. For PXE, the reply contains the IP address and a file name (the PXE boot Linux kernel). The PXE client uses tftp to download the specified file into memory and execute it. This leads to more reading and more executing.
Get the software for the PXE server
You'll need tftp-server, DHCP, syslinux.
I was already running dnsmasq as a DHCP server, so I didn't need to install anything. There are many packages which can act as DHCP servers. If you are already running DHCP software, you should be able to use that.
I didn't have tftp. Getting it on a Fedora machine is just:
yum install tftp-server
PXELINUX is part of the syslinux RPM (which came as part of my server's Fedora distribution), but it's also available on the installation images.
Since I was already running dnsmasq as a DHCP server, I just needed to add a couple of lines:
These lines say: if the host with MAC address 00:D0:B7:4E:31:1B asks, tell him his hostname is b2, he should use IP address 192.168.0.60 and tell him to boot the file pxelinux.0 (by implication, using tftp).
NB I'm using a non-routable IP address from the subnet 192.168.0/24 during the build of the server. Later, I will configure different IP addresses in preparation for using this server on a different subnet.
NB Although I can specify a hostname here, in this case it acts merely as documentation because the install process will require that the name be specified again.
In other words, at this stage, these are interim values which may or may not be the same as final values.
Having made changes to the config file, I restart the DHCP server:
By default, tftp, which runs from xinetd, is turned off. Edit /etc/xinetd.d/tftp
< disable = yes --- > disable = no
If you examine /etc/xinetd.d/tftp, you will see that the default configuration wants to serve /tftpboot. I don't like to have such directories on my root partition, so I created /tftpboot as a symlink pointing to the directory which contained the requisite data.
mkdir /Big/PXEBootServer/tftpboot ln -s /Big/PXEBootServer/tftpboot /
Note that the use of a symlink keeps the path short. There are limits; and it means there is less to type.
Either I wasn't thinking too clearly, or perhaps I was concerned that I might not be able to get the net install to work, because I downloaded 6 CD images instead of a single DVD. Knowing what I know now, if I had to do it over, I would download just the DVD image. (The old machine on which I want to install Fedora 10 only has a CD drive, not DVD.)
One of the CD images which comes with Fedora 10 is called Fedora-10-i386-netinst.iso; it has a copy of pxelinux.0 and some other needed files.
mount -o loop -r /Big/downloads/Fedora-10-i386-netinst.iso /mnt2 cp /mnt2/isolinux/vmlinuz /mnt2/isolinux/initrd.img /tftpboot cp /usr/lib/syslinux/pxelinux.0 /tftpboot
(Note: I could also have typed
cp /usr/lib/syslinux/pxelinux.0 /Big/PXEBootServer/tftpboot
but using the symlink saves typing.)
What do we have so far?
ls -lA /tftpboot/. total 19316 -rw-r--r-- 1 root root 17159894 Nov 20 11:50 initrd.img -rw-r--r-- 1 root root 13100 Feb 9 2006 pxelinux.0 drwxrwxr-x 3 root staff 4096 Nov 27 15:42 pxelinux.cfg -rwxr-xr-x 1 root root 2567024 Nov 20 11:50 vmlinuz
Most of the documentation says to create (touch) a whole stack of files, but I prefer to make just two: one called default as a backstop, and one with a name like 01-00-d0-b7-4e-31-1b. What is this?
Get the MAC address of the NIC to be used. This is nowhere near as easy as it sounds. My PC has 2 NICs. Which is the one to use? It's probably the one that corresponds to eth0. I discovered which one was eth0 by first booting into the Knoppix CD - a very good idea for all sorts of reasons (see Part 2). Once booted into Knoppix, ifconfig eth0 will give the MAC address. In my case it was 00:40:05:58:81:2F.
Now convert the colons to hyphens and upper- to lower-case:
echo 00:D0:B7:4E:31:1B | tr ':A-Z' '-a-z' 00-d0-b7-4e-31-1b
The required filename is this string preceded by 01- ie 01-00-d0-b7-4e-31-1b.
- NB Many documents disagree with me. They indicate that the file name should be in upper-case. Possibly different PXE clients operate in different ways, but my PXE client explicitly looks for lower-case.
- The 01 represents "ARP type 1" ie Ethernet.
- See /usr/share/doc/syslinux-3.10/pxelinux.doc.
I chose to be very specific. I figured if the net install is successful, I can use the same technique for other target machines. To achieve that, each machine would be distinguished by its MAC address. Consequently, I did not think it was a good idea to use default for a specific machine. Rather, I configured default to contain generic parameters to handle the possible case of an as yet unconfigured machine performing a PXE boot. default would also handle the case where I got the MAC address wrong (for example while I was trying to figure if the other file should be in upper-case or lower-case). Of course, the PXE boot would behave incorrectly, but at least it would show that there was some life in the system.
Here are the contents of the /tftpboot/pxelinux.cfg/default:
prompt 1 default linux timeout 100 label linux kernel vmlinuz append initrd=initrd.img ramdisk_size=9216 noapic acpi=off
Readers familiar with GRUB or LILO will recognise this as a boot config file.
default is a generic config file. I could use it for the installation, but I'd have to type a lot of extra args at boot: time, for example
boot: linux ks=nfs:192.168.0.3:/NFS/b2
To save typing and to provide me with documentation and an audit trail, I've tailored a config file specifically for my current server. And how better to tie it to the target machine than by the target machine's MAC address? (In the absence of user intervention, MAC addresses are unique.)
prompt 1 default lhd timeout 100 display help.txt f1 help.txt f2 help.txt f3 b2.cfg # Boot from local disk (lhd = local hard drive) label lhd localboot 0 label linux kernel vmlinuz append initrd=initrd.img ramdisk_size=9216 noapic acpi=off label b2 kernel vmlinuz append initrd=initrd.img ramdisk_size=9216 noapic acpi=off ksdevice=eth0 ks=nfs:192.168.0.3:/NFS/b2
NB It may look ugly, but the append line must be a single line. The kernel cannot handle any mechanism which attempts to split the text over more than one line.
See /usr/share/doc/syslinux-3.10/syslinux.doc for info on the items in the config file.
Please note that this is vastly different from the first file I used (the one that got me into trouble). By this stage, the config file has been refined to within an inch of its life.
Here are the little wrinkles that make this config file so much better:
default lhd tells the PXE client to use the entries under the label lhd by default ksdevice=eth0 forces the install to use eth0 and avoids having anaconda (the Linux installer) ask the user which interface to use ks=nfs:192.168.0.3:/NFS/b2 tells anaconda where to find the kickstart file
When the target machine is started, the PXE client will eventually come to a prompt which says
If the user presses Enter, the machine will attempt to boot from the local hard drive. If the user does not respond within 10 seconds (timeout 100 has units of 1/10 of a second), the boot will continue with the default label (lhd). Alternatively, the user can enter linux (with or without arguments) for different behaviour. In truth, I used this label earlier in the development of this config file; I would lose little if I now removed this label.
If the user presses function-key 1 or function-key 2 , help is displayed. If the user presses function-key 3, the actual boot config file is displayed (on the PXE server b2.cfg is just a symlink pointing to /tftpboot/pxelinux.cfg/01-00-d0-b7-4e-31-1b). I added the f3 entry largely for debugging and understanding - there's nothing the user can do but choose menu options which are displayed by help.txt.
This completes this part of the exercise. This is the point at which you can try out the PXE process.
Try it out
Machines differ and I daresay PXEs differ. I will describe how my machine behaves and what to expect.
When turned on, my machine, a Compaq Deskpro EP/SB Series, announces its own Ethernet (aka MAC) address and that it is trying DHCP. The MAC address indicates which NIC is being used. This machine will present this MAC address to the DHCP server, which hopefully will respond with the offer of an IP address. It is also used at PXE boot time when searching the TFTP directory for its config file.
If the DHCP server responds, then a series of dots appear as my machine downloads the PXE boot Linux kernel. When the kernel is given control it announces itself with the prompt boot:.
If you get to this point, then the PXE server has been configured correctly, and the client and server are working harmoniously. In Part 4 I'll discuss the rest of the install.
It didn't work for me!
You might recall that in Part 1 Lessons from Mistakes I discussed this subject. I will also confess that the first few times I tried to get things going, I was not successful.
Here's what to do if things don't behave as expected.
This terminology is misleading. As much as you might like things to work first time, if you're human, chances are they don't. Consequently, you should expect things to not work. (You can feel that you are batting better than average when they do work.)
Here's what to do when things go wrong. First, the target machine is in a very primitive state so it is not likely to be of much help. Since the process under investigation involves a dialogue between 2 machines, you would like to monitor the conversation (A said U1, then B said U2, then A said U3, ... ). Who was the last to speak? What did he say? What did I expect him to say?
Ideally, you would have a third machine on the same subnet as the 2 machines (the target machine and the PXE server). But that's unlikely, and adds very little. Clearly, the place to monitor the conversation is on the PXE server. Use tcpdump and tshark and/or ethereal).
As my notes from the exercise say,
This gave me the opportunity to try and try, again and again, turning on tcpdump and tshark, and fiddling and fiddling until about the 15th time I got it right.
Here's the expected conversation:
PXE client sends broadcast packet with its own MAC address PXE server sends BOOTP/DHCP reply
I see the above repeated another 3 times. Then,
PXE client arp enquiry PXE server arp reply
The arp enquiry comes from an IP address (earlier packets have an IP address of 0.0.0.0), and confirms that the DHCP server has sent the client an IP address and that the client has used the value specified.
PXE client sends a RRQ (tftp read request) specifying pxelinux.0 PXE server sends the file
There should be many packets, large ones from the server (containing the file data); small (length 4) acks from the client.
PXE client sends a RRQ specifying pxelinux.cfg/01-00-d0-b7-4e-31-1b
This is where you find out what exactly your PXE client is asking for.
I guess there are zillions of other ways that things could go wrong, but the above covers pretty much everything I can think of.
Here's a complete conversation:
12:03:06.208042 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:d0:b7:4e:31:1b, length: 548 12:03:06.243002 IP 192.168.0.3.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length: 300 12:03:07.192231 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:d0:b7:4e:31:1b, length: 548 12:03:07.192641 IP 192.168.0.3.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length: 300 12:03:09.169864 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:d0:b7:4e:31:1b, length: 548 12:03:09.170287 IP 192.168.0.3.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length: 300 12:03:13.125134 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:d0:b7:4e:31:1b, length: 548 12:03:13.142644 IP 192.168.0.3.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length: 300 12:03:13.143673 arp who-has 192.168.0.3 tell 192.168.0.60 12:03:13.143716 arp reply 192.168.0.3 is-at 00:01:6c:31:ec:77 12:03:13.143912 IP 192.168.0.60.2070 > 192.168.0.3.69: 27 RRQ "pxelinux.0" octet tsize 0 12:03:13.261195 IP 192.168.0.3.34558 > 192.168.0.60.2070: UDP, length 14 12:03:13.261391 IP 192.168.0.60.2070 > 192.168.0.3.34558: UDP, length 17 12:03:13.261601 IP 192.168.0.60.2071 > 192.168.0.3.69: 32 RRQ "pxelinux.0" octet blksize 1456 12:03:13.284171 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 15 12:03:13.284367 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.309059 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460 12:03:13.310453 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.333326 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460 12:03:13.334717 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.358162 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460 12:03:13.359557 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.381382 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460 12:03:13.382769 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.405583 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460 12:03:13.406981 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.430128 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460 12:03:13.431518 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.453394 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460 12:03:13.454782 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.478026 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460 12:03:13.479420 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.503527 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1456 12:03:13.504917 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4 12:03:13.517760 IP 192.168.0.60.57089 > 192.168.0.3.69: 63 RRQ "pxelinux.cfg/01-00-d0-b7-4e-31-1b" octet tsize 0 blks 12:03:13.544383 IP 192.168.0.3.34558 > 192.168.0.60.57089: UDP, length 25 12:03:13.544592 IP 192.168.0.60.57089 > 192.168.0.3.34558: UDP, length 4 12:03:13.584227 IP 192.168.0.3.34558 > 192.168.0.60.57089: UDP, length 373 12:03:13.584728 IP 192.168.0.60.57089 > 192.168.0.3.34558: UDP, length 4 12:03:13.584802 IP 192.168.0.60.57090 > 192.168.0.3.69: 38 RRQ "help.txt" octet tsize 0 blksize 1440 12:03:13.615601 IP 192.168.0.3.34559 > 192.168.0.60.57090: UDP, length 25 12:03:13.615803 IP 192.168.0.60.57090 > 192.168.0.3.34559: UDP, length 4 12:03:13.639528 IP 192.168.0.3.34559 > 192.168.0.60.57090: UDP, length 368 12:03:13.640023 IP 192.168.0.60.57090 > 192.168.0.3.34559: UDP, length 4 12:03:18.259021 arp who-has 192.168.0.60 tell 192.168.0.3 12:03:18.259210 arp reply 192.168.0.60 is-at 00:d0:b7:4e:31:1b
The default is to boot from the local hard drive. You can enter one of: lhd (default) boot from local hard drive linux (legacy) normal (ie manual) install b2 (hmg addition) kickstart install for b2 ksdevice=eth0 ks=nfs:192.168.0.3:/NFS/b2 f3 display the current boot config file
The final contents of /tftpboot:
ls -lAt /tftpboot/. total 19328 drwxrwxr-x 3 root staff 4096 Dec 21 2008 pxelinux.cfg -rw-rw-r-- 1 root staff 364 Dec 7 2008 help.txt lrwxrwxrwx 1 root staff 33 Dec 7 2008 b2.cfg -> pxelinux.cfg/01-00-d0-b7-4e-31-1b -rw-r--r-- 1 root root 17159894 Nov 20 2008 initrd.img -rwxr-xr-x 1 root root 2567024 Nov 20 2008 vmlinuz -rw-r--r-- 1 root root 13100 Feb 9 2006 pxelinux.0 ls -lAt /tftpboot/pxelinux.cfg/ total 24 -rw-rw-r-- 1 root staff 369 Dec 7 2008 01-00-d0-b7-4e-31-1b -rw-rw-r-- 1 root staff 123 Nov 26 2008 default
Henry has spent his days working with computers, mostly for computer manufacturers or software developers. His early computer experience includes relics such as punch cards, paper tape and mag tape. It is his darkest secret that he has been paid to do the sorts of things he would have paid money to be allowed to do. Just don't tell any of his employers.
He has used Linux as his personal home desktop since the family got its first PC in 1996. Back then, when the family shared the one PC, it was a dual-boot Windows/Slackware setup. Now that each member has his/her own computer, Henry somehow survives in a purely Linux world.
He lives in a suburb of Melbourne, Australia.