"Linux Gazette...making Linux just a little more fun!"

From Word Processors to Super Computers
Donald Becker Speaks about Beowulf at NYLUG

By Stephen Adler

Editor's note: In the original article on Adler's website, many of the inline images display a larger copy of themselves when clicked on. These larger images are not included in the Linux Gazette version, to keep the total size of the Gazette small.

I got an e-mail over the weekend announcing that Donald Becker would be addressing the NYLUG on May 19th. That's the New York Linux users group for those of you out west. From out here on Long Island, NYC is a long way away. But I figured I would rough out the commute into NYC to catch what Donald had to say about his Beowulf project. Actually, if you can keep a secret, I'll admit to having fun writing up my encounters with Internet luminaries like Donald and publishing them on the Internet. This would give me a chance to do so once again, so the long commute into NYC didn't seem so bad.

A rainy day in New York City, and I'm hustling around looking for a parking lot.
Wednesday came flying along, I spent most of the afternoon beating up on an alpha personal workstation 433au, trying to get Linux installed on it. Hey, Red Hat 6.0 was out and since they have a generic kernel which seems to run on all Alpha variants, I figured this should be a snap. Wrong! For some reason, MILO refuses to boot up on the machine. I've been trying off and on to get alpha/Linux installed on this machine since January. It belongs to a professor at Stony Brook who is a real Linux enthusiast, and started down the path of the Linux install, and ran into this MILO problem. I gave it a try, a graduate student from Columbia gave it a try, and we have all failed. The Relativistic Heavy Ion Collider is coming on line soon so we don't have much time to spend on this box. It has become somewhat like King Author's sword. Who ever can pull that sword out of the rock, or install Linux on that machine, will lead a blessed life... Roy, (The professor who owns the Alpha) has now put up a reward for who ever can get Linux installed on the damn thing. The reward right now stands at 2 tickets to see the NY Yanks. (Or Knicks if you are of that persuasion...)

Gucci bags and Rolex watches for sale abound. Where are the damn umbrella sellers!
Time flies when you are having trouble getting Linux installed on something, as it did that Wednesday afternoon. I ended up missing the 4:05pm train into Penn Station and decided to drive in. To my dismay, it would have taken just as long to wait for the next train, as it would have to drive in. Rain poured out of the sky as I topped 20MPH speeds on the Long Island Expressway heading west into Manhattan. I wanted to get to the meeting in time to be able to meet Donald and the rest of the NYLUG members. That was just not going to happen. At this rate, I would be lucky to get to hear him speak at all.

It's 6:20pm and I'm heading up 3rd Ave in search of a parking lot. The meeting starts at 6:30pm. Damn, I'm always running just on time. With little effort, I was able to find a very reasonable parking lot which charged $10 'till closing. It's usually about $25 for a mid town parking lot. I dropped the car off, and dash out in search of the IBM building where the NYLUG was meeting. Rain is coming down, I'm getting soaked, and I'm looking all over the place for those street vendors who always have what you don't need at the time. Fake Rolex watches were up for sale, as were Gucci bags, but no umbrellas. I could feel the rain starting to seep onto my scalp as I ran across Madison, heading north towards 57 St.

IBM, a while back, started to get a clue about the benefits of Open Source/Free software and has now donated one of their meeting rooms for the NYLUG, who meet about once a month. (Rasterman is talking at the next one.) The IBM building stands very tall on the corner of 57th and Madison. It boasts some modern work of some sort at its entrance. One needs to sign in, in order to be let into the building. The meeting was being held on the 9th floor.

I arrive to the meeting room where NYLUG is gathered. A projector is setup with Donald's laptop plugged into it. There are about 30 or 40 people present. Jim Gleason, the organizer of the meeting, who works for VA Research, is there talking with Donald, looking rather busy. He sees me and introduces me to Donald. I have just drove in through about 2.5 hours of LIE traffic, dashed across several streets and avenues under the rain, and my bladder had been screaming at me since exit 40 on the LIE that it needed to be relieved. I couldn't concentrate much on what I was saying at the time. I shook hands with Donald, and muttered something like, "We use lots of computers at BNL". I don't remember how he responded, I think he didn't say anything. I then managed to get myself away, find a seat, store my laptop and look for a good place to take a photo of the room.

A shot of the NYLUG meeting room, courtesy of IBM. By the time Donald's talk was well underway, there was basically standing room only.

Jim Gleason took the mike, and called on people to sit down. He wanted to get the meeting going on time, (it was getting close to 7pm by now). I settled down into my seat, booted my laptop, and proceeded to ignore my aching bladder. I had more important business to take care at the time.

A solemn moment for Jim Gleason, the VA Research guy who is one of the NYLUG contacts and organizers. Actually, the shot was taken as he happened to look down at his notes. The only time he did so during his introduction. Murphy's law is at work here. Jim is a very energetic guy who is excited about his work.
At this point, I started to take notes, as Donald start talking. Since my notes are always rather jumbled, it will be easer to me to cover in broad stokes the topics he talked about instead of trying to give a word by word reproduction of what he said.

His introductory slide showed two things. His affiliation with the NASA Goddard Space Center and a company called Scyld Computing Corporation. My guess is that he has been given the liberty at NASA to work with this Scyld startup to help bring the Beowulf into the private sector. Good for him. At this point, something rather annoying started to happen. The projector which was hooked up to Donald's laptop, started to lose sync with it. Donald, who has a bit of shyness to himself, was having a hard time giving his talk while at the same time, pressing various auto sync buttons on the projector to try and get his transparencies back up on the screen. This went on through his talk. It really didn't matter since he didn't bother to walk through his slides, rather he just talked from the top of his head about what he as been doing for that past 7 years.

Donald's talk went on until 8:20pm. During that time I got the following out of his talk.

A bad picture of Donald at the beginning of his talk. It looks like my camera is getting out of sync with the fabric of space-time. (One of these day's I'm going to upgrade my analog camera to a digital one. But with the increase in property taxes out on Long Island and the small salary a "junior" scientist makes at BNL, it will be some time be for I do so.)
He introduced the concept of a Beowulf system. Basically, it is a cluster of many, off-the-shelf PC's, running Linux, and tied together through a high speed, low latency networking infrastructure. The network topology of this system tends to be a flat one which makes it easier on the application side. Fast Ethernet, tied through a fast Ethernet switch is the current network hardware of choice for a Beowulf cluster. ATM is too expensive at this point and I believe he mentioned that the latency tends to be greater than with fast Ethernet. (But don't hold me to that statement.) He did mention that the ATM "overhead" was way too large. After the talk was over, one of the questions from someone in the audience revealed that Beowulf is basically a library of software which one uses to help implement a distributed application. This includes facilities such as providing a global PID, methods of remote execution of processes, much like rsh, etc. There was some mention of mpi/vpm (and mpiII) which are parallel processing abstractions sitting above the Beowulf distributed processing layer. One of the tasks on my list is to clearly learn about this software, but unfortunately, Donald's talk was not a HOWTO on using Beowulf to parallelize your application. It was more like, "I've worked on Beowulf, and here are some interesting things about it...". So, the specifics of Beowulf still elude me.

Donald talked a bit about the open source nature of the project. In short, being an open source project was crucial in making it as reliable as it is. This also holds for the Linux kernel itself. While working on building Beowulf clusters, Donald ran across some problems with the Linux kernel which he had to fix. Things like only being able to mount 64 file systems got in his way. Having hundreds of PC's talking to each other on the network stressed the networking data structures in the Kernel which he also had to deal with. Being that he had the source code to the kernel he was able to make the Beowulf project work. He also took in contributions from outsiders. If the contributed software was of relevance and worked well, he would include it.

The side of the IBM building, as I face Madison Ave. Thank you IBM for letting the NYLUG use your meeting rooms so that we can hear Donald speak. Although it would be nice if you guys got a Linux friendly projector. Its OK if the projector is not Y2K certified. We'll take it any way.
Donald spoke a bit about the history of his project. His first cluster was made up of 100MHz DX4Somethings (DX486?). (Due to the projector not being able to sync properly to Donald's PC, I could only read part of the slides. You have to give credit to the IBM folk though. The projector was certified as being Y2K compliant. It had a rather official looking sticker on its side saying so...) In 1996, a 2.2 GF/sec cluster was built, followed by a 10GF/sec system in 1997. This was a threshold crossing system. NASA considered 10GF/sec to be the minimum computing power for a system to be called a "super computer". In 1998, a 40+GF/sec system was put together, (at Los Alamos National Laboratory I believe.) What made all this possible was the fact that price per performance was gaining rather rapidly for PC based machines. The threshold was crossed between 1996/1997 making the Beowulf type system competitive with the big Cray type systems. The Beowulf project crossed another watershed when a Beowulf system won the Gordon Bell prize for $/performance. (I believe this was around 1997.) The NASA Goddard Space Center at the time had a "Super Computer" in its basement, called the T3D I believe. It was a 10GF/sec machine. Donald was able, through open source software, a good network and cheap PC's, in essence beat it.

Donald spent some time showing pictures of current Beowulf clusters in operation. Some were rack mounted systems, some were bunches of PC's on a shelfs. The PC's on shelfs Beowulf system is called LOBOS which stands for Lots of Boxes on Shelves. One of the systems built in the 19 inch racks was called the hive due to the noise the large cabinet fans made.

The art work standing at the entrance to the IBM building. Unfortunately, I can't tell the difference between this and a bunch of steel beams welded together.
Some applications which are currently using Beowulf systems are climate modeling, ray tracing and galaxy evolution modeling. He was particularly intrigued with the galaxy evolution modeling application. In order to model a galaxy, you need to have every star in the galaxy, interact with every other star in the galaxy. Gravity's force is felt at infinite distances. One would think that this kind of find grained application would not work well on a network distributed system. But the guys at Los Alamos, came up with a tree structured algorithm which mapped very well onto a network topology, thus making a Beowulf architecture work for this type of computing problem. NASA uses the Beowulfs for image processing of satellite and Hubble images. The Hubble images had to be refocused because of the "over site" of one of the mirror polishers. One application of satellite image processing is to splice together all the satellite photos taken from various angles and positions of one area on earth, to form one large coherent image.

Some of the specifics about Beowulf clusters he mentioned were the following. Usually one of the nodes is set aside and dedicated to managing the rest of the nodes in the cluster. It's the job distributor. Some very simple techniques are used to keep track of which systems have not crashed. A multicast coming from each machine is received by the controlling node or it pings the rest of the nodes in the cluster. If one of the nodes goes down, the controller quits submitting jobs to it. There are some mechanisms within the Beowulf software for process migration from one node to another. He also talked about how he uses RPM extensively to maintain the software on the nodes. He referred to RPM as a "key technology" enabling the easy maintainability, software wise, of large clusters of PC's. A question came up asking about how he maintains his Beowulf code. He didn't answer the question very well. He didn't really want to answer the question since he did not want to endorse any kind of source code management software like rcs or cvs. But he did stress that RPM was key in order to be able to distribute software to may PC's.

Who's that imposter! (I've gotta' upgrade that damn camera...)
He also talked about the stability of the machines he works with. Most of his systems had been up for over a 100 days. I believe some of the Beowulf clusters had been up for over 200 days. What is important is not that a single machine has been up that long, but that large numbers of machines have been up and running for that amount of time. Because of the long running nature of a Beowulf cluster, one tends not to use the latest and greatest software release of anything. He was using a 2.0.3x version of Linux on his machines. He also pointed out a critical feature of having the source code available for the kernel and all the software which makes up a Beowulf system. If there is a bug found, then one can fix it by modifying a few lines of code. That one module or program gets recompiled and you're off and running again, with a minimum amount of administrative work. If one works with closed source systems, it is often the case that when a similar small bug is found and fixed, a whole cascade of software upgrades result. This is due to the fact that the bug fix will come in the form of a new software release. This release then upgrades your shared libraries. The shared library upgrades then force you to upgrade all your applications and on and on. After which you are then forced into revalidating your whole cluster for production use. Something which can take a long time. Donald mentioned that he validates his systems by running Linux kernel compilations for two days to "burn in" his systems.

Donald also spent some time talking about how one runs a Beowulf cluster and keeps it reliable. This is done by monitoring the hardware for possible future failures. The most common one is due to failing fans. There seems to be a host of applications which monitor system performance, from the temperature of the boxes, to network packet error checking. Keeping an eye on these problems helps keep a Beowulf cluster healthy.

Donald answering questions after his talk. I nice shot of his left back side.
One last thing worth mentioning. With all this talk of running Linux systems for 100's of days on end, a Windows "story" came up. It turns out that there is a bug in the timer software for Windows. It will cause your PC to crash in 49 days. This bug was just recently found and has been around for a long time. Since a Windows system rarely stays up for that long, its has only been until recently that this bug has been found.

One person in the audience asked why Donald used Linux as the kernel for building up his Beowulf project instead of one of the BSD kernels. Donald had an interesting answer to that question. First off, the BSD kernels were not as stable as Linux, back when he started working on his project. He then proceeded to complain that working with the BSD developers was very difficult. They tend to hide the development process thus making it harder to contribute the needed upgrades. (Remember that Donald had to work with the internal data structures of the kernel in order to make his project scale.) He then said that these BSD developers had very large egos. "Their ego's would fill this room" he said, thus implying the difficulty of working with them. He then went on to say that he was quite able to work with Linus. Linus was a laid back guy.

Another shot of Donald left backside. Although I'm starting to work around towards his front. If I'm lucky I may get him looking into the camera.

There were many other interesting questions which were discussed during Donald's talk. You can read my jumbled notes if you care to try and decipher them for more information.

Well, that's as far forward as I could get. Although I did get a nice shot of him and his book which I proudly display at the top of this write up.
The session came to an end about 8:20pm. During his session he plugged his new book about Beowulf clusters titled How to Build a Beowulf. The book was written in collaboration with several of the Beowulf developers and is a compilation of a lot of the tutorials and documentation on the software. It's published by MIT Press and fits in with the other "definitive" references to mpi, mpiII and pvm also published by MIT Press. He said that he makes about 30 cents per book sold and was counting up the number of people in the audience to see if he could buy dinner with the proceeds if every one bought one. One guy in the audience offered him 60 cents for the book he had in his hand, doubling his take home profit. Donald declined the offer.

People got up and started to leave the room after the talk was over. I stuck around to take some pictures of Donald as he talked to some NYLUGers. I eventually was able to get a chance to re-introduce myself to him. I have him my card and invited him out to BNL if he were ever in the area again. (I'm really bad at this sort of thing.) I then asked him if he had trouble getting funding for his first Beowulf system. He told me that he got the Beowulf idea back when he was working for the NSA. He presented the idea to his superiors. He needed $50K to put a cluster of nodes to together. For the NSA, $50K just too little to bother with and his request was declined. So he took his idea over to NASA. NASA thought it was worth funding, so he got a job there specifically to work on his Beowulf idea. The rest, as they say, is history.

My last shot of Donald as we start receiving our dinner orders. I was fortuitous enough to take this shot just as the waiter held Donald's plate right over his mouth. It is truly amazing how often Murphy's law kicks in. The guy looking right into the camera is named Judd. He works for Netscape and announced at the NYLUG meeting an install fest he was organizing.

I left the room and spent some time with Jim Gleason in the hallway just outside. VA Research is about to deliver a 36 node system to BNL this coming week and we talked about that a bit. Suddenly, my bladder screamed at me and I broke off in mid sentence, "Where's the men's room!". To my fortune, it was about 10 feet behind me. I don't know how I was able to ignore my bodily functions from exit 40 of the LIE until now...

A picture of the other table where the 2nd half of the NYLUGers hung out while waiting for their food to show up.
A small fraction of the group then headed over to Kapland's deli for a real NYC deli meal. I ordered an extra lean pastrami sandwich. In typical NY deli fashion, I was delivered just that, a mountain of extra lean pastrami sandwiched between two thin slices of rye bread; nothing else. The pickles and cole slaw were delivered on dishes as we sat down. I had to manually apply the Russian dressing myself.

I sat across one guy who seemed to do business with wall street. One tidbit which I found rather interesting was that he had this friend who put systems together for wall street trading firms. One would assume that these systems are bullet proof; 100% reliable. It turns out that they crash all the time. There is enough redundancy in these systems so that these crashes can be afforded. After hearing Donald talk about large numbers of systems being up for 100's of days at a time, and then hearing that wall street trading systems crash continuously was a real shock. Maybe wall street will begin to understand the power of Open Source. Until then, my retirement fund will not be as safe as it could be.

Another shot of Jim Gleason along with Matthew Hunt and Ari. Ari is the guy in the back who also works for VA Research. He's coming out to BNL to setup the 36 node machine I'm aching to submit my jobs to. The guy in the middle is Matthew Hunt, President of the Linux Users of NY group (LUNY).
At about 9:30pm, Jim Gleason was getting worried about getting Donald to JFK to catch his 11:30pm flight to NC. Donald was headed down to attend the LinuxExpo. It was getting late for me as well. I said good bye to the crowd of NYLUGers and headed out in search of that lot where I parked my car. The drive back to where I live on Long Island proceeded in standard form. After giving the MTA guy the $3.50 toll for using the Midtown Tunnel, I start counting the exists along the LI as I drive by them. 1, 2, ... 10, 11, ... 20, ..., 30...

Driving along on the LIE always leads my mind to wandering in thought. This time, my mind wandered around open source land. I still cannot get a grip on the power of the Internet. What really made Donald's project possible was the fact that he had access to Linux. You could never build a Beowulf cluster out of windows 3.1 machines. Think about it, this is what was running on those powerful 100MHz DX486 machines back when he started this project. I can imagine going to one of the NSA administrators and trying to convince him that you could take all those PC's the secretaries were using to write up memos using MS Word, gang them together and turn them into a super computer. And do so for only $50K. Back in 1992, that was a radical idea! And look at what we have now, super computers popping up and the beginning of a new industry. Also, has anyone ever heard of an NT Beowulf cluster? I'm sure Micro Soft would boast of one if there was one. (And take credit for the idea as well.) That would be a good way to test the stability of NT. Run 100 NT machines in a cluster and see how long you would keep them all up and running. It would be nice to see Mindcraft perform such a feat. Having 100 Linux machines running for over a hundred days translates to 10,000 cpu days of continuous running. Benchmark that Mindcraft...

Exit number 67, exit number 68. Ahhh, exit 68, home at last.

Please e-mail me your comments, if you have any. I'm always interested in what you may have to say related to this write up or anything else on your mind.

Click here if you want to read other articles I've published on the Internet, or click here to view my home page.

Copyright © 1999, Stephen Adler
Published in Issue 42 of Linux Gazette, June 1999