The HyperNews Linux KHG Discussion Pages

Block Device Drivers

[Note: This has not been updated since changes were made in the block device interface to support block device loadable modules. The changes shouldn't make it impossible for you to apply any of this...]

To mount a filesystem on a device, it must be a block device driven by a block device driver. This means that the device must be a random access device, not a stream device. In other words, you must be able to seek to any location on the physical device at any time.

You do not provide read() and write() routines for a block device. Instead, your driver uses block_read() and block_write(), which are generic functions, provided by the VFS, which will call the strategy routine, or request() function, which you write in place of read() and write() for your driver. This strategy routine is also called by the buffer cache, which is called by the VFS routines, which is how normal files on normal filesystems are read and written.

Requests for I/O are given by the buffer cache to a routine called ll_rw_block(), which constructs lists of requests ordered by an elevator algorithm, which sorts the lists to make accesses faster and more efficient. It, in turn, calls your request() function to actually do the I/O.

Note that although SCSI disks and CDROMs are considered block devices, they are handled specially (as are all SCSI devices). Refer to Writing a SCSI Driver for details. (Although SCSI disks and CDROMs are block devices, SCSI tapes, like other tapes, are generally character devices.)

Initialization

Initialization of block devices is a bit more complex than initialization of character devices, especially as some ``initialization'' has to be done at compile time. There is also a register_blkdev() call that corresponds to the character device register_chrdev() call, which the driver must call to say that it is present, working, and active.

The file blk.h

At the top of your driver code, after all other included header files, you need to write two lines of code:

#define MAJOR_NR DEVICE_MAJOR
#include "blk.h"
where DEVICE_MAJOR is the major number of your device. drivers/block/blk.h requires the use of the MAJOR_NR define to set up many other defines and macros for your driver.

Now you need to edit blk.h. Under #ifdef MAJOR_NR, there is a section of defines that are conditionally included for certain major numbers, protected by #elif (MAJOR_NR == DEVICE_MAJOR). At the end of this list, you will add another section for your driver. In that section, the following lines are required:

#define DEVICE_NAME        "device"
#define DEVICE_REQUEST     do_dev_request
#define DEVICE_ON(device)  /* usually blank, see below */
#define DEVICE_OFF(device) /* usually blank, see below */
#define DEVICE_NR(device)  (MINOR(device))

DEVICE_NAME is simply the device name. See the other entries in blk.h for examples.

DEVICE_REQUEST is your strategy routine, which will do all the I/O on the device. See The Strategy Routine for more details on the strategy routine.

DEVICE_ON and DEVICE_OFF are for devices that need to be turned on and off, like floppies. In fact, the floppy driver is currently the only device driver which uses these defines.

DEVICE_NR(device) is used to determine the number of the physical device from the minor device number. For instance, in the hd driver, since the second hard drive starts at minor 64, DEVICE_NR(device) is defined to be (MINOR(device)>>6).

If your driver is interrupt-driven, you will also set

#define DEVICE_INTR do_dev
which will become a variable automatically defined and used by the remainder of blk.h, specifically by the SET_INTR() and CLEAR_INTR macros.

You might also consider setting these defines:

#define DEVICE_TIMEOUT DEV_TIMER
#define TIMEOUT_VALUE n
where n is the number of jiffies (clock ticks; hundredths of a second on Linux/386; thousandths or so on Linux/Alpha) to time out after if no interrupt is received. These are used if your device can become ``stuck'': a condition where the driver waits indefinitely for an interrupt that will never arrive. If you define these, they will automatically be used in SET_INTR to make your driver time out. Of course, your driver will have to be able to handle the possibility of being timed out by a timer.

Recognizing PC standard partitions

[Inspect the routines in genhd.c and include detailed, correct instructions on how to use them to allow your device to use the standard dos partitioning scheme. By now, bsd disklabel and sun's SMD labelling are also supported, and I still haven't gotten around to documenting this. Shame on me--but people seem to have been able to figure it out anyway :-)]

The Buffer Cache

[Here, it should be explained briefly how ll_rw_block() is called, about getblk() and bread() and breada() and bwrite(), etc. A real explanation of the buffer cache is reserved for the VFS reference section. Jean-Marc Lugrin wrote one, but I can't find him now.]

The Strategy Routine

All reading and writing of blocks is done through the strategy routine. This routine takes no arguments and returns nothing, but it knows where to find a list of requests for I/O (CURRENT, defined by default as blk_dev[MAJOR_NR].current_request), and knows how to get data from the device into the blocks. It is called with interrupts disabled so as to avoid race conditions, and is responsible for turning on interrupts with a call to sti() before returning.

The strategy routine first calls the INIT_REQUEST macro, which makes sure that requests are really on the request list and does some other sanity checking. add_request() will have already sorted the requests in the proper order according to the elevator algorithm (using an insertion sort, as it is called once for every request), so the strategy routine ``merely'' has to satisfy the request, call end_request(1), which will take the request off the list, and then if there is still another request on the list, satisfy it and call end_request(1), until there are no more requests on the list, at which time it returns.

If the driver is interrupt-driven, the strategy routine need only schedule the first request to occur, and have the interrupt-handler call end_request(1) and the call the strategy routine again, in order to schedule the next request. If the driver is not interrupt-driven, the strategy routine may not return until all I/O is complete.

If for some reason I/O fails permanently on the current request, end_request(0) must be called to destroy the request.

A request may be for a read or write. The driver determines whether a request is for a read or write by examining CURRENT->cmd. If CURRENT->cmd == READ, the request is for a read, and if CURRENT->cmd == WRITE, the request is for a write. If the device has seperate interrupt routines for handling reads and writes, SET_INTR(n) must be called to assure that the proper interrupt routine will be called.

[Here I need to include samples of both a polled strategy routine and an interrupt-driven one. The interrupt-driven one should provide seperate read and write interrupt routines to show the use of SET_INTR.]

Copyright (C) 1992, 1993, 1994, 1996 Michael K. Johnson, johnsonm@redhat.com.


Messages

1. Idea: non-block-cached block device? by Neal Tucker
2. Idea: Shall I explain elevator algorithm (+sawtooth etc) by Michael De La Rue