[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]

"Linux Gazette...making Linux just a little more fun!"

Writing Documentation, Part III: DocBook/XML

By Christoph Spiel

To cite from ``DocBook -- The Definitive Guide'' (see Further Reading at the end of this section), DocBook provides a system for writing structured documents using SGML or XML. In the following, I shall focus on the XML-variant of DocBook, because the SGML-variant is being phased out.

DocBook has been developed with a slightly different mindset than the systems I discussed in the two previous articles (POD article, LaTeX/latex2html article).

The particular features of DocBook mentioned, imply uses of DocBook documents that are not possible, at least not easily, with POD or LaTeX documents.

Being general purpose translators, both tools are not restricted to transforming DocBook documents. If you feed them the right style sheets, they will do other translations, too.


The DocBook/XML syntax resembles HTML. The fundamental difference between the two being the strictness with which the syntax is enforced. Many HTML browsers are extremely forgiving about unterminated elements, and they often silently ignore unknown elements or attributes. DocBook/XML translators reject non-DTD complying input with detailed error messages, and refuse to produce any output in such cases.

DocBook/XML is spoken in several variants, where the variants differ in interpreting the closing tag of an element. The most verbose dialect always closes <tag> with </tag>. Another variant allows for abbreviating the closing tag to </>, yet another allows dropping the closing tag for empty elements all together. I prefer writing out every end tag, a style that has proven advantageous in deeply nested structures such as nested lists. So, in this article only the form <tag> ... </tag> will appear.

Special characters are written with the ampersand-semicolon convention as they are in HTML. The most frequently used special characters are

Comments are bracketed between ``<!--'' and ``-->''.

Document Structure

As already mentioned, DocBook documents must adhere to the structure that is defined in a DTD. Every document starts with selecting a particular DTD:

    <!DOCTYPE                                       (1)
     book                                           (2)
     PUBLIC "-//OASIS//DTD DocBook XML V4.1//EN"    (3)
     "/usr/share/sgml/db41xml/docbookx.dtd"         (4)
     [ ]                                            (5)

where I have broken the expression (from ``<'' to ``>'') into several lines for easier analysis, and added numbers in parentheses for reference.

Part (1) tells the system that we are about to choose our DTD. Part (2) defines element book to be the root element of our document. part (3), the public identifier selects the DTD to use. The public identifier is the string in quotes. The system identifier, part (4) tells the translation tools where to find the DTD on the local computer system. Within the square brackets, part (5), we could place so called entity definitions, but I do not want go into detail on entities in this introduction, so we leave this space empty.

Now, we start the text with the root element, in our case book. What elements go into book is defined in the DocBook DTD. These are, for example, bookinfo or chapter. For a comprehensive list of allowed elements, consult ``The Definitive Guide''. The elements allowed within bookinfo or chapter are also defined in the DocBook DTD as are all elements. The only way constructing a valid document is by obeying all the rules prescribed by the DTD.

What might look like a drag on first sight -- Rules? Rules suck! -- is the key to open up the document to programmatic access. As the document complies to the DTD, all post-processing can rely on that very fact. Good for the programmers of the post-processors! I have to admit that the number of elements and the elements' mutual relationships is tough to pick up. However, the relations are logical: a chapter contains one ore more (introductory) paragraphs and one or more Level 1 sections. No section, on the other hand, contains a chapter, that would be nonsense. Having a copy of ``The Definitive Guide'' right next to the keyboard also helps to learn DocBook. Further down, there is a short compilation of commonly used tags.

Here comes a very short, but complete DocBook document.

    <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1//EN"
                          "/usr/share/sgml/db41xml/docbookx.dtd" []>
            <title>XYZ (version 0.8.15) User's Manual</title>
        <chapter id = "chapter-introduction">
                This chapter provides a quick introduction to XYZ.
            <sect1 id = "section-syntax">
                    In this section we present an outline of the
                    syntax of the XYZ language.
            <sect1 id = "section-core-library">
                <title>Core Library</title>
                    Even if no additional libraries are loaded to a
                    XYZ program, it has access to some core library
        <chapter id = "chapter-commands">
            <sect1 id = "section-interactive-commands">
                <title>Interactive Commands</title>
                <sect2 id = "section-interactive-commands-argumentless">
                    <title>Argumentless Commands</title>
            <sect1 id = "section-non-interactive-commands">
                <title>Non-Interactive Commands</title>
                <sect2 id = "section-non-interactive-commands-argumentless">
                    <title>Argumentless Commands</title>

Useful Tags

To help the aspiring DocBook writer making sense of the loads of elements, the DocBook standard defines, I have compiled a bunch of useful tags, which are used often.

Root Section Tags

Root section tags define the outermost element of any document.

  I<paragraphs or chapters>


  I<paragraphs or level 1 sections>


Sectioning Tags

Sectioning elements divide the document into logical parts like chapters, sections, paragraphs, and so on.

chapter, sect1, ..., sect6
<chapter id = "label">


followed by

paragraphs or level N+1 sections


Define a section. Commonly, chapter and section elements carry the id attribute, which allows for referencing the elements with, for example, <xref linkend = "label"></xref>.


paragraph text


Group several lines of text together to form a paragraph. This is the workhorse element in many documents.

<programlisting role = "language">

program text


Render a longish piece of program text -- preserving the line breaks. The program is assumed to be written in the language specified in the role attribute. Note that within programlisting all special characters retain their meaning!

This means in particular that you cannot use the control characters ``<'', ``>'', and ``&'' inside of it. The several workarounds for this problem. Either you replace all control characters with their mnemonic equivalents (``&lt;'', ``&gt;'', and ``&amp;'' in our example), or you wrap the program code in a CDATA, like, for example,

            cout << "value = <" << &p << ">\n";

or, if the program is stored in file, pull in the whole file with

                <imagedata format = "linespecific"
                           fileref = ""></imagedata>

List-Making Tags

Generate the three typical types of lists.

The items or definitions are typically formed by one or more paragraphs, but they are allowed to contain program listings, too. The terms usually are one or more words, not paragraphs.

Inline Markup Tags

<emphasis>text to be emphasized</emphasis>

Highlight a short part of the document; usually a single word.

<filename>filename or directory name</filename>

Mark word as filename.

<literal>literal something</literal>

<literal role = "classification">literal something</literal>

Mark a word as being a literal expression. Use this tag only as last possibility, if no other more specific tag matches. To calm one's bad conscience, literal often gets decorated with a role attribute, which describes more precisely the kind of literal.

<replaceable>placeholder name</replaceable>

Mark a meta-variable.


Give a name to a section or a formal element, like a table.

Cross References

Cross references refer to other parts of the same DocBook document or to other documents on the World Wide Web. Targets of the former are all elements that carry an id attribute, targets of the latter are selected with universal resource locators (URLs).

<link linkend = "target">item</link>

Install a (hyper-)link to the spot identified via target within the current document.

<ulink url = "complete URL">item</ulink>

Install a hyper-link to a WWW-accessible document identified by a complete URL. A complete URL includes the protocol, for example, http://.

<xref linkend = "target"></xref>

Install a (hyper-)link to the spot identified via target within the current document. A translator will add text around an xref element. For example, a xref to a section might be decorated with the text ``see section''.

What I Have Left Out

Ugh, I left out tons of stuff, but only to give you a smooth, non-frightening introduction. Some great things DocBook handles that I have not discussed are

Also left out is everything related to changing the DTD or changing the style sheets.

Pros and Cons


Further Reading

Next month: Texinfo

Christoph Spiel

Chris runs an Open Source Software consulting company in Upper Bavaria, Germany. Despite being trained as a physicist -- he holds a PhD in physics from Munich University of Technology -- his main interests revolve around numerics, heterogenous programming environments, and software engineering. He can be reached at

Copyright © 2002, Christoph Spiel.
Copying license
Published in Issue 75 of Linux Gazette, February 2002

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]