[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On automatic conversion

>>>>> "Frederik" == Frederik Fouvry <fouvry@sfs.nphil.uni-tuebingen.de> writes:

Frederik> It is entirely true (even tough I'm not a real expert ;-) that
Frederik> the conversion cannot happen automatically.  However, it is
Frederik> possible to convert LinuxDoc to _valid_ DocBook documents.  This
Frederik> is the KDE approach: the documents are first converted
Frederik> automatically to valid DocBook documents (20% are valid without
Frederik> changes, a large majority need trivial changes, which could have
Frederik> been avoided if the main aim had been to generate valid
Frederik> documents; the rest are slightly more complicated cases, or come
Frederik> from non-valid LinuxDoc documents).  In the next step, the valid
Frederik> DocBook is converted to "proper DocBook".

This is somewhat similar to the way we converted from LaTeX to DocBook.  In
some of our older LaTeX, we simply used the standard LaTeX commands to get
a certain typographic effect, leading to sentences like:

"Use the \texttt{ls} command to obtain a listing of the file \texttt{foo}."

Some of our newer (more enlightened :-) LaTeX would have represented this
sentence as:

   "Use the \cmd{ls} command to obtain a listing of the file \fil{foo}."

So our conversion scripts had to take both cases into account.  It
obviously should convert \cmd{} to <command></command>, and \fil{} to
<filename></filename>.  But what about \texttt{}?  It had been used in a
semantically ambiguous way -- we would need to get rid of that ambiguity.

What we did was to convert \texttt{} to <tt?></tt?>.  While not valid
DocBook, consider how quickly an interactive search and replace command
(I've used Emacs' query-replace command, although I'm sure it could be done
with other tools) could be used to change all the appropriate occurences of
"tt?" to "filename".  A second run could change "tt?" to "command".  If
there were any "tt?"s left in the file, we'd have to deal with them one at
a time.  Generally, it would take 10-15 minutes to go through a
several-thousand-line file in this manner, so it's not *that* painful.

Frederik> I would suggest you try to convert a few documents automatically,
Frederik> and consider what you want to do based on that, without
Frederik> dismissing conversion out-of-hand.

That is how we crafted our conversion scripts -- run the script against a
few files, look at the results, fine-tune the script, and repeat.  Once
the script's output *looked* OK, we started the process of turning it into
a valid DocBook document.  This in turn resulted in a bit more fine-tuning
of the scripts.

At the end of it, I'd estimate our script handled 80% of the grunt work of
the conversion, and the remaining 20% was so diverse (or the logic was so
complex) that it was just not worth our effort to have the script handle
those cases.  For those, it was easier to have a human take up the slack.

It was the classic case of "how much effort do you put into code you'll use
exactly once?" :-)

Ed Bailey        Red Hat, Inc.          http://www.redhat.com/

To UNSUBSCRIBE, email to ldp-docbook-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org