[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tags (was: RE: New Threads (Was...))

To: LDP <ldp-discuss@lists.linuxdoc.org>
Subject: Re: Tags (was: RE: New Threads (Was...))
From: Gary Preckshot <garrell@inreach.com>
Date: Mon, 12 Jun 2000 19:28:44 -0700
References: <A5F46F4ED18FD211ABEE00105AC6CF070109377D@email.cu-portland.edu>
Resent-date: Mon, 12 Jun 2000 22:39:06 -0400 (EDT)
Resent-from: ldp-discuss@lists.debian.org
Resent-message-id: <x8FBPC.A.Ko.A7ZR5@murphy>
Resent-sender: ldp-discuss-request@lists.debian.org

Gregory Leblanc wrote:

> 
> We've got most of this already, sort of.  I think that there should be three
> lists, as you propose, although I think that your second list shouldn't ever
> be clearly defined.

I'm not sure that a list that isn't completely
define is very useful.

As a way of starting this tags discussion off, I
have a text file with a hierarchical list of
DocBook tags automatically generated from DocBook
itself. The list has the form

.
.
.
Article
Article Abstract
Article Address
Article Anchor
Article AuthorBlurb
Article BlockQuote
Article CalloutList
Article Caution
Article CmdSynopsis
Article Comment
Article Equation
Article Example
Article Figure
Article FormalPara
.
.
.

It's 3345 lines long, and each line consists of
parent_tag child_tag

For instance, above, if Article is open, Abstract
... FormalPara and the following are legal
children. I haven't got the unreverted tags yet,
but it would be a similar, but shorter file.

As to what we can do with this, we can mark it up
with list indicators for allowed, search,
deprecated, and whatever other list we choose. The
lists can then be used by a syntax checker to
indicate where there were deviations from LDP's
policy. The fact that this list was generated
automatically from the docbook DTD means we won't
miss anything.

> Again, I don't think we have anything to define subsets, except to remove
> those "depreciated" tags.  Until we have some search engines that take
> advantage of the DocBook markup, there's no reason to define any more than
> "ok to use" and "don't use"

See above. You can use "grep -v string fileaname"
to remove tags you don't like. So you can start
with the above complete list and make several
lists from it. 

> Hmm, that would make four, the way that I count.  However, they would
> definitely have some overlap.  Required would be the ones that you MUST
> have, in order to have a valid HOWTO document.  Permitted would be ones that
> are allowed in HOWTOs, but not required.  Searchable would be some from both
> sets, although not necessarily all of either set.  These would be the ones
> that our search engine/viewer understands.  The last set would be restricted
> tags, which would basically be any tags that we don't want people to use.
> 

Not a big deal if you use mechanical help. We
should be able to try things on for size. Note
that if you forget something, you can grep out
what you need, append it to the file you're
building, and sort it, and voila, a new list.

> > > Perhaps we should begin by stripping all tags from the template
> > > (and example.sgm?) and annotate them?  Is that a good start
> > for defining
> > > our subset?
> >
> > It's a start, but the issue of searching needs to
> > be dealt with.
> 
> I've put some minor thought into doing this, but it's a big enough project
> that I need to get back up to speed with programming first.

Yes, it is a big project, and by using mechanized
help we can avoid mistakes.

> 
> > > Where do
> > > we start...with index tags? section tags? something else?
> > What are the
> > > good tags to use for intelligent context sensitive searches?
> >
> > We need required structure tags (like
> > <sect1>,<Article> etc.) required identification
> > tags (like <Author> and subsidiary tags), required
> > history tags (like <RevisionHistory> and
> > subsidiary tags), search tags (like keyword
> > lists), indexing tags (I'm not sure what they are,
> > but they should mark points in the text. Maybe
> > link tags.) Deprecated tags. Other tags that are
> > OK, but not special. Whichever of us gets to it
> > first.
> 
> Alrighty, I think I'll give that a shot this evening, in between Solaris
> installs.
>

Hopefully I can provide you with a starting point.
I think I'll send you the file separately.

> > > > 3) Put together an
> > > > on-line thesaurus of keywords.
> > >
> > > Ok, I'm seen a Glossary suggested, but no thesaurus
> > suggestion so far.
> > > Why a thesaurus?
> >
> > A glossary would make a good howto. I suggested a
> > thesaurus because keywords can get out of hand. A
> > thesaurus would do two things: authors could avoid
> > new keywords if one already existed that met their
> > requirements. People doing searches could find out
> > which keywords were likely to hit their subject.
> 
> What kind of structure are you looking at for the thesaurus?   Is this for
> people to read, or for authors/maintainers to use in trying to make their
> document show up in searches more appropriately?
> 

We need a database online that folks can search
and add to if they can't find what they need. Each
entry should have associated with it a meaning and
intention. If we do it right, the thesaurus should
expand with use.

> 
> I think, maybe, possibly, that the indexterm tags can do some of these, and
> the <TOC> stuff may be able to do the rest.  Not completely sure though.

Something to look at.

> 
> DocBook is actually a simple language, less complex than C or Pascal.  SGML
> is pretty much as complex as you choose to make it, since it's not a
> language, but a language for describing other language.  But that's just
> being petty.

Yep. As far as I can tell, it should be easy to
parse.

Gary

--  
To UNSUBSCRIBE, email to ldp-discuss-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

References:
- re: Tags (was: RE: New Threads (Was...))
  - From: Gregory Leblanc <GLeblanc@cu-portland.edu>

Prev by Date: Re: Tags
Next by Date: Squid HOWTO proposal.
Previous by thread: Re: Tags
Next by thread: RE: Tags (was: RE: New Threads (Was...))
Index(es):
- Date
- Thread