10 Putting It All Together

An SGML conformant document has a number of parts, not all of which have been discussed in this chapter, and many of which the user of these Guidelines may safely ignore. For completeness, the following summary of how the parts are inter-related may however be found useful.

An SGML document consists of an SGML prolog and a document instance. The prolog contains an SGML declaration (described below) and a document type definition, which contains element and entity declarations such as those described above. Different software systems may provide different ways of associating the document instance with the prolog; in some cases, for example, the prolog may be `hard-wired' into the software used, so that it is completely invisible to the user.

10.1 The SGML Declaration

The SGML declaration specifies basic facts about the dialect of SGML being used such as the character set, the codes used for SGML delimiters, the length of identifiers, etc. Its content for TEI-conformant document types is discussed further in chapter 28 and chapter 39. Normally the SGML declaration will be held in the form of compiled tables by the SGML processor and will thus be invisible to the user.

10.2 The DTD

The document type definition specifies the document type definition against which the document instance is to be validated. Like the SGML declaration it may be held in the form of compiled tables within the SGML processor, or associated with it in some way which is invisible to the user, or requires only that the name of the document type be specified before the document is validated.

At its simplest the document type definition consists simply of a base document type definition (possibly also one or more concurrent document type definitions) which is prefixed to the document instance. For example:

<!DOCTYPE my.dtd [
    <!-- all declarations for MY.DTD go here -->
    ...
]>
<my.dtd>
    This is an instance of a MY.DTD type document
</my.dtd>

More usually, the document type definition will be held in a separate file and invoked by reference, as follows:

    <!DOCTYPE tei.2 system "tei2.dtd" [
    ]>
    <tei.2>
         This is an instance of an unmodified TEI type document
    </tei.2>
Here, the text of the TEI.2 document type definition is not given explicitly, but the SGML processor is told that it may be read from the file with the system identifier given in quotation marks. The square brackets may still be supplied, as in this example, even though they enclose nothing.

The part enclosed by square brackets is known as the document type declaration subset or `DTD subset'. Its purpose is to specify any modification to be made to the DTD being invoked, thus:

    <!DOCTYPE tei.2 SYSTEM "tei2.dtd" [
         <!ENTITY tla "Three Letter Acronym">
         <!ELEMENT my.tag - - (#PCDATA)>
         <!-- any other special-purpose declarations or
              re-definitions go in here -->
    ]>
    <tei.2>
         This is an instance of a modified TEI.2 type document,
         which may contain <my.tag>my special tags</my.tag> and
         references to my usual entities such as &tla;.
    </tei.2>
In this case, the document type definition in force includes first the contents of the DTD subset, and then the contents of the file specified after the keyword SYSTEM. The order is important, because in SGML only the first declaration of an entity counts. In the above example, therefore, the declaration of the entity tla in the DTD subset would take precedence over any declaration of the same entity in the file tei2.dtd. It is perfectly legal SGML for entities to be declared twice; this is the usual method for allowing user modification of SGML DTDs. (Elements, by contrast, may not be declared more than once; if a declaration for <my.tag> were contained in file tei.dtd, the SGML parser would signal an error.) Combining and extending the TEI document type definitions is discussed further in chapter chapter 3 : Structure of the TEI Document Type Definition.

10.3 The Document Instance

The document instance is the content of the document itself. It contains only text, markup and general entity references, and thus may not contain any new declarations. A convenient way of building up large documents in a modular fashion might be to use the DTD subset to declare entities for the individual pieces or modules, thus:
    <!DOCTYPE tei.2 [
         <!ENTITY chap1 system "chap1.txt">
         <!ENTITY chap2 system "chap2.txt">
         <!ENTITY chap3 "-- not yet written --">
    ]>
    <tei.2>
    <teiHeader> ... </teiHeader>
    <text>
        <front> ... </front>
        <body>
         &chap1;
         &chap2;
         &chap3;
         ...
    </body>
    </text>
    </tei.2>

In this example, the DTD contained in file tei2.dtd has been extended by entity declarations for each chapter of the work. The first two are system entities referring to the file in which the text of particular chapters is to be found; the third a dummy, indicating that the text does not yet exist (alternatively, an entity with a null value could be used). In the document instance, the entity references &chap1; etc. will be resolved by the parser to give the required contents. The chapter files themselves will not, of course, contain any element, attribute list, or entity declarations---just tagged text.


Back to table of contents
On to next section
Back to previous section