An SGML document consists of an SGML prolog and a document instance. The prolog contains an SGML declaration (described below) and a document type definition, which contains element and entity declarations such as those described above. Different software systems may provide different ways of associating the document instance with the prolog; in some cases, for example, the prolog may be `hard-wired' into the software used, so that it is completely invisible to the user.
At its simplest the document type definition consists simply of a base document type definition (possibly also one or more concurrent document type definitions) which is prefixed to the document instance. For example:
<!DOCTYPE my.dtd [
<!-- all declarations for MY.DTD go here -->
...
]>
<my.dtd>
This is an instance of a MY.DTD type document
</my.dtd>
More usually, the document type definition will be held in a separate file and invoked by reference, as follows:
<!DOCTYPE tei.2 system "tei2.dtd" [
]>
<tei.2>
This is an instance of an unmodified TEI type document
</tei.2>
Here, the text of the TEI.2 document type definition is not given
explicitly, but the SGML processor is told that it may be read from the file
with the system identifier given in quotation marks. The square brackets may
still be supplied, as in this example, even though they enclose nothing. The part enclosed by square brackets is known as the document type declaration subset or `DTD subset'. Its purpose is to specify any modification to be made to the DTD being invoked, thus:
<!DOCTYPE tei.2 SYSTEM "tei2.dtd" [
<!ENTITY tla "Three Letter Acronym">
<!ELEMENT my.tag - - (#PCDATA)>
<!-- any other special-purpose declarations or
re-definitions go in here -->
]>
<tei.2>
This is an instance of a modified TEI.2 type document,
which may contain <my.tag>my special tags</my.tag> and
references to my usual entities such as &tla;.
</tei.2>
In this case, the document type definition in force includes first the
contents of the DTD subset, and then the contents of the file specified after
the keyword SYSTEM. The order is important, because in SGML only
the first declaration of an entity counts. In the above example, therefore, the
declaration of the entity tla in the DTD subset would take
precedence over any declaration of the same entity in the file
tei2.dtd. It is perfectly legal SGML for entities to be
declared twice; this is the usual method for allowing user modification of SGML
DTDs. (Elements, by contrast, may not be declared more than once; if a
declaration for <my.tag> were contained in file tei.dtd,
the SGML parser would signal an error.) Combining and extending the TEI
document type definitions is discussed further in chapter chapter
3 : Structure of the TEI Document Type Definition.
<!DOCTYPE tei.2 [
<!ENTITY chap1 system "chap1.txt">
<!ENTITY chap2 system "chap2.txt">
<!ENTITY chap3 "-- not yet written --">
]>
<tei.2>
<teiHeader> ... </teiHeader>
<text>
<front> ... </front>
<body>
&chap1;
&chap2;
&chap3;
...
</body>
</text>
</tei.2>
In this example, the DTD contained in file tei2.dtd has been extended by entity declarations for each chapter of the work. The first two are system entities referring to the file in which the text of particular chapters is to be found; the third a dummy, indicating that the text does not yet exist (alternatively, an entity with a null value could be used). In the document instance, the entity references &chap1; etc. will be resolved by the parser to give the required contents. The chapter files themselves will not, of course, contain any element, attribute list, or entity declarations---just tagged text.