International SGML/XML Users' Group
<home>
<about>
<technology>
  <introdution>
  <modelling>
  <syntax/>
  <presentation>
  <linking>
  <graphics>
  <multimedia>
  <knowledge>
  <database>
</technology>
<membership>
<chapters>
<contact>
<news>
<events>
<search>

Markup syntax ...

SGML/XML syntax are the rules governing how an instance of these meta-laguages can be constructed. SGML and XML have very similar rules, and can both be understood at the same time with the diffeences explored later. XML/SGML instances consist of structured data that is contained within tags. In XML tags are always denoted by the < and > characters. SGML can use other characters to denote tags, but almost all applications use the < and > characters in practice.

All SGML and XML instances should start with a document declaration. This tells computer applications that what they have encountered is an SGML/XML document and provides some other information such as what character set is used within the instance and what schema the document conforms to (conformity to a schema is optional in XML).

All SGML/XML documents contain elements. Elements are named containers that can hold text, other containers or any combination of these. An element is wrapped in the delimiters > and <. The name of the tag should descibe the element's purpose. Elements do not have to contain content and can be empty. Elements can have any name - but they must start with a alphabetic character and cannot contain spaces:

    <organization>
    <name>International SGML/XML Users' Group</name>
    <acronym>ISUG</acronym>
    </organization>
Attributes are metadata about the element. Attibutes are contained within an element's opening tag and their values are delimited by quotation marks.
    <organization acronym="ISUG">
    International sGML/XML Users' Group</organization>
The above examples illustrate a central issue in the design of XML schema: whether data should be represented as an element or as an attribute will often depend on the way in which it is likley to be used or some geneal rules of thumb applied by the designer of the schema. Some general principles that have been applied incude: metadata should be attributes; information to be displayed to users should be in element tags; processing instructions should be placed in attributes, ....

Another syntatic component of SGML and XML are entities. Entities are a mechanism of substituting some content into a document when it is processed. They are used for two purposes: to insert text that can change (and which is available to the entity resolver of the parser) or for special characters so that they are not mis-interpreted by the application. A frequently seen example of this is in the documentation of SGML and XML on the internet where the < character will be thought to denote a new element and must be replaced by a < in the instace to scape it for display purposes from the application's prcessing (view the source of this document to see examples). Entities start with an ampersand (&) can contain spaces, and are terminated by a semi-colon. e.g. &ISUG; could be expanded to 'International SGML/XML Users' Group'.

As mentioned above, SGML and XML are processed by software called parsers. In both SGML and XML whitespace is ignored and parsers look for elements, attributes and entities. Most parsers output a normalized version of their input and report any errors found in the document. For documents with an associated schema, the schema is used to check that only valid tags, following the rules of the schema, are used in the document being checked.

But XML does not require that documents be associated with a schema. So what is checked under these circumstances? All XML documents must be, at least, well-formed. Well-formedness means that all tags must be paired, so a tag must be either empty or accompanied by a closing tag; and hierarchical integrity must be preserved so that an open tag must be closed before an open tag that is above it in the hierarchy is closed.

Copyright 2002 ISUG