








One of the most significant design improvements in XML is to make it easy to use with modern compiler tools. Part of this improvement involves making it possible to express the syntax of XML in Extended Backus-Naur Form (EBNF) [Section 1.4]. If you've never seen EBNF before, think of it this way:
Let's take a simple example that has nothing to do with XML (or the real rules of language):
[1] Word ::= Consonant Vowel+ Consonant
[2] Consonant ::= [^aeiou]
[3] Vowel ::= [aeiou]
Rule 1 states that a word is a consonant followed by one or more vowels followed by another consonant. Rule 2 states that a consonant is any letter other than a, e, i, o, or u. Rule 3 states that a vowel is any of the letters a, e, i, o, or u. (The exact syntax of the rules, the meaning of square brackets and other special symbols, is laid out in the XML specification.)
Using the above example, is this "red" a Word? Yes.
By the same analysis, "reed", "road", and "xeaiioug" are also words, but "rate" is not. There is no way to match Consonant Vowel Consonant Vowel using the EBNF above. XML is defined by an EBNF grammar of about 80 rules. Although the rules are more complex, the same sort of analysis allows an XML parser to determine that <greeting>Hello World</greeting> is a syntactically correct XML document while <greeting]Wrong Bracket!</greeting> is not.
In very general terms, that's all there is to it. You'll find all the details about EBNF in Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman or in any modern compiler text book.
While EBNF isn't an efficient way to represent syntax for human consumption, there are programs that can automatically turn EBNF into a parser. This makes it a particularly efficient way to represent the syntax for a language that will be parsed by a computer.
Contact Robin Cover with corrections and updates, or to submit contributions to the ISUG online document database.
