About SGMLISUG PubsBookstoreChaptersDeveloping SGMLJoin ISUG

A Technical Introduction to XML - Appendix

Appendix

Extended Backus-Naur Form (EBNF)

One of the most significant design improvements in XML is to make it easy to use with modern compiler tools. Part of this improvement involves making it possible to express the syntax of XML in Extended Backus-Naur Form (EBNF) [Section 1.4]. If you've never seen EBNF before, think of it this way:

  1. EBNF is a set of rules, called "productions"
  2. Every rule describes a specific fragment of syntax
  3. A document is valid if it can be reduced to a single, specific rule, with no input left, by repeated application of the rules.

Let's take a simple example that has nothing to do with XML (or the real rules of language):

[1] Word ::= Consonant Vowel+ Consonant

[2] Consonant ::= [^aeiou]

[3] Vowel ::= [aeiou]

Rule 1 states that a word is a consonant followed by one or more vowels followed by another consonant. Rule 2 states that a consonant is any letter other than a, e, i, o, or u. Rule 3 states that a vowel is any of the letters a, e, i, o, or u. (The exact syntax of the rules, the meaning of square brackets and other special symbols, is laid out in the XML specification.)

Using the above example, is this "red" a Word? Yes.

By the same analysis, "reed", "road", and "xeaiioug" are also words, but "rate" is not. There is no way to match Consonant Vowel Consonant Vowel using the EBNF above. XML is defined by an EBNF grammar of about 80 rules. Although the rules are more complex, the same sort of analysis allows an XML parser to determine that <greeting>Hello World</greeting> is a syntactically correct XML document while <greeting]Wrong Bracket!</greeting> is not.

In very general terms, that's all there is to it. You'll find all the details about EBNF in Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman or in any modern compiler text book.

While EBNF isn't an efficient way to represent syntax for human consumption, there are programs that can automatically turn EBNF into a parser. This makes it a particularly efficient way to represent the syntax for a language that will be parsed by a computer.

[First Section]   [Previous Section]

Contact Robin Cover with corrections and updates, or to submit contributions to the ISUG online document database.

ISUG 
logo
Copyright © 1998 International SGML Users' Group