








Charles F. Goldfarb, Information Management Consulting; Tel: +1 (408)867-5553; 13075 Paramount Drive, Saratoga CA 95070, USA. Email: Charles@SGMLsource.com.
Few things since HTML have aroused interest and controversy in the SGML community like XML.
When HTML first became important a few years ago, there was a lot of consternation in some quarters of the SGML world. Some resented the attention being given to HTML because it was "bad" SGML (it describes a rendered document, rather than the usual un-rendered abstraction). But this "bad" SGML proved the feasibility of SGML on a mass scale and helped to legitimize SGML within the mass market, even though that market has yet to adopt full SGML. Prior to HTML and the Web there appeared articles about how SGML was "old-fashioned" and a "throwback" to the mainframe days. We don't see such articles any more.
More to the point, HTML helped educate the Web community to the benefits of generalized markup. Now, when a site designer feels the limited by HTML, the solution is more of SGML, rather than dedicated rendition representations, such as PDF or JavaScript. XML is the "more of SGML" solution that has been calculated to appeal to the Web user.
Although XML and HTML are both derivatives of full SGML, they are very different. HTML is a complete SGML application; that is, there is a DTD (several versions of it, actually) and prescribed processing for the elements, implemented by Web browsers. An HTML user is presented with a fixed "vocabulary" of element types, each of which will be rendered in a predictable way.
XML is different. It is an "application profile" -- a set of rules for constructing SGML applications. As a profile, it is more concerned with the syntax of SGML than the vocabulary. Users can define their own element types, DTDs, and the style sheets that govern their rendition. In other words, there can be an unlimited number of XML applications.
What distinguishes XML applications from the general run of SGML applications is primarily their use of a restricted subset of SGML functionality. The subset was chosen to meet several objectives important to the Web community: efficiency in a networked environment, simplicity of explanation, and ease of implementation of both formal processors and casual application scripts. All of these objectives involved difficult design trade-offs. The objectives partially contradict one another, and they contradict aspects of other important objectives of SGML use, such as long-term preservation of documents.
In meeting these objectives, the XML design responded to requirements that are meaningful for non-Web SGML applications as well. For example, one of the most controversial-seeming aspects of XML is its ability to process a document instance without respect to a DTD (a feature of "well-formed" XML documents). However, DTD-less processing was actually an original objective of SGML, which is why ISO 8879 defines both validating and non-validating parsers: the former were to be used when creating documents and the latter when formatting them. (Unfortunately, the inability to specify end-tags for empty elements effectively prevented DTD-less processing, but this problem is in the process of being remedied by the WebSGML Adaptations TC.)
As long as the World Wide Web Consortium (W3C) -- the industry "standardizers" of XML -- allow it to remain a conforming profile of SGML, the SGML community should welcome XML. Although new users will first see XML as a way to "add your own tags to HTML", they will soon enough become aware of the advantages (for some applications) of rule-based, type-driven styles, as opposed to ad-hoc instance-based markup. At the same time, these design-oriented users will likely show us new ways to exploit structural information to achieve better graphic communication.
This technical and artistic synergy is already being mirrored in a political synergy. The Web community has a historic aversion to the official International Standards Organization (ISO) -- the government-sponsored standardizers of SGML -- because an ISO committee promulgated the "OSI" networking standards, which attempted to compete with those of the Internet. (Ironically, OSI had a related "Office Document Architecture" that competed unsuccessfully with SGML for many years.)
Because of XML, the W3C and the ISO are working together. The XML developers have presented their requirements to the ISO SGML committee (JTC1/SC18/WG8), which has evaluated them in the context of the needs of the SGML community as a whole. The result is the WebSGML Enhancements TC, the first installment of the SGML revision, which extends SGML to address the network communications needs of all SGML users while maintaining compatibility with existing conforming SGML documents.
The World Wide Web is an increasingly important communications medium, and may eventually become the most important electronic medium. No one expects it to eliminate printed documentation any more than television and radio eliminated books and magazines, but no commercial or enterprise publisher can afford to ignore it. XML has already begun to play a vital role in assuring that SGML can continue to support the full needs and multiple distribution environments of its user community, while still enabling optimal document processing for the World Wide Web.
Contact Robin Cover with corrections and updates, or to submit contributions to the ISUG online document database.
