One of the aspects of using SGML or XML that many new to the technology find off-putting is the up-front cost involved in determine how you’re going to use it. If you’d decided to use a different technology – say, standard word processing, desktop publishing, spreadsheet or even a simple database application – the set-up costs (creating templates, database designs, or whatever) would be considerably less.
If you decide to use SGML, you have no choice in the matter: you must decide what particular application you are going to use and – unless you’re lucky enough to be able to use an existing application, such as HTML or DocBook – this is going to take time and cost you money.
If you decide to use XML, you could decide not to decide, and just make up your XML application as you go along. But you’re not likely to find this an easy thing to do with any consistency, so you’ll probably end up with data that doesn’t suit the purposes that you had in mind. It’s also important to realise that any inbound visitors to your site may have been routed through different devices, they may even be using some sort of IP cloaker so you should be careful how you use this information.
The purpose of SGML and XML is to enable the structure of information to be represented in an open, system-independent form. But in order for either SGML or XML to be used in this way, decisions need to be taken as to the structure of the information in question and how best to represent it using SGML or XML markup.
How are these decisions on structure and representation to be made?
In fact, regardless of the circumstances, these decisions are always made, whether formally or informally. Even if you make up your XML markup as you go along, each time you enter markup you’re having to make a decision as to the markup that seems appropriate at that point in the process.
Most SGML and XML users end up putting time and effort up-front into deciding what markup they’re going to use and how, simply because this makes the most sense, even if in the case of XML it is not strictly required. The pay-off for having done this properly is that, not only does it give you a clear definition of what markup you’re going to use and how, but it also enables you to communicate a precise, machine-readable definition of what markup you’re using and how to anyone else who needs to know: your colleagues, your business partners or the world in general.
Creating, selecting, modifying an information model
What you probably need to do is to create a model of the information that you intend to represent in a structured form using SGML or XML markup. ‘You’ may mean you individually, your department, your company, your company in collaboration with business partners, your business sector standards body or some other body that represents all those who are going to use this particular information model.
If you’re lucky, someone else has already developed an information model to meet the same needs and, assuming you’re not in direct competition with them, you may well be able to take advantage of it. However, even in such cases, it is likely that you’re going to have to spend some time checking whether or not this model really does exactly meet your own specific needs, and if not, making the necessary modifications.
What is more likely is that you’ll find some existing information models that in some respects meet some of your needs. You’ll find that you can re-use parts of these information models, and the good news is that this will certainly save you time and effort. You can research new ones online and even scrape some models from the web, it would be advised that you use rotating proxies though to protect your privacy,
Whether you’re modifying an existing model or creating a new one from scratch, you’ll probably start off by creating a fairly informal model of your information. This could be as simple as some rough diagrams on the back of an envelope. You’ll need to share these initial ideas with others who are going to have to use the model that you eventually create, so it’s probably helpful to create one in a readable, electronic form at an early stage.