XML crib sheet
- Article 68 of 77
- Information Age, May 2004
XML is the 'lingua franca' for e-documents. It provides a standard mark-up language for defining the structure of documents, thereby enabling their transmission, validation and interpretation between applications and between organisations.
Page 1 | Page 2 | Page 3 | All 3 Pages
What's all the fuss about?
The sudden boom in the web during the 1990s brought home to many vendors and many within IT organisations the value of a document format that could be read by virtually any other application.
The web's HTML document format is text-based and marks out content using a fixed set of tags determined by the World Wide Web Consortium (W3C). As a result, any program that understands those tags can parse an HTML document.
This set of tags, while useful for web pages, is too limited to provide a good basis for all document formats, however; and while SGML - a more sophisticated and expandable system than HTML - exists, it is too difficult to implement just for a web browser. Full SGML systems solve large, complex problems that justify their expense. Viewing structured documents sent over the web rarely carries such justification.
A halfway house between HTML and SGML was needed that would provide the flexibility and expandability of SGML on the web and to general applications; hence, XML was developed.
XML is a mark-up language for documents containing both content (words, pictures, etc) and some indication of what role that content plays (for example, content in a section heading has a different meaning from content in a footnote, which means something different to content in a figure caption or content in a database table, etc). By defining sets of tags appropriate to the application, XML can store pretty much any piece of content in a way intelligible and readable by any XML-compatible application.
Virtually any new file or messaging format developed in the last two years has come out of XML. As more and more mainstream applications begin to use it and as the tools to develop and deploy it become ubiquitous, XML is going to be as prevalent as Windows or even the text file in the IT world.
The shape of XML
To anyone familiar with HTML or the coding of a web page, XML looks both familiar and different. After a series of machine- and human-readable statements at the beginning of the file, content follows with different areas each marked up with an opening tag (eg <tag>) and a closing tag (eg </tag>). But while HTML has preset tags, such as <h1> ... </h1> and <p> ... </p>, which define a headline and a paragraph respectively, XML contains arbitrary tags: it would be perfectly possible to have one XML document use <h1> ... </h1> and <p> ... </p> to mark areas of content as headlines or paragraphs, and to have another use <headline> ... </headline> and <paragraph> ... </paragraph>.
The actual meanings of these XML tags are defined elsewhere in a Document Type Description (DTD) or schema file.
Easing the transfer of documents
Using XML as the basis of a document format means that there is a whole range of tools and software that can already parse it, making it unnecessary for the end user to obtain a particular program in order to read a document.
Even the doyen of the proprietary file format, Microsoft, is keen to embrace XML. Office 2003 provides the ability for Microsoft Word and its stablemates to save and read documents in XML-based formats. Microsoft has also published the schemas it has used for these formats (such as 'SpreadsheetML' and 'WordprocessingML'), making it possible for other programs to understand, not just read, files saved in these formats and to save files in that format as well.
Page 1 | Page 2 | Page 3 | All 3 Pages
