XML Protocols
XML Schemas In Action
XML Schemas In Action
Jun. 3, 2001 12:00 AM
XML Schemas have truly entered the picture with the Internet standards organization, W3C, advancing it to Candidate Recommendation status in late October 2000. You may ask, Who cares? Well, developers with any interest in XML. This means XML Schemas are here to stay, so start learning the ins and outs of them.
A great aspect of the XML Schema specification is the syntax. It utilizes standard XML, so all XML knowledge is transferable. The days of learning the grimy details of document type definitions (DTDs) are ending. The major drawback of a DTD is its arcane syntax, which is derived from Standard Generalized Markup Language (SGML). SGML was a precursor to XML, so it makes sense that DTDs were quickly adopted. Problems arose when XML-savvy individuals experienced the DTD syntax. I won't delve into DTD basics; rather I'll discuss a recent opportunity I had to put the XML Schema standard to work.
The Data
This project required the retrieval of commodity data from the Chicago Board of Trade (CBOT). The data is available through a variety of services from non-CBOT companies, so you must deal with a middleman to get it. After an extensive search a Chicago-based vendor, FutureSource, was selected. The main reason they were selected was the fact that they supported XML. Their data is available via an XML stream using a standard Web (HTTP) request.
The basic flow of information is as follows: the request is sent to the FutureSource server via a Web address (URL) with a variable number of parameters, and an XML document is returned. One odd occurrence with the company is that they didn't utilize any DTD (or Schema). The explanation conveyed by their sales staff was that their data is always consistent. They control it, so there's no need to check its validity.
My clients disagreed with this explanation, and I can't say I blame them. They requested the creation of accompanying Schema for the XML data. The main reason was that the data will be massaged and used throughout the organization, so it's better to get a handle on expectations now rather than later. One major drawback was the vendor's control of the XML. They can change the data structure at any time. Any change has an immediate impact on our design, but the client was not fazed. Consequently, we moved forward with the project by selecting a standard set of tools.
The Technology
The following list shows the technologies used:
- Java JDK 1.2
- Xerces XML parser 1.3
- Xalan XML processor 2.0
- Text editor for editing various files
Xerces 1.3 was chosen due to its support for the XML Schema standard with no code modifications. That is, parsing is handled the same whether a DTD, an XML Schema, or nothing is utilized. The XML Schema or DTD is specified in the XML document. Now we examine the contents of the XML received (expected) via an HTTP request.
The XML
The XML returned for one commodity is included in Listing 1. This is a standard set of data that's expected, so the data must always be formatted as such.
Explanation of Listing 1
- The root element of the XML document is opened.
- The target XML namespace is declared. This points to the XML Schema declaration on the W3C site. This must be included in the root element.
- The xsi attribute is used to specify the location of the XML Schema file. This points to a schema in the same directory as the XML file. This must be included in the root element.
- The quote element is declared with its opening tag and contains three attributes.
- The type element in the quote element is defined.
- The root element is closed.
XML isn't complicated, but the elements within a quote element must arrive in the same order along with other requirements. We'll now discuss the XML Schema developed for our XML.
The XML Schema
The development of a Schema wasn't complicated, but it was a challenge (i.e., a learning curve) as it was an introduction to the technology. Actually, it was an indoctrination. Listing 2 showcases the Schema developed.
Explanation of Listing 2
- XML document is declared.
- The XML Schema namespace is designated.
- The root element quotes are defined.
- The quote element is complex considering it contains elements as well as attributes. The opposite is simpleType.
- The quote element is established
and may occur zero or more times.
- The quote element is complex as well.
- The elements of the quote element
must appear in the designated order;
the sequence directive signals this.
- The elements contained within the
quote element are defined. The type
element contains string data.
- The attribute status of the quote ele-
ment is defined and is required.
- The attribute request of the quote
element is defined and is required.
- The quote element's closing tag.
- The request attribute of the quotes
element is defined and is required.
The one great aspect of the XML Schema in Listing 2 is that it's XML. The Schema conforms to XML construction rules just like any other XML document. The document must be well formed with all elements possessing opening and closing elements. The other advantage is that it was completely unnecessary to learn a new standard such as DTDs. I was familiar with XML, so it made the learning process much more friendly and straightforward.
Examine the DTD snippet in Listing 3 if you don't believe this was a problem. It contains a portion of a DTD developed for our XML. Compare it with the XML Schema and you'll see the sheer elegance of the XML Schema specification.
Only the Beginning
The XML Schema specification was only a portion of the project, so the complexity isn't fully demonstrated in this article. The main point is XML Schema's ease of use. In addition, it allows the structure of an XML document to be standardized. The power of XML Schemas extends beyond the scope of this article. The type and format of an element's data can be specified and much more can be discussed. I suggest a thorough perusal of the XML Schema documentation on the W3C Web site.
Resources
- Chicago Board of Trade: www.cbot.com
- DTD: www.oasis-open.org/docbook/
- FutureSource: www.futuresource.com
- SGML: www.oasis-open.org/cover/sgml-xml.html
- Sun JDK: java.sun.com
- Xerces: xml.apache.org
- Xalan xml.apache.org
- XML Schema spec: www.w3.org/XML/Schema
- W3C: www.w3c.org
About Tony PattonTony Patton works with various technologies such as Java, XML, HTML,
and Domino. He's the author of Practical LotusScript and Domino
Development With Java, both available from Manning Publications.