Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
XML Scehmas And DTDs Working Together
XML Scehmas And DTDs Working Together

This article compares and contrasts the broad functionality of XML Schemas (whose approval by the World Wide Web Consortium is imminent) with that of document type definitions, currently part of the XML 1.0 Recommendation.

Let's begin with some background to describe the need for schemas and DTDs.

XML holds out the promise of sharing all kinds of information effortlessly, with automated and loss-free exchanges among disparate information systems. XML defines how to mark up content to give it both structure and additional meta-information. This increases the worth of the data many times by allowing a wide variety of applications to make greater use of the content.

However, as with any structured information exchange, all parties must adhere to a mutual agreement on the syntax and semantics of the data or chaos will ensue. The XML 1.0 specification fulfills the need for such an agreement by providing for a DTD, which describes the tags and hierarchy that the XML data may include.

For example, consider an engine manufacturer that provides service information to its OEM customers. Because XML provides a media-neutral and processible data format, the company can deliver its service information in XML so its OEM customers can automatically incorporate this data into their own service manuals. This allows each OEM the freedom to apply its own formatting and behaviors so the service information automatically appears as if the OEM had produced it.

To achieve this kind of exchange, the engine manufacturer and the OEM must agree on which tags will be used, and how. That's where a DTD or schema comes in.

The XML 1.0 specification provides the mechanism of a document type declaration, which defines a set of tags and attributes and describes how they can be put together to form a valid document. Most such declarations are commonly put into a file separate from the document itself; this external subset of the document type is usually called the DTD, which this article informally uses instead of "document type declaration." (There can also be an internal subset part of the declaration that would be contained at the top of the document itself.)

To be merely well formed, an XML document doesn't need to be associated with a DTD. However, for most if not all applications to be able to process an XML document as intended, that document must conform to some set of expectations about what tags can appear where, what attributes can be on which tags, which kinds of values the attributes can have, and so on. A DTD describes the expectations or constraints desired for a given application. If a well-formed XML document satisfies all the constraints defined in its DTD, it's said to be valid. The kinds of constraints a DTD can define include:

  • Names of all the elements
  • For each element, the names of all its attributes
  • For each element, which elements it can contain and in what order (the element's *content model*)
  • For each attribute, some general indication of what its value can be
Such levels of constraints have been used to great advantage over the past 20 years, first in SGML and now also in XML. But for many applications, there would optimally be constraints on the tags that go beyond what DTDs can express. For example, DTDs allow you to say that a given attribute must be a whole number, but there's no way to say that it must be an integer between 1 and 100. And if you have an element whose content must be constrained, such as a "published-date" element whose value should be a valid date, DTDs provide no way to constrain what text can go into an element. Many people wanted more powerful data typing (as such constraints are often called) for both element content and attribute values.

DTDs in XML have a different syntax than that of XML documents; DTDs use a "declaration" syntax instead of XML tags. In other words, a DTD isn't an XML document, so parsing it requires additional technology. For a number of reasons, XML experts believe there are benefits to creating an XML syntax for DTDs that would replace the special syntax of a DTD with the now-standard XML syntax.

Both the need for stronger data typing, as well as the desire to develop an XML-based description of constraint declarations, led to the development of an XML Schema language.

The XML Schema Working Group of the W3C has developed a "Proposed Recommendation" for a schema language that provides a means for defining the constraints on the structure and content of an XML document. This Proposed Recommendation is likely to become a full-fledged standard in the near future.

Both DTDs and schemas support the validation of a document's structure, that is, both specify valid elements, their content models, valid attributes, valid attribute types, and default attribute values. Schemas offer a couple of significant additional capabilities:

  • They can specify a few kinds of content models that XML DTDs cannot. For example, schemas support some "and" models that would, for example, constrain a date element's content model to be exactly one day element, one month element, and one year element, but in any order.
  • They can also specify context-sensitive content models. For example, in an engine repair manual schema you could specify that a "part" element that is a child of the "manual" document element has a content model that allows one or more "chapter" subelements, whereas a part element that is a child of a "parts-list" element has a content model that allows only character data.

(A point of controversy that hasn't been settled is the claim that schemas fall short compared to DTDs because they lack support for parameter entities, which allow fine-grained customization and modularization of DTDs. Because of this, some have argued that maintaining complex constraint declarations will be more difficult as schemas than as DTDs. Even if this concern proves valid, it will affect only those who create and maintain DTDs, not the vast majority of users whose job is to create and maintain information.)

The primary advantage of schemas over DTDs is their support for validating element content. While DTDs allow for very basic constraints on attribute values, schemas not only strengthen the data-typing constraints that can be applied to attribute values to a great degree, but also allow the same strong level of data-typing constraints to be applied to element content. This means that your schema can require, for example, that the content of your "NumberOfDependents" element is a valid integer between 0 and 30, or that the content of your "Telephone" element is a string that matches a certain pattern (e.g., 3-3-4) that you have determined all phone numbers must match.

The degree to which content validation provides an added benefit is application-specific. One could make a general argument that document-oriented applications are likely to benefit relatively less from data typing than data-oriented applications. However, these generalizations won't apply universally, so existing applications will migrate at different speeds. Migration from DTDs to schemas will take place over many years.

Even after schemas become an approved standard, they will coexist with - but not replace - DTDs. As a part of XML 1.0, DTDs will always be supported by validating XML 1.0 parsers, and for many applications, DTDs can supply all the constraints they need.

Even for those who find some benefit in migrating to schemas, parts of DTDs may still remain. One reason for that is that the declaration of entities - something that can be done as part of a DTD - isn't supported in the current version of XML Schemas. If entity declarations are needed, they still must be done using standard XML 1.0 declarations in either the internal subset (which goes at the top of the document) or the external subset (i.e., within the DTD). A given document may thus have both a DTD and an XML Schema that provides additional constraints.

Companies who currently use DTDs will be able to convert to schemas later, but will never be required to do so. Validating XML processors will always have to support DTDs.

Tools for converting XML DTDs to schemas include Arbortext's Epic Architect, a developer's kit that bundles in TIBCO Extensibility's schema development tool, which automates many aspects of DTD-to-schema conversions. (Conversion of SGML DTDs to schemas may present additional obstacles.)

Regardless of the use of DTDs and schemas, the specifications that describe the semantics of XML documents will remain unchanged, and therefore XML data will remain interoperable. There will be no significant issues with regard to the coexistence of DTDs and schemas.

Companies who use DTDs for validation may exchange documents with companies that use schemas. The only risk is that documents created in a DTD-based application may contain content that fails to comply with the potentially more highly constrained data-typing specifications of the related schema. There are two basic ways to deal with this:

  1. Incorporate a public domain schema validator into the workflow; several of these validators are already available, although you should realize that the schema standard is somewhat in flux, and any existing validator may not be in sync with the latest version of the spec.
  2. Plan to use an authoring tool that supports schema validation so that authors are constrained to create valid information in the first place. Some existing tools plan to add schema support, and others will no doubt be brought to market.
For More Information...
You can locate the W3C drafts and reports from the index at www.w3.org/TR/. Although the XML Schema Recommendation, which comes in two parts, is a substantial bit of reading, you can get a good overview by reading the XML Schema Primer, available at www.w3.org/TR/xmlschema-0/. A public page on the W3C site at www.w3.org/XML/Schema provides a good set of pointers to various schema-related resources.
About Paul Grosso
Paul Grosso has over 25 years of experience in the field of software
design, electronic publishing, and markup technologies. He is
Arbortext's chief representative to the W3C, chairing the XML Core
Working Group.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
Swisscom, the Swiss telecom, is going into the cloud business. Its subsidiary Swisscom IT Services AG has signed up with Red Hat as a Certified Cloud Provider and launched a public cloud Infrastructure-as-a-Service (IaaS) cloud targeting enterprise-class customers primarily in ...
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP)....
In a surprise move on Tuesday, January 10, Oracle wheeled out its Big Data Appliance. That’s the one it said in October would be ready sometime in the first half. Only nobody believed it meant early in the first half. Heck, it’s not even clear anybody thought Oracle could make ...
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
CloudLinux, Inc., on Thursday released CafeFS 3, a virtualized file system for shared hosters that cages each customer within its own virtualized file system. CageFS becomes part of CloudLinux OS at no additional charge. CloudLinux OS, the only commercially-supported Linux OS m...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News

BEACHWOOD, Ohio, Feb. 16, 2012 /PRNewswire/ -- DDR Corp. (NYSE: DDR) today announced operating re...