Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
XML and Data Interchange
XML and Data Interchange

We've all heard the proclamations: XML is the language standard that enables seamless data interchange between disparate applications. First the Internet provided the physical connection. Now XML completes this by providing a common language that enables every application to exchange data. Because of this ability, XML will replace HTML as the new lingua franca of the Internet. This certainly sounds like the IT version of nirvana. Unfortunately, reaching nirvana is not that simple.

XML provides an important foundation for representing richly structured data - it simplifies the problem of data interchange. Most companies are building some level of support for XML into their applications. But XML has an oft-overlooked weakness: it does not, by itself, solve the data interchange problem. It provides some common ground rules for how data is represented, but doesn't specify the contents or structure of that data. Because different applications use different content and data structures, there are many different approaches to modeling data in XML. As a result, we have multiple dialects of XML and are forced to try and transform data from one dialect into variant forms so it can be consumed by a disparate array of applications.

Solving the data-transformation problem is critical since data interchange is at the heart of the burgeoning field of business-to-business e-commerce. This article looks at the problem as well as some new technologies and approaches to addressing it.

XML standardizes many things including tagging, hierarchies, nesting, character coding, and more. These standards provide a core level of commonality across all implementations of XML. However, XML is an acronym for eXtensible Markup Language, and the key word is extensibility, which is what creates the challenge inherent in interchanging data. XML was made extensible so as to support the evolving needs of future applications. In short, developers are able to create their own tags - and the relationships among these tags - to suit their personal needs. This information is captured in schemata called document type definitions (DTDs) that can be included with XML or implied, as is the case with well-formed XML. Since XML's standardization by the W3C (World Wide Web Consortium) in 1988, the number of disparate DTDs has increased phenomenally. The problem is that many of these DTDs are competing to solve the same problem. Much like the Tower of Babel, there's been a fragmentation of dialects of XML - represented by DTDs.

Some pundits predict a consolidation of XML dialects. The Internet, they claim, favors universal standards that facilitate interoperability. However, the nature of mankind and early evidence suggest that we can anticipate an increased balkanization of XML. As soon as a few companies form a group to define a standard DTD for their industry, another competing group is formed.

Application vendors also play a role in this fragmentation. Companies benefit in many ways by defining the leading dialect in their market.

  • They're perceived as technology leaders by the market; the mantle of technology leadership typically translates into market-share leadership.
  • The company that leads such an effort can influence the direction of the standard to support or feature the capabilities of its own application, again increasing its positioning vis-ˆ-vis the competition.
  • By having advanced insight into the direction of the standard, these companies can tune their products to exploit the future standard, thereby gaining a time-to-market advantage.

    These are strong commercial reasons for application vendors to drive standards efforts. Add to this the unique needs of the various segments of any market and you have a recipe for increased incompatibility. For example, a consortium of companies led by Ariba has created cXML; a consortium led by CommerceOne has created a competing standard, CBL. SAP and Oracle are also creating their own dialects of XML in hopes of making them standards. As a result, XML, which many people viewed as the complete solution for data interchange, is turning into the modern-day Tower of Babel.

    Competing dialects of XML can differ in their content, the naming scheme for the tags and their hierarchical structure. One dialect might use the tag <fname> to describe a person's first name. Another might use the tag <first_name>, causing incompatibility. Or one dialect might have separate fields for first name and last name, while a competing dialect groups the first and last names together into a single full-name field. While the naming of the tags and the structure of the tagging scheme are considerations, the most difficult issue to deal with when transforming from one dialect of XML into another is when the basic content of the various dialects of XML differs. If one dialect includes the person's middle name and another doesn't, then there's a bigger problem. In this scenario you can perform a complete transformation only from the dialect with more data to the dialect with less data. But what do you do if you need to transform in the opposite direction? The problem grows when each dialect includes its own unique data. In this scenario you can't perform a complete translation between dialects in either direction.

    The complexity of various dialects of XML continues to increase at an amazing pace, exacerbating the problem. Not only are the various dialects becoming more complex as companies try to one-up the competition, but the base functionality of XML is also increasing through the addition of new extensions to XML. An example is the schema-scripting language XML Schema which enables the addition of data-type information to XML. This is a valuable capability because it enables type-aware processing and querying. For example, you can find a product whose expiration date is coming up in the next that. However, this additional data increases the difficulty in translating among various dialects, particularly those that offer different levels of support for these extensions. To make matters worse, XML Schema isn't a standard yet and there are alternative approaches to solving this problem as well.

    Fortunately, there is hope.

    One of the primary values of XML is the promise of simplified data interchange. This is most critical in business-to-business e-commerce, where companies need to exchange data among various back-office systems that store their data in disparate ways. XML greatly simplifies this process, but the fragmentation of dialects makes the development process very painful. To address this problem, there are two new technologies that solve the dialect transformation problem:

  • XML Stylesheet Transformation (XSLT): An extension of the XSL draft proposal to the W3C, it provides a means for building standard templates for transformation between dialects, then populating the templates with the data from the source XML.
  • XML Script: Developed by Decisionware, it's a scripting language that defines the specific transformation procedures and is embedded directly into the XML itself.

    XSLT is now at the recommendation stage with the W3C. XML Script, on the other hand, is attempting to become a de facto standard. And, of course, achieving standardization by a standards body doesn't ensure that it will be accepted as a standard. Ultimately, it must suceed in the markeplace to become a standard.

    Determining the best approach for transformation also depends on your role in the process. If you're a recipient of XML data, you can fall back on a business solution. You may attempt to enforce your XML standards on the creator of the data. For example, in the buyer-seller relationship the buyer is in a superior position to coerce suppliers to support the buyer's standard form of XML. This relative strength has been demonstrated in the EDI world where companies such as the Big 3 automakers and WalMart have forced their respective suppliers to support their standard data formats. However, the recipient isn't always in such a position and must therefore look at alternative technology-based approaches, such as XSLT and XML Script, to address these needs.

    From the sender's perspective, there are two primary options:

    1. The sender can generate a standard form of XML, then use tools like XSLT or XML Script to transform this data into the formats required by the recipient.

    2. Build an intermediary data store that aggregates the necessary data from various internal sources, then stages it for transformation. The data can be stored as objects in order to maintain its richness and the superior data manipulation functionality offered by a database. For example, by storing the data in a database, the sender can leverage functionality, including rich and standard query capabilities and data type information. These capabilities are coming to XML via XMLQuery and XML Schema, but since these technologies aren't a standard yet, using a standard database can insulate the sender from the risk of building on evolving and nonstandard technologies. By leveraging a staging database, you can programmatically assemble the data into various XML dialects on the fly by simply populating XML templates. The richness of the programming and database languages simplifies this effort.

    POET Software was faced with this problem when designing its supplier-side business-to-business e-commerce product, the POET eCatalog Suite (eCS). While data transformation is just one feature of the complete eCatalog solution, it's an important one. In this case the other primary features included data cleansing (so the data is suitable for outside consumption), catalog customization and catalog delivery. To address the data-cleansing requirements, POET needed to stage the data in a separate repository, which also facilitated the ultimate solution for the transformation problem.

    POET used its object database, and this offered many advantages. Object databases use a hierarchical structure much like XML, are more extensible than their relational counterparts, offer superior performance when using a complex object model and offer zero administration, making them more embeddable. POET chose to store rich objects instead of native XML. The rich objects are less verbose and more easily searchable. More important, they enabled the product to store a superset of the data required by the various XML dialects necessary for output. From this data eCS generates the actual XML on the fly in the dialect required by the customer. This approach enabled POET to overcome the dialect problem because it could support various dialects, easily evolving as new ones were added or as the current ones evolved. Customers receive their data in the format they need, so the dialect transformation issue was moot.

    Of course, if every company generating XML data implemented a flexible data staging solution that translated the data into all of the various formats required by the recipients, the dialect problem would disappear. Unfortunately, as evidenced by the fragmentation of XML dialects, we can't expect all companies to solve their problems the same way. So don't expect the XML dialect problem to go away any time soon.

    About Mike Hogan
    Mike Hogan is the vice president of corporate development at POET Software, a B2B e-commerce company. In this role he crafts the company's strategy and establishes and manages strategic partnerships.

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Latest Cloud Developer Stories
    Swisscom, the Swiss telecom, is going into the cloud business. Its subsidiary Swisscom IT Services AG has signed up with Red Hat as a Certified Cloud Provider and launched a public cloud Infrastructure-as-a-Service (IaaS) cloud targeting enterprise-class customers primarily in ...
    Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP)....
    In a surprise move on Tuesday, January 10, Oracle wheeled out its Big Data Appliance. That’s the one it said in October would be ready sometime in the first half. Only nobody believed it meant early in the first half. Heck, it’s not even clear anybody thought Oracle could make ...
    Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
    CloudLinux, Inc., on Thursday released CafeFS 3, a virtualized file system for shared hosters that cages each customer within its own virtualized file system. CageFS becomes part of CloudLinux OS at no additional charge. CloudLinux OS, the only commercially-supported Linux OS m...
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON Featured Whitepapers
    ADS BY GOOGLE

    Breaking Cloud Computing News
    As client demand for engagements increases, Revel Consulting (www.revelconsulting.com), a Kirkland, ...