|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
XML Protocols XML and Data Interchange
XML and Data Interchange
By: Mike Hogan
Apr. 12, 2000 12:00 AM
We've all heard the proclamations: XML is the language standard that enables seamless data interchange between disparate applications. First the Internet provided the physical connection. Now XML completes this by providing a common language that enables every application to exchange data. Because of this ability, XML will replace HTML as the new lingua franca of the Internet. This certainly sounds like the IT version of nirvana. Unfortunately, reaching nirvana is not that simple. XML provides an important foundation for representing richly structured data - it simplifies the problem of data interchange. Most companies are building some level of support for XML into their applications. But XML has an oft-overlooked weakness: it does not, by itself, solve the data interchange problem. It provides some common ground rules for how data is represented, but doesn't specify the contents or structure of that data. Because different applications use different content and data structures, there are many different approaches to modeling data in XML. As a result, we have multiple dialects of XML and are forced to try and transform data from one dialect into variant forms so it can be consumed by a disparate array of applications. Solving the data-transformation problem is critical since data interchange is at the heart of the burgeoning field of business-to-business e-commerce. This article looks at the problem as well as some new technologies and approaches to addressing it. XML standardizes many things including tagging, hierarchies, nesting, character coding, and more. These standards provide a core level of commonality across all implementations of XML. However, XML is an acronym for eXtensible Markup Language, and the key word is extensibility, which is what creates the challenge inherent in interchanging data. XML was made extensible so as to support the evolving needs of future applications. In short, developers are able to create their own tags - and the relationships among these tags - to suit their personal needs. This information is captured in schemata called document type definitions (DTDs) that can be included with XML or implied, as is the case with well-formed XML. Since XML's standardization by the W3C (World Wide Web Consortium) in 1988, the number of disparate DTDs has increased phenomenally. The problem is that many of these DTDs are competing to solve the same problem. Much like the Tower of Babel, there's been a fragmentation of dialects of XML - represented by DTDs. Some pundits predict a consolidation of XML dialects. The Internet, they claim, favors universal standards that facilitate interoperability. However, the nature of mankind and early evidence suggest that we can anticipate an increased balkanization of XML. As soon as a few companies form a group to define a standard DTD for their industry, another competing group is formed. Application vendors also play a role in this fragmentation. Companies benefit in many ways by defining the leading dialect in their market.
These are strong commercial reasons for application vendors to drive standards efforts. Add to this the unique needs of the various segments of any market and you have a recipe for increased incompatibility. For example, a consortium of companies led by Ariba has created cXML; a consortium led by CommerceOne has created a competing standard, CBL. SAP and Oracle are also creating their own dialects of XML in hopes of making them standards. As a result, XML, which many people viewed as the complete solution for data interchange, is turning into the modern-day Tower of Babel. Competing dialects of XML can differ in their content, the naming scheme for the tags and their hierarchical structure. One dialect might use the tag <fname> to describe a person's first name. Another might use the tag <first_name>, causing incompatibility. Or one dialect might have separate fields for first name and last name, while a competing dialect groups the first and last names together into a single full-name field. While the naming of the tags and the structure of the tagging scheme are considerations, the most difficult issue to deal with when transforming from one dialect of XML into another is when the basic content of the various dialects of XML differs. If one dialect includes the person's middle name and another doesn't, then there's a bigger problem. In this scenario you can perform a complete transformation only from the dialect with more data to the dialect with less data. But what do you do if you need to transform in the opposite direction? The problem grows when each dialect includes its own unique data. In this scenario you can't perform a complete translation between dialects in either direction. The complexity of various dialects of XML continues to increase at an amazing pace, exacerbating the problem. Not only are the various dialects becoming more complex as companies try to one-up the competition, but the base functionality of XML is also increasing through the addition of new extensions to XML. An example is the schema-scripting language XML Schema which enables the addition of data-type information to XML. This is a valuable capability because it enables type-aware processing and querying. For example, you can find a product whose expiration date is coming up in the next that. However, this additional data increases the difficulty in translating among various dialects, particularly those that offer different levels of support for these extensions. To make matters worse, XML Schema isn't a standard yet and there are alternative approaches to solving this problem as well. Fortunately, there is hope. One of the primary values of XML is the promise of simplified data interchange. This is most critical in business-to-business e-commerce, where companies need to exchange data among various back-office systems that store their data in disparate ways. XML greatly simplifies this process, but the fragmentation of dialects makes the development process very painful. To address this problem, there are two new technologies that solve the dialect transformation problem:
XSLT is now at the recommendation stage with the W3C. XML Script, on the other hand, is attempting to become a de facto standard. And, of course, achieving standardization by a standards body doesn't ensure that it will be accepted as a standard. Ultimately, it must suceed in the markeplace to become a standard. Determining the best approach for transformation also depends on your role in the process. If you're a recipient of XML data, you can fall back on a business solution. You may attempt to enforce your XML standards on the creator of the data. For example, in the buyer-seller relationship the buyer is in a superior position to coerce suppliers to support the buyer's standard form of XML. This relative strength has been demonstrated in the EDI world where companies such as the Big 3 automakers and WalMart have forced their respective suppliers to support their standard data formats. However, the recipient isn't always in such a position and must therefore look at alternative technology-based approaches, such as XSLT and XML Script, to address these needs. From the sender's perspective, there are two primary options: 1. The sender can generate a standard form of XML, then use tools like XSLT or XML Script to transform this data into the formats required by the recipient. 2. Build an intermediary data store that aggregates the necessary data from various internal sources, then stages it for transformation. The data can be stored as objects in order to maintain its richness and the superior data manipulation functionality offered by a database. For example, by storing the data in a database, the sender can leverage functionality, including rich and standard query capabilities and data type information. These capabilities are coming to XML via XMLQuery and XML Schema, but since these technologies aren't a standard yet, using a standard database can insulate the sender from the risk of building on evolving and nonstandard technologies. By leveraging a staging database, you can programmatically assemble the data into various XML dialects on the fly by simply populating XML templates. The richness of the programming and database languages simplifies this effort. POET Software was faced with this problem when designing its supplier-side business-to-business e-commerce product, the POET eCatalog Suite (eCS). While data transformation is just one feature of the complete eCatalog solution, it's an important one. In this case the other primary features included data cleansing (so the data is suitable for outside consumption), catalog customization and catalog delivery. To address the data-cleansing requirements, POET needed to stage the data in a separate repository, which also facilitated the ultimate solution for the transformation problem. POET used its object database, and this offered many advantages. Object databases use a hierarchical structure much like XML, are more extensible than their relational counterparts, offer superior performance when using a complex object model and offer zero administration, making them more embeddable. POET chose to store rich objects instead of native XML. The rich objects are less verbose and more easily searchable. More important, they enabled the product to store a superset of the data required by the various XML dialects necessary for output. From this data eCS generates the actual XML on the fly in the dialect required by the customer. This approach enabled POET to overcome the dialect problem because it could support various dialects, easily evolving as new ones were added or as the current ones evolved. Customers receive their data in the format they need, so the dialect transformation issue was moot. Of course, if every company generating XML data implemented a flexible data staging solution that translated the data into all of the various formats required by the recipients, the dialect problem would disappear. Unfortunately, as evidenced by the fragmentation of XML dialects, we can't expect all companies to solve their problems the same way. So don't expect the XML dialect problem to go away any time soon. Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News
|
|||||||||||||||||||||||||||||||||||||||||||||||||