Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Preparing for Tomorrow - Today
Preparing for Tomorrow - Today

Today many companies are evaluating the application of XML to their technology initiatives. With all its potential, performance, scalability, and accessibility implications need to be considered when developing an implementation strategy utilizing XML.

We all know that XML is the enabler for doing business over the Internet. Its self-descriptive nature simplifies the exchange of data between parties, making it a powerful standard that simplifies B2B communication. Yet, while this ease of interaction has many benefits - speeding transaction times, opening up new channels, establishing access to data never before attainable - it introduces a challenge that could actually inhibit XML's mass adoption. Namely, if XML is as successful as we expect it to be and the number of XML business transactions grows exponentially, can the currently installed B2B infrastructure support such unprecedented levels of activity?

This four-part series addresses this question by studying several system components likely to be impacted by the large volume of XML transactions generated in an automated commerce chain. The series will focus on key attributes that are critical for a successful B2B system and demonstrate that proper advanced planning will ensure the scalability and performance of XML-based systems as XML transaction levels increase.

This first article explores what happens to a transaction when it's represented in XML and how it impacts the performance and scalability of B2B e-commerce systems. In addition, I'll identify the key attributes of a B2B-XML transaction and the impact of this additional context on performance and scalability. Understanding this cause-and-effect relationship is the first step in assessing the impact of using XML.

The Nature of XML
To discuss the performance and scalability of XML-based systems, it's important to understand the impact of converting a transaction to XML. You're probably familiar with many of the attributes of XML: it's self-describing, flexible, and actionable, making it the ideal foundation for B2B interoperability. However, each of its positive attributes (flexibility, extensibility, ease of use, and platform independence) comes at a price: transaction size, externally defined data structures, text representation of data, and text representation of attributes and qualifiers that establish a well-formed XML transaction. Each attribute contributes in a different way to the overall performance and scalability of the XML-based system.

Figure 1 illustrates a typical business transaction, a purchase order (P.O.) between Buyer 1 and Supplier A for 100 widgets. The P.O. transaction content is shown in a printed format, as well as a more compressed delimited format. Historically, both of these formats have communicated the transaction between business partners. But though they contain the information required by the business user, they fail to provide the supporting information required by the user's application to automatically process and act on the content of the transaction. Typically, users rekey this information into their business application in order to process the transaction.

Now let's take a look at this same transaction when represented in XML. One advantage of XML is that it provides a description of the content in the transaction that's separate from the actual transaction data. This document type definition (DTD) contains information that describes the data contained in the transaction. In some cases the DTD is included within the body of the XML transaction; in others it's a separate file that's referenced only within the XML transaction. Listing 1 shows the same P.O. transaction in XML. Note that this transaction has a unique DTD - <!DOCTYPE PURCHASEORDER SYSTEM> - that's not included within the XML representation of this P.O.

While XML defines the alphabet (encoding) and grammar of a language, it doesn't provide any context, which is needed if a B2B conversation is to have any meaning. Many parallel efforts are driven by a wide range of standards bodies to create dictionaries that will provide context for specific business communities' emerging standards bodies. These efforts hold the promise of providing the context that's lacking in the XML specification itself. One standards body, the Open Applications Group, Inc., (OAGI) (www.openapplications.org), defines its dictionary in terms of business object documents (BOD). Each BOD represents a specific function within the business process. When the data from a P. O. transaction is represented in the form of an OAGI BOD, there will always be an increase in the size of the transaction. Listing 2 demonstrates this increase when the data from Listing 1 is encoded in the OAGI Process P.O. BOD.

These examples demonstrate that the OAGI version of the XML is more comprehensive in its content and context than any of the formats shown previously in Figure 1 and Listing 1. The P.O. BOD is flexible enough to handle variations in language, time zone, and units of measure. This additional specification allows more flexibility and enables the recipient of the transaction to act more precisely while executing the order. The OAGI BOD attributes include:

  • Greater flexibility: Note the date format and ability to specify time zones.
  • Extensibility: User and partner specific areas provide for additional information.
  • Ease of use: The tags that provide business context and structure are well defined.
The question remains: Are these benefits worth the cost from a performance and scalability perspective?

It's Not the Size of the XML That Matters - It's What's in It That Counts
First and foremost, the growth in the size of the transaction when it's flexibly described using a specific XML business dialect, as in Listing 2, is significant. In Figure 1, the size of the transaction with formatting is 300B, and this could be compressed into a delimited data interchange that would further reduce the size to 180B. This same transaction represented in XML without a business-oriented dialect would expand more than three times (see Listing 1). When you add the flexibility offered by a business-oriented DTD, such as the OAGI BOD, the file becomes five to 10 times larger, as shown in Listing 2. In this example our 300B transaction grew to 2,992B - almost 10 times the size of the original transaction - once the OAGI XML BOD was applied.

If the source transaction is increased by a factor of 10, then on a linear basis one can predict the impact. If the current B2B capabilities support 100 transactions per supplier per day, then 10 suppliers require 600MB of capacity for the XML transactions alone over the course of a year. Compare this to the 40MB required when the transaction is in a delimited form, and the effect of XML utilization on bandwidth and storage capacity becomes apparent. Note that these figures don't take into account the overhead associated with the indexing, filtering, and segmentation of the transaction for the purpose of retrieval, nor do they account for the size of the DTD, which is accessed with the transaction at the time of processing.

All Tags Are Equal... But Some Are More Equal Than Others
Another key attribute of XML is its self-describing capability through the use of tags that surround the XML data elements. The DTD provides a data representation for any given transaction, and this representation may be included as part of the transaction. To properly parse the XML transaction, a system must first read the DTD, which tells the system what elements to expect and what relationships exist between the various elements. It's this information contained within the DTD that enables an application to act on the data contained in a given XML transaction. When processing text using XML, no additional processing is required for transforming or presenting the data. But when processing dates or numbers, it's an entirely different matter.

The application that receives an XML transaction must typically take some action based on a numeric calculation performed on the data contained in the transaction. Since numeric fields in XML are represented as text, the parser must first convert the data into a numeric representation. This must be done prior to the application performing its calculation.

The performance implications of this XML implementation detail are more difficult to predict. In general, numeric calculations are more performance intensive than date calculations. Text calculations are the least intensive. To estimate the impact of this attribute, the logic applied to a given transaction must be broken down into three types of discrete elements for the transaction. The more calculations based on a number or date element, the greater the performance and scalability impact to the system.

Validate Now or Validate Later....Either Way, You Must Pay
Finally, consider the use of qualifiers and required elements within the DTD and their impact on performance. One of the key strengths of XML is its flexibility. By defining elements as optional, it's possible to generalize transactions by type, so the transaction can be made applicable to many diverse situations. As a result of this flexibility, however, the processing required to resolve these transactions requires more application logic. This additional logic will also influence system performance.

One alternative to improve performance is to embed the logic wherever possible in the DTD as required elements or through qualifiers. While this reduces the runtime processing, it also makes the test for a well-formed transaction more processor intensive since there are additional steps to test this transaction against. This impacts performance and affects the scalability of the system. Flexibility presents some additional challenges or requires an extra step, which is eliminated as soon as the transaction is validated.

Planning for Performance and Scalability
The performance impact of these attributes, while considerable, shouldn't be used as an argument to dismiss XML. On the contrary, by knowing the cause and effect of these attributes on a given XML system, it's possible to balance performance and scalability against the flow of transactions into the production environment. If a system requires flexibility, extensibility, ease of use, and platform independence, then XML is very appropriate. By knowing the impact of these key attributes on the performance of a specific system, it becomes apparent how to plan for maximum performance and scalability.

For example, consider the choice between a system that stores XML transactions and one that resolves transactions into a specific database schema. Considering the performance implications of XML's key attributes, it's important to weigh the need for change in the transaction and the variations to those supplier transactions versus the need to resolve the transactional content into its constituent data types and relationships for application performance. If an XML system must accept a significant volume of transactions, and those transactions are consistent across suppliers, then the need for adaptability is secondary. In this case the system could be designed to utilize XML as an interoperability layer, and the transactions can be resolved into a database as they arrive.

Mapping directly into a database structure, however, isn't always appropriate. For example, if there are many different types of transactions across a range of time with many systems, it becomes extremely difficult, if not impossible, to know in advance exactly what data elements will need to be stored or the relationship between elements. Without this advance knowledge, the database schema can't be designed to maximize performance.

If the situation is more dynamic, however, the decision may be different. Consider the need to provide dynamic analytics on a multitude of transaction types, across a wide span of time, and with various systems. In this instance the required flexibility will almost certainly demand the data be stored in native XML. Knowing that more performance is required when handling XML, one must design appropriate performance and scalability into the system. Knowing the influencing XML factors, such as the cost of numeric calculations or growth in size of the transaction, allows the system to be architected so it can grow as needed.

Parting Thoughts
The performance impact of XML attributes shouldn't discourage programmers and system architects from using XML. On the contrary, by knowing the cause and effect between the attri butes and a given XML system, it's possible to balance performance and scalability against the flow of transactions into one's environment. By understanding how XML is used in an environment, it becomes apparent how to plan for performance and scalability.

To maximize the performance and scalability of an XML system, you must balance the requirements of the enterprise with the role of XML. If the requirements dictate an adaptable dynamic solution where the evolution of the solution is unclear, then maintaining the transactions within the system as native XML will pay dividends that far outweigh the performance impact of doing so. If it's clear that the enterprise solution can be bound to a single data model that will evolve slowly and in a predictable way, then performance and scalability can be allowed to dictate the system implementation, and the role of XML becomes that of an integration layer that maps nonconforming transactions into the relational data model.

Given the turbulence and infancy of today's B2B landscape, I recommend focusing on the opportunity XML offers to understand fully the impact of that decision from the performance and scalability perspective. As the adoption of XML grows and as XML tools and applications become more prevalent, the performance and scalability discussion will focus on the specific implementation details.

Part 2 will focus on storage and retrieval issues associated with using XML. I'll discuss the scalability, performance, and context implications associated with storing XML in its native format versus resolving it to a database. If you'd like me to discuss some particular aspect of this topic, e-mail me at the address below.

Reference
The Open Applications Group, Inc. www.openapplications.org.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have techn...
Nimble, the social CRM platform has announced the launch of Nimble 2.0, billed as the “most social” CRM platform on the market today. Nimble was designed entirely with social CRM in mind and is the first social business platform that empowers companies with the ability to get clo...
2011 was a year of rapid adoption for public and private cloud services. Instant and on-demand server provisioning was the driving force behind the massive growth. On top, cloud server templates and script automation simplified application installation for simple and pre-defined ...
"Having been in the IT field for many years, I believe the cloud computing chapter in the industry is an exciting one and I am proud to be a part of it," said National Reconaissance Office (NRO) Chief Information Officer Jill T. Singer Tuesday, as it was announced that she was on...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE