Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
The Problem with XML-Based Storage
The Problem with XML-Based Storage

As Java and XML continue as the de facto standard for developing enterprise applications, issues arise in using these technologies. For example, the need to store XML data, and the criteria for selecting the appropriate repository. Here's a real-world example of how applying XML to the wrong problem can lead to the wrong solution.

I inherited a solution last year from a team that needed to develop a solution quickly and didn't have all the requirements outlined for them. They knew some of the basic structures and relationships between data elements, but they were convinced that the initial schema wasn't going to be the final one. While this is a problem faced by most development teams, they were expected to release new functionality on a weekly basis.

The problem involved the development and maintenance of a B2B fulfillment application. The main components of the fulfillment system were customer catalogs and orders. They chose XML as the mechanism for defining customer catalogs and expression orders. Initially this worked out great for catalogs because they were able to extend and add relationships between existing and new elements on a near real-time basis. This also worked out great for order objects because they needed to share data between multiple systems. Integration with other systems was achieved through the combination of Java servlet channels and XML messages. They ended up selecting a company with an innovative XML repository.

The partnership blossomed until we encountered some operational and growth issues. Some dealt with the maturity of the product. As with most nonrelational databases, some concerned the performance of writes as opposed to small reads. Large reads became a problem as there was no caching or cursor mechanism built inside the engine. HTTP, the mandated protocol to query the database and manage data, isn't the most efficient protocol, as many of you know. While it makes sense from a Web-accessibility perspective, access to the database was within the internal hosted network - from servlets, never from the Web.

We discovered a space reclamation problem. Documents marked for deletion were never deleted at all. This meant that document replacement took twice as much space as document creation. To reclaim space we needed to run a defrag utility every few weeks that would search through the system and delete the marked documents. Furthermore, the instability of the database forced us to reindex it every time it was brought down for backups. An additional side effect was that, in order to reduce the number of persistence engines, some RDBMS behavior inside the XML repository was replicated. While some of this behavior was unique to the vendor they chose, other parts were relevant to most nonrelational XML repository solutions.

At the time, we didn't consider RDBMS tools; we believed our schema had to be adaptable and were fearful of the limitations associated with a rigid schema. Also, it wasn't until recently that RDBMS systems have added credible XML support to their environments. At the request of our customer and to alleviate some of our operational issues, we decided to look into porting the system to an RDBMS with XML support.

Let me outline the lessons we learned:

  • While many XML repositories are based on object-oriented databases, the maturity level and operational aspects associated with them are not very robust. When looking at an XML repository, keep in mind that many of the other relationships you need to define inside your application are non-XML. Unless you want to maintain two different systems, one for XML documents and one for data relationships, you'll end up creating relational relationships inside the XML repository. Based on our findings, there's little benefit to this approach.
  • Identify the speed requirements of your system. If your application is doing a lot of writes and very few reads, it won't benefit from an XML repository.
  • Define the data access protocols.. Are they binary based like JDBC, or character based like HTTP?
  • Evaluate the operational utilities of the solution. What is the procedure to perform a backup? How long does it take to perform a 10GB backup? How are backups restored? Are there any memory leaks or space reclamation problems? How often is the repository required to be indexed?

    In summary, don't assume. You know what they say!

    About Israel Hilerio
    Israel Hilerio is a program manager at Microsoft in the Windows Workflow Foundation team. He has 15+ years of development experience doing business applications and has a PhD in Computer Science.

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Latest Cloud Developer Stories
    Many times over the last year I have been asked the question, "What is Windows Intune?" I like to describe Windows Intune as the cloud service helps you centrally manage and secure your PCs through a simple, web-based console. Released back in March 2011, Windows Intune has alre...
    Why are APIs so important in clouds? Do APIs have to be open? How fast or slow will standardization in the cloud be? Why is ensuring high availability for the cloud service critical? In his session at the 10th International Cloud Expo, Mårten Mickos, CEO of Eucalyptus Systems, w...
    Very few trends in IT have generated as much buzz as cloud computing. In his session at the 10th International Cloud Expo, Mark Hinkle, Director, Cloud Computing Community at Citrix, will cut through the hype and quickly clarify the ontology for cloud computing. The bulk of the c...
    The proliferation of device connectivity is redefining the functionality requirements and capabilities of many embedded systems as more and more of these devices look to leverage the “Cloud.” While many commercial software and hardware component vendors have begun to realign thei...
    Hardware and chemistry improvements will make the $1,000 human genome a reality soon. While the massive amount of genomics data that will be generated represents a huge opportunity to advance personal medicine, it also presents an enormous big data challenge. In his session at ...
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON Featured Whitepapers
    ADS BY GOOGLE

    Breaking Cloud Computing News

    Alvarion Ltd. (NASDAQ:ALVR) a provider of optimized wireless broadband solut...