Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Complete Data Integration Through XQuery
Vastly simplifing SOA implementations

Most businesses have an urgent need for up-to-date, accurate information based on data from multiple data sources. It would be much easier if all your data were stored in one database so it can be queried as a whole, but this is rarely practical. In the real world, data integration is required. You need a simple, efficient way to query data found in various data sources.

Suppose you use information about customers in your Service Oriented Architecture (SOA). Your company might use an external CRM system like salesforce.com for leads and customer data, an internal ERP system like PeopleSoft for processing orders, a dedicated software solution to track technical support calls, and one or more databases to store customer information not captured by these other systems. Each of these data sources has a different representation of a customer, a different API and data model, and perhaps a different query language. Nevertheless, you have to be able to combine this data intelligently to get an overview of a customer.

For instance, you may want to generate one report with the overall status of a customer, or you may want to find all customers with outstanding tech support issues who are deciding whether to make a major purchase this quarter. If all of your data were in a single database, you would retrieve the information with a simple query. Because the data is in many different sources, you have to write a good deal of code to get the same result, and the code is quite different for each data source. This is time-consuming, error prone, and complicates security and auditing. With XQuery, you query each data source as though it were XML, no matter how the underlying data is physically stored.

XQuery is the World Wide Web Consortium (W3C) standard XML query language, designed for both XML processing and data integration. Using XQuery for data integration vastly simplifies SOA implementations, making your developers more productive and improving the performance of your systems. An XML Integrated Development Environment (IDE) that supports XQuery makes it much easier for you to visualize data sources, generate and test queries, and debug. The queries you develop can be exposed via a data access layer, which is accessed using SOAP or HTTP, so that they can be reused in different SOAP message formats or in other applications.

XQuery Simplifies Data Integration
XQuery simplifies data integration in two ways. First, it provides native support for XML and for the operations most frequently needed when working with XML. Today, XML is at the heart of most data integration, and this is certainly true for SOA environments where every SOAP message is expressed in XML. Most languages don't support XML natively. In general, programming languages are based on objects or structures where query languages are based on relational tables and scripting languages are based on text. XQuery is based on XML, and XML is the only data structure in XQuery. In the same way that SQL queries tables to produce tables, XQuery queries XML to produce XML - and the XML produced by an XQuery can be used directly in XML applications. For example, a query result might be the payload of a SOAP message. XQuery provides direct support for querying, creating, and transforming XML.

One frequently used expression in XQuery, the FLWOR expression, is similar to SQL's SELECT-FROM-WHERE. Because XML structures are more complex than SQL tables, XQuery provides path expressions that can identify any item in an XML structure. To create structures in query results, it also provides constructors, using a syntax that looks like the XML to be constructed. A typical XQuery might use path expressions to locate data, FLWOR expressions to perform joins and combine data, and constructors to create the structures of the query result. These tasks are much more tedious with conventional programming languages. For instance, to achieve the same result with the Java DOM API, this would require parsing, navigating object structures, casting values from XML into Java data types, creating a result tree structure, and appending nodes to that result tree. In general, conventional programming languages require seven to 20 times more code than an equivalent XQuery. Not only are XML applications harder to write in conventional programming languages, performance can be much better in a good XQuery implementation, because XQuery is a declarative language that allows the implementation to do many useful kinds of query optimization.

The second way XQuery simplifies data integration is by eliminating the need to work with different APIs and data models for each data source. The XQuery language is defined in terms of XML structures, but since almost any data can be mapped into XML structures, an XQuery implementation can use XQuery to query just about anything. For instance, an XQuery implementation can provide support for relational data, implementing queries by generating efficient SQL for the database, but allowing a user to query the data as though it were XML.

By treating all data sources as XML, this kind of XQuery implementation lets a developer query relational data, Web message calls, and other data sources together, with a small amount of declarative code, in one uniform data model, without mastering the idiosyncrasies of each system.

Consider the customer example in the introduction. With an XQuery implementation that supports all of the underlying data sources, a developer can write a simple query to do a join among the different systems that represent different aspects of a customer. This dramatically simplifies software development in most business environments. The developer focuses on the information that's needed, not on the representation used in each system. Typically, the code savings in data integration environments is even greater than in pure XML environments.

The available data sources and the implementation strategy vary widely among XQuery implementations. For relational data, an implementation may translate an XQuery into SQL then translate the SQL result sets to XML when returning results to the query engine. For flat file formats, an implementation can provide XML converters that actually convert data to XML on-the-fly when it's queried. Web Service calls may be supported using functions that can be called from within a query. When choosing an XQuery implementation, make sure that it fits in your computing environment and can handle the data sources needed in your architecture. The XQuery implementations from most database vendors are designed to query only data stored in their database; most companies have more than one database, and data not found in a database.

The XQuery implementations from application server vendors or XML integration server vendors can query a wider range of data sources, but require the adoption of their server, which may not fit in your architecture, or may increase the footprint of the system. If you're writing Web Services in a Java environment, make sure your implementation supports the XQuery API for Java (XQJ), which is the standard Java interface for XQuery - it lets your servlets use XQuery the same way that JDBC lets servlets use SQL. Also, the performance of XQuery implementations varies dramatically - make sure that you test performance for the data you work with, especially if you're using XQuery for relational data or very large XML files. Because XQuery is declarative and can be optimized, a good implementation will provide performance better than you normally achieve with hand-coded Java, JDBC, SQL, and an XML API.

Using XQuery vastly simplifies data integration, offering loosely coupled access, and providing one way to query any data source supported by the query engine. And because an XQuery implementation can talk directly to the original data source, it can do optimizations that are no longer available once the data is extracted and converted to physical XML. As a result, what is easier for the developer also results in better performance.

XML Development Environments for Data Integration
It's hard to understand the relationships among data without some way to visualize the data. This is particularly true when working with data from multiple sources. When doing data integration, look for an IDE that lets you visualize as many of your data sources as possible, supports general XML functionality, and has good support for XQuery. Some of these tools let you establish database connections, drag-and-drop from data sources to create XQuery code, run queries and see their output, and run a debugger to help find bugs. These tools make developers more productive.

When choosing an IDE for data integration with XQuery, consider related functionality that you may need. For instance, some IDEs also provide support for developing XML pipelines and publishing. Several IDEs can generate XQJ code to run an XQuery as part of a program. One XQuery IDE is implemented as an Eclipse plug-in, which is very convenient for Java developers who use Eclipse. Several IDEs also provide good support for writing and testing XSLT stylesheets, W3C XML Schemas and DTDs, and related XML development.

The Data Access Layer In most companies, several data consumers need to access the same information. For instance, if one of your Web Services needs a description of a customer, this same description might also be useful for other Web Services, and also for dynamic Web sites, AJAX clients, publishing applications, or any other application that needs customer data. Frequently companies design for a single project, coding very similar interfaces for each data consumer, an obvious waste of programming effort. And if the data sources change, each of these interfaces has to be rewritten. In environments where security and auditability are important, much more code must be audited.

A data access layer lets many data consumers access data using the same well-defined interface. For each request, the data access layer calls a data service. Data services should represent the business model, hiding underlying systems and the data integration task from data consumers. For instance, you might write a data service that provides the data for a single customer. A data service can be parameterized - a parameter might identify the customer ID or the name of a particular view of the customer.

Many data services do nothing more than query data from one or more data sources to produce XML. These data services can be written directly in XQuery, using external variables to allow queries to be parameterized. In other data services, an XQuery may be part of a Java program that performs business logic or interacts with other systems, or it may be part of an XML pipeline. A small focused team can be responsible for writing the queries to implement data services, and for documenting available services, allowing data consumers to access these services using standard Web and XML interfaces.

Summary
XQuery provides a simple way to query data across data sources, providing simple, efficient data integration. With a good XQuery implementation, any data source can be queried as though it were XML, and any desired XML structure can be created as the result of a query. For instance, a query can take relational data and other data sources as input, and return the payload for a SOAP message. XQuery increases productivity by freeing developers from the need to learn a different API and data model for each data source, and provides direct support for the operations commonly used in XML. Depending on the XQuery implementation, data sources might include relational data from one or more databases, XML files, Web Service calls, EDI, and legacy file formats among others. An XML development environment that allows visualization of multiple data sources and provides support for XQuery can further enhance developer productivity. When many data consumers need access to the same kinds of data, data integration can be done in a data access layer that provides a set of data services, representing the business model, that hide the details of data integration and allow reuse of data integration code.

Because businesses need up-to-date information that comes from a variety of data sources, but the proper tools and development methods have lagged behind, today's software systems are often needlessly complex and ad hoc. Modern data integration tools are the solution. Using XQuery, an XML IDE, and a data access layer simplifies development significantly, improves performance, increases code reusability, and makes systems more maintainable.

About Jonathan Robie
Jonathan Robie is the XML program manager at DataDirect Technologies. Before joining DataDirect, Jonathan was an XML research specialist at Software AG. Jonathan works very closely with the W3C; he is a co-author of the XQuery specification, has participated in several W3C Working Groups, and speaks regularly at XML conferences. Jonathan wrote an XQuery tutorial for a book called XQuery from the Experts which is now available on Amazon.com.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
Swisscom, the Swiss telecom, is going into the cloud business. Its subsidiary Swisscom IT Services AG has signed up with Red Hat as a Certified Cloud Provider and launched a public cloud Infrastructure-as-a-Service (IaaS) cloud targeting enterprise-class customers primarily in ...
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP)....
In a surprise move on Tuesday, January 10, Oracle wheeled out its Big Data Appliance. That’s the one it said in October would be ready sometime in the first half. Only nobody believed it meant early in the first half. Heck, it’s not even clear anybody thought Oracle could make ...
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
CloudLinux, Inc., on Thursday released CafeFS 3, a virtualized file system for shared hosters that cages each customer within its own virtualized file system. CageFS becomes part of CloudLinux OS at no additional charge. CloudLinux OS, the only commercially-supported Linux OS m...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News

The Khronos™ Group, an industry consortium creating open standards for the accelera...