XML Protocols
Exchanging Data Between Globally Accepted Technologies
Exchanging Data Between Globally Accepted Technologies
Sep. 20, 2001 12:00 AM
Today's decentralized, distributed environment for exchange of information can be confusing at times, but for the most part you can still encounter standardized, globally accepted technologies. Each of these technologies serves its respective domain well. But when the question of how data can be exchanged between these technologies arises, you face a lot of difficulties.
For instance, CORBA achieves interoperability between languages, hardware platforms, and operating systems quite well. But when it comes to the Internet, it's a bit more complicated to implement a CORBA-based system. Thus almost every technology has its pros and cons.
Interoperability Between Languages
This article focuses on the CORBA Web interface - keeping in mind the interoperability issue and how to make it work on the Internet. When we consider the hardware platform, some processors store the data in the little-endian or big-endian format. For instance, consider the char data type: it takes 1 byte of memory space so the question of whether it is little- or big-endian storage doesn't arise. On the other hand, when we look at the 16-bit integer, it's made up of 2 bytes that permit two different ways of storing these bytes in memory (see Figure 1).
It's up to the processor to decide how to write this data into memory. Some processors store the low-order byte at the starting address, known as little endian that is, the low-order byte at the address (x) and the high-order byte at the address (x+1). Other processors store the high-order byte at the starting address, that is, the high-order byte at the address (x) and the low-order byte at the address (x+1). So there's no agreement on which byte ordering to use. If you exchange information between systems without taking care of this byte ordering, you receive wrong data.
If the exchange of data involves byte-encoded transfer only, there's no problem in communicating between processors that use different endian schemas. Some of the technologies revolve around the char set as the basic medium of transfer of data.
But the world is not made up only of characters; you also have primitive data types, like the ones that exist in C++ that include int, short, and long, which are language constructs and user-defined types. There has to be some way of handling them. Languages such as C++, Java, and CORBA take care of these language constructs. But other technologies, such as XML and HTML, do not.
XML for CORBA
XML is a streamlined subset of SGML. It prefers to talk of structure rather than infrastructure, as in CORBA. XML is powerful because it lets developers create their own markup language; it's the key to creating markups that can be used by any number of applications beyond Web browsers.
All text that isn't markup constitutes the character data of the document. Choosing XML for CORBA is an added benefit in that you can define a markup for each of your IDL definitions (see Table 1).
HTTP and the WWW
HTTP is a well-known protocol used over the Internet for data transfer and has been in use since 1990. This protocol is a layer above the TCP/IP protocol. It's an application-level protocol that's very light and provides the speed necessary for distributed, collaborative, hypermedia information systems. It's a generic, stateless protocol that can be used for many tasks, such as Name servers and Distributed Object Management Systems, through the extension of its request methods (commands). A notable feature of HTTP is the "typing of data representation," which allows systems to be built independently of the data being transferred.
When HTTP, XML, and CORBA Work Together
Thus it's possible to combine the features of HTTP, XML, and CORBA to achieve overall interoperability while still maintaining the speed requirement.
In HTTP you have a set of common methods, such as GET and POST. These methods are flexible - you can extend them to contain additional information that is used during the method processing. The POST method can be used to send a block of data, such as the result of submitting a form, to a data-handling process. The POST method of HTTP can be used to pass the request from the client to the server. This feature of HTTP can be used to pass the CORBA request/response model.
The CORBA client request can be encoded into an XML document (request). The complete request can be passed onto the server, the HTTP server using the POST method. The HTTP server, upon receiving the request, can treat it differently from a normal HTTP request based on a special CORBA flag sent with the request or the URL the request was sent to. The server should pass the request to the CORBA layer again.
Before invoking the method, the real form of a CORBA request is again formed on the server side by browsing the XML document. The results of the method requested are encoded into the XML document and the complete response is again posted back to the client. This scenario looks simple but lots of innovation is required to accomplish this.
One of the problems encountered is encoding the primitive data types and the user-defined types into the XML request/response document. The XML Schema Working Group is currently working on this issue.
The XML file has two types:
- Document-oriented: Takes care of messages
- Data-oriented: Takes care of real data types
What Do We Understand by Data Types?
A data type is a 3-tuple consisting of:
- A set of distinct values called its value space
- A set of lexical representations called its lexical space
- A set of facets that characterize the properties of the value space, individual values, or lexical items
The requirements of this XML Schema language are to:
- Provide for primitive data typing, including byte, date, integer, sequence, SQL, and Java primitive data types.
- Define a type system that is adequate for import/export from database systems (e.g., relational, object, OLAP).
- Distinguish requirements relating to lexical data representation versus those governing an underlying information set.
- Allow creation of user-defined data types, such as data types that are derived from existing data types that may constrain certain properties (e.g., range, precision, length, format).
Using HTTP with CORBA has an added advantage in that HTTP works on a well-known port and even across firewalls or proxies. The second issue that needs to be solved is the kind of adapter that will take up the HTTP request with a CORBA flag and convert the HTTP request/response into a CORBA request/response type. There are people working in this direction also. One example of this is the protocol adapter, namely XORBA, which acts as a Web server adapter that automatically handles requests for CORBA services. These requests and responses use SOAP, an object-access mechanism that uses HTTP as the transport base and XML as the method for encoding information.
The Direction of Integration
Thus we have seen how powerful it is to integrate the features of HTTP, CORBA, and XML. The "typing of data representation," a notable feature of HTTP, allows systems to be built independently of the data being transferred. Similarly, the use of XML as the data exchange format between systems will continue to grow - and we can look forward to many more innovative ways of using XML and CORBA together. It's just a matter of time.
Side bar 1
A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646] (see also [ISO/IEC 10646-2000]). Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. The use of "compatibility characters," as defined in section 6.8 of [Unicode] (see also D21 in section 3.6 of [Unicode3]), is discouraged.
Side bar 2
The definition of markup from the XML Specification: "Markup takes the form of start-tags, end-tags, empty-element tags, entity references, and character references. It also takes care of comments, CDATA section delimiters, document type declarations, processing instructions, XML declarations, text declarations, and any white space that is at the top level of the document entity (i.e., outside the document element and not inside any other markup)."
About Rupak KumarRupak Kumar is a Software Architect at iCMG (http://iCMGworld.com). He is involved in design and development of K2 Component Server, the first of its kind in adopting CORBA® Component Model (CCM) including K2 CCM Container. He is also involved in building bridges for CORBA Components inter-working with EJB Components and the vice versa. His other area of interest include IDL2XML round trip messaging. Rupak can be reached at rupak@icmg.nu