Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Data Mapping to JDBC, XML, and Beyond
Data Mapping to JDBC, XML, and Beyond

These paradigm changes have greatly increased my power to express program logic, such that my programs have gotten smaller, simpler, and much easier to understand, while supporting ever-increasing user capabilities. When I started programming, I worked with simple command-line interfaces and text-based “green screens.” Next I produced “fat-client” graphical user interfaces, and now I work on Web-enabled user interfaces. Again, each paradigm switch has greatly increased user power, flexibility, and ease of use while the code required to produce the interfaces has decreased and is much simpler to understand.

Data Storage and Retrieval Problems
Unfortunately, I haven’t seen the same kinds of advances in data retrieval and storage. In fact, I think we’ve declined in that area as an increasing number of data source/data sink technologies such as XML, guaranteed messaging, and directory services have come into mainstream development. Besides the user interface, of course, relational databases used to be the sole data source/sink technology I dealt with. As such, programming environments of the recent past provided first class support. My PowerBuilder and Oracle Forms developer friends have extolled the virtues of these environments over the somewhat primitive JDBC support in Java. My only defense has been the promise of reusable logic in my Java objects that transcend the hard-coded data mappings between PowerBuilder or Oracle Forms screens and the database. Unfortunately, it takes a great deal of JDBC code to map the data involved in the Java objects to the database. Add XML documents, queued messages, and LDAP directories to the mix, and things get even worse. Each of these technologies requires a different Java API, a new learning curve, and a great deal of code to implement. In a recent code survey at my workplace, I found that over 50% of a major application was devoted to nothing but data retrieval, translation, and storage. That left under 50% of the system to do the real work, namely providing a user interface and logic to do something useful with the data users provide us.

Another problem I encountered as I tried to modify and extend the systems at work was the hard-coded data mappings that proliferated throughout. I couldn’t add inheritance hierarchies or new classes and relationships. Related classes required inefficient secondary database queries as I moved from one class to another. It was practically impossible to get the existing data mapping code to recognize the need to instantiate the correct subclasses of an object in a class hierarchy as instances were read from a data source.

My most discouraging finding of all was the large number of critical defects in the data storage, manipulation, and retrieval code. There was little care in the placement of transaction boundaries, allowing for all kinds of data integrity problems under less than ideal system operating circumstances. Resources like database connections, statements, and result sets were not being freed correctly, resulting in problems as the application ran over an extended period of time. When processing message queues, the code was committing transactions to the database without a synchronization strategy, such as a two-phase commit, to properly remove messages in the same unit of work. XML documents were not being parsed or generated in an “extensible” way, thus eliminating the crucial X in XML.

To solve the problem, I started looking into new Java technologies and APIs like XML data binding, Java data objects, and message-driven EJBs. Each of these technologies had limitations as I tried to hook them up to the logic in my application. Where should I put the logic for objects that crossed data source/data sink technologies? For example, information for my Customer class came in from both the user interface and a message queue, was created or updated in the database, and output as XML documents to the user interface or other enterprise systems. Pretty much every data mapping technology I tried, including the more traditional commercial object/relational mapping frameworks on the market, had either a heavy or exclusive bias to a particular data source/sink technology. I was forced to create multiple Customer classes, one per data source/sink technology (for example, DatabaseCustomer, XMLCustomer, MessageCustomer). Then I’d either have to duplicate the application logic concerned with processing a customer or I’d need to have one Customer class with the logic and transformations to and from the other Customer classes. None of these designs are object-oriented. In responsibility-driven design, a Customer class shouldn’t have any logic in it to communicate with a data storage or retrieval mechanism. Instead it should perform the responsibilities of a Customer as abstracted from the problem domain. Other classes in the system should be responsible for the data mapping.

JLF Prototype Data Mapping Framework
Being somewhat of a framework buff, I wondered if I could design a framework that abstracted the dirty details of data source/sink technologies, but provided much of the power and flexibility of the native JDBC, XML, JMS, and JNDI APIs. I came up with the data mapping portion of an open source framework called Java Layered Frameworks (JLF), located at http://jlf.sourceforge.net . This framework works to minimize the amount of code in your application needed to map your Java objects to any number of different data sources/sinks. It also helps you execute complex mappings in a relatively efficient way. For example, when using a JDBC data source/sink, JLF can help reduce the number of SQL statements sent to the database, and it can cache relatively static data so you don’t have to read the same data every time you use it.

JLF Data Mapping Overview
JLF is a set of layered frameworks designed to help Java application developers create their applications quicker and with less code. These frameworks include the following capabilities:

1. Configuration framework

2. Logging framework

3. Utility library

4. Data mapping framework

5. HTTP request processing framework

The configuration framework basically initializes JLF by identifying where property files are located. Java property files configure the operation of the remainder of the frameworks in JLF, and the configuration framework helps the other frameworks to find those property files.

The logging framework is an evolution of my JLog logging framework. It helps to instrument events and log errors in your application so you can detect and correct defects more quickly.

The utility library portion of JLF contains code that performs some common coding tasks in Java. Examples include properly creating hash values for complex objects and using the Reflection API.

The data mapping framework is the main framework in JLF and the focus of this article. It’s designed to help you map data in your Java objects to any number of different data source/sink technologies. Most of the capabilities of the current version of the framework deal with the JDBC API, but JLF accommodates other types of data sources and sinks as well (for example, output to XML documents or input from servlets). It’s also extensible to fit any number of other transactional or nontransactional data source/sink technologies.

The framework layers described above are shown graphically in Figure 1. Each layer shows where the Java package is implemented in parentheses, so you know which package to import in your code.

To use the data mapping framework in JLF, you must understand three core concepts:

1. Data mapped objects: These are the Java classes you create for your application. They hold the data you want to map to your data source/sink.

2. Data mappers: The JLF framework provides these objects for you to map your data to and from the data source/sink.

3. Data location property files: These are the Java property files you create. They tell the data mappers how to map data between the data mapped objects and the data source/sink.

All three concepts go hand-in-hand to accomplish data mapping. We’ll now go through each concept in further detail.

Data Mapped Objects
Any Java classes that you want JLF to map to a data source/sink must be subclasses of JLF’s DataMappedObject class. This class contains all the core code to help you define and access variables, relationships, and inheritance hierarchies, so the framework can map these for you. Instead of defining instance variables in your object, define DataAttributeDescriptors. When you want to create relationships between DataMappedObjects in your design, create RelationshipDescriptors. If you have an inheritance hierarchy in your DataMappedObject subclasses, create a hierarchy table so JLF can instantiate the proper types of objects automatically. Figure 2 shows the primary classes in the JLF framework you use to define your DataMappedObjects.

Once you’ve defined your DataMappedObject subclasses with the proper attributes, relationships, and an optional hierarchy table, the data mapped object framework goes to work. It creates DataAttributes and relationships as it maps data back and forth between your Java objects and the database. These two classes of objects help the data mapping framework coordinate the data flowing to and from the database.

DataAttributes are used to replace instance variables in your classes. You may wonder why you can’t simply use instance variables like any other JavaBean class would. The answer is twofold. DataAttributes help the data mapping framework efficiently map the data to a database, and they also help to do optimistic locking. In the first case, if you don’t change a value in your object after it’s read from the database, there’s no need to send an update SQL statement when you store your object back to the database. Since you’ve made no change to the object, sending a SQL statement to the database uses up database resources to change a row to the same values it already contains. Not only would this consume precious database resources, it would also delay application response time to the application user.

The data mapping framework, in the execution of an update() method, first checks to see if anything has really changed in the object before it executes the SQL update statement. If you use simple instance variables in your design, the JLF data mapping framework would have a much more difficult time discovering if you’ve updated your object. Second, the most efficient way to use a database in a very high-volume transactional system is almost always to use optimistic locking. To use this, execute a locking query before you update or delete an object in the database. The locking query makes sure another process hasn’t modified the object since you originally read it from the database. One common way to do this locking query is to check the values of the object in the database and make sure they haven’t changed since the original query. With a simple instance variable in your objects, there’s no initial value to do the locking query before you update the row with the new value. DataAttributes keep the original value read from the database, as well as the new value that you wish to change the object to.

DataAttributes have different subclasses to help overcome the limitations of Java native types. For example, Java string variables do not have a limit on the number of characters you can store in them. When using a relational database, you almost always define a maximum string length for any of the character columns in your database. The StringAttribute subclass of DataAttribute allows you to define and enforce a maximum string length. Use LongAttribute for int and long variables, DoubleAttribute for float and double variables, DateAttribute for Dates, and, of course, StringAttribute for strings.

Relationship objects help you efficiently map related DataMappedObjects to a database. They help to introduce different database mapping optimizations. For example, you can use them when you deem it more efficient to use one query to populate any number of related Java objects. On the other hand, in cases where you rarely traverse a relationship, you don’t want to take the time to populate the objects on the other side of the relationship until you know you need them. Otherwise you’d be inefficiently pulling back large quantities of unused data from the database. The data mapping framework uses relationship objects to “lazy read,” or read on demand, such objects when you deem that approach to be more efficient.

Figure 3 shows how the DataAttribute and Relationship objects described earlier work with DataMappedObjects.

Data Mappers
The data mapping framework uses a data mapping “plug-in” called a DataMapper. DataMappers map objects to and from a particular data source/sink technology. The goal behind the data mapping plug-in design is to hide the complexity of mapping data to and from that technology. For example, say your Java application needs to map data in its objects to a relational database using the JDBC API, to XML documents using an XML-parsing API, from HTML input forms via the Servlet API, and then send messages to queues using the JMS API. You’d have to learn the complexities of four different and complex APIs to get your work done. You’d also need to write a lot of code, as each API is different and requires completely different code to execute the mapping.

The data mapping framework hides this complexity from you. The code to map your objects to a relational database looks almost identical to the code that maps your objects to an XML document or from the input parameters of a servlet. The DataMapper plug-in deals with the appropriate Java API, so under ideal circumstances your code has no technology-specific API code in it. There will always be cases where the framework doesn’t do what you need it to do when using, for example, the JDBCDataMapper. In those cases you write a little bit of JDBC code and hopefully the JDBCDataMapper will do the rest of the work for you. Data mappers in the JLF framework, including the JDBCDataMapper, are shown in Figure 4.

Data Location Property Files
Each DataMapper looks to a property file for information on how to map objects to the data source/sink technology it supports. These property files are called data locations. They describe how to get to a particular data location and map the data between Java objects and that location. To open a connection to a JDBC data location, the data mapper needs information such as the database URL, the appropriate JDBC driver, and perhaps a user ID and password. Once the connection is established, the data mapper needs to know which SQL statements to send to CRUD (create, read, update, delete) the data. You also tell the data mapper how you want to efficiently map your relationships – reading them in the same query as the original object, or perhaps lazy reading them on demand. In a future article, I hope to explain how each of the data mappers works to rid you of the burdens of data mapping API code.

Conclusion
Enterprise Java software developers, undergoing due diligence in their object design, have a difficult task at hand. Java’s APIs for dealing with data sources and data sinks are quite different from technology to technology. The JDBC, XML Parsing, JNDI, and JMS APIs have only the Java programming language in common. As a result, object designers typically hard-code the data mapping between their Java classes and the data source/sink technology they currently deal with. In most cases, this hard-coding is tedious, error-prone, and takes quite a bit of code to carry out.

Inheritance hierarchies, involved in almost any nontrivial object design, are typically abandoned because of data mapping difficulties. In addition, if the data source/sink design changes, it has a direct impact on the Java code (for example, the Java code is tightly coupled to the design of a database). When the same Java class needs to communicate with another data source/sink technology, it’s often easier to start from scratch rather than incorporate a second data source/sink mapping into the current class.

The JLF data mapping framework tries to address all these problems by separating the design of your Java classes from the mapping of them to and from a data source/sink. JLF abstracts the details of executing different technology mappings using data mappers. It provides default implementations of JDBC, XML (currently write only), and servlet data mappers and is hopefully extensible for you to add your own. This should leave you free to concentrate on good object design instead of dealing with all of Java’s data mapping APIs.

About Todd Lauinger
Todd Lauinger is currently employed as a Software Construction Fellow at Best Buy Co., Inc. He has over 10 years of experience developing large, mission-critical software systems for engineering and business organizations. Todd is also an experienced instructor, mentor, conference speaker, and published author, and has a Masters Degree in Software Engineering.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

What's your point? Yet more levels of abstraction from the problem domain? How many levels of someone else's buggy code do you have to bury yourself under to retrieve data? Come on ...


Your Feedback
anonymous wrote: What's your point? Yet more levels of abstraction from the problem domain? How many levels of someone else's buggy code do you have to bury yourself under to retrieve data? Come on ...
cedric wrote:
Latest Cloud Developer Stories
Swisscom, the Swiss telecom, is going into the cloud business. Its subsidiary Swisscom IT Services AG has signed up with Red Hat as a Certified Cloud Provider and launched a public cloud Infrastructure-as-a-Service (IaaS) cloud targeting enterprise-class customers primarily in ...
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP)....
In a surprise move on Tuesday, January 10, Oracle wheeled out its Big Data Appliance. That’s the one it said in October would be ready sometime in the first half. Only nobody believed it meant early in the first half. Heck, it’s not even clear anybody thought Oracle could make ...
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
CloudLinux, Inc., on Thursday released CafeFS 3, a virtualized file system for shared hosters that cages each customer within its own virtualized file system. CageFS becomes part of CloudLinux OS at no additional charge. CloudLinux OS, the only commercially-supported Linux OS m...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News

The Khronos™ Group, an industry consortium creating open standards for the accelera...