|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
General Java How to Store and Share Your Java Objects
How to Store and Share Your Java Objects
Apr. 1, 1998 12:00 AM
Need to store your Java objects? Files can do this, with a little bit of programming to flatten them. Need to share them with others, guarantee integrity? Traditional DBMSs can do this, if you translate your Java objects to SQL. Need 24x7, scalability, distribution over WANs, flexibility for schema changes? ODBMSs can do this, and they can do it easily, by automatically making your Java objects persistent. We'll present the basics of object databases and contrast them with relational and object-relational; explain how to determine if your application is a good fit for ODBMSs; how to deal with legacy issues and how to use ODBMSs with Java and the Web. Examples, chosen from over 150,000 users in production, are included.
Where to Store Shared Information File systems support the basic need of persistence. Information is still there after the process terminates and this information can be accessed later. Beyond this, files offer little and they do require work. The programmer must somehow flatten his Java objects into streams of primitive data, and then manually write those streams to files. The reverse is necessary to access the information later. Any changes in the object types will likely require changes in this flattening code, in the file format and perhaps in applications that use it. Any concurrent access control is up to the application programmers or conflicts will result. Files are useful when you have a small amount of information, which is unlikely to change much, accessed by only a single user (at a time), with no need for reliability features such as recovery or usability features such as relationships, distribution and versioning.
The next choice for persistence is to use a traditional DBMS, such as a relational one (RDBMS). These systems have been very successful in business applications which use very simple, primitive, fixed-length data types, organized in tables. They add support for concurrency so multiple users can access the same information without destroying each other's work. They also add recovery, so the stored information can be restored to a known, good state even after power outage or other catastrophic failures. They add powerful searching (or query) capabilities. Unfortunately, RDBMSs were designed for a different generation of software technology in which users dealt with raw (unencapsulated) data, third generation programming languages (COBOL, FORTRAN, C) and a data-specific language (SQL), with the programmer manually translating back and forth between the two. With objects, this means the programmer must translate his objects to flat, primitive types and sort them by tables. Then, when restoring the objects from the RDBMS, the programmer must reassemble the objects from various tables, using slow inter-table connections called joins. This mapping code results in three problems: ODBMSs include the capabilities of traditional databases, but add several new ones. First, they support objects. The very same objects you define and create in Java are transparently managed by the ODBMS, including saving them on disk, recovering from failures and coordinating concurrent access. This means there is no need for the mapping code (described previously) with all of its problems. It also means that all the DBMS capabilities, including recovery, concurrency, and query with object methods, apply directly to objects rather than to the primitive, disassembled pieces of objects. Because all access to the ODBMS goes to the objects themselves, they can automatically enforce integrity. Even graphical ODBC tools can be forced (using security restrictions, by user and group) to go through high-level object methods in order to maintain integrity and most ODBMSs. Where RDBMSs have mainframe-like central-server architectures in which all storage and processing occurs on a centralized machine, certain ODBMSs have been developed with distributed architecture. This allows objects to live on any computer (accessible in the networked environment), to execute anywhere and to be accessed transparently by all users, with all operations working across this distributed single logical view. In addition, objects make a natural unit for replication, and implementations now exist that keep replicas in synch even across failures. The distributed ability to support transparently adding servers is a major part of scalability, and other capabilities have been added, too, including concurrency modes that support multiple readers and up to one writer running simultaneously without blocking. Also, ODBMSs bring new features. The ability to define many-to-many relationships allows the ODBMS to generate and manage the code to maintain such relationships, to dynamically add and remove elements and to maintain referential integrity, all without the need for users to write code or manually manage secondary data structures such as foreign keys. Moreover, traversal of such relationships is direct, without the need to search down tables and compare keys as is done in the relational join. By connecting networks of objects with these relationships, users can construct composite objects, which allow any number of levels of depth, and also any number of composites threading through a single object. Objects also provide the natural unit for versioning, keeping track of the history of the object's state, or even allowing simultaneous creation of multiple branches. Finally, since these features are in the DBMS, rather than layered over some simple data-only store, the ODBMS can integrate them together to properly handle complex object's models; e.g., recovery of composites and relationships and the behavior of relationships when one of the objects versions, etc. In brief, ODBMS brings the advantages of files and traditional DBMSs, and also adds support for objects and additional features.Faced with customer requests for object support, the RDBMS vendors have come up with an approach called object-relational or ORDBMS. To understand this mixed approach, we'll look at the high-level architectural description shown in Table 2. A DBMS architecture can be split into the front end, which interfaces to the user, and the back end, which stores and retrieves the persistent information. Either of these may be based on either relational or object technology, providing the four alternatives shown. The first, with relational front and back ends, gives a typical RDBMS, while the second does the same for ODBMS. The third shows an object front end layered over a relational back end engine. This is the approach of the RDBMS vendors, largely because they have a large investment in their back ends and it's very hard to change them. Adding the object front end does add value; e.g., it might allow better integration with some object tools and it might allow some new data types. However, the back end is still relational, which means the objects are still being disassembled into flat tables, or BLOBs, whose internal structure is unknown to the rest of the DBMS. Some ORDBMSs are adding data "blades" or "cartridges" which are effectively pre-built class libraries. Unfortunately, they miss the point of objects by dealing only with data. Also, they require kernel modifications, so they are hard for typical users to build, or even modify. In contrast, ODBMSs allow users to freely build any classes of objects, with any operations and relationships and to freely extend others' classes. All of these can be used in exactly the same ways as any pre-built classes. For completeness, the last column of Table 2 shows how a relational technology front end (including query and ODBC) can be layered on top of an object database back end. This not only adds functionality, including ad hoc query of objects and off-the-shelf use of all the familiar tools, but also plays a key role in legacy support, as we'll see below.
When to Use an ODBMS
An ODBMS may well be a better tool for maintaining your persistent information if any of the following three items apply to your system:
Complex, Interconnected Information
Distributed Environment Users The earliest users of ODBMSs were those who had no choice, because they simply couldn't use the traditional DBMSs, yet they still had a significant need for persistence of large amounts of information, concurrency, scaling and recovery. These were engineering applications such as CAD/CAE, both mechanical and electronic, and are still users. Scientific applications also are major users. Examples here include CERN, in Geneva, storing the results of high-energy physics experiments (pictured on pages 8 and 9). They're building the world's largest database, 100 PB (a petabyte = 1,000 terabytes = 1,000,000 gigabytes). Similarly, the Sloan Digital Sky Survey (FermiLab, Johns Hopkins, etc.) is building a 40TB database containing the first digital survey of the sky, storing the stars, galaxies, quasars, etc., as objects in the ODBMS. From there, the user base expanded into Telecommunications, where network management and real-time call routing require the performance, direct relationships, scalability and flexibility of ODBMSs. Examples here include Qualcomm (and their customers Nortel, Sprint, etc.), creators of the CDMA cellular standard, who build all their base stations on an ODBMS. Other examples include Siemens' Multiplexor, Intecom's Voice/Video/Data PBX, COM21's cable TV-based very high-speed modems (up to 1Mbps) and Motorola's Iridium satellite-based world-wide cellular system. Manufacturing and process control are another major user, with real-time support for controlling distributed environments as well as databases of historical information for off-line analysis and query. Users in this area include Fisher-Rosemount, manufacturing control systems widely used in the petroleum and chemical and pharmaceutical industries; Landis & Gyr, environmental control systems used to maintain the world's busiest airport, Chicago's O'Hare; the Transamerica Pyramid and hospital suites, etc.; and KLA-Tencor, the market leader in semiconductor manufacturing. Financial services are just now becoming users of ODBMSs, as exemplified by Citibank's currency trading system, deployed across Europe and the USA. Logistics systems such as BBN's Target are used in military and commercial environments, as are transportation systems. Others include document management, library management, healthcare systems, plus the utilities industry where American Meter has built a data collection application for remote meter reading and demand-side management.
The ODMG Java Interface The normal syntax is used within Java to define object types, instantiate objects and access them. Persistence is via reachability, which means that once an object is connected to a persistent object (including "root" objects), it becomes persistent. This is a natural extension for dynamic, garbage-collected languages in which unconnected objects are considered garbage and (eventually) deleted. Objects connected to other transient objects are retained transiently (until the end of the process), while those connected to persistent objects are retained persistently (across processes, until they become garbage). A brief example is shown in Listing 1.
Legacy System Access Since some ODBMSs now fully support SQL and ODBC, these well-known languages may be used to simultaneously access both the objects in the ODBMS and the tables in legacy RDBMSs. Programs written in SQL can access all such systems, as can the familiar graphical tools (Crystal Reports, Microsoft Access and Visual Basic, etc.), almost all of which support ODBC (see Figure 1). The advantage of this approach is that it leverages existing investments in programs, tools and also in personnel training. Experienced database users can immediately access the new (as well as old) databases, starting where they're already familiar, and over time learn more and more about objects in order to get more benefits. For the object user, a preferable approach would be to make the legacy systems accessible as objects. This is done by creating surrogate objects, which stand for information in legacy systems. For the major RDBMSs, class libraries to do this can be purchased; for these or other systems, the user can also write his own surrogate methods to read and write legacy information. The result is that these surrogates fit transparently into the distributed, single logical view. When they're accessed, they go off to the legacy systems but, except for performance considerations, they look exactly like any other objects. Although the mapping of tables to objects can be done automatically in a straightforward way, it is usually best to reanalyze the entire system, define the desired view of objects and then bury in the surrogate's methods the translation to any historical structures, so objects might be pieced out of different tables or go through legacy modules as needed to meet the application's and user's functionality. The result is that the new object users have full access to the legacy systems, but the legacy systems themselves continue to work unchanged. Evolution is now possible at the user's discretion and timetable: legacy information can be moved into native objects if and when desired, with no change for object users though of course at that point legacy systems will need to be changed to use the native objects (see Figure 2).
Conclusion Finally, unless you like the "bleeding edge," check for references that are successful in production, using the features or capabilities you need. Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News |
|||||||||||||||||||||||||||||||||||||||||||||||||