Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Building Multimedia Repositories
Building Multimedia Repositories

Information repositories are essential. They allow data to be shared within or outside an organization, bringing us closer to the reality of the paperless office.

With the toolset shown in Table 1, you can build an enterprise-class, scalable Web-enabled repository that fully incorporates various forms of media. Document files, photographs, video clips and sound files can easily be included in the repository using Java and Oracle8i's LOB (Large Objects) data types.

In this article I'm going to focus on how you build a repository to store and search documents such as Microsoft Word, and HTML and XML files stored in a LOB column of a database table. The example used here populates the repository with Microsoft Word résumés, indexes it using Oracle Intermedia and reads it using Java streams from a servlet (see Figure 1).

Benefits of Java and Oracle8i
Building repositories using Java and Oracle8i has several benefits. The documents inherently take advantage of the transaction management and ACID (atomicity, concurrency, integrity and durability) properties of the relational database, which means that changes to an internal LOB can be committed or rolled back. Moreover, associated applications can seamlessly take advantage of database features such as backup and recovery. This makes things easier for the system administrators, who no longer have to perform separate database and file system backups for relational information and documents. All data housed in the database, whether structured (relational) or unstructured (document files), can be written, searched and accessed using a single industry standard interface ­ SQL. These SQL statements can be executed from Java using JDBC (Java Database Connectivity).

Working with Large Objects
Oracle8i supports several types of LOB columns. One type, a BLOB (Binary Large Object), can house binary information such as audio, video, images and documents internally within the database. Each row can store up to 4 gigabytes of data. I used the BLOB data type to store the Microsoft Word résumés in my example.

The Oracle database stores a locator inline with the data. The locator is a pointer to the actual location of the data (LOB value). The LOB data can be stored in the same table or a separate one. The advantage of the locator is that the database doesn't have to scan the LOB data each time it reads multiple rows because only the LOB's locator value is read; the actual LOB data is read only when required.

When working with Java and LOBs, first execute the SELECT statement to get the LOB locator, then read or write LOBs using JDBC. (Oracle JDBC driver's type extension classes from oracle.sql package is used to read and write from an Oracle database.) The actual LOB data is materialized as a Java stream from the database, with the locator representing the data in the table. The following code reads the résumé of an employee whose employee number, 7900, is stored in a LOB column called résumé in the sam_emp table.

Statement st = cn.createStatement();
ResultSet rs = st.executeQuery
("Select resume from sam_emp where empno=7900");
rs.next();
oracle.sql.BLOB blob=((OracleResultSet)rs).getBLOB(1);
InputStream is=blob.getBinary-
Stream();

Populating the Repository
The documents can be written to LOB columns using Java, PL/SQL or a bulk utility called Oracle SQL*Loader. To insert a new row, do the following:

  1. Execute the SQL insert statement with an empty BLOB.
  2. Query the same row to get the locator object. Use this locator to write your document to the LOB column. Note: Java streams are employed to write the documents to the LOB column.
  3. Create the Java output stream using the getBinaryOutputStream() method of this object to write your document or any binary information to that column.
For example, to insert information in the sam_emp table about a new employee whose employee number is 9001, first insert all the structured information along with an empty BLOB using JDBC. Next, select the LOB column, résumé, from the same row to get the oracle.sql.BLOB object (the locator). Finally, create the Java output stream from this object:

st.execute("INSERT INTO sam_emp(empno, resume)
VALUES(9001,empty_blob())");
ResultSet rs = st.executeQuery(
"Select résumé from sam_emp where empno=9001 for update");
rs.next();
oracle.sql.BLOB blob = ((OracleResultSet)rs).getBLOB(1);
OutputStream os = blob.getBinaryOutputStream();

Optionally, you may use the java.awt.FileDialog class and java.io package to dynamically select and read a file from your PC. Then load it to a LOB column using the preceding code.

The way you search and retrieve documents is independent of how you load the documents. For example, you can store the documents using PL/SQL or SQL*Loader, then search and retrieve using Java servlets. Using PL/SQL, Listing 1 loads an employee's résumé, saved as a Microsoft Word file, to the résumé column of the sam_emp table.

Searching the Repository
The documents stored in the LOB columns can be indexed using Oracle Intermedia, which provides advance search capabilities such as fuzzy, stemming, proxy, phrases and more. It can also generate thematic searches and gist. The documents can be indexed via the "Create Index" database command.

Refer to Listing 2 to see how the index is built on the résumé column of the sam_emp table. Once the index is created, the Java applications can search the repository by simply submitting SELECT statements.

The MyServletCtx servlet in Listing 3 searches the term passed to it as a parameter in the résumé column of the sam_emp table. The servlet returns the rows matching the search criteria in HTML table format. The employee names in the HTML table are hyperlinked to another servlet, MyServlet, which reads the entire résumé from the database in its original format.

Retrieving from the Repository
Document retrieval using Java is similar to writing documents to the repository. The "Working with Large Objects" section earlier in this article describes how to read LOBs from the database. The MyServlet in Listing 4 reads a Microsoft Word résumé from the sam_emp table, sets the content type, then streams it out to the browser using an output stream.

Summary
In this article I've shown how you store, search and retrieve Word documents using LOB data types and Java.

You can also use the Oracle8i database to store, index, parse and transform XML documents. Storing XML documents in the database removes the need to administer and manage multiple repositories for relational and XML data. The Oracle8i's JVM makes it possible to run a Java XML parser in the database. Using the parser, you can parse and transform the XML files inside the database before outputting it to an application server.

About Samir Shah
Samir Shah, a former
Oracle employee and a certified Oracle
professional, is a manager of database and Web technologies at Wall Street Systems. He has over 10 years' industry experience.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have techn...
Nimble, the social CRM platform has announced the launch of Nimble 2.0, billed as the “most social” CRM platform on the market today. Nimble was designed entirely with social CRM in mind and is the first social business platform that empowers companies with the ability to get clo...
2011 was a year of rapid adoption for public and private cloud services. Instant and on-demand server provisioning was the driving force behind the massive growth. On top, cloud server templates and script automation simplified application installation for simple and pre-defined ...
"Having been in the IT field for many years, I believe the cloud computing chapter in the industry is an exciting one and I am proud to be a part of it," said National Reconaissance Office (NRO) Chief Information Officer Jill T. Singer Tuesday, as it was announced that she was on...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News
Joyent Cloud, the highest performance public cloud, and Amplify, a startup accelerator focused on su...