Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Working with Xindice
Working with Xindice

In my free time, I've been working on a CMS/Portal application using Java and XML. I was glad to discover some XML database tools that are now available - as more and more data is being stored and transmitted in XML format, XML databases are worth considering. Moving an XML application to Xindice (pronounced zin-dee-chay) is an interesting experience.

Xindice is a "new" open-source database engine, so a lot of issues must be resolved by brute force. However, getting started using the Java API was fairly easy, and getting a test class put together only takes an hour or two. Xindice actually has an HTTP server built in that runs on port 4080. However, the query mechanism over HTTP didn't work on my machine, so I started to build my own servlet that could search documents stored in Xindice.

Why Switch to Xindice?
After I spent some time building the portal, the documents were getting out of hand; going to the next level of functionality called for a more robust engine for delivering content. XML documents for Web content, posted articles, events, and catalogs would soon become unmanageable. Putting this information in a relational database would solve most of the problems, but a relational database would be a lot of overhead and would limit the user's platform (since everyone has their own favorite flavor of SQL relational database). In an earlier article, I mentioned searching XML with XSLT (XML-J Vol. 3, issue 7). The methods described in that article could handle 30 or 50 documents, but what if a site needed to scale to hundreds or thousands of documents? Using a server such as Xindice facilitates the storage of documents and provides indexing as well as retrieval mechanisms. Xindice is written in Java, so it's platform independent as well.

Working with Xindice requires setting it up and building a Java package that includes a few classes to query the database. I also built a servlet class so the database could be queried over HTTP. This is new technology, so be sure to test and design well before deciding whether it's the right solution.

Setting Up Xindice
Setting up Xindice is pretty straightforward. Java 1.3 or later must be installed and the Xindice_Home variable must also be set to run the software.

Documents are stored in collections on the server. For my articles, I created an "articles" collection using the xindiceadmin tool: ./bin/xindicdeadmin add_collection -c /db -n articles.

Collections are a repository for XML documents. The fact that collections can contain other collections separates them from relational models. The documents database itself contains a human-readable hierarchy. For now, all my articles are placed in an "articles" collection, but I could (and probably will) separate this into "articles/xml", "articles/ msdotnet", and "articles/java" collections.

To begin with you may want to add a few documents using the command-line utility. Documents are stored in Xindice in a special format to which indexes can be added to increase performance. From the command line, adding a document is simple: xindice add_document-c/db/articles-ffx102. xml -n SSL1.

The -n parameter is the unique key for the document. Even though it's an optional parameter, it's a good idea to supply your own. Later, if you build a mechanism to retrieve a single document, looking it up by the generated key (i.e., 0625df60001a5d4000bc49d00060 bf5) won't be very convenient. SSL1 or SSL09-2002 is a lot easier to type in and retrieve later. Beware: Xindice currently allows you to store documents with duplicate keys. Querying will return the results from both documents, but you'll only be allowed to retrieve the first document added.

Before diving into the Java classes, test to make sure everything is running well through the command-line interface. One note about the command line: double quotes need to be placed around XPath expressions. This is usually not the case, and it's not the case when using the Java classes. Use bin/xindice retrieve_document -c /db/articles -f fx102.xml -n SSL1 or bin/xindice xpath -c /db/articles -q /article["contains(title, 'SSL')"]/title.

The documentation that comes with Xindice explains everything. Check the User Guide, Developer Guide, and Administrator Guide - the information you need may be in any one of these guides (you can find the documentation at http://localhost:4080).

Creating a Java Class to Search DocBook Articles
To implement an XML database, I divided the project into four classes. The xmlQuery class that handles calls to Xindice, the docBookSearch class that handles creating XPath queries specific to DocBook XML, a "parameters" class representing the search parameters needed for the DocBook searches, and finally, a dbSearchServlet class that takes the parameters from an HTTP request and passes them to the docBookSearch class.

The xmlQuery class shown in Listing 1 is based on the examples provided with Xindice. The terminology isn't the same as that of a relational database class, but there are enough similarities that this type of connecting and querying should be very familiar to most developers. Instead of connections, tables, and queries; managers, collections, and services provide the interface to get data from the system. The concept of a collection may be more familiar to developers who have previously worked with content management tools.

In the xmlQuery class, the getResults method builds a nonvalid XMLdocument as a string from the Results object's getContent() method. In the future these results may be appended to a valid XML document or sent to the client as a stream of results. Xindice returns a complete XML document with an XML declaration, so the first line (the XML declaration) is stripped from the results. Since Xindice is open source, the source code can also be modified to return result streams as one document. The JavaDoc documentation, which explains other classes and other methods to use, is also included with the distribution.

The second class, searchParams.java (see Listing 2), encapsulates the parameters needed to search for documents. At first the package used parameter list, but after it grew to about five parameters, switching to a class made more sense. To keep things simple, I created a public class with six public fields. For a real implementation, more than one class may be involved, using either a bean model or something different.

The third class, docBookSearch, does most of the work (see Listing 3), creating a valid XPath query for searching documents from the parameters passed in from the servlet. The best thing about this class is that it can be tested from a standard Java class with a "Main" method and, after testing, called from the servlet class below. The logic in this class needs to be refined and will become more complex to handle different types of queries and produce different results.

The fourth class (see Listing 4), dbSearchServlet, is the the servlet itself. The following code retrieves the results from the DocBookSearch class and passes them to the browser:

response.getOutputStream().println
(DocBookSearch.getSearchResults(sp));

This class could function with less code, but some parameter checking has been included. After compiling the class, it should be in the ./WEB-INF/classes directory. To make things easier, I put everything into a package, added it to the ./WEB-INF/lib directory, and then added the code shown in Listing 5 to the ./WEB-INF/web.xml file:

Finishing Up
Using the new com.sonoma.xmldb package, the results can be accessed from the url: http://localhost:8080/examples/xSearch?doc= article&searchString=SSL&searchNode= title&fullText=true&titlesOnly=true.

The goal, however, is to make the "searchString" parameter the only required parameter. Another servlet class will be "getArticle", which will take an article key as a parameter from the example given earlier, an entire article will be accessible by the URL: http:// localhost:8080/examples/xArticle?SSL1.

On the portal, the search will be invoked in an XSLT document. This is another advantage of having an XML database incorporated with a servlet. In a previous article I wrote about searching through one XML document. With Xindice, implementing a variety of searches will be much simpler. The XSLT code looks something like Listing 6.

Of course, the search terms will be replaced by parameters or variables. The database can also be queried by other Web applications and programs. A few more parameters could produce results in RSS (Rich Site Summary) format as well; an XML-based application provides a lot of possibilities and a very open system. With XML architecture, the same type of system could be achieved. With Xindice as a datastore, the problem of storing and querying large amounts of XML documents is solved, enabling more XML-driven applications.

References

  • The Sonoma Project (XML-based portal):http://sonomap.sourceforge.net
  • Apache Xindice: http://xml.apache.org/xindice

    Other XML Database Products

  • XML DBMS: www.rpbourret.com/xmldbms/index.htm
  • eXist: http://exist.sourceforge.net
    About Roy Hoobler
    Roy Hoobler has been developing custom Web applications since 1996. After completing his MCSD certification, he spent the mid-'90s at a large consulting firm focused on intranet/extranet applications for Fortune 1000 companies. In 1998 Roy joined Net@Work (www.netatwork.com) as director of Internet technologies, specializing in systems architecture, project management, and research into emerging programming methods.

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Latest Cloud Developer Stories
    Can you bring services from the cloud to your customers faster and have them adopt it with ease of use or bring the power of bundled services to the fingertips of your clients without creating new rigid ‘apps stove pipes'? Do you want to prevent your business running away to publ...
    OCZ Technology Group, a provider of high-performance solid-state drives (SSDs) for computing devices and systems, on Tuesday announced the Z-Drive R4 CloudServ PCI Express (PCIe) flash storage solution, designed to accelerate cloud computing applications and reduce operating expe...
    Many organizations have embraced, or are considering, the benefits of cloud computing – speed, flexibility, increased expertise, shared workload, reduced costs, etc. The benefits are many – but so are the risks. What are the threats to cloud security? Which parties assume respons...
    In August 2011, SHI Enterprise Solutions (ESS) division launched the SHI Cloud, offering reliable and cost-effective industrial-grade cloud computing platforms. That same division achieved an 82 percent increase in revenue over 2010.
    SoftLayer Technologies on Tuesday announced the immediate worldwide availability of SoftLayer Object Storage, a redundant and highly scalable cloud storage service that allows users to easily store, search and retrieve data across the Internet, with optional CDN connectivity, or ...
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON Featured Whitepapers
    ADS BY GOOGLE

    Breaking Cloud Computing News

    Quest Software’s Board of Directors announced today that Doug Garn is stepping down...