Comments
Patrick Collands wrote: collands (AT) gmail com I'd be very grateful for an invitation. Thank you.
Cloud Expo on Google News

SYS-CON.TV

2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Click For 2008 West
Event Webcasts
Using XSLT to Generate SQL
Using XSLT to Generate SQL

Examples of selecting data from a database to produce XML are not too difficult to find. A Google search or a visit to the XML section at your local bookstore will no doubt reveal several such examples. It can, however, be more difficult to locate examples of how to insert data from XML into a database. Even if you locate such examples, it can be a challenge (to say the least) to adapt them to large XML files in which the data you need resides in a selected number of elements nested several levels deep within the file.

This article illustrates how to use XSLT to generate SQL, which can then be loaded into an Oracle database using Unix shell scripts. For the purpose of illustrating the concept, the example given is relatively short and straightforward, but the method used in this article could easily be adapted to more complex projects. Furthermore, although this article focuses on getting data into an Oracle database from XML in a Unix environment, the concepts could be adapted to work with other database software running on other platforms.

There are at least two known methods for taking data from XML files and inserting it into Oracle. One method is to use the Java XSU insert method with an XML file. With this method, the structure of the XML file mirrors the structure of the Oracle table. Another method is to use Oracle's XML data type, which is available in Oracle9i. Using this method, a developer can insert the XML file directly into Oracle "as is." Both methods have their advantages in working with smaller XML files in which the elements are not heavily nested. However, in working with files in which the elements are heavily nested, both methods require the generation of a secondary XML file for database loading. This is a requirement if you wish to select only certain elements from a large, heavily nested XML file. Using the method proposed in this article, generating a secondary XML file for database loading is not necessary, provided the needed data resides in the original XML file.

If you've worked with Oracle SQL*Plus, and with files having a .sql extension, the battle is half won. These files contain SQL code and are used to load data into Oracle in the SQL*Plus environment. In the same way that you use an XSLT processor - such as SAXON or Xalan - to generate XML or HTML output from XML, it is possible to use XSLT to generate a text file output. SAXON can be used to generate a text file having a .sql extension in the following manner: java -jar saxon.jar -o file.sql file.xml file.xsl. This command may be appropriate to execute in this way if only a small number of files are to be processed. If there is a need to batch process a large number of files, it may be helpful to write a shell script to shorten the command syntax. To do this in a Unix environment, a shell script like the one below may be used. This script, which is called gensql in this article, is also referenced in the sample code for this article, available at www.sys-con.com/xml/sourcec.cfm.

#!/bin/sh

SQLFILE=$3
XMLFILE=$1
XSLFILE=$2
# Here we would test for the correct number of arguments.

if [ $# != 3 ]
then
echo "Usage: gensql file.xml file.xsl file.sql"
else
/usr/bin/java -jar saxon.jar -o $SQLFILE $XMLFILE $XSLFILE
fi

Before using this script to build a SQL file, you must first identify the elements within the XML file from which the data will be extracted. Even if you're relatively familiar with the data within the XML file, if it is a large file (or a group of large files), it may be helpful to build a table of values. Before addressing this table, it's helpful to describe the kind of data that will be used in our fictitious example.

In this example, consider the needs of a national coordinator working with high school baseball teams. To assist in this work, XML files must be built for each state in the U.S. containing the names of high schools, their location, where each team plays its home baseball games, and each team's mascot. For this example, consider the XML file for the state of Indiana focusing on three Indiana towns whose teams play each other: Littleville, Flatville, and Lillyville. For the database example, it's necessary to extract from the sample XML file the name of each town, the field where each team plays its home games, and each team's mascot. Table 1 is helpful in extracting this information.

Now the XSLT file may be built. In getting the SQL file output from the XML file, "text" must be specified as the output type within the XSLT file in the following manner:

<xsl:stylesheet xmlns:xsl="<http://www.w3.org/1999/XSL/Transform>"
version="1.0">
<xsl:output method="text"/>

Next, the root level element of the XML document must be matched with the following:

<xsl:template match="/">

Although the XML example file focuses on the state of Indiana, XML files also exist for other states in the U.S., and the values for each state must be differentiated. At this point, the top- level element must match an attribute value of "IN", which is the two-letter U.S. abbreviation for Indiana. Although the sample XSLT file will contain only values associated with Indiana, information could be retrieved about other states by repeating this same if statement for each state. After entering the INSERT statements for each state, the if statement for the values associated with the state in question is closed. There may be several INSERT statements for a particular state, but by matching the two-letter abbreviation for that state in the XSLT file, all the values for a state will be selected from an XML file containing only data about that state. Thus, there may exist one XML file for each state, but only one XSLT file for all 50 states.

<xsl:if test="/State/@id='IN'">

At this point in building the XSLT file, you're ready to add the code necessary for building the SQL INSERT statement. The values to insert into the Oracle table will be the ones identified earlier in the sample table. When extracting these values for the sample data, the XSLT normalize-space function is used. This function will remove leading and trailing spaces from the data contained within the XML tags. This function is used to limit the number of characters allowable in each column in the database table. The XML tag names in our sample XML file correspond directly to the column names in our sample Oracle table, so we would build the XSLT code for our SQL INSERT statement in the following manner:

INSERT INTO INDIANA
(TOWN, FIELD, MASCOT)
VALUES
('<xsl:value-of select="normalize-space
(//HighSchool[@Number='01']/Town)">',
'<xsl:value-of select="normalize-space
(//HighSchool[@Number='01']/Field)">',
'<xsl:value-of select="normalize-space
(//HighSchool[@Number='01']/Mascot)">');

In writing the XSLT file for the sample data, you'd need to repeat this same INSERT statement three times in order to get all the data from the XML file into the SQL file. To select the additional elements from the XML file, you'd simply replace the [@Number='01'] above in the XSLT example with [@Number='02'] and then [@Number='03'].

Instead of using a DTD (as done in the sample code), an XML Schema could be used. In using an XML Schema a developer could possibly avoid having to use the normalize-space function present in the XSLT file. The data in the XML file could be constrained using an XML Schema, which would eliminate leading and trailing spaces. While schemas have their advantages and are effective for constraining data in some XML files, they are ineffective in constraining elements that need to allow for mixed content. In other words, in using an XML Schema, a developer cannot constrain the value of an element that would allow for the ability to insert other elements (such as footnote reference elements) within that element tag. Generating a separate XML file using XSLT can solve this problem. With this separate file, you can then constrain the content in that XML file using a schema. This, however, creates a separate file that must be managed in some way. Thus, in cases where the data is known and in cases in which mixed content must be allowed in the XML files, it is best to use a DTD and constrain the data in an XSLT file.

The last step in the process is to load the SQL file into Oracle. The simplest way to do this would be to log manually into SQL*Plus and execute the @file.sql command at the SQL prompt. This solution may be fine in cases in which you have a small amount of data to load. But in cases in which you need to batch load a larger amount of data, it is best to load the data through a script. Sample Unix scripts are included for loading the SQL file into Oracle in the sample code.

In the Unix scripts, the following main steps load the XML data into Oracle:
1.   Delete the old SQL files generated from previous data loads.
2.   Create a new SQL file from our XML with the gensql script (mentioned earlier).
3.   Create an empty file with a .sql extension.
4.   Append the TRUNCATE TABLE command to the file created in Step 3. (This command will empty the table but leave the table structure intact.)
5.   Append the file created in Step 2 to the file created in Step 3.
6.   Append the "exit" command to the file created in Step 3.
7.   Execute the remote hostname command (rsh) to the Oracle Unix server. When executing this command, execute the Unix script containing the following command: sqlplus userid/password @SQLFileCreatedInStepThree. Within the script containing the sqlplus command, it may be necessary to set ORACLE environment variables to their value on the remote Oracle Unix server.

These steps may seem somewhat involved, but they represent a direct way to load XML data into Oracle. The main factor to watch out for in implementing these steps is making sure the ORACLE environment variables are set properly. Depending on how your Unix environment is set up, it may be unnecessary to set these environment variables in the script. If you have an Oracle DBA or access to an Oracle power user on site, it would be best to get their advice on setting the environment variables for the script. If you have to fend for yourself in setting the variables, log in to the Oracle Unix server, run the "env" command, and note the value of the ORACLE_ environment variables. Then log out of the remote Oracle server and run the env command locally. After doing this, compare the value of the ORACLE environment variables on the remote server with those on the local server. If the variables are set differently, try setting them in the sqlplus login script to their value on the remote Oracle server.

Summary
I've looked mainly at how to generate a SQL file from XML using XSLT, and then how to take this file and load it into an Oracle database using Unix scripts. I've also compared this method of loading data from XML files into Oracle with other methods well-known to anyone required to work between XML files and a database, and I've considered the advantages and disadvantages of using XML Schemas to constrain data within XML element tags.

In closing, I hope that the example in this article is helpful. Any questions about the example or any of the sample code may be directed to me via e-mail; I'll be happy to help.

References

  • "How to Specify Output File Extension in Saxon." www.biglist.com/lists/xsl-list/archives/ 200304/msg01412.html
  • SAXON XSLT Processor: http://saxon.sourceforge.net
  • Kay, M. (2001). XSLT Programmer's Reference, 2nd edition. Wrox.
  • Naude, F. "Oracle SQL*Plus FAQ." www.orafaq.net/faqplus.htm
  • Naude, F. "Oracle XML FAQ." www.orafaq.net/faqxml.htm
  • Sobell, M. (1995). A Practical Guide to the Unix System, 3rd edition. Addison-Wesley.
  • Sobell, M. (1999). A Practical Guide to SOLARIS. Addison-Wesley.
    About Greg Watson
    Greg Watson is a Computer Systems Analyst working in the area of XML development at the
    Defense Intelligence Agency’s Missile and Space Intelligence Center. In 2002, he spoke at the
    Intelligence Community Conference on XML Metadata.

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Latest Cloud Developer Stories
    CloudBench Applications, Inc. announced its financial results for the three months and nine months ending September 30, 2009. All amounts are stated in Canadian dollars unless otherwise noted. Revenues from BasicGov, the Company's cloud computing solution for local government, gr...
    The new contract is an industry first, with CSC being the first Microsoft partner to lead and win a cloud computing services agreement of this scale. Under terms of the contract, CSC will provide Royal Mail Group's 30,000 employees with access to new IT services using Microsoft's...
    Operates in over 170 countries and is one of the world’s leading providers of communications solutions and services. Richard Tarboton talks for MeettheBoss.TV on his role as Head of Energy & Carbon for BT and what they are doing towards reducing carbon emissions.
    CA is going to put its Agile Planner software on salesforce.com’s Force.com platform in the first half to accelerate development time and give users visibility over their development initiatives to reduce time-to-market. Customers are supposed to be able to accelerate the deploym...
    Despite its uncertain fate Sun soldiers on. Monday it trotted out a cloud-based multiplatform desktop as a service for K-12 and community colleges that can run Windows, the Mac OS, Linux and Solaris applications to nearly any client device, including its own Sun Ray thin clients....
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON Featured Whitepapers
    ADS BY GOOGLE

    Breaking Cloud Computing News
    CloudBench Applications, Inc. announced its financial results for the three months and nine months e...