|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
XML Protocols Using XSLT to Generate SQL
Using XSLT to Generate SQL
By: Greg Watson
Aug. 6, 2003 12:00 AM
Examples of selecting data from a database to produce XML are not too difficult to find. A Google search or a visit to the XML section at your local bookstore will no doubt reveal several such examples. It can, however, be more difficult to locate examples of how to insert data from XML into a database. Even if you locate such examples, it can be a challenge (to say the least) to adapt them to large XML files in which the data you need resides in a selected number of elements nested several levels deep within the file. This article illustrates how to use XSLT to generate SQL, which can then be loaded into an Oracle database using Unix shell scripts. For the purpose of illustrating the concept, the example given is relatively short and straightforward, but the method used in this article could easily be adapted to more complex projects. Furthermore, although this article focuses on getting data into an Oracle database from XML in a Unix environment, the concepts could be adapted to work with other database software running on other platforms. There are at least two known methods for taking data from XML files and inserting it into Oracle. One method is to use the Java XSU insert method with an XML file. With this method, the structure of the XML file mirrors the structure of the Oracle table. Another method is to use Oracle's XML data type, which is available in Oracle9i. Using this method, a developer can insert the XML file directly into Oracle "as is." Both methods have their advantages in working with smaller XML files in which the elements are not heavily nested. However, in working with files in which the elements are heavily nested, both methods require the generation of a secondary XML file for database loading. This is a requirement if you wish to select only certain elements from a large, heavily nested XML file. Using the method proposed in this article, generating a secondary XML file for database loading is not necessary, provided the needed data resides in the original XML file. If you've worked with Oracle SQL*Plus, and with files having a .sql extension, the battle is half won. These files contain SQL code and are used to load data into Oracle in the SQL*Plus environment. In the same way that you use an XSLT processor - such as SAXON or Xalan - to generate XML or HTML output from XML, it is possible to use XSLT to generate a text file output. SAXON can be used to generate a text file having a .sql extension in the following manner: java -jar saxon.jar -o file.sql file.xml file.xsl. This command may be appropriate to execute in this way if only a small number of files are to be processed. If there is a need to batch process a large number of files, it may be helpful to write a shell script to shorten the command syntax. To do this in a Unix environment, a shell script like the one below may be used. This script, which is called gensql in this article, is also referenced in the sample code for this article, available at www.sys-con.com/xml/sourcec.cfm.
#!/bin/sh Before using this script to build a SQL file, you must first identify the elements within the XML file from which the data will be extracted. Even if you're relatively familiar with the data within the XML file, if it is a large file (or a group of large files), it may be helpful to build a table of values. Before addressing this table, it's helpful to describe the kind of data that will be used in our fictitious example. In this example, consider the needs of a national coordinator working with high school baseball teams. To assist in this work, XML files must be built for each state in the U.S. containing the names of high schools, their location, where each team plays its home baseball games, and each team's mascot. For this example, consider the XML file for the state of Indiana focusing on three Indiana towns whose teams play each other: Littleville, Flatville, and Lillyville. For the database example, it's necessary to extract from the sample XML file the name of each town, the field where each team plays its home games, and each team's mascot. Table 1 is helpful in extracting this information. Now the XSLT file may be built. In getting the SQL file output from the XML file, "text" must be specified as the output type within the XSLT file in the following manner:
<xsl:stylesheet xmlns:xsl="<http://www.w3.org/1999/XSL/Transform>" Next, the root level element of the XML document must be matched with the following: <xsl:template match="/"> Although the XML example file focuses on the state of Indiana, XML files also exist for other states in the U.S., and the values for each state must be differentiated. At this point, the top- level element must match an attribute value of "IN", which is the two-letter U.S. abbreviation for Indiana. Although the sample XSLT file will contain only values associated with Indiana, information could be retrieved about other states by repeating this same if statement for each state. After entering the INSERT statements for each state, the if statement for the values associated with the state in question is closed. There may be several INSERT statements for a particular state, but by matching the two-letter abbreviation for that state in the XSLT file, all the values for a state will be selected from an XML file containing only data about that state. Thus, there may exist one XML file for each state, but only one XSLT file for all 50 states. <xsl:if test="/State/@id='IN'"> At this point in building the XSLT file, you're ready to add the code necessary for building the SQL INSERT statement. The values to insert into the Oracle table will be the ones identified earlier in the sample table. When extracting these values for the sample data, the XSLT normalize-space function is used. This function will remove leading and trailing spaces from the data contained within the XML tags. This function is used to limit the number of characters allowable in each column in the database table. The XML tag names in our sample XML file correspond directly to the column names in our sample Oracle table, so we would build the XSLT code for our SQL INSERT statement in the following manner:
INSERT INTO INDIANA In writing the XSLT file for the sample data, you'd need to repeat this same INSERT statement three times in order to get all the data from the XML file into the SQL file. To select the additional elements from the XML file, you'd simply replace the [@Number='01'] above in the XSLT example with [@Number='02'] and then [@Number='03']. Instead of using a DTD (as done in the sample code), an XML Schema could be used. In using an XML Schema a developer could possibly avoid having to use the normalize-space function present in the XSLT file. The data in the XML file could be constrained using an XML Schema, which would eliminate leading and trailing spaces. While schemas have their advantages and are effective for constraining data in some XML files, they are ineffective in constraining elements that need to allow for mixed content. In other words, in using an XML Schema, a developer cannot constrain the value of an element that would allow for the ability to insert other elements (such as footnote reference elements) within that element tag. Generating a separate XML file using XSLT can solve this problem. With this separate file, you can then constrain the content in that XML file using a schema. This, however, creates a separate file that must be managed in some way. Thus, in cases where the data is known and in cases in which mixed content must be allowed in the XML files, it is best to use a DTD and constrain the data in an XSLT file. The last step in the process is to load the SQL file into Oracle. The simplest way to do this would be to log manually into SQL*Plus and execute the @file.sql command at the SQL prompt. This solution may be fine in cases in which you have a small amount of data to load. But in cases in which you need to batch load a larger amount of data, it is best to load the data through a script. Sample Unix scripts are included for loading the SQL file into Oracle in the sample code.
In the Unix scripts, the following main steps load the XML
data into Oracle: These steps may seem somewhat involved, but they represent a direct way to load XML data into Oracle. The main factor to watch out for in implementing these steps is making sure the ORACLE environment variables are set properly. Depending on how your Unix environment is set up, it may be unnecessary to set these environment variables in the script. If you have an Oracle DBA or access to an Oracle power user on site, it would be best to get their advice on setting the environment variables for the script. If you have to fend for yourself in setting the variables, log in to the Oracle Unix server, run the "env" command, and note the value of the ORACLE_ environment variables. Then log out of the remote Oracle server and run the env command locally. After doing this, compare the value of the ORACLE environment variables on the remote server with those on the local server. If the variables are set differently, try setting them in the sqlplus login script to their value on the remote Oracle server.
Summary In closing, I hope that the example in this article is helpful. Any questions about the example or any of the sample code may be directed to me via e-mail; I'll be happy to help.
References Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News
|
|||||||||||||||||||||||||||||||||||||||||||||||||