|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
XML Protocols Developing Complex XSLT Scripts
Developing Complex XSLT Scripts
By: Yuhang Sun
Feb. 27, 2003 12:00 AM
XSLT is a declarative language designed for transforming XML documents into documents in any format. In developing large-scale XSLT scripts, software qualities such as flexibility and maintainability become issues. To address those issues, this article will first discuss a design approach that emphasizes a decomposition and recomposition view of transformations; then, a technique for modularizing XSLT scripts will be described; and finally, a few design patterns will be introduced.
Identify Responsibilities and Assign Them to Code Units Having identified code units in XSLT, we should study how to divide a complex transformation into smaller ones and assign each to a code unit. At the most abstract level, due to the stateless nature of XSLT, the responsibility of any XSLT script can be described as transforming a set of input streams to an output stream. Intuitively, outputting a segment of the output stream can be the candidate of a smaller responsibility. Outputting each line in Listing 2 could be a responsibility. However, abstraction is required to identify reusable and semantically significant segments. Identifying those segments, which may not even be continuous, requires insight into the output stream. It's easy for a C programmer to find outputting the conditional expressions a meaningful responsibility. The template that fulfills this responsibility is implemented in Listing 3. Identifying smaller responsibilities is a decomposition process, which decomposes an output stream into pieces. Consequently, the assembly or recomposition of those smaller segments into the output stream is necessary. The template in Listing 4 assembles the outputs from the template in Listing 3 and outputs the C++ code in Listing 2. An example XML context while calling the template in Listing 4 is also listed. Decomposition and recomposition are not only design techniques, they're also an accurate vocabulary for describing the functionality of a code unit. Using this approach, it's important to constantly consider what a code unit does from the perspective of decomposition and recomposition. For example, the template in Listing 3 outputs conditional expressions, which are segments of the implementations of equality operators. The template can be reused elsewhere because it abstracts a common concept. The template in Listing 4 outputs the implementations of equality operators by assembling and decorating the outputs from the template in Listing 3. Figure 1 shows the composite relationship of the templates in Listings 3 and 4 and some other templates, which is a good way to show design and implementation structure. The use of decomposition and recomposition in XSLT is very much like the process of functional decomposition in structured programming, which produces finer-grained code units calling each other hierarchically.
Modularize Code Units into Files The commonalities of the input or output of code units are intuitive and effective criteria for grouping. Figure 2 shows the stylesheet files used in a project and their interdependencies. The code units in "class-wrapper.xsl" take "class.xml" as their inputs; the code units in "intermediate-wrapper.xsl" call the public code units of "class-wrapper.xsl" and "datatype-wrapper.xml"; "c-source-file.xsl" contains the code unit to assemble the segments outputted from the public code units in "intermediate-wrapper.xsl" and outputs a complete C++ source file, "class.cpp". "class-wrapper.xsl" and "intermediate-wrapper.xsl" use input commonalities as their grouping criteria, while "c-source-file.xsl" uses output commonality. This grouping approach, along with the decomposition and recomposition view of transformations, results in traceable code where, given a responsibility, locating the implementing code unit is easy: in the vocabulary of decomposition and recomposition, the input and output of the code unit should be clear. According to the input and output, the stylesheet file containing the implementation of the code unit can be identified. At last, the code unit can be located by reading through the identified stylesheet file. This last step may take a long time if the file is large. The step can be made fast by grouping code units within the file and systematically commenting. Listing 5 shows the content of "intermediate-wrapper.xsl", whose public code units are grouped by the file they will be called from. The comments starting with "Group" denote the boundary between those groups. Since there is no mechanism to mark a code unit as public, a naming convention that prefixes the names of public code units with the names of their containing stylesheet files is used. For example, the names of templates in Listing 5 are prefixed with "intermediate-wrapper". The naming convention makes public code units prominent; moreover, it specifies their implementing file, which makes the code more traceable. Grouping is for the purpose of organizing code units and encapsulating them. Besides being the physical shelves of code units, stylesheet files are also units of encapsulation. A stylesheet file exposes its functionalities through public code units. Each public unit should fulfill a well-defined responsibility in terms of decomposition and recomposition. Don't expose private data through any public code unit. Call-by-context templates require callers and callees to share context information and so tend to break encapsulation. Therefore, only call-by-name templates and global variables should be used for public code units.
Applying Design Patterns
Intermediate XML Tree
Problem Another problem is that the formats of input and output documents keep changing throughout the stylesheet development, even though those documents convey relatively stable semantics. With direct transformation, the transforming XSLT scripts need significant modification according to the changes. Using this pattern, the intermediate XML trees convey the stable semantics with stable formats. Because the formats of the intermediate trees are relatively stable, there is less coupling between subtransformations (see Listing 6). The logical mode in Listing 6 defines two data types, Alphanumeric and LastName. Data type LastName is derived from Alphanumeric, adding a max-length facet. The logical model clearly describes the semantics of data types but it could not be used easily for generating code. The intermediate tree, which can be transformed from the logical model, includes a semantics element for each data-type element, which can be directly used for generating code. The data types with unbounded string semantics could be mapped to the pointer type in C or the String class in Java, while the data type with bounded string semantics could be mapped to arrays in both languages. Imagine that the structure of the logical model in Listing 6 changes to the structure in Listing 7. The two logical models convey exactly the same information with different formats. Due to its abstractness, the intermediate tree will remain unchanged. Therefore, the code depending on the intermediate tree is not affected.
Discussion
Builder
Problem Following the builder pattern, the code in Listing 10 realizes the transformation from the input document in Listing 8 to the output documents in Listing 9. While invoking the director template in Listing 10, the element node "tp:marshalling", which uniquely identifies the callback builder template for marshalling, should be passed in. Refer to "The Functional Programming Language XSLT - A proof through examples" on how to treat templates as first-class data. The director template invokes the callback builder templates at certain points during parsing of the input document. Both callback builder templates take a parameter, "style", which is set to either "optional" or "field" depending on the parsing contexts where the templates are invoked. The callback builder templates output C++ instructions according to the value of this parameter.
Discussion
Summary
Acknowledgments
References Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News
|
|||||||||||||||||||||||||||||||||||||||||||||||||