Comments
Patrick Collands wrote: collands (AT) gmail com I'd be very grateful for an invitation. Thank you.
Cloud Expo on Google News

SYS-CON.TV

2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Click For 2008 West
Event Webcasts
Developing Complex XSLT Scripts
Developing Complex XSLT Scripts

XSLT is a declarative language designed for transforming XML documents into documents in any format. In developing large-scale XSLT scripts, software qualities such as flexibility and maintainability become issues. To address those issues, this article will first discuss a design approach that emphasizes a decomposition and recomposition view of transformations; then, a technique for modularizing XSLT scripts will be described; and finally, a few design patterns will be introduced.

Identify Responsibilities and Assign Them to Code Units
The goal of any kind of application design is to divide responsibilities into smaller pieces and conquer them in separated code units. In XSLT, call-by-name templates, call-by-context templates, and global variables are code units, which are equivalent to functions in structured programming languages. Global variables are equivalent to call-by-name templates without any parameters. Call-by-context templates require callers and callees to share context information. Examples of three types of code units are shown in Listing 1.

Having identified code units in XSLT, we should study how to divide a complex transformation into smaller ones and assign each to a code unit.

At the most abstract level, due to the stateless nature of XSLT, the responsibility of any XSLT script can be described as transforming a set of input streams to an output stream. Intuitively, outputting a segment of the output stream can be the candidate of a smaller responsibility.

Outputting each line in Listing 2 could be a responsibility. However, abstraction is required to identify reusable and semantically significant segments. Identifying those segments, which may not even be continuous, requires insight into the output stream. It's easy for a C programmer to find outputting the conditional expressions a meaningful responsibility. The template that fulfills this responsibility is implemented in Listing 3.

Identifying smaller responsibilities is a decomposition process, which decomposes an output stream into pieces. Consequently, the assembly or recomposition of those smaller segments into the output stream is necessary. The template in Listing 4 assembles the outputs from the template in Listing 3 and outputs the C++ code in Listing 2. An example XML context while calling the template in Listing 4 is also listed.

Decomposition and recomposition are not only design techniques, they're also an accurate vocabulary for describing the functionality of a code unit. Using this approach, it's important to constantly consider what a code unit does from the perspective of decomposition and recomposition. For example, the template in Listing 3 outputs conditional expressions, which are segments of the implementations of equality operators. The template can be reused elsewhere because it abstracts a common concept. The template in Listing 4 outputs the implementations of equality operators by assembling and decorating the outputs from the template in Listing 3. Figure 1 shows the composite relationship of the templates in Listings 3 and 4 and some other templates, which is a good way to show design and implementation structure.

The use of decomposition and recomposition in XSLT is very much like the process of functional decomposition in structured programming, which produces finer-grained code units calling each other hierarchically.

Modularize Code Units into Files
Grouping code units into coarse-grained code modules is a basic technique for managing complexity. In XSLT, there are no concepts such as classes (in object-oriented programming) to use for grouping code units, though files provide natural boundaries. Given a stylesheet file, it's crucial for complexity management to be able to illustrate its grouping criterion. The criterion should be recorded in the leading comment of the file. The grouping criterion of a stylesheet file summarizes the characteristics of the code units that are both contained in the file and called from other files. Those code units will be referred as public code units as opposed to the ones used internally.

The commonalities of the input or output of code units are intuitive and effective criteria for grouping. Figure 2 shows the stylesheet files used in a project and their interdependencies. The code units in "class-wrapper.xsl" take "class.xml" as their inputs; the code units in "intermediate-wrapper.xsl" call the public code units of "class-wrapper.xsl" and "datatype-wrapper.xml"; "c-source-file.xsl" contains the code unit to assemble the segments outputted from the public code units in "intermediate-wrapper.xsl" and outputs a complete C++ source file, "class.cpp". "class-wrapper.xsl" and "intermediate-wrapper.xsl" use input commonalities as their grouping criteria, while "c-source-file.xsl" uses output commonality. This grouping approach, along with the decomposition and recomposition view of transformations, results in traceable code where, given a responsibility, locating the implementing code unit is easy: in the vocabulary of decomposition and recomposition, the input and output of the code unit should be clear. According to the input and output, the stylesheet file containing the implementation of the code unit can be identified. At last, the code unit can be located by reading through the identified stylesheet file. This last step may take a long time if the file is large. The step can be made fast by grouping code units within the file and systematically commenting. Listing 5 shows the content of "intermediate-wrapper.xsl", whose public code units are grouped by the file they will be called from. The comments starting with "Group" denote the boundary between those groups.

Since there is no mechanism to mark a code unit as public, a naming convention that prefixes the names of public code units with the names of their containing stylesheet files is used. For example, the names of templates in Listing 5 are prefixed with "intermediate-wrapper". The naming convention makes public code units prominent; moreover, it specifies their implementing file, which makes the code more traceable.

Grouping is for the purpose of organizing code units and encapsulating them. Besides being the physical shelves of code units, stylesheet files are also units of encapsulation. A stylesheet file exposes its functionalities through public code units. Each public unit should fulfill a well-defined responsibility in terms of decomposition and recomposition. Don't expose private data through any public code unit. Call-by-context templates require callers and callees to share context information and so tend to break encapsulation. Therefore, only call-by-name templates and global variables should be used for public code units.

Applying Design Patterns
Grouping code units into files based on the commonalities of their input and output is effective in organizing code. However, the application of fundamental software engineering principles and design patterns plays the key role in managing complexity. Both patterns described next are typical and may apply to a variety of contexts.

Intermediate XML Tree
Instead of directly transforming input documents to output documents, transform the input documents to some intermediate XML trees and then transform the intermediate XML trees to the output documents.

Problem
Direct transformation of input documents to output documents is so complex that it may cause the use of convoluted XPath expressions. Using this pattern, the original transformation is divided into two simpler subtransforms; one takes the input documents and creates the intermediate XML trees, while the other takes the intermediate XML trees and generates the output documents.

Another problem is that the formats of input and output documents keep changing throughout the stylesheet development, even though those documents convey relatively stable semantics. With direct transformation, the transforming XSLT scripts need significant modification according to the changes. Using this pattern, the intermediate XML trees convey the stable semantics with stable formats. Because the formats of the intermediate trees are relatively stable, there is less coupling between subtransformations (see Listing 6).

The logical mode in Listing 6 defines two data types, Alphanumeric and LastName. Data type LastName is derived from Alphanumeric, adding a max-length facet. The logical model clearly describes the semantics of data types but it could not be used easily for generating code. The intermediate tree, which can be transformed from the logical model, includes a semantics element for each data-type element, which can be directly used for generating code. The data types with unbounded string semantics could be mapped to the pointer type in C or the String class in Java, while the data type with bounded string semantics could be mapped to arrays in both languages.

Imagine that the structure of the logical model in Listing 6 changes to the structure in Listing 7.

The two logical models convey exactly the same information with different formats. Due to its abstractness, the intermediate tree will remain unchanged. Therefore, the code depending on the intermediate tree is not affected.

Discussion
The goal of using intermediate XML trees is to reduce the complexity and impact of the changes of input or output documents. The intermediate XML trees must reflect the stable semantics of input and output documents with stable structures; moreover, the structures of intermediate XML trees should be easier to use for generating final outputs. Intermediate XML trees are more useful than temporary structures and often reflect the most important design abstractions.

Builder
Separate responsibility of the assembly of a complex output from its representation. A director template is responsible for parsing and assembling. Different builder templates, which are invoked by the director template as callback, are responsible for creating different representations.

Problem
An input document needs to be transformed to multiple output documents in different formats. The transformation to each format calls for the same parsing and assembling process. Using this pattern, the director template encapsulates the parsing and assembling process. Therefore, the builder templates can focus on the representations of their output documents without knowing the assembling process and depending on the structure of the input document.

Following the builder pattern, the code in Listing 10 realizes the transformation from the input document in Listing 8 to the output documents in Listing 9.

While invoking the director template in Listing 10, the element node "tp:marshalling", which uniquely identifies the callback builder template for marshalling, should be passed in. Refer to "The Functional Programming Language XSLT - A proof through examples" on how to treat templates as first-class data. The director template invokes the callback builder templates at certain points during parsing of the input document. Both callback builder templates take a parameter, "style", which is set to either "optional" or "field" depending on the parsing contexts where the templates are invoked. The callback builder templates output C++ instructions according to the value of this parameter.

Discussion
This pattern is very similar to the object-oriented Builder pattern discussed in "Design Patterns: Elements of Reusable Object-Oriented Software." The ability to treat templates as first-class data types makes some of the techniques used in object-oriented or functional programming applicable to XSLT programming.

Summary
The decomposition and recomposition view of XSLT scripts is essential. It is always possible to describe a transformation with steps of decomposition and recomposition, no matter what advanced design patterns are used. XSLT lacks the entities for modularization and encapsulation. The commonalities of inputs and outputs of code units are intuitive criteria for modularization. The use of a few code and name conventions enforces modularization and encapsulation. Intermediate XML Tree is a powerful pattern for simplifying problems, which also results in codes that are easily adjustable to changes. The Builder pattern shows how to treat templates as first-class data types and how to borrow the ideas from the object-oriented world.

Acknowledgments
Special thanks to Mario Aquino for his review and excellent comments.

References

  • Gamma, Erich; Helm, Richard; Johnson, Ralph; and Vlissides, John. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison Wesley.
  • Novatchev, Dimitre. "The Functional Programming Language XSLT - A proof through examples": www.topxml.com/xsl/articles/fp/1.asp
    About Yuhang Sun
    Yuhang Sun is a software engineer for Object Computing, Inc. He is the major XML/XSLT developer for a project that's successful in applying XML-related technologies

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Latest Cloud Developer Stories
    CloudBench Applications, Inc. announced its financial results for the three months and nine months ending September 30, 2009. All amounts are stated in Canadian dollars unless otherwise noted. Revenues from BasicGov, the Company's cloud computing solution for local government, gr...
    The new contract is an industry first, with CSC being the first Microsoft partner to lead and win a cloud computing services agreement of this scale. Under terms of the contract, CSC will provide Royal Mail Group's 30,000 employees with access to new IT services using Microsoft's...
    Operates in over 170 countries and is one of the world’s leading providers of communications solutions and services. Richard Tarboton talks for MeettheBoss.TV on his role as Head of Energy & Carbon for BT and what they are doing towards reducing carbon emissions.
    CA is going to put its Agile Planner software on salesforce.com’s Force.com platform in the first half to accelerate development time and give users visibility over their development initiatives to reduce time-to-market. Customers are supposed to be able to accelerate the deploym...
    Despite its uncertain fate Sun soldiers on. Monday it trotted out a cloud-based multiplatform desktop as a service for K-12 and community colleges that can run Windows, the Mac OS, Linux and Solaris applications to nearly any client device, including its own Sun Ray thin clients....
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON Featured Whitepapers
    ADS BY GOOGLE

    Breaking Cloud Computing News
    CloudBench Applications, Inc. announced its financial results for the three months and nine months e...