Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Java Developer's Journal: A Blueprint For Developing Language Tools
A proven approach to making them modular, extensible, and maintainable

Language tools such as compilers, interpreters, and code generators are a critical part of the software development landscape. Any software project will include several procured tools and very likely several in-house tools. Experience shows that the only guarantee with such tools is change: the underlying language may change due to improvements or extensions and the functionality provided by the tool expands, driven by user-requested features and the need to stay in front of the competition. The specific changes that will be made are rarely known at the outset, but change is coming.

This implies that when designing such a tool, extensibility is paramount. So it's important that the design is modular. A clean division of responsibilities is needed to support maintainability, which in turn is needed to support the rapid pace of change typically associated with language tools.

This article describes a proven approach to developing language-based tools in a way that is modular, extensible, and maintainable. The approach is based on two principles: establishing the core modules at the outset and using the visitor pattern to interact with language sentences.

To illustrate the approach a simple calculator example is given. This calculator supports the addition, subtraction, multiplication, and division of integers. The language supported by this calculator is informally described below:

expression = constant |
   expression op expression
constant = 0 | 1 | 2 | ...
op = + | - | * | /

This language is very simple, but it's sufficient to illustrate the main concepts related to the development of language tools. The example is developed in Java based on the JavaCC parser generator. However the concepts presented are language-independent and apply equally to C++ and C#. All of the code shown in this article is available for download.

A key objective when developing a language tool is to ensure that the tool isn't dependent on the actual representation of the language. This allows simple support for multiple language formats, imports from other tools, and so on. To meet this objective, a distinction is made between concrete and abstract syntax. This is described below. After this the notion of context is introduced followed by a description of the actual mechanism for parsing. Then the use of the generated parse tree is explained.

Concrete Syntax
Concrete syntax represents the way in which a specific file format represents the input for the tool. This is often described using BNF or a similar structured representation. If a parser generator such as Lex/Yacc, Antlr, or JavaCC is used, the concrete syntax will be described in a generator-specific manner. A simple concrete syntax for the calculator using JavaCC shown is Listing 1.

Note that normally semantic actions would be included in such a JavaCC description. These are presented later in the article.

In general there can be several concrete syntaxes for a language (plain text, XML, RTF, etc.). The tool design should be sufficiently flexible to support multiple concrete syntaxes (as well as the ability to add further concrete syntaxes) without having a major impact on the rest of the tool.

Abstract Syntax
Abstract syntax is a representation of the language that's independent of the concrete syntax. The abstract syntax representation contains only information that's necessary for the tool to perform its task; any other information is discarded. In principle this necessary information should be included in all concrete syntaxes. So there should only be one abstract syntax regardless of the number of concrete syntaxes. In an OO setting, an abstract syntax is typically a tree-structure that reflects the way in which language sentences can be constructed. The abstract syntax fulfils a number of functions. It:

  • Represents the program
  • Stores any necessary information related to the concrete representation (e.g., for pretty printing, relating error messages to specific file locations, etc.). This is explored further in the next section.
  • Exposes the above information to tools without revealing implementation details.
The tree-like structure of the abstract syntax is represented using inheritance; we use context information objects to store the necessary information from the concrete representation; exposing information to tools without revealing implementation is achieved with interfaces. And since the abstract syntax has a tree-like structure, tools will access it using a visitor pattern. This basic structure is shown in Figure 1.

Notice that the package arrangement in Figure 1 follows the convention used in Eclipse that dictates that classes in a package named intern are not to be exposed to other tools.

Context Information
The idea of creating abstract syntax is that the tool will use these objects to achieve its goals. This means that the tool is not dependent on the specific concrete format being used. However, this separation can be problematic since in some situations information from the concrete representation is actually needed. For instance, a type checker might generate an error message that has to be displayed to the user. For this message to be of value, the specific location of the error has to be provided.

The solution to this problem is to associate each abstract syntax node with an object defining the context of that node. This might, for example, be the start and end line and column for the node; the information required here may vary according to the nature of the language, the tool, and the concrete format in question. As with the abstract syntax, client tools should be shielded from the implementation details of context information, so an interface is used and classes implementing this interface are internal. It's possible that there may be multiple context information classes according to the concrete representation. This is suggested in Figure 2 where a plain text context information class is used.

Parsing
To create a parser, the concrete syntax presented earlier needs to be married with the abstract syntax classes. This is shown in Listing 2. Note that there will typically be one parser for each concrete syntax supported.

This example is based on JavaCC, but the principle applies to other parser generators: the semantic actions in the matching rules in the parser definition are used to create and instantiate abstract syntax objects, resulting in the creation of an abstract syntax tree corresponding to the input text. This abstract syntax tree will be the input to other components in the tool that require the input text.

About Paul Mukherjee
Paul Mukherjee works as a consultant for Systematic Software Engineering, and is a Sun Certified Java Programmer and Sun Certified Java Developer. In his role as a consultant he is used to helping to make projects successful but also tries to help the individual members of the project to be better at what they do.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Java Developer's Journal: A Blueprint For Developing Language Tools. Language tools such as compilers, interpreters, and code generators are a critical part of the software development landscape. Any software project will include several procured tools and very likely several in-house tools. Experience shows that the only guarantee with such tools is change: the underlying language may change due to improvements or extensions and the functionality provided by the tool expands, driven by user-requested features and the need to stay in front of the competition. The specific changes that will be made are rarely known at the outset, but change is coming.

Java Developer's Journal: A Blueprint For Developing Language Tools. Language tools such as compilers, interpreters, and code generators are a critical part of the software development landscape. Any software project will include several procured tools and very likely several in-house tools. Experience shows that the only guarantee with such tools is change: the underlying language may change due to improvements or extensions and the functionality provided by the tool expands, driven by user-requested features and the need to stay in front of the competition. The specific changes that will be made are rarely known at the outset, but change is coming.


Your Feedback
Java Developer's Journal News Desk wrote: Java Developer's Journal: A Blueprint For Developing Language Tools. Language tools such as compilers, interpreters, and code generators are a critical part of the software development landscape. Any software project will include several procured tools and very likely several in-house tools. Experience shows that the only guarantee with such tools is change: the underlying language may change due to improvements or extensions and the functionality provided by the tool expands, driven by user-requested features and the need to stay in front of the competition. The specific changes that will be made are rarely known at the outset, but change is coming.
Java Developer's Journal News Desk wrote: Java Developer's Journal: A Blueprint For Developing Language Tools. Language tools such as compilers, interpreters, and code generators are a critical part of the software development landscape. Any software project will include several procured tools and very likely several in-house tools. Experience shows that the only guarantee with such tools is change: the underlying language may change due to improvements or extensions and the functionality provided by the tool expands, driven by user-requested features and the need to stay in front of the competition. The specific changes that will be made are rarely known at the outset, but change is coming.
Latest Cloud Developer Stories
Swisscom, the Swiss telecom, is going into the cloud business. Its subsidiary Swisscom IT Services AG has signed up with Red Hat as a Certified Cloud Provider and launched a public cloud Infrastructure-as-a-Service (IaaS) cloud targeting enterprise-class customers primarily in ...
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP)....
In a surprise move on Tuesday, January 10, Oracle wheeled out its Big Data Appliance. That’s the one it said in October would be ready sometime in the first half. Only nobody believed it meant early in the first half. Heck, it’s not even clear anybody thought Oracle could make ...
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
CloudLinux, Inc., on Thursday released CafeFS 3, a virtualized file system for shared hosters that cages each customer within its own virtualized file system. CageFS becomes part of CloudLinux OS at no additional charge. CloudLinux OS, the only commercially-supported Linux OS m...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News
Atlantis Computing™, the leader in Virtual Desktop Infrastructure (VDI) storage and performance opti...