Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Are Elements And Attributes Interchangeable?
Are Elements And Attributes Interchangeable?

In designing markup languages, one of the first questions customers ask religiously is, "Do you prefer elements or attributes?" In fact, if you examine many of the current markup languages on the Internet, you often see a strict schism between those that use mostly (or even exclusively) attributes and those that use mostly or exclusively elements.

I believe this should not be an either/or proposition because elements and attributes are not interchangeable even though they have similarities. This article will examine why this problem exists, the nature of elements and attributes, and a logical set of rules to determine when each is appropriate to use.

Element-Centric Approach
Table 1 compares an element-centric approach to an attribute-centric approach on a canonical example.

The roots of this problem arise due to a collision of two opposing biases. On the one hand, there is a simplicity, consistency, and coverage bias toward using elements over attributes; and, on the other, there is a historical bias of DTDs, which favor attributes for validation. The bias toward elements begins with the learning sequence.

You must learn elements before you learn attributes because attributes are part of an element. Therefore, you cannot have attributes without elements, but you can have elements without attributes. This is an important axiom since it is the basis of an element-centric bias: elements don't need attributes but attributes need elements. In other words, while it is possible to use an element-only approach, it's not possible to use an attribute-only approach.

This brings us to the second reason for an element-centric bias: consistency. Since a markup language must have elements, it's more consistent to use child elements for characteristics.

The final bias, tutorial book coverage, is a direct result of the first two biases. Authors teaching XML lend more space to discussing elements than attributes and therefore tend to use an element-centric approach in the majority of their XML examples. As an illustration, Elliotte Rusty Harold states on page 101 of the XML Bible, "When in doubt, put the info in the elements."

Attribute-Centric Approach
Now let's turn to the other side of the coin: a historical bias to using attributes for limited validation. Attributes can have a primitive type (although the XML 1.0 types are all text-based) and can be constrained to a set of enumerated values. An element with a single text node is not type and has no constraints (e.g., the #PCDATA content model). Thus, historically, for a markup language to enforce validation of instance values, the best choice was attributes. This historical bias is erased with the advent of XML Schema, whose strong data typing and constraint facilities apply equally to elements and attributes.

Let's begin with formal definitions from the XML 1.0 specification (second edition), dated October 6, 2000. The definitions are, "Each element has a type, identified by name, sometimes called its 'generic identifier' (GI), and may have a set of attribute specifications. Each attribute specification has a name and value." The specification goes on to say that "An element type declaration constrains the element's content." And an element's content is "the text between the start-tag and end-tag." The above formal specification of an element states that its type has only a single dimension, which is membership defined by the content model. On the other hand, attributes correspond to one "of three kinds: a string type, a set of tokenized types, and enumerated types." The formal definitions provide us with two distinct definitions of type - an element type, which is a content model, and an attribute type, which can be one of a set of primitive types.

The confusion arises from the different nature of an element as the root and branch of a tree versus its nature as a leaf node. An element as a leaf node can be one of two varieties: an empty element or an element containing a single text node. Thus we actually have four distinct categories of an element's type: subelement content, mixed content (subelements and text), empty, and text-only. The last one in the list overlaps the functionality (though not exactly) of an attribute value of type CDATA. Therein lies the problem, confusion, and assumption of interchangeability.

Design Rules
This schism should not exist. Attributes and elements are sufficiently different to develop judgment criteria to determine which to use under different circumstances. I will cover eight design rules for using attributes and elements. First, I'll present a brief listing of all eight rules and then explain each rule in detail and provide an example.

  1. For containment, you must use elements.
  2. For characteristics where white space may be significant and the value could be a multiline string or paragraph, use elements.
  3. For DTDs, to constrain an instance value to an enumerated or primitive type, use an attribute.
  4. For repeating identical parts (homogenous aggregation), use child elements.
  5. For DTDs, to set a default or fixed value, you must use an attribute.
  6. For common composite characteristics, use global attributes.
  7. For performance-sensitive applications, you can reduce the size of large XML files by using attributes for an object's characteristics and parts.
  8. To reference an object part or characteristic using an IDREF, you must use an element.
Now let's examine each rule in detail.
  • Rule 1:
    For containment you must use elements. How do you determine when you need containment? When examining the pieces of information your document will contain you can separate them into one of three categories: objects, object characteristics, or object parts.

    The top-level objects are normally obvious, usually nouns, and form the root and top-level branches of your document tree. Examples of objects are "window," "body," and "borrower." As in object-oriented programming, many times these are modeled after real-world objects. The distinction between part and characteristic is key. A characteristic is commonly represented via an attribute as it describes a facet of the object and has little or no semantic meaning if isolated from the object. For example, an eXtensible User-interface Language (XUL) window element has a height and width characteristic.

    <window height="400" width = "600">
    On the other hand, an object part can be semantically separated from its parent object. Therefore, except for stringent and validated performance constraints (see Rule 7), use a child element to express object parts. For example, a button can be placed inside a window as a part yet be semantically independent. Here is a simple XUL window that contains a button:
    <window height="400" width="600"> <button label="My Button" /> </window>
    In modeling terms, a characteristic relates to its parent object via composition while a part is related via aggregation. While this division of characteristic versus part is a good rule of thumb, the next rule covers an exception.

  • Rule 2:
    For characteristics where whitespace may be significant and the value is a multi-line string or several paragraphs, a child element should be used. The reason for this exception is that attribute values are normalized. One of the normalization rules is to replace carriage returns and tabs with a space. For example, the value of the description attribute in the following element would be normalized as follows:
    <product name = "Teddy Bear"
    description = "Soft cuddly fur.
    Button nose. 3 feet tall. Double
    stitching on all seams. Guaranteed to
    ast more than 5 years.
    A great gift for children and a
    favorite among hospitals nationwide.">
    Would be normalized to: "Soft cuddly fur. Button nose. 3 feet tall. Double stitching on all seams. Guaranteed to last more than 5 years. A great gift for children and a favorite among hospitals nationwide."

    Therefore, if the paragraph divisions are useful to the receiving application, a subelement should be used like this:

    <product name="Teddy Bear">
    <description>
    Soft cuddly fur. Button nose. 3 feet
    tall. Double stitching on all seams.
    Guaranteed to last more than 5 years.
    A great gift for children and a
    favorite among hospitals nationwide.
    </description>
    </product>
  • Rule 3 (DTD-Specific):
    To constrain an instance value to an enumerated or primitive type, use an attribute. For example, to constrain a value of a color characteristic to the values "Red," "Green," or "Blue," you would declare an attribute like this:
    <!ATTLIST canvas color (Red | Green | Blue) #REQUIRED >
    For Schemas, you would declare a type and attach it to either an element or attribute like this:
    <xsd:simpleType name="RGB"> <xsd:restriction base="xsd:NMTO KEN"> <xsd:enumeration value="Red"/> <xsd:enumeration value="Green"/> <xsd:enumeration value="Blue"/> </xsd:restriction> </xsd:simpleType>
    Since Schema allows you to declare a type for either an attribute or element, this rule applies only to the use of DTDs. Here's how to declare an element with the RGB type.
    <xsd:element name="Color" type="RGB" />
  • Rule 4:
    For repeating identical parts (homogeneous aggregation), you must use child elements. The XML 1.0 specification states: "No attribute name may appear more than once in the same start-tag or empty-element tag." Therefore, the only possible way to have multiple attributes with the same semantic meaning would be to differentiate them with a one-up number, such as button, button1, button2, etc. Attaching a count to an attribute name is poor design - on the order of using a wrench to pound nails. Therefore, let's examine a sample of using child elements for repeating parts. An XUL menu-popup may contain any number of menuitems, like this:
    <menupopup> <menuitem label="New" /> <menuitem label="Open" /> <menuitem label="Exit" /> </menupopup>
  • Rule 5 (DTD-Specific):
    To set a default value or a fixed value you must use an attribute. The attribute declaration allows you to specify a default or fixed value for the attribute. For example, let's add a default value to our previous color example (Rule 3):
    <!ATTLIST canvas color ( Red | Green | Blue ) "Red" >
    The above example sets the color Red as the default value. Specifying a default value with Schemas is done using the default attribute (this is used for both defaulting elements and attributes):
    <xsd:element name="Color" type="RGB" default="Red" />
  • Rule 6:
    Common composite characteristics should be expressed with global attributes. In a DTD, you express a global attribute with a parameter entity reference, like this:
    <!ENTITY % ename "name CDATA #REQUIRED>
    In a Schema you create a base type that you extend. An example of extending a type from the Schema Specification Part 0 - Primer is shown in Listing 1.

    Once again, since Schema's type system applies to both elements and attributes, this rule applies only to DTDs.

  • Rule 7 (Performance):
    To reduce the size of large XML files, use attributes for both characteristics and parts. Attributes are more space-eficient than elements since they do not require end tags. A note of caution here: do not assume your application needs high performance. The use of an attribute-centric approach is a tradeoff of speed for flexibility.

  • Rule 8:
    To reference a part via an IDREF you must use an element. An IDREF refers to an element with an attribute of type ID. When to use IDREFs versus containment will be the focus of a later article. Besides IDREFs, there are many XPATH expressions for selecting elements. Considering Moore's law, it's often better to choose flexibility over performance.

    A simple example of an IDREFS attribute referring to elements is shown in Listing 2.

Conclusion
Elements and attributes are not interchangeable. Ignore the bias of some and the pressure to choose one or the other; instead, use the guidelines above to determine which is suitable for your markup language.
About Michael Daconta
Michael C. Daconta is the director of Web and technology services for McDonald Bradley, Inc., where he conducts training seminars and develops advanced systems with Java, JavaScript, and XML. Over the past 15 years, Daconta has held every major development position, including chief scientist, technical director, chief developer, team leader, systems analyst, and programmer. He is a Sun-certified Java programmer and coauthor of Java Pitfalls (John Wiley & Sons, 2000), Java 2 and JavaScript for C and C++ Programmers (John Wiley & Sons, 1999), and XML Development with Java 2 (Sams Publishing, 2000). In addition, he is the author of C++ Pointers and Dynamic Memory Management (John Wiley & Sons, 1995).

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
Swisscom, the Swiss telecom, is going into the cloud business. Its subsidiary Swisscom IT Services AG has signed up with Red Hat as a Certified Cloud Provider and launched a public cloud Infrastructure-as-a-Service (IaaS) cloud targeting enterprise-class customers primarily in ...
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP)....
In a surprise move on Tuesday, January 10, Oracle wheeled out its Big Data Appliance. That’s the one it said in October would be ready sometime in the first half. Only nobody believed it meant early in the first half. Heck, it’s not even clear anybody thought Oracle could make ...
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
CloudLinux, Inc., on Thursday released CafeFS 3, a virtualized file system for shared hosters that cages each customer within its own virtualized file system. CageFS becomes part of CloudLinux OS at no additional charge. CloudLinux OS, the only commercially-supported Linux OS m...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News

Aruba Networks, Inc. (NASDAQ:ARUN), a global leader in distributed enterprise netwo...