Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Semantic Technology
An extensible solution to enterprise data integration

All organizations, including multinational corporations and government agencies, face a common problem of enterprise data integration. Obviously, large-scale sources of the problem stem from mergers and acquisitions. When a large company is formed from other pieces, each brings with it its own data, in its own form.

But the problem isn't restricted to large conglomerates. All businesses have information trapped in a wide variety of forms, including e-mail, spreadsheets, Web pages and a variety of proprietary sources. More than ever, it's difficult for a business to know what it knows.

This isn't a new problem, and a variety of enterprise data integration solutions have been on the market for a number of years to provide solutions to this problem. But the problem remains. Why? All too often, a new enterprise integration solution is designed for a particular use, and while it's successful in that context, it fails to be extensible to other uses. In extreme cases, the need for integrated information has evolved so that the new integrated system is obsolete before it even goes live. The integrated enterprise system simply becomes yet another information source, competing now with the originals. The problem has become bigger, not smaller.

The solution to enterprise data integration can't just be another data system added into the enterprise mix. It has to be a living, extensible, network of information in the enterprise. In short, it has to work like the Web.

The Semantic Web & Enterprise Data
The World Wide Web Consortium (W3C) has been working on the problem of extending the Web we know today into a Web of Data. This new generation of the Web - dubbed the Semantic Web by W3C and sometimes known as Web 3.0 - harnesses the power of the World Wide Web for managing data. Many of the features that are now familiar to us from the Web are directly relevant to the real problem of enterprise data integration:

  • The Web is extensible, and not just by its designers
  • Anyone can refer to any Web resource
  • Information in any format is available

Agreement & the Semantic Web
A common misconception about the Semantic Web is that it is based on a universal agreement about the meaning of terms. Indeed, if we could get everyone in the world to agree on what the word customer means then information integration on a worldwide scale would be greatly simplified. But this is an unrealistic expectation. Different companies and even different workers in a single company, have legitimate and differing notions about what even such a basic word as customer means.

While it is unrealistic to ask the two branches to agree to use the word the same way, it's not unrealistic to discuss which use is more general, and how. But before we can even say, "The Kansas City office uses the word customer in a more specific way than the New York office does," we have to be able to refer to "The Kansas City office's use of the word customer" and "The New York City office's use of the word customer." The Semantic Web provides agreement just at this level - agree on how to refer to your terms, so that you can discuss how you agree and disagree on their meaning.

This kind of agreement is achieved by having a single global reference for everything. This may seem like an ambitious goal, but it's in fact the part of the Semantic Web that's borrowed lock, stock, and barrel from the current Web, where entities are identified and managed with identifiers called URIs (which are slight variants of the familiar URLs we use in Web browsers every day). The URI is the key to the extensibility of the World Wide Web we know today, and serves as the basis for the extensibility in the Semantic Web.

Representing Data on Semantic Web
Data representation in the Semantic Web is based on a standard called RDF, which breaks data representation down to its most basic part. In RDF, this is called a triple. A triple is a basic statement about a relationship. The three parts of the triple are called the subject, predicate, and object (borrowing notation from basic grammar), where the subject and the object are two entities that are related to one another, and the predicate specifies the relation. A triple holds the same informa-tion as a cell in a spreadsheet or a database; the row id, column id, and the cell contents making up the three parts of the triple (see Figure 1).

Using this simple model, information from any data source (spreadsheets, databases, XML documents, Web pages, RSS feeds, e-mail, ...) can be represented in a uniform way. Since all information is referenced via global URIs, any data source can refer to any other. This is how the Semantic Web achieves the same extensibility as the familiar Web.

Enterprise Data Integration - Before and After
There are a number of approaches to enterprise data integration today. While there are some key differences in these approaches, they have some things in common. In all these approaches, a model, corresponding loosely to a Master Data File in earlier technology, is built to reflect the requirements of the integrated data set. Existing data is then mapped to this model. The approaches differ in the expressiveness of this model and the details of the mapping (e.g., is data transformed and warehoused, or left in situ and proxied), but in all cases, the model itself is rigid and proprietary.

Semantic Data Integration differs in a number of ways. While it also relies on a model of the integrated data, the model itself is represented in RDF. This means that the model itself is extensible and flexible. If a Semantic Data Integration model is obsolete, it can be extended easily. And not just by its designer; as a Web model, it can be extended by anyone. Representing the model in RDF also means that it is backed by a standard; any RDF model can be loaded into a wide variety of vendor tools with no loss of information. Unlike previous proprietary approaches, the enterprise is not locked into a particular vendor's technology.

Barriers to Adoption
The Web sparked a revolution in how information is managed in the world-at-large. The unruly, almost chaotic way in which anyone can put up a Web page challenged our thinking about publishing, libraries, and information management on the whole. Semantic Data Integration represents a similar challenge for the enterprise; conventional wisdom has left control of corporate data in the hands of a small number of professional data managers who made sure that data did not get out of control. But the proliferation of extracurricular data in e-mail and spreadsheets attests to the fact that there is a need for individual workers to have a stronger hand in the management of their data. This tension isn't a result of a Semantic Data Integration approach; it's a real force in the enterprise. Semantic Data Integration is a reasoned approach to engaging with and managing that force for the benefit of the enterprise.

Current & Future State of Semantic Data Integration
The Semantic Web standards have been several years in the making, but are now proving themselves in real data integration situations. In the enterprise, adoption is understandably cautious, as it is with any new technology. But we're seeing successful deployments that exploit the extensibility and flexibility of semantic data integration to create applications that are resilient in the face of fast-changing data requirements.

Non-functional requirements like scalability, privacy, and security are always concerns for a data-intensive technology. While many open source RDF systems offer some assurances in these areas, database giant Oracle's entry into the field (with its reputation for non-functional support) that has done the most to calm any uneasiness along these lines.

More and more companies are feeling the pain in their daily business of disintegrated data. As other approaches continue to fail, it's becoming clear that while Semantic Data Integration may not be a silver bullet, it is a revolutionary capability; whoever is the first to master it will dominate their space. Successful adoption of Semantic Data Integration isn't without its problems, but more and more enterprises are turning to Semantic standards to address their enterprise information needs.

About Dean Allemang
Dr. Dean Allemang specializes in innovative applications of knowledge technology. He was awarded his PhD in AI in 1990, worked at five different AI labs in Europe between 1990-1996, co-founded a company in the mid-90s that tried to invent the Semantic Web when the standards were just a gleam in the eye of a few W3C folks. He was winner of the Swiss Technology Prize twice, and has filed two patents on the application of graph matching algorithms to the problems of semantic information interchange. As an internationally recognized expert in the Semantic web, he participated in the review board for the Digital Enterprise Research Institute-the world's largest Semantic Web research institute. He leads TopQuadrant's successful TopMIND training series, from which he drew much of the inspiration for his recent book (co-authored with Prof. Jim Hendler), Semantic Web for the Working Ontologist.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
As a result, it said, of “customer feedback and evolving usage patterns,” Microsoft cut the price of its cloud-ified SQL Azure database 48%–75% for databases larger than 1GB and introduced a new entry-level 100MB model. It blogged that it’s noticed that many projects start smal...
Wide and cheap availability of cloud-based media services is upon us. With the transformations these services are already bringing to the consumption of music, video and interactive media, change has likewise come to professional workflows. Documents in 2012 are read, written, co...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical ...
Fresh off a happy quarter, Rackspace said Thursday that it’s bought SharePoint911, one of those you-never-heard-of-them outfits that does SharePoint consulting, training and JumpStart services so it can deliver newfangled SharePoint services along with its existing SharePoint hos...
Cloud is a shift from the focus on underlying technology implementation to leveraging existing implementations and further building upon them. Cloud orchestration or a network of clouds is the wave of the future where these clouds can operate with elasticity, scalability, and eff...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE