Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Mission Impossible XML(MI-XML)
Mission Impossible XML(MI-XML)

In previous articles in XML-J (Vol. 2, issues 3, 4) we documented the thought processes involved in Crossmark's development of a knowledge management system using NeoCore's XML Information Server. Although the first version of the application isn't yet complete, we'd like to share some of the insights we've gained during the development period.

The goal of the KMS is to provide a framework for gathering information from diverse sources and constructing relationships between the pieces of information. We believe that revealing the relationships between information will provide insight that has heretofore been unavailable.

Another goal of the project is to assess a new paradigm for development. We're not sure what type of insight will be available to us through the application, but we do know that the application must accommodate change - not just in its functionality, but also in its intended use. We may discover that it doesn't provide much use as a CRM application, but that it's indispensable as a resource-planning tool. Therefore, the solution must be flexible enough to target new goals. When the architecture of a product is implemented in a relational database, we believe that the solution becomes locked. The use becomes tied to a rigid structure. Modifying a database to meet changing requirements creates significant challenges, and changing entire goals requires complete rearchitecture of the data structure. We propose that implementing the data tier of an application in native XML will provide benefits to developers facing changing requirements. To that end, we haven't spent significant time defining the structure of the data. We know that use of the application will reveal features that must be changed, new functions that are required, and incorrect assumptions that have to be corrected. In fact, we've tried not to anticipate changes that will be required, a habit that's hard to break for an experienced developer.

Although the theory sounds fine, actually achieving it would have been impossible had we not discovered NeoCore's XML Information Server. Providing the ability to index every node and attribute of any XML document, and allowing us to query on the indexed data, seems to be a task of overwhelming complexity. But that's exactly what the server achieves. The final goal of the project has therefore been to assess the capabilities of NeoCore's product to determine if it lives up to its potential.

Status of Project
Currently, development of the first version is more than three-quarters finished. In terms of functionality, we're focusing our efforts on managing contact relationships and documents. To further this, one of our initial development tasks was to create utilities that would allow rapid gathering of significant amounts of useful data. The first type of utility, a database-import utility, pulls contact data from various standard data sources. As a result, we've gathered over 50,000 contacts in our repository before the application has officially been delivered.

The second utility is a control that imports contact data from Microsoft Outlook. Its import routines build the relationships between the contacts and the users who own the contacts, and also link imported contacts to existing individuals in the contact repository. We quickly discovered that a CRM application can provide true insight only when it exposes the relationships that exist between the overwhelming amount of transaction data and other information that companies capture. At present we're developing tools to allow users to easily gather and tag various documents, including e-mail, Web pages, and Microsoft Office documents. The tags will indicate relationships back to concepts, thereby allowing a user to retrieve all documents that relate to a certain idea.

In terms of architecture, we've finished developing the core objects that encapsulate the behavior of the KMS and provide the interface with the NeoCore server. We're currently developing a Web client to interface with this application. The client will allow users to edit and create new entities in the KMS. It'll provide an interface for gathering and tagging documents, and will also display entities and their relationships in a meaningful manner.

The design of the application is simple. Four building blocks make up the entire architecture:

  • Entity Definitions (ED): EDs are XML Schema definitions that describe the possible types of entities, along with their properties, that can exist in the repository. The properties described by an ED can be typed, single- or multivalued, and enumerated. The collection of available properties described by an ED can be expanded without breaking existing instances of the entity.
  • Relationship Definitions (RD): RDs define the possible relationships that may exist between two or more entity instances, along with the available properties of the relationship.
  • Entity Instance (EI): An EI is a single case of an entity that's described in an ED. The properties an EI can possess are inherited from its ED. Every EI should conform to the specification described in its definition.
  • Relationship Instance (RI): An RI is a single case of a relationship between two or more EIs. The properties an RI can possess are inherited from its RD. Similar to EIs, RIs should conform to the specification described in its definition.
Figure 1 is a screenshot of the Web interface showing, on the left, an individual along with his contacts and the company he works for. On the right are the editable properties for that individual.

The core business objects are COM+ components written in Visual Basic. These components are currently running on a separate COM+ Application Server. The Web front end is being developed using Active Server Pages (ASP). All communication between the Web client and the business objects is wrapped as XML. This poor man's version of SOAP - Simple Object Access Protocol - is a technique we've used to very good effect in previous applications. The business logic necessary to process XML doesn't differ much from the logic used to handle data saved in a database. Most of the differences lie only in the functionality to retrieve and persist the data.

In terms of implementation, the business objects have only one significant difference from the types of components that make up a typical database-driven application. We've developed the components to gracefully handle changes in the structure of the data. Every entity in the repository is defined in terms of what its structure should be. When presented with a fragment of XML that represents an existing entity, the business component compares the structure of the fragment with the structure it expects. If the two differ, the component processes the bits of the structure it recognizes, ignoring the bits it doesn't and treating them simply as chunks of XML that it will save "as is." This behavior was the only allowance we made to anticipate future changes in the data structure, and it should allow the system to bend but not break.

Some challenges we faced during development were unique to this project. While the project itself is still two weeks behind, it isn't due to technical problems. Developers are used to trying to anticipate changes or problems their applications may encounter - a highly valuable ability. The design phase ran beyond its allotted time because we kept trying to incorporate flexibility instead of developing the functionality that we knew was required immediately and letting NeoCore's server handle the flexibility.

Other challenges we faced are unique to NeoCore's XML Information Server. What the product does, it does exceedingly well; however, because it's a new breed and not fully mature, development time was required to provide functionality that developers currently take for granted.

Related to that, we've identified a number of enhancements we believe would make the server a more powerful and useful product. To their credit, NeoCore has indicated that they have addressed or will address all of our concerns in future versions:

  • Version 1.0 of the server is designed as a Web service and thus uses a Web server as a front end. Version 2.0 migrates to Netscape for improved performance and security. While HTTP will remain the primary interface to the server, the company has worked with us to develop Java and C++ classes that wrap the HTTP functionality. These classes are included in version 1.0. Future versions will support COM as well as other wrapper interfaces.
  • We needed the ability to do nested queries (analogous to one form of SQL join), and had to code this in our application. NeoCore is adding this functionality to version 2.0.
  • We need sorting, math, and string-manipulation routines. NeoCore has stated that this is in the works for a future release.
  • We encountered intensive memory use by one component of the XML server. NeoCore has indicated that they have significantly reduced memory footprint and also made it tunable in version 2.0.
  • When returning the value of attributes, you must use the Flat command, which outputs the result like this:
<Flat-Results>

<line>ND>EntityDefinition>@EntityName>
Sector</line>

<line>ND>EntityDefinition>@EntityName>
Company</line>

<line>ND>EntityDefinition>@EntityName>
Individual</line>

<line>ND>EntityDefinition>@EntityName>
Industry</line>
</Flat-Results>

This requires additional string processing of the results. NeoCore has indicated that this is fixed in version 2.0.

  • We'd like to see more support of the XPath specification. Querying and access to XML are done using XPath statements, so more support for the specification would allow greater flexibility in the methods developers could use to get to the data. NeoCore does plan to support more of the functionality described in the specification, and claims that it's a matter of evolution of their product, and of the evolution of all query languages relative to XML currently under development.
  • We have a minor quibble related to the foregoing issue. We'd like to see the "*" operator available as a wildcard outside of XPath filters (i.e., be able to use "@*" and have all attributes of a node be returned). Again, NeoCore says this will be available in version 2.0.
Future of the Project
We hope to finish development of the Web client in a couple of weeks. Helping our efforts is the fact that front-end coding doesn't differ at all from the coding for an application using a database back end. Once the client is available, our focus will be on rapidly gathering contact data from our sales force and documents, including internal office documents and Web documents. By this summer we expect to have over 100,000, as well as two to three times that number of documents and contacts stored in the repository. By that time we should begin to see patterns that will provide insight.

Because of its simplicity and flexibility, our software department has already anticipated using the existing design as a documentation tool. Using the system's architecture, it would be easy to describe (1) our existing applications, (2) which layers communicate with each other, and (3) the servers that each module of a distributed application resides on. At that point our network engineers will be able to see the relationships between our applications and their servers - we'll be able to see what applications will be affected, and how, by hardware problems.

Conclusions
Though not yet complete, we're already discovering some useful things from this project. First, this has been an ideal project to attempt to implement with an XML repository. Unlike a mission-critical transactional application requiring airtight reliability and a data structure requiring referential integrity, this application as a whole doesn't fail when data is lost. It simply becomes a bit of knowledge that has been lost - not a good thing, but not fatal either.

The most contention about the project has been over whether we've proved the viability of our development paradigm, and the fact that persisting data such as XML allows developers to design applications without planning to anticipate unforeseen changes. Our developers have had a hard time dealing with this theory. If this approach does prove useful, convincing developers to adopt it and change their old habits will be more than half the battle of successfully implementing it. One thing we've determined is that this development model requires programmers to be more adaptive and flexible, as opposed to the old paradigm, which requires more up-front planning.

We've definitely been satisfied with the NeoCore XML Information Server. While not yet fully mature, it does what it advertises. It permits storage of XML documents in their native format and has eliminated the bottlenecks inherent in accessing those documents. With more and more information exchanged via XML, it seems silly to go to the trouble of breaking down the information so as to store it in a database, then reconstructing it later. We've also been delighted with NeoCore's responsiveness to our needs. They recognize the potential of their product, and have worked to improve it based on our concerns. They've been responsive to every issue we've had with their product, even designing a custom API just for our needs. NeoCore has indicated that version 2.0 will have full transactional support and fine-grained, content-based access control along with additional scaling and performance enhancements. This version should become available for general release early this summer.

About John Thompson
John Thompson is
president of the
performance group at Crossmark, Inc., one of the nation's largest sales and marketing
organizations. He's responsible for driving all Web-based
initiatives, including the development and
successful execution of strategic e-Alliances. John is a member of the XML/EDI group, charged with developing the next generation of B2B transaction
standards.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
Swisscom, the Swiss telecom, is going into the cloud business. Its subsidiary Swisscom IT Services AG has signed up with Red Hat as a Certified Cloud Provider and launched a public cloud Infrastructure-as-a-Service (IaaS) cloud targeting enterprise-class customers primarily in ...
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP)....
In a surprise move on Tuesday, January 10, Oracle wheeled out its Big Data Appliance. That’s the one it said in October would be ready sometime in the first half. Only nobody believed it meant early in the first half. Heck, it’s not even clear anybody thought Oracle could make ...
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
CloudLinux, Inc., on Thursday released CafeFS 3, a virtualized file system for shared hosters that cages each customer within its own virtualized file system. CageFS becomes part of CloudLinux OS at no additional charge. CloudLinux OS, the only commercially-supported Linux OS m...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News

BEACHWOOD, Ohio, Feb. 16, 2012 /PRNewswire/ -- DDR Corp. (NYSE: DDR) today announced operating re...