Comments
Patrick Collands wrote: collands (AT) gmail com I'd be very grateful for an invitation. Thank you.
Cloud Expo on Google News

SYS-CON.TV

2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Click For 2008 West
Event Webcasts
Populating Word Documents on the Server with Microsoft .NET
Using VSTO to create and manipulate data islands in Office documents

Consider the following portion of an all-too-common server scenario. An authenticated user, perhaps a salesperson, requests a Word document from a server. The document is an expense report, and the server is an ASP, ASP.NET, or SharePoint Server. The server code looks up some information about the user from a database, Active Directory, or Web service. For example, perhaps the server has a list of recent corporate credit card activity that it will prepopulate into the expense list. The server starts up Word but keeps it "invisible" because there is no interactive user on the server. It then uses the Word object model to insert the data into a table, saves the result, and serves up the resulting file to the user.

This is a very suboptimal document life cycle for two reasons. First, it is completely unsupported and strongly recommended against by Microsoft. Word and Excel were designed to be run interactively on client machines with perhaps a few instances of each running at the same time. They were not designed to be scalable and robust in the face of thousands of Web server hits creating many instances on "headless" servers that allow no graphical user interfaces.

Second, this process thoroughly conflates the "view" with the data. The server needs to know exactly how the document is laid out visually so that it can insert and remove the right fields in the right places. A simple change in the document format can necessitate many tricky changes in the server code.

However, automatically serving up documents full of a user's data is such a compelling scenario that many organizations have ignored Microsoft's guidelines and built solutions around server-side manipulation of Word and Excel documents. Those solutions tend to have serious scalability and robustness problems.

What can we do to mitigate these two problems?

Data-Aware VSTO Documents
One way to solve this problem is to move the processing onto the client. Visual Studio Tools 2005 for Microsoft Office 2003 allows you to use Visual Studio to associate customization written in C# or VB 2005 with a Word or Excel document. You can serve up a blank document that has a VSTO-managed customization behind it that runs on the client. The customization can connect to the database server when the document is opened and retrieve data from the database and place it in the document. When the client is ready to send the data back to the database, it connects again and updates the database by reading the updated data in the document. No special document customization has to happen on the server at all, and the database server is doing exactly what it was designed to do.

This solution has a major drawback, however: it requires that every user have access to the database. From a security perspective, it might be smarter to only give the document server access to the database, thereby decreasing the "attack surface" exposed to malicious hackers. What we really want to do is have the document ready to go with the user data in it from the moment the user obtains the document, but without having to start up Word or Excel on the server.

XML File Formats
Avoiding the necessity of starting up a client application on the server is key. Consider the first half of the scenario above: the server takes an existing on-disk document and uses Word to produce a modified version of the document. Word is just a means to an end; if you know what changes need to be made to the bits of the document and how to manipulate the file format, you have no need to start up the client application.

The Word and Excel binary file formats are "opaque," but Word and Excel now support persisting documents in a much more transparent XML format. It is not too hard to write a program that manipulates the XML document without ever starting up Word or Excel. Word provides mechanisms to map an XML schema into the document and then create an XSLT that can transform XML data that matches that schema into the original mapped document.

However, the XML file formats have some drawbacks. Although it is certainly faster and easier to manipulate the XML format directly, parsing large XML files is still not blazingly fast. XML files tend to be quite a bit larger than the corresponding binary files. Word's schema mapping capability is sometimes too constraining for certain solutions - for example, Word's schema mapping is element-centric and doesn't do very well when mapping schemas that are attribute-heavy, such as a typed dataset schema.

We need a way to solve these additional problems. We need a solution that works on binary, non-human-readable files, works with VSTO-customized documents, handles cases that are difficult to achieve with Word's XML and XSLT transform techniques, and a solution that cleanly separates view from data.

The VSTO Data Island
VSTO allows you to associate a managed class called a "host item" with a Word document. VSTO allows you to cache the state of public host item class members that contain data in a "data island" so that they are persisted into the Word document as XML, independent of their user-interface representation. The document format can be either the standard Word binary DOC file format or the new Word XML format.

You can cache almost any kind of data in the XML data island. To be cacheable by the VSTO run time, you must meet the following criteria:

  • The data must be stored in a public member variable or property of a host item (e.g., a customized Word document class)
  • If stored in a property, the property must have no parameters and be both readable and writable
  • The run-time type of the data must be dataset (or a subclass), data table (or a subclass), or any type that is serializable by the System.Xml.Serialization.XmlSerializer object
To tell VSTO that you would like to cache a member variable, just add the Cached attribute to its declaration. Before using the member variable in your code, you can check whether the member was filled in from the data island - use the NeedsFill method provided in a VSTO host item class.

Creating a Word VSTO Customization
Listing 1 shows a simple Word VSTO customization that has two cached member variables called EmpName and Expenses and a Bookmark called EmployeeNameBookmark. To create this customization, launch VSTO or Visual Studio Team System. Choose File >New > Project. Click the Office category under Visual C# as shown in Figure 1. Then click on Word Document as the project type and click OK. Name the project ExpenseReport.

A second dialog will appear prompting you to pick a document to use. Select the "create a new document" option to have VSTO create a new empty Word document. In the newly created project you will see the host item called ThisDocument.cs. Double click on ThisDocument.cs to display a Word editing view inside of Visual Studio. While in the Word editing view, select a place in the Word document where you want to insert the bookmark. From the Insert menu choose Bookmark and name the bookmark EmployeeName. Figure 2 shows the editing experience inside of Visual Studio. You can edit both the Word document itself and the host item code associated with the Word document without leaving the Visual Studio environment.

Now, right click on the ThisDocument.cs host item and choose view code. Edit the code to look like Listing 1. The code declares two cached member variables - EmpName and Expenses. It checks if these cached member variables have been filled from the cache in the ThisDocument_Startup handler. If the string EmpName is filled, the bookmark we created is accessed to set the text to the value of the EmpName string. If the data set Expenses is filled, we iterate over the dataset and put the data into a Word table - the code to do this is omitted for brevity.

Press F5 to run the document customization and verify the cached data feature. Word will start up and the ThisDocument_Startup method will be called. On the first run, the data island will be empty so the first call to NeedsFill will return true. The code sets EmpName to the string "Unknown Employee" but does nothing more. Save the document and close it. As the document is saved, the VSTO run time detects that a member variable marked as cached was changed and saves the state of that variable into the data island in the document - in this case the value of the variable EmpName. Next, reopen the document. On the second run, the call to NeedsFill will return false as the member variable EmpName is found in the data island. The code will then run to set the EmployeeName bookmark's text to contain the string readout of the data island.

About Eric Carter
Eric Carter is the development manager for the Visual Studio Tools for Office (VSTO) team at Microsoft. He helped invent, design, and implement many of the features that are in VSTO today. Previously at Microsoft he worked on Visual Studio for Applications, the Visual Studio Macros IDE, and Visual Basic for Applications for Office 2000 and Office 2003. For more information about VSTO, visit Eric?s blog at http://blogs.msdn.com/eric_carter/default.aspx.

About Eric Lippert
Eric Lippert's primary focus during his nine years at Microsoft has been on improving the lives of developers by designing and implementing useful programming languages and development tools. He has worked on the Windows Scripting family of technologies, Visual Studio Tools for Office, and most recently, the new Language Integrated Query features of C# 3.0. For more information about VSTO, visit Eric's blog at http://blogs.msdn.com/ericlippert/.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

The need to customize each client before one can view the word documents makes this feature useless. Also the xml loads extreeeemely slow. A waste of developer-time so far.

Consider the following portion of an all-too-common server scenario. An authenticated user, perhaps a salesperson, requests a Word document from a server. The document is an expense report, and the server is an ASP, ASP.NET, or SharePoint Server. The server code looks up some information about the user from a database, Active Directory, or Web service. For example, perhaps the server has a list of recent corporate credit card activity that it will prepopulate into the expense list. The server starts up Word but keeps it 'invisible' because there is no interactive user on the server. It then uses the Word object model to insert the data into a table, saves the result, and serves up the resulting file to the user.

Consider the following portion of an all-too-common server scenario. An authenticated user, perhaps a salesperson, requests a Word document from a server. The document is an expense report, and the server is an ASP, ASP.NET, or SharePoint Server. The server code looks up some information about the user from a database, Active Directory, or Web service. For example, perhaps the server has a list of recent corporate credit card activity that it will prepopulate into the expense list. The server starts up Word but keeps it 'invisible' because there is no interactive user on the server. It then uses the Word object model to insert the data into a table, saves the result, and serves up the resulting file to the user.

Consider the following portion of an all-too-common server scenario. An authenticated user, perhaps a salesperson, requests a Word document from a server. The document is an expense report, and the server is an ASP, ASP.NET, or SharePoint Server. The server code looks up some information about the user from a database, Active Directory, or Web service. For example, perhaps the server has a list of recent corporate credit card activity that it will prepopulate into the expense list. The server starts up Word but keeps it 'invisible' because there is no interactive user on the server. It then uses the Word object model to insert the data into a table, saves the result, and serves up the resulting file to the user.


Your Feedback
Frans wrote: The need to customize each client before one can view the word documents makes this feature useless. Also the xml loads extreeeemely slow. A waste of developer-time so far.
SYS-CON India News Desk wrote: Consider the following portion of an all-too-common server scenario. An authenticated user, perhaps a salesperson, requests a Word document from a server. The document is an expense report, and the server is an ASP, ASP.NET, or SharePoint Server. The server code looks up some information about the user from a database, Active Directory, or Web service. For example, perhaps the server has a list of recent corporate credit card activity that it will prepopulate into the expense list. The server starts up Word but keeps it 'invisible' because there is no interactive user on the server. It then uses the Word object model to insert the data into a table, saves the result, and serves up the resulting file to the user.
.NET News Desk wrote: Consider the following portion of an all-too-common server scenario. An authenticated user, perhaps a salesperson, requests a Word document from a server. The document is an expense report, and the server is an ASP, ASP.NET, or SharePoint Server. The server code looks up some information about the user from a database, Active Directory, or Web service. For example, perhaps the server has a list of recent corporate credit card activity that it will prepopulate into the expense list. The server starts up Word but keeps it 'invisible' because there is no interactive user on the server. It then uses the Word object model to insert the data into a table, saves the result, and serves up the resulting file to the user.
SYS-CON India News Desk wrote: Consider the following portion of an all-too-common server scenario. An authenticated user, perhaps a salesperson, requests a Word document from a server. The document is an expense report, and the server is an ASP, ASP.NET, or SharePoint Server. The server code looks up some information about the user from a database, Active Directory, or Web service. For example, perhaps the server has a list of recent corporate credit card activity that it will prepopulate into the expense list. The server starts up Word but keeps it 'invisible' because there is no interactive user on the server. It then uses the Word object model to insert the data into a table, saves the result, and serves up the resulting file to the user.
Latest Cloud Developer Stories
The Enterprise Cloud Requires a real time infrastructure and a management discipline that understands and can enforce service level discipline.
CloudBench Applications, Inc. announced its financial results for the three months and nine months ending September 30, 2009. All amounts are stated in Canadian dollars unless otherwise noted. Revenues from BasicGov, the Company's cloud computing solution for local government, gr...
The new contract is an industry first, with CSC being the first Microsoft partner to lead and win a cloud computing services agreement of this scale. Under terms of the contract, CSC will provide Royal Mail Group's 30,000 employees with access to new IT services using Microsoft's...
Operates in over 170 countries and is one of the world’s leading providers of communications solutions and services. Richard Tarboton talks for MeettheBoss.TV on his role as Head of Energy & Carbon for BT and what they are doing towards reducing carbon emissions.
CA is going to put its Agile Planner software on salesforce.com’s Force.com platform in the first half to accelerate development time and give users visibility over their development initiatives to reduce time-to-market. Customers are supposed to be able to accelerate the deploym...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News
CloudBench Applications, Inc. announced its financial results for the three months and nine months e...