|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Best Practices Best Practices for Advanced Data Persistence in Enterprise .NET
What .NET can learn from Java
By: Greg Aloi; Tobias Grasl
Mar. 22, 2005 12:00 AM
The .NET framework has become the platform of choice for many enterprise applications. Not surprisingly, many .NET concepts parallel those in CORBA and J2EE. The benefit of hindsight gives .NET architects and developers the opportunity to imitate the success while avoiding the pitfalls inherent in complex distributed development. This is especially true for data persistence, which often gets less attention than it deserves. Issues with data persistence are often not discovered until the later stages of development and testing, when the options for addressing them are limited and expensive. A well-designed data persistence layer can drastically improve the performance of the entire application, if it's addressed throughout the development process. This article outlines best practices for developing advanced data persistence in a .NET application. It discusses the development and runtime requirements of an effective data layer and then shows the benefits of an object-oriented approach and intelligent persistent data caching, comparing these with the data persistence facilities available in the .NET framework. Requirements for a Efficient Data Persistence Layer Enterprise applications have similar requirements for data persistence regardless of the programming languages they were developed with and the platforms they're deployed on. One lesson learned early on in the Java community was that the representation of relational data as objects yields huge returns. The Java world continues to focus on this object-oriented approach with EJB, JDO and more recently Hibernate. Successful .NET architects can benefit from Java's lessons. Nobody wants to spend a lot of time writing, testing and debugging data persistence code. For the most part, this is plumbing code that "should just work" without a lot of effort. Developers want to focus their efforts on the custom business logic of their application. However, because data persistence is an important element of the overall application, it can't be ignored. Ideally a solid data persistence layer requires minimal developer intervention. Choosing good tools that assist in object-relational mapping and code generation along with the careful design of an efficient data persistence model helps meet both these requirements. However, the ability to design and build an application quickly doesn't guarantee success at deployment. Performance and scalability are possibly the two most popular terms in enterprise development. A poorly designed data access layer can significantly impact application performance, and ultimately availability and scalability. While some approach this problem by throwing more hardware at it, caching is the most common and effective solution for performance problems. While Web page caching is a common solution for improving response times, it doesn't address data bottlenecks for enterprise applications. Applications with complex data requirements quickly become unmanageable when data access is coded directly in Web components. Instead, business logic and data access typically reside in the server tier between the database and Web components as shown in Figure 1. Scalability and performance problems occur when the volume of requests from the server tier overwhelm the limited database resources. Server tier caching of persistent data increases the application's capability for handling many concurrent requests. So, important success factors for a data persistence layer include:
Advantages of an Object-Oriented Approach for Developer Productivity When Microsoft first introduced the .NET Framework, it also introduced a data layer API called ADO.NET. Derived from ADO, an earlier form of data access, ADO.NET's goal is to work (through third-party adapters) with various data sources throughout the Internet. Its status as a de facto standard, coupled with its tight integration to Visual Studio.NET, quickly made ADO.NET useful to developers who were working with relational data. Unfortunately, ADO.NET is not an object-oriented approach. Although it facilitates access to various data sources, ADO.NET remains a relatively low-level API whose code can be generated either through Visual Studio or by third-party tools. Regardless of what tool is used, complex systems usually need some additional data persistence code, which must be written manually. In addition, ADO.NET can't directly satisfy the requirement of transparent object persistence. This is because the API is geared towards the processing of tabular result sets (the DataSet object) built from SQL queries, rather than towards the business objects used by the application. In contrast, the O-R (Object-Relational) mapping approach takes all that's good from object-oriented programming and applies it to the data layer. Concepts such as encapsulation, abstraction and reusability make the data layer easier to manipulate and use, meaning faster, more efficient development. The Data Persistence Object Model For example, a relational table A_Table can be modeled and mapped to the C# class A.cs. Column names become property names, and the CRUD operations are replaced with methods. Figure 2 shows two tables (Department and Employee) that have a one-to-many relationship. The foreign key (DeptID) is on the Employee table. Suppose we want to get all Employees associated with the "sales" department. In ADO.NET the code may look like the following: Now let's take a look at how the same operation could look using an O-R mapping system. Notice how the O-R mapped solution uses no SQL or database-specific code. In fact, the relational data is treated the same as any ordinary .NET object. A closer look at Sample 2 shows how relationships are represented and used in the O-R approach (dept.employees). Obtaining information in related tables becomes as simple as accessing an object attribute. Advanced Features of an Object-Oriented Approach Inheritance is a powerful feature of object-oriented programming. Therefore applying this capability to the object model further enhances the usability of the data persistence code. Now the developer can work with and understand the data persistence objects in a more natural way. Products that offer this level of O-R mapping functionality usually provide various tools to define the mapping and generate code for persistent objects. Choosing a solution with a complete toolset will greatly improve productivity. For example, some products offer a tool that will read and generate object models directly from existing database schemas. This means that with very little time and effort, and object model can be created, code can be generated and the developer can enjoy the benefits the object-oriented approach brings to the data persistence layer. Integrating Caching for Performance and Scalability ADO.NET can provide a form of lightweight caching by using disconnected datasets. In using disconnected datasets, DataSet information is serialized to XML files, which can then be modified "offline" in local processes. Some folks in the Microsoft world feel that disconnected datasets are the .NET way to implement data caching. However, caching wasn't the initial intent behind disconnected datasets; designers were simply looking for a way to reduce the number of open database connections. By tearing down the connection after data has been transferred to or from the database, disconnected datasets improve performance on both the database and application side (clients are no longer blocked). Various other types of caching solutions are available. However, it's important to realize that caching strategies aren't equal. Depending on the complexity of your application, certain caching solutions may result in significant differences in performance and scalability. Some also require much more development effort to integrate and manage the cache. Questions you need to ask when looking for a caching solution include:
Let's start the discussion with the most basic type of caching - static caching. A static cache (sometimes also called a read-only cache) contains data that never changes, usually reference data that is potentially used a number of times. Therefore instead of retrieving this data from the database every time it's needed, an application can simply query the cache, thus reducing the overhead of a database call. Although a static cache can be a huge improvement over querying the database, it only solves performance problems for read-only data - which may be only a portion of the entire data model. For most enterprise systems, many elements in the data model are very dynamic (change frequently) and also need to be accessed at near real-time speeds. The old 80-20 rule applies: 80% of an application's transactions will be performed on 20% of the data. Caching that 20% can significantly improve performance, even for transactional data. Dynamic Caching When looking at caching systems, make sure to understand the locking behavior for updates. One approach is to lock the data that's being updated (pessimistic locking). For example, if an Employee instance is getting updated, then all other threads sharing this cache will be blocked from reading (and possibly updating) this Employee until the update is complete. Although this method works, pessimistic locking tends to reduce performance for transactions on highly requested objects and negate the cache benefits. More sophisticated caching solutions offer optimistic locking. By assigning version numbers to different revisions of the data, the cache will know if a thread is attempting to update stale data. By using a versioning mechanism, data in the cache never needs to be locked, thus increasing performance. Relationship Caching Usability Distributed Caching To Increase Scalability and High Availability Just as with caching, clustering technologies vary widely. Most simply invalidate objects when a change occurs to the underlying data. This means that the next request for any of the changed objects requires an expensive database call. True cache synchronization provides the updated information to all of the distributed caches, delivering the best performance. Not only can a distributed caching system provide performance and scalability requirements of an enterprise application, it provides a safe foundation for advanced enterprise deployments such as those providing failover and load balancing. When all application caches are in sync (as in Figure 3) then if one server goes down, the other servers would easily be able to satisfy the additional failover requests with little to no performance impact on the end user. Ideally, cache synchronization complements server clustering mechanisms to provide better uptime in the case of database or network outages, as well as allowing for server-side failover. Conclusion This article identified developer productivity, performance, scalability and high availability as important requirements for the data persistence layer of enterprise .NET applications. Productivity gains can be realized by adopting an object-oriented approach and choosing code generation tools with flexible features. Choosing a good caching and distribution technology can increase performance, scalability and availability with minimal code changes. The combination of O-R mapping, caching and distribution for advanced data persistence can play a significant role in reducing risks and contributing to the success of enterprise applications. Resources For more on caching, see the following Web sites. www.alachisoft.com www.deklarit.com www.objectstore.com/products/edgextend/index.ssp Reader Feedback: Page 1 of 1
Your Feedback
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News
|
||||||||||||||||||||||||||||||||||||||||||||||||||||