|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Features The In-Memory DBMS Opportunity: IMDB and Performance Enhancement
Why are IMDBs important? Why is In-Memory DBMS so different?
Jun. 8, 2010 10:15 AM
The database has changed. Or at the very least it is changing, possibly quite rapidly. One of the major influences driving this change is the In-memory DBMS (IMDB). Some argue that the IMDB has emerged specifically to meet the needs of embedded systems and as is evidenced by the very name, IMDBs exist entirely in memory without ever going to disk. In fact, in-memory systems have blossomed in recent times and evolved from a period when they were only used for caching, or in high-speed data systems, to a place now in 2010 when they may form a far more prevalent part of the mainstream IT landscape. But should we consider an IMDB as nothing more than a traditional database that has been loaded into memory? The answer is no - and it's not just about boosting performance and scalability while keeping a tight rein on storage costs (although those are super important too) so let's find out why. IMDSs are really a rather different proposition compared to traditional databases, as they are inherently less complex. As well as the removal of disk I/O considerations, IMDBs have a smaller number of moving parts and dependent processes. What this means is that both RAM and CPU power are more efficiently preserved and a faster level of overall performance is reached - what I mean is, the performance will be faster than deploying a traditional DBMS in memory. In-memory DBMS Technology Defined IMDB databases are generally regarded to be faster than disk-optimized databases since the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory provides faster and more predictable performance than disk. In applications where response time is critical an IMDB is often perceived to be the best choice possible. An IMDB usually features a strict memory-based architecture and direct data manipulation. All its data is stored and manipulated exactly in the form used by the application, removing overheads associated with caching and translation. Read and write accessibility is somewhere around just a few microseconds. IMDB technology can support real-time data management, application-tier deployment and the ACID properties. See later box out for a definition of ACID. In-Memory Database Systems 3-key USPs
Why IMDB Is Important Put simply, today there is a specific need for high-performance DBMSs that can support the data centre to a new level of operational excellence. There will be a trade off here between increased performance and the wider robustness (or if you prefer ‘durability') of the system in question. With specific relevance to Sybase technology, of course ASE 15.5 does boast in-memory database capability and there are considerations here if full ASE application compatibility is to be retained. Why Are IMDBs Important? "In-memory databases are managed in main memory instead of on disk, so the disk is relegated to the role of a recovery, rather than data management, platform. This greatly reduces both the dependency on disk storage and the volume of disk storage required, especially when multiple copies of data are used in distributed architectures," added Olofson. Key Challenges for Datacenters Today Writing on IT Today, journalists Marty Ward and Sean Derrington said that, "Data center managers are caught between a rock and a hard place. They are expected to do more than ever, including protecting rapidly expanding volumes of data and a growing number of mission-critical applications, managing highly complex and wildly heterogeneous environments, meeting more challenging service level agreements (SLAs), and implementing a variety of emerging "green" business initiatives." ACID - Atomicity, Consistency, Isolation and Durability
Figure 1: Data flow in a typical DBMS. Red lines show data transfer. Grey lines show message path. Figure 1 depicts a typical DBMS scenario that will be familiar to you all. This is of course in some contrast to an in-memory database system. An IMDB encapsulates a system that has comparatively low data transfer needs due to the fact that the system gives the application the direction it needs to directly refer to the data item in the database. Although this perceived ‘gap' in the transport layer then exists, the data itself is still fully protected because this direction (or pointer if you prefer) only exists as part of the database API - and this ensures its correct use. What we see at this point is an eradication of the need for multiple data transfers and the follow-on of this benefit is that processing will directly benefit from a streamlining process. Removing multiple data copies reduces memory consumption and the straightforwardness of this design allows for greater reliability. Why In-Memory DBMS Is So Different For an insight into why an IMDB system should perform so much faster, we turn again to IDC's Carl W. Olofson for an explanation of what happens in a traditional system so that we can get a proper grip on the inherent differences, "A disk-based database is designed from the ground up to optimize its data management based on an optimal disk layout and I/O minimization strategy. For this reason, data is assigned to preallocated spaces on disk that are mapped to files or, if the access method is direct, to designated segments on designated disk volumes. Often, to minimize disk head movement and contention, the data is spread across many volumes. When data is required for a database operation, the DBMS must identify the database page required, determine where that page is stored on disk, access the appropriate volume, retrieve the data, allocate the data to a buffer, and update a page buffer table." Olofson goes on to point out that even when a required piece of data is in the buffer memory and its database key (the actual physical location of the data) is already known, the system must dereference the database key, determine that the page is in memory, locate the page and find the referenced location in the buffer where the requested data is located. In contrast to this, when an IMDB looks for data by its location, it simply uses a memory offset to load a pointer and gets the data directly. "When data is updated on a disk-based system, the buffer is updated and marked as ‘dirty. A lock is set to prevent another user from reading the old data from disk until the transaction can be committed," said Olofson. "When the transaction is committed, a record is written to the log. In some systems, the buffers are flushed at this point. In others, the buffer page table is updated to indicate that requests for that data should be referred to the now committed data in buffer, which is eventually written to disk either when the buffer must be flushed to make way for another database page or when the I/O optimization algorithm finds an idle moment to perform the write. In an IMDB, most of these operations do not exist. A user has a temporary image of updated data until it is committed, at which point the temporary image replaces the permanent image," he added. Coming of Age: Is 2010 The Year of the IMDB? "It was about 7 years ago when I first was introduced into the concept of in-memory databases. At the time it was less known database vendor called Times-Ten that offered an in-memory database with blazing performance metrics, hence times ten. It was the perfect answer to solid-state disk drives that could drain an IT budget in a hurry. Just recently Sybase announced its Sybase ASE server, in version 15.5, will have an in-memory engine equivalent that will provide the same functionality and manageability as the standard Sybase ASE server. This is a remarkable step, because it provides performance gains transparent to client applications and the database engine will not challenge DBAs to learn new skills. To me this is a win-win situation, says Peter Dobler, President of Dobler Consulting, a database consulting services and application support services firm serving clients in the southeast region of the United States. Dobler says that the answer lies in the architecture of in-memory databases. They are designed to improve transaction-processing volume for classic OLTP applications. While data warehouses would not necessarily get any benefits from in-memory databases, IMDBs do provide extreme high-speed transaction processing without the need to confirm disk write success. Traditional databases have one thing they have to do to ensure data integrity. They all need to wait for the disk I/O to confirm a write to disk. Database vendors came up with very complex and sophisticated caching techniques to overcome this performance challenge. But they cannot ignore this fundamental requirement. In-memory database bypass this disk writing requirement and that's what improves the speed. "If you think about it, IMDBs are designed for high volume transaction systems, like e-commerce shopping carts, in-memory databases are unbeatable when it comes to writing transaction data, says Dobler. "This is fundamentally different to data caching of traditional database engines. Data caching improves read performance, but does nothing to improve write performance. There is a downside to these databases as well; they offer alternatives to performance problems in poorly written applications. Like powerful hardware, in-memory database have the potential to mask poor application development. We might see an explosion of in-memory database implementations due to this matter." In-Memory DBMS and Scalability An IMDB on the other hand will only need to use the disk in the case of a data recovery situation. It writes the log (if there is one) to disk, and it dumps its memory contents to disk from time to time," says IDC's Olofson. "Because the disk is not involved in database transactions, the data that is dumped to disk can be packed together. As a result, there is no problem with filling a volume right up to the brim. Also, because the disk has no impact on performance, cheap lower-tier storage may be used. In scale-out situations, where clustered databases use partitioned and redundantly stored data, this savings effect is compounded," he added. But why is tier 1 storage expensive? The 3 Tiers of Storage Tier 1 Storage - Suitable for business-critical 24x7 databases; file servers and email applications; and data warehouses; this is a redundant, cache-based tiered storage model with fast response time and fast data transfer and availability rates. Tier 2 Storage - This model is best suited to "seldom-used, non-critical databases" such as historical data held for reference purposes only. It uses less expensive media in storage area networks (SAN), but does not feature 24x7 availability or extensive backup. Tier 3 Storage - This is used for data that is relatively rarely accessed at all. It may even feature inexpensive media such as DVD ROMs and Compact Disks. Sybase ASE and IMDB ASE 15.5 also increases efficiency in the data center with the integration of the ASE Backup Server with Tivoli® Storage Manager (TSM) from IBM. With this support, ASE databases can be backed up on any TSM supported media, providing faster backups and restores with less network and storage resources required. This new integration provides a cost effective solution for storage management. "The database management features offered by Sybase ASE 15.5 coupled with the storage management features offered by IBM TSM provide a powerful solution to overcome the challenges of data protection faced in today's business environment," said Richard Vining, Product Marketing Manager, IBM Tivoli Storage. "We are pleased to provide Sybase ASE with the Ready for Tivoli software designation, which shows customers that the solution meets or exceeds IBM compatibility criteria and successfully integrates with one or more IBM Tivoli Software products." "In data centers, the IT challenge is to increase efficiency and availability while lowering data center costs. At the same time, application deployments in grid and cloud computing environments are increasing the requirements of transaction processing systems to support large volumes of concurrent users with high transaction rates," said Brian Vink, vice-president, data management products, Sybase. "ASE 15.5 addresses these extreme requirements by delivering increased data throughput and greater concurrent activity while elevating productivity and uptime." Conclusion • • • Resouce Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News
|
|||||||||||||||||||||||||||||||||||||||||||||||||