Database as a Service: A Different Way to Manage Data
An important tool in a developer’s toolbox for rapid development
Nov. 30, 2009 02:00 PM
SaaS Journal on Ulitzer
SaaS has rapidly evolved from an online application source to providing application building blocks such as
- Platform as-a-Service (PaaS)
- Infrastructure-as-a-Service (IaaS) and
- Database-as-a-Service (DaaS)
DaaS is the latest entrant into the "as a Service" realm and typically provides tools for defining logical data structures, data services like APIs and web service interfaces, customizable user interfaces, and data storage, backup, recovery and export policies. To ensure successful DaaS implementations, developers and database professionals need to address traditional challenges associated with data design and performance tuning. They will also need to address new challenges introduced by the lack of physical access for backup, recovery and integration.
What Is DaaS?
DaaS provides traditional database features, typically data definition, storage and retrieval, on a subscription basis over the web. To subscribers DaaS appears as a black box supporting logical data operations, and logical data stores where customers can only see their organization's data. Physical access is seen as a security risk and thus it is not available. As with SaaS, DaaS vendors build and manage data centers incorporating best practices in security, back-up, recovery and customer support. Data services typically are provided as SOAP or REST APIs allowing users to define data structures, perform CRUD operations, manage entitlements and query the database using a subset of standard SQL.
Real-World Examples: Force.com and Amazon SimpleDB
Two real-world examples of DaaS are Salesforce.com's Force.com, which provides data services in its toolkit for building applications, and Amazon's SimpleDB, which provides an API for creating data stores which can be used for applications or pure data storage.
Force.com supports the Model-View-Controller paradigm for application development where Model refers to the data model.
- Database schema: Developers can configure pick list values for fields in standard CRM objects (tables), or create custom objects and fields via the Salesforce.com Setup menu. Data elements can also be defined programmatically through the Metadata API, which is used by the Force.com IDE, an add-on for Eclipse. Lookup fields and parent-child relationships allow foreign key relationships between tables.
- CRUD operations: Data entry, updates and deletes can be performed using Force.com pages that are automatically generated for each table, or through the Force.com Web Services API. Apex, Force.com's programming language, provides the ability to develop object oriented code to perform data operations.
- Database queries: Querying data is done through SOQL, Force.com's subset of SQL. SOQL provides read-only access via the Web Services API or Apex, Force.com's development language.
- Stored procedures: Custom business rules can be implemented as Triggers, the equivalent of database stored procedures written in Apex.
- Pro: Force.com database development and functionality parallels traditional database development.
- Con: Force.com database design requires careful design and coding.
Amazon.com's SimpleDB service appears to be geared to developing applications quickly with minimal effort on database design and definition:
- Database schema: SimpleDB stores data in "domains," the equivalent of a spreadsheet tab. Once a domain is created attributes (fields) are created when records are added to the domain. Each record requires a unique ID string for each item (record) and attributes are added as name-value pairs such as ("First Name", "Tara"). Items are limited to 256 name-value pairs, and domains are limited to 1 billion attributes.
- CRUD operations: SimpleDB uses the Put to insert and update items, Get to retrieve an item by unique number, and Delete to delete records.
- Database queries: SimpleDB supports a subset of SQL for read-only access to data. SimpleDB does not support queries across domains, so SQL joins are not available. The Developer Guide suggests storing related data in a single domain as a workaround.
- Stored procedures: SimpleDB currently does not support stored procedures.
- Pro: SimpleDB allows rapid development of web based applications requiring data services.
- Con: SimpleDB does not support joins, foreign keys and stored procedures. Porting complex applications to SimpleDB may not be feasible.
What Is Data Management?
One could argue that data management began as early as man invented written communication. Even cataloging, bookkeeping, and archiving, which are all forms of data management, are known to exist in ancient times. In recent history, the first computerized database management systems started to evolve in the 1960's when the primary data storage media were magnetic tapes. In the 1980's, Data Management also became known as Data Resource Management and Enterprise Information Management as organizations recognized corporate data as assets that must be managed. The publication of the DAMA-DMBOK Guide  this year is a major step to formalizing data management as a science and practice. Data management functions discussed in this article are based upon this guide.
The DAMA-DMBOK Guide identifies ten data management functions found in most organizations. These functions are briefly described below:
- Data Governance - planning, supervision and control over data management and use
- Data Architecture Management - as an integral part of the enterprise architecture
- Data Development - analysis, design, building, testing, deployment and maintenance
- Database Operations Management - support for structured physical data assets
- Data Security Management - ensuring privacy, confidentiality and appropriate access
- Reference & Master Data Management - managing golden versions and replicas
- Data Warehousing & Business Intelligence Management - enabling access to decision support data for reporting and analysis
- Document & Content Management - storing, protecting, indexing and enabling access to data found in unstructured sources (electronic files and physical records)
- Meta Data Management - integrating, controlling and delivering meta data
- Data Quality Management - defining, monitoring and improving data quality
Figure 1 shows the scope of each of these functions. This article focuses on the Data Development and Database Operations Management functions as they relate to DaaS. It points out similarities and differences in managing data that resides in a DaaS environment versus a non-DaaS environment as well as key implementation challenges in present day technologies.
DaaS Data Management
How does data management in a DaaS environment differ from traditional environments?
All of the data management functions shown in Figure 1 apply to DaaS but with a twist introduced by the additional layer of abstraction presented by the Service Oriented Architecture (SOA) that defines the DaaS. As with other types of SOA implementations, a DaaS provider hides the physical implementation and complexities of managing the data stores from its consumers, while at the same time providing a ubiquitous language agnostic API, such as XML, to enter, retrieve, and manipulate data.
For those familiar with relational database managements systems (RDBMS) such as Oracle and SQL Server, DaaS is analogous to RDBMS as RDBMS is to flat files. In both cases, the abstractions introduced by the newer platform simplify access and management of data but in some cases limit what you can do with the data. Limitations are usually overcome as the platform matures. For example, a DaaS query language might initially allow access to one object (table) at a time but future releases may provide two or more objects to be joined together.
Just like the logical abstractions of RDBMS hides the fact that tables are implemented as files; DaaS objects might actually be implemented as RDBMS tables behind the scene. However, a DaaS consumer does not need to know that and does not have to be concerned with the maintenance of the underlying RDBMS. Table 1 shows more examples on how DaaS differs from RDBMS using Force.com and Oracle as bases for comparison.
DaaS presents many advantages and promises. However, adopters of this new paradigm may find some new challenges, some of which are highlighted below using Force.com as an example.
In order to optimize performance and simplify data access DaaS typically limits on resource intensive queries and reports. For queries this may mean that certain join statements or outer joins are not supported, or the number of entities that can be queried is limited. Similarly, report writers may limit joins by controlling what entities are available. Approaches for dealing with joins include copying some attributes of master objects into child objects, or writing code to merge master and detail results.
Physical Database Access
DaaS by nature hides the underlying details of the physical database implementation. At this point in time, troubleshooting and performance tuning require cobbling together various tools and approaches. While a best practice is to always follow vendor recommendations carefully, empirical data can be gathered via commercial performance testing tools, custom scripts/code, manual testing and vendor profiling tools. Analyze the results carefully and consult the vendor if their best practices do not mesh with the data.
Since DaaS provides logical database services there is no standard for partitioning data. Best practice is to review vendor documentation on performance, especially for large data volumes. Approaches to data partitioning include defining tables or namespaces in lieu of partitions, creating indexed fields as partition filters, creating hierarchies and entitlements to control data visibility, or licensing multiple DaaS instances.
Backup and Recovery
While DaaS provides high performance tools for querying and exporting data, it can be difficult to perform a "database dump" that includes exporting data, Metadata and code as one operation. And once the data is "dumped" there may be no facility to rebuild a database from the database dump. Administrators used to these features with on-premise software must develop custom scripts for dumping and loading data for DaaS.
Some DaaS implementations allow the equivalent of stored procedures to support referential integrity and transaction logic. One workaround is the tried and true polling service that looks for updated records and performs the appropriate operations for inserts, deletes and updates. Regardless of the approach, pay careful attention to commit/rollback logic and error handling.
Despite some of the limitations listed above, DaaS adoption is being driven by multiple factors that speed application delivery including:
Ease of Deployment
Without the need to procure, install and configure equipment DaaS can be rapidly deployed. And since vendors have already done extensive performance tuning on logical data services there may be little need to do performance tuning or extensive data design.
Because DaaS is web-based most vendors comply with web standards, providing interoperability with desktops, servers and development tools from many vendors. Web service APIs for SOAP or REST are typically interoperable with multiple development platforms such as Adobe, Java, Mac OS X Cocoa, .NET and Visual Basic, Ruby On Rails, Perl, PHP and Python.
Simplified Database Administration
Database administrators may not need to understand SQL or APIs to configure DaaS databases. Data objects, custom fields, validation rules and data entry forms may be configured via the DaaS user interface. These are logical operations changes are usually available immediately via reports, SQL queries and web services.
Standardized Data Integration
DaaS web services provide programmatic access to data via the vendor's API. Ubiquitous support for SOAP and REST services in ETL tools, middleware and application servers facilitate integration with most platforms. And since most DaaS vendors provide text-based file import and export, batch processing or semi-manual procedures allow integration with legacy systems and new applications.
Database as a Service can be an important tool in a developer's toolbox for rapid development as well as for organizations that have more limited IT infrastructure resources. As illustrated above there are some limitations but often the overall benefits to the project outweigh them. Regardless of the approach, Database as a Service is something all Data Architects need to know and understand as we move into the next decade. It is critical to enabling the movement of enterprise applications to the cloud.
- Olle, T. William, 2006,"Nineteen Sixties History of Data Base Management", (ISBN: 978-0-397-34637-3), Springer Boston.
- DAMA International, 2009, "The DAMA Guide to the Data Management Body of Knowledge", (ISBN: 0977140083), Technics Publications, LLC.