Real-World Use Cases: Cloud Storage Workloads
What is your data workload?
Jan. 22, 2009 04:02 PM
In my previous article, "Cloud Computing Public or Private? How to Choose Cloud Storage," we covered choosing between public and private cloud storage and the appropriate data types for cloud storage. This month we will dig deeper into the workloads and file creation patterns that best fit cloud storage with a focus on private clouds. Rather than file types, the discussion will cover how files are managed and where cloud storage fits, along with a few real-world use cases.
When choosing any storage solution it's important to consider the workload and data usage patterns. This even goes beyond storage - application workloads drive server, network and all IT infrastructure decisions. Sure, most vendors will tell you that their product is the best solution for any workload, and when choices were few, that was somewhat accurate. However, today there are many different offerings, each with strengths and weaknesses in different situations. This article will review six workload scenarios and identify where cloud storage is a good fit and where it is a poor fit.
Rapidly Changing Single File Workloads
Examples of a rapidly changing single file workload would include I/O patterns of a database, source code repository, or an active spreadsheet. In this workload there is either a very powerful single server, or many users sharing a single file. In both cases, updates to a single file are constant and rapid, driving the need for a tier-one class of storage. To facilitate this workload, the system should have lots of memory; fast, hard drives; and the ability to create snapshots for instant data protection. Today this market is well served by Enterprise NAS vendors such as EMC and NetApp.
Data Ingestion Workloads
The best example of a data ingestion workload is video surveillance. Consider, for example, the city of London and its thousands of cameras, each streaming write operations to storage. Every camera creates its own set of files and needs fast access to storage. This is an excellent workload for private cloud storage. A private storage cloud has many storage nodes that can ingest streams of information independently so there is no data bottleneck. A camera-to-storage node ratio can be established, say 10 cameras per node, and then replicated out to hundreds of nodes, and enabling thousands of cameras. Since the cloud is centrally managed, a single administrator can easily manage the video surveillance storage for the entire city.
Video streaming and online video sharing are categorized as read-intensive workloads. Consider the example of the Beijing Olympics last summer. There was unbelievable demand for online video of the events, and in the U.S. the focus was on men's swimming. When the U.S. relay team won by a fraction of a second, everybody wanted to watch. Millions of people flocked to the web and video servers churned out views. This creates a unique storage demand. With thousands of web servers trying to read a single file, the architecture must support parallel reads. With hundreds of independent nodes serving out many copies of the same file, cloud storage provides the ideal solution to read intensive workloads.
High Performance Computing (HPC) Workloads
HPC workloads are similar to data ingestion workloads with one important difference - access to a single file. Rather than every client creating a unique file, hundreds or thousands of systems access a single file that is striped across many nodes for performance. This workload requires tight coordination between every node in the cluster to ensure data integrity, file locking, and cache coherence. HPC storage is used extensively in oil and gas exploration and financial data modeling where complex transactions are processed by compute clusters. There are a number of established HPC storage vendors include Panasas, Isilon and NetApp GX.
Single Producer, Many Consumer Workloads
In June 2008, the NASA Phoenix Mars Lander discovered ice crystals on the surface of Mars. The world reacted, scientists and religious organizations confirmed their unique theories about the universe, and everybody wanted access to the data. Given the challenges of landing on Mars and collecting soil samples, it's safe to say this is an example of a write once, consume many workload. Other examples include genomic sequence findings and quarterly business results. All share a single creation event with demand for multiple points of read access. Cloud storage protects data by replicating files to one or more nodes. This same activity can create many access points, enabling a single creation event to be easily shared amongst many consumers.
Archive or Content Depot Workloads
In most cases as data ages it becomes less active. Whether it is corporate information or media content, it is important that this data be kept available, but at a cost relative to its value. Private cloud storage economics and scale capabilities are designed to address this use case. Data can be copied to the cloud to free up more expensive tier-one storage devices and delay costly infrastructure upgrades. Cloud storage can be expanded on demand using the latest (or oldest) commodity hardware and a few simple mouse clicks. When it comes time to retire cloud hardware, it can be removed without downtime, preserving access and enabling 50 year archives.
What Is Your Data Workload?
When considering storage choices, ignore the "we can do everything" vendors and think about your workload. Once you understand your requirements and how the data will be used, your answer will emerge.