Comments
yourfanat wrote: I am using another tool for Oracle developers - dbForge Studio for Oracle. This IDE has lots of usefull features, among them: oracle designer, code competion and formatter, query builder, debugger, profiler, erxport/import, reports and many others. The latest version supports Oracle 12C. More information here.
Cloud Expo on Google News
SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Data Landscape at Facebook By @JnanDash | @CloudExpo [#BigData]
What does the data landscape look like at Facebook with its 1.3 billion users across the globe?

Data Landscape at Facebook

What does the data landscape look like at Facebook with its 1.3 billion users across the globe? They classify small data referring to OLTP-like queries that process and retrieve a small amount of data, usually 1-1000 objects requested by their id. Indexes limit the amount of data accessed during a single query, regardless of the total volume. Big data refers to queries that process large amounts of data, usually for analysis: trouble-shooting, identifying trends, and making decisions. The total volume of data is massive for both small and big data, ranging from hundreds of terabytes to hundreds of petabytes on disk.

The heart of Facebook’s core data is TAO (The Association of Objects) – distributed data store for the social graph. The workload on this is extremely demanding. Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends — the list goes on. The high degree of output customization, combined with a high update rate of a typical user’s News Feed, makes it impossible to generate the views presented to users ahead of time. Thus, the data set must be retrieved and rendered on the fly in a few hundred milliseconds. TAO provides access to tens of petabytes of data, but answers most queries by checking a single page in a single machine. The challenge here is how to optimize for mobile devices which have intermittent connectivity and higher network latencies than most web clients.

Big data stores are the workhorses for data analysis and they grow by millions of events (inserts) per second and process tens of petabytes and hundreds of thousands of queries per day. The three data stores are: 1) ODS (Operational Data Store) has 2 billion time series of counters (used mostly in alerts and dashboards) and processes 40000 queries per second. 2) Scuba is the fast slice-and-dice data store with 100 terabytes in memory. It ingests millions of new rows per second and deletes just as many. Throughput peaks around 100 queries per second scanning 100 billion rows per second. 3) Hive is the data warehouse with 300 petabytes of data in 800,000 tables. Facebook generates 4 new petabytes of data and runs 600,000 queries and 1 million map-reduce jobs per day. Presto, HiveQL, Hadoop, and Giraph are the common query engines over Hive.

What is interesting to note here is that ODS, Scuba, and Hive are not traditional relational databases. They process data for analysis, not to serve users, so they do not need ACID guarantees for data storage and retrieval. Instead, challenge arises from high data insertion rates and massive data quantities.

Facebook represents the new generation Big Data user demanding innovative solutions not seen in the traditional database world. A big challenge is how to accommodate the seismic shift to mobile devices with their peculiar intermittent network connectivity.

No wonder, Facebook  hosted a data faculty summit last September with many known academic researchers from around the country to find answers to its future data challenges.

Read the original blog entry...

About Jnan Dash
Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

Latest Cloud Developer Stories
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just ...
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, described how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high deman...
Despite being the market leader, we recognized the need to transform and reinvent our business at Dynatrace, before someone else disrupted the market. Over the course of three years, we changed everything - our technology, our culture and our brand image. In this session we'll di...
Cloud Storage 2.0 has brought many innovations, including the availability of cloud storage services that are less expensive and much faster than previous generations of cloud storage. Cloud Storage 2.0 has also delivered new and faster methods for migrating your premises storage...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021



SYS-CON Featured Whitepapers
Most Read This Week
ADS BY GOOGLE