From the Blogosphere
AWS Outage and High Availability | @DevOpsSummit #SDN #AWS #Monitoring
Thoughts for developing highly available cloud services or more important making decisions what our services’ SLAs should be
By: Toddy Mladenov
Mar. 11, 2017 10:00 AM
After yet another cloud outage yesterday (see AWS’s S3 outage was so bad Amazon couldn’t get into its own dashboard to warn the world) the world (or at least its North American part) once again went crazy how dangerous the cloud is and how you should go build your own data center because you know better what is good for your business.
Putting aside all the hype as well as some quite senseless social media posts about AWS SLAs, here is our thought process for developing highly available cloud services or more importantly making conscious decisions what our services’ SLAs should be.
I will base this post on my customer experience with Docker that was impacted by the outage yesterday but also walk you through the thought process for our own services. Without knowing Dockers business strategy I will speculate a bit but my goal is to walk you through the process and not define Docker’s HA strategy. For those who are not familiar what the problem with Docker was, Docker’s public repository is hosted on S3 and was not accessible during the outage.
The first thing we look at is, of course, the business impact of the service. Nothing new here! Thinking about Docker’s registry outage here are my thoughts:
With just those simple points, one can now make a conscious decision that the impact of Docker’s public repository being down is most probably high. What to do about it?
The simplest thing you can do in such a situation is to set the expectations upfront. Calculate a realistic availability SLA and publish it on your site. Unfortunately, looking at Docker Hub’s site I was not able to find one. In general, I think cloud providers bury their SLAs so deep that it is hard for customers to find them. Thus, people search on Google or Bing and start citing the first number they find (relevant or not), which makes the PR issue even worse. I would go even further – I would publish not only the 9s of my SLA but also what those 9s equate to in time, and whether this is per week, month or year. Taking, for example, the Amazon’s S3 SLA, after being down for approximately 3 hours yesterday, if we consider it annually, they are still within their 8h 45min allowed downtime.
Now that you made sure that you have a good answer to your customers, let’s think how can you make sure that you keep those SLAs intact. However, this doesn’t mean that you should go ahead and overdesign your infrastructure and spin up a multimillion project that will provide redundancy for every component of every application you manage. There were a lot of voices we’ve heard yesterday calling for you to start multi-cloud deployments immediately. You could do that but is this the right thing?
I personally like to think about this problem gradually and revisit the HA strategy on a regular basis. During those reviews, you should look at the business requirements as well as what is the next logical step to make improvements. Multi-cloud can be in your strategy long term but this is certainly much bigger undertaking than providing quick HA solution with your current provider. In yesterday’s incident, the next logical step for Docker would be to have a second copy of the repository in the US West and the ability to quickly switch to it if something happens with US East (or vice versa). This is a small incremental improvement that will make a huge difference for the customers and boost Docker’s PR because they can say: “Look! we host our repository on S3 but their outage had minimal or no impact on us. And, by the way, we know how to do this cloud stuff.” After that, you can think about multi-cloud and how to implement it.
Last, but not least your HA strategy should be also tied to your monitoring, alerting, remediation but also to your customer support strategy. Monitoring and alerting is clear – you want to know if your site or parts of it are down and take the appropriate actions as described in your remediation plan. But why, your customer support strategy? Well, if you haven’t noticed – AWS Service Dashboard was also down yesterday. The question comes up, how do you notify your customers of issues with your service if your standard channel is also down? I know that a lot of IT guys don’t think of it but Twitter turns out to be a pretty good communication tool – maybe you should think of it next time your site is down.
Developing solid HA strategy doesn’t need to be a big bang approach. As everything else, you should ask good questions, do incremental steps, fail and learn. And most importantly, take responsibilities for your decision and don’t blame the cloud for all bad things that happen with your site.
@DevOpsSummit at Cloud Expo taking place June 6-8, 2017, at Javits Center, New York City, and is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
DevOps at Cloud Expo / @ThingsExpo 2017 New York
DevOps at Cloud Expo / @ThingsExpo 2017 Silicon Valley
Download Show Prospectus ▸ Here
The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
@DevOpsSummit will expand the DevOps community, enable a wide sharing of knowledge, and educate delegates and technology providers alike. Recent research has shown that DevOps dramatically reduces development time, the amount of enterprise IT professionals put out fires, and support time generally. Time spent on infrastructure development is significantly increased, and DevOps practitioners report more software releases and higher quality. Sponsors of @DevOpsSummit will benefit from unmatched branding, profile building and lead generation opportunities through:
For more information on sponsorship, exhibit, and keynote opportunities, contact Carmen Gonzalez by email at events (at) sys-con.com, or by phone 201 802-3021.
The World's Largest "Cloud Digital Transformation" Event
@CloudExpo / @ThingsExpo 2017 New York
@CloudExpo / @ThingsExpo 2017 Silicon Valley
Full Conference Registration Gold Pass and Exhibit Hall ▸ Here
Register For @CloudExpo ▸ Here via EventBrite
Register For @ThingsExpo ▸ Here via EventBrite
Register For @DevOpsSummit ▸ Here via EventBrite
Sponsors of Cloud Expo / @ThingsExpo will benefit from unmatched branding, profile building and lead generation opportunities through:
For more information on sponsorship, exhibit, and keynote opportunities, contact Carmen Gonzalez (@GonzalezCarmen) today by email at events (at) sys-con.com, or by phone 201 802-3021.
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Track 1. FinTech
Delegates to Cloud Expo / @ThingsExpo will be able to attend 8 simultaneous, information-packed education tracks.
There are over 120 breakout sessions in all, with Keynotes, General Sessions, and Power Panels adding to three days of incredibly rich presentations and content.
Join Cloud Expo / @ThingsExpo conference chair Roger Strukhoff (@IoT2040), June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA for three days of intense Enterprise Cloud and 'Digital Transformation' discussion and focus, including Big Data's indispensable role in IoT, Smart Grids and (IIoT) Industrial Internet of Things, Wearables and Consumer IoT, as well as (new) Digital Transformation in Vertical Markets.
Financial Technology - or FinTech - Is Now Part of the @CloudExpo Program!
Accordingly, attendees at the upcoming 20th Cloud Expo / @ThingsExpo June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA will find fresh new content in a new track called FinTech, which will incorporate machine learning, artificial intelligence, deep learning, and blockchain into one track.
Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expensive intermediate processes from their businesses.
FinTech brings efficiency as well as the ability to deliver new services and a much improved customer experience throughout the global financial services industry. FinTech is a natural fit with cloud computing, as new services are quickly developed, deployed, and scaled on public, private, and hybrid clouds.
More than US$20 billion in venture capital is being invested in FinTech this year. @CloudExpo is pleased to bring you the latest FinTech developments as an integral part of our program, starting at the 20th International Cloud Expo June 6-8, 2017 in New York City and October 31 - November 2, 2017 in Silicon Valley.
The upcoming 20th International @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA announces that its Call For Papers for speaking opportunities is open.
Submit your speaking proposal today! ▸ Here
Our Top 100 Sponsors and the Leading "Digital Transformation" Companies
(ISC)2, 24Notion (Bronze Sponsor), 910Telecom, Accelertite (Gold Sponsor), Addteq, Adobe (Bronze Sponsor), Aeroybyte, Alert Logic, Anexia, AppNeta, Avere Systems, BMC Software (Silver Sponsor), Bsquare Corporation (Silver Sponsor), BZ Media (Media Sponsor), Catchpoint Systems (Silver Sponsor), CDS Global Cloud, Cemware, Chetu Inc., China Unicom, Cloud Raxak, CloudBerry (Media Sponsor), Cloudbric, Coalfire Systems, CollabNet, Inc. (Silver Sponsor), Column Technologies, Commvault (Bronze Sponsor), Connect2.me, ContentMX (Bronze Sponsor), CrowdReviews (Media Sponsor) CyberTrend (Media Sponsor), DataCenterDynamics (Media Sponsor), Delaplex, DICE (Bronze Sponsor), EastBanc Technologies, eCube Systems, Embotics, Enzu Inc., Ericsson (Gold Sponsor), FalconStor, Formation Data Systems, Fusion, Hanu Software, HGST, Inc. (Bronze Sponsor), Hitrons Solutions, IBM BlueBox, IBM Bluemix, IBM Cloud (Platinum Sponsor), IBM Cloud Data Services/Cloudant (Platinum Sponsor), IBM DevOps (Platinum Sponsor), iDevices, Industrial Internet of Things Consortium (Association Sponsor), Impinger Technologies, Interface Masters, Intel (Keynote Sponsor), Interoute (Bronze Sponsor), IQP Corporation, Isomorphic Software, Japan IoT Consortium, Kintone Corporation (Bronze Sponsor), LeaseWeb USA, LinearHub, MangoApps, MathFreeOn, Men & Mice, MobiDev, New Relic, Inc. (Bronze Sponsor), New York Times, Niagara Networks, Numerex, NVIDIA Corporation (AI Session Sponsor), Object Management Group (Association Sponsor), On The Avenue Marketing, Oracle MySQL, Peak10, Inc., Penta Security, Plasma Corporation, Pulzze Systems, Pythian (Bronze Sponsor), Cosmos, RackN, ReadyTalk (Silver Sponsor), Roma Software, Roundee.io, Secure Channels Inc., SD Times (Media Sponsor), SoftLayer (Platinum Sponsor), SoftNet Solutions, Solinea Inc., SpeedyCloud, SSLGURU LLC, StarNet, Stratoscale, Streamliner, SuperAdmins, TechTarget (Media Sponsor), TelecomReseller (Media Sponsor), Tintri (Welcome Reception Sponsor), TMCnet (Media Sponsor), Transparent Cloud Computing Consortium, Veeam, Venafi, Violin Memory, VAI Software, Zerto
About SYS-CON Media & Events
Cloud Expo®, Big Data Expo® and @ThingsExpo® are registered trademarks of Cloud Expo, Inc., a SYS-CON Events company.
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
SYS-CON Featured Whitepapers
Most Read This Week