"The Security Group Does Not Exist" - Working with the AWS APIs at Scale

by Igor Savchenko on Jul 30, 2014 7:00:00 AM

At Scalr we provide SaaS and on-premises software to manage cloud infrastructure (and it’s open-source), so we end up making lots and lots of API calls to the AWS APIs.

What’s great about scale is that when you’re making thousands of API calls a day, events with a 0.1% probability of occurring tend to happen multiple times a day, sometimes resulting in surprise, confusion, and hair pulling! One of those low-probability events is the subject of today’s blog post.

Let’s start with a quick test, shall we? Can you spot what’s wrong with this code?

import boto.ec2
# Connect to EC2
conn = boto.ec2.connect_to_region("us-east-1")  # Credentials are in the environment!
# Create a new SG
security_group = conn.create_security_group("test-sg, "Just making a point!")
# Define SG rules
ip_rules = [("tcp", 80, ""), ("tcp", 443, "")]
# Update the SG with the rules
for protocol, port, network in ip_rules:
  security_group.authorize(protocol, port, port, network)

If you can’t: read on!

The AWS APIs Are Fundamentally Eventually Consistent

When you hit the AWS API to create a security group, or allocate an Elastic IP, the response you get usually includes an ID for the resource you just created — and when it doesn’t, you get an error message explaining what went wrong, so you can fix your API call.

Now, popular belief (and the AWS documentation) suggests that as soon as you have the resource ID, you can make further API requests against it, like adding security rules, or associating an EIP, or adding tags to an instance. And in most cases, this will work.

But the truth is that the AWS APIs are in fact eventually consistent: the mere fact that you have a resource ID does not guarantee that the underlying resource actually exists.

In practice, this means that once in a while, your API call will return an ID that you can’t use just now, because the resource doesn’t exist yet. And if you nonetheless try and use it, you’ll get an error message, like:

The security group 'sg-xxxxxxxx' does not exist

The allocation ID 'eipalloc-xxxxxxxx' does not exist

The instance ID 'i-xxxxxxxx' does not exist

Of course, when you retry the call a few minutes later — after you got to your logs — the security group is there, and so is the Elastic IP, and you’re left wondering: “Oh AWS, why can’t I use the security group I just created?”.

Read More

Topics: API, Technical, Tips, Amazon

Back To Basics: All the Wrong Reasons To Discard Cloud (And A Right One To Adopt It)

by Thomas Orozco on Jul 23, 2014 9:00:00 AM

Scalr being a Cloud Management company, we often blog about how Cloud Management can increase the value you get out of your cloud (in the IaaS sense of the term).

But how do you decide to use cloud in the first place? Our experience has been that the reasons for organizations to adopt or discard cloud are sometimes quite nebulous. This post seeks to provide clarity.

AWS is the Largest Cloud, but it isn’t the Only Cloud

Cloud infrastructure was invented by Amazon when the company launched its AWS (Amazon Web Services) offering. AWS’s value proposition was (and still is) characterized by:

  • Trading capital investments (CAPEX) for operational expenditure (OPEX), thanks to a pay-per-use billing model

  • Consolidating workloads and increasing utilization, thanks to virtualization that enables a vast range of diverse instance types

  • Providing self-service and instant access to infrastructure resources, thanks to a UI and an API that make it possible to provision instances in minutes, and thanks to a billing model based on hourly increments.

Evidently, these are tradeoffs, not straight up benefits. In fact, your organizati

on might decide not to adopt cloud specifically because you have existing capital investments (CAPEX) you intend to leverage, or because your high-performance computing (HPC) workloads can’t afford to run on virtualized hardware.

But fortunately, you don’t have to throw the baby away with the bathwater! There are options that allow you to “unbundle” this value proposition.

Read More

Topics: API, AWS, Cloud Management, Private Cloud, cloud adoption

Growing Up Fast: OpenStack Turns 4

by Sebastian Stadil on Jul 21, 2014 7:00:00 AM

OpenStack celebrates its 4th birthday today, and while the champagne corks are popping over at Openstack.org, we thought it would be another good opportunity to reflect on what we have observed in the OpenStack community in the last 4 years. We shared some of our impressions in our blog “OpenStack and the Top 5 Reasons for Private Cloud Adoption” after we returned from OpenStack Summit in Atlanta in May.

OpenStack can now proudly state that there are now more than 70 global user groups and 17,000 community members across 139 countries, spanning more than 370 organizations. What we find compelling is that they are no longer just early adopters dipping their toes in the water. While we love working with those brave enough to go where no one has gone before, it’s now exciting to see the plethora of traditional retail, media, banking, and finance companies that are adopting OpenStack. But we think OpenStack is just beginning to hit its prime.

Read More

Topics: Community, OpenStack, Cloud Platform, enterprise cloud

Top 3 IT Concerns When Adopting A Cloud Platform, And How To Approach Them

by Thomas Orozco on Jul 17, 2014 5:38:00 PM

Here at Scalr, we often work with IT organizations that are evaluating or adopting a Cloud Platform (such as AWS, Google Compute Engine, OpenStack or CloudStack), largely due to the nature of the software we are building (the Scalr Cloud Management Platform).

In our experience, IT departments that evaluate clouds are very aware of their business’s requirements for a cloud platform. They know that their cloud needs to be self-service, that it needs to be fast, that it needs to be flexible, etc.

But what about IT’s own requirements? Regardless of whether the company adopts cloud or not, IT is responsible for:

  • The security of the company’s infrastructure

  • The cost of operating said infrastructure

  • The enforcement of change management policies across said infrastructure

Oftentimes, IT departments are not sure about how to solve those problems once the company adopts cloud. In this post, we’d like to share our experience working with IT departments that have successfully identified and solved these problems.

The Underlying Problem: With Cloud, IT Is Accountable, But Giving Up Control

In a non-cloud environment, IT can enforce policies by carefully reviewing provisioning requests that are made by developers, and making sure they comply with the company’s policies.

However, when the company adopts cloud, developers gain the ability to provision resources on a self-service basis: they can use their cloud’s API to provision the resources that they need. IT is kept out of the loop, and is left unable to enforce its policies.

In turn, this means that IT must now rely on developers to:

  • Follow IT’s security policies

  • Follow IT’s cost-control policies

  • Follow IT’s change-management policies

Unfortunately — and regardless of their best intentions — developers usually fail to meet IT’s requirements, if only because they already have plenty on their plate, and because they aren’t particularly qualified to follow numerous and ever-evolving IT policies.

One solution is for IT and developers to work more efficiently together (that is, to adopt DevOps). and that’s ultimately what the organization should strive for. But regardless of how enthusiastic the organization is about DevOps, IT departments usually need a bit more control and guarantees than “let’s trust that people will do the right thing”.

Read More

Topics: Cloud Platform, Cost, Security, compliance

Cloud APIs Are An Assembly Language For Infrastructure

by Thomas Orozco on Jul 8, 2014 11:59:00 AM

What will your compiler choice be?

Cloud is a transformational technology, because it introduces instantaneous self-service provisioning where there used to be long-winded provisioning processes. Consider this:

  • Prior to cloud, developers had to make a request to IT and wait for days on end to get access to resources.

  • With cloud, developers can simply make an API call (or use a web-based frontend) and instantly get access to what they need, largely increasing their agility and ability to react to customer needs and market changes.

Of course, I’m not trying to say that cloud invented self-service provisioning. Surely, forms of self-service existed before cloud. But the truth is: cloud made self-service so pervasive that it literally changed what we mean when we say “self-service”.

Developers used to have “self-service” access to a shared MySQL database and an Apache VirtualHost or IIS environment if they were lucky. Now, they have self-service access to entire operating systems, complete with scalable multi-node templates including Node.js or Rails application stacks, software and hardware load balancers, and even Hadoop clusters.

But is that enough? Is cloud the nirvana of developer agility? Probably not.

Cloud is a Foundational Technology

Cloud is first and foremost an abstraction over hardware (mainly compute, storage, and networking). In turn, this new abstraction enables new abstractions to be created.

In fact, you can draw a parallel with how programming evolved.  

We started with assembly languages (in our example, they are the equivalent of hardware). Then, we added new languages (e.g. C) and their associated compilers, libraries and runtimes (the equivalent of cloud here). In turn, these low-level languages were used to create new higher-level languages (e.g. Java, JavaScript, Ruby, and many, many others).

High-level languages like Java offer higher-level abstractions that enable programmers to create software more efficiently than if they were using low-level languages such as C. In turn, this increases the business’s agility. Today, these languages are used to create the applications that power Google, Facebook, Twitter, and many others.

So how does that relate to cloud? Just like a number of successful companies built their business using high-level programming languages, successful companies built their cloud architecture using high-level abstractions layered on top of what cloud provided.

Netflix is the quintessential example of success with cloud. The company achieved that success by building higher-level abstractions on top of what existed in their cloud (AWS), such as Asgard and the famous Simian Army.

Read More

Topics: API, Strategy, Multi-Cloud, Opinion, Cloud Platform, AWS, Cloud Management, Amazon, enterprise cloud, Automation

Welcome to the Scalr blog!

We build a Cloud Management tool that helps businesses efficiently design and manage infrastructure across multiple clouds.

Here, we post about our experience working with the Cloud, and building Scalr. On average, we do that twice a week.

Sometimes, we'll also cover Cloud-related news.

Subscribe to Email Updates