Resource tagging can be useful, but it’s hardly an effective way to control cloud costs.
Enterprises are moving to multi-cloud in droves. Why? The key drivers most often cited by cloud adopters are speed, agility, platform flexibility, and reduced costs—or at least more predictable costs. It’s ironic then that more than half of these companies say that runaway cloud costs are their biggest postmigration pain point.
How can we get cloud budgets under control? First, we need to understand what we’re working with. But when costs are accruing from multiple teams, using multiple accounts, involving multiple products across multiple geographies on multiple cloud platforms, getting a clear picture can be a nearly impossible task. For that reason, infrastructure and operations teams often turn to cloud and cost management solutions to gain better visibility.
Resource tagging: An incomplete answer
One common way teams and cost management solutions have tried to increase visibility is through the use of tags. Tagging is essentially the process of assigning names to infrastructure (servers, databases, storage volumes, etc.) and in some cases applications or projects. Tags might include useful information like geographic region, department, environment, the purpose of the server, or even the name of the person who provisioned the server. For example, I might provision a database in the Northern Virginia region of AWS and tag it like this:
Tagging can be useful in increasing the visibility into which instances are running where and how budget is being allocated. Tags can augment the data that teams are getting from their cloud providers. One of the most commonly cited reasons IT teams implement tagging is to prevent runaway costs accrued through shadow IT. The teams create best practices and guidelines for tags to include the data they’ll need to keep track of everything going on in the environment.
But there’s an inherent problem with this approach: It neglects to consider the reason shadow IT started in the first place, which was to avoid the processes put in place by IT. Tagging can only be successful if IT can be sure that every tag is correct and follows guidelines 100 percent of the time. With teams spanning different locations and resourcing multiple private and public cloud platforms, that quickly becomes unlikely.
Here’s a case in point. Across three different teams, three nearly identical databases provisioned in the Northern Virginia region of AWS may follow IT’s guidelines but end up with wildly different names:
What’s more, tags intrinsically tie infrastructure and policies together, and that is a problem at scale. In enterprises, or in any companies that consume a lot of cloud infrastructure, resources are in a constant state of flux, shifting purposes all the time. And when teams morph, change, or combine over time, their resources do as well. Two teams may have different tagging policies, and when they merge or when resources move around, tagging conventions are often broken. In the first database example, the tags might look like this:
Evan-mysql-eats-1 (what happens to your tagging strategy when one word is misspelled?)
Central IT could solve much of the problem simply by owning all provisioning and tagging, making sure to follow policies. But that slows everything down. And again, that’s what generally causes shadow IT in the first place.
Logical grouping: A partial solution
Tagging was never meant to be used for something as important and granular as cost management. Monitoring costs per application or server doesn’t usually make much business sense anyway. Instead, enterprises might consider how to logically group applications or provisioned infrastructure into “projects” or even teams. Then projects and teams could be assigned budgets, making cost allocation and reporting much simpler and removing the reliance on tags. Developers provisioning in the cloud could associate their applications with the projects they belong to or with the cost centers they report to.
But this is only a partial solution to the problem. Even if teams could verify perfect accuracy in tagging, or could be moved to a project-based cost model, visibility into cloud costs is only a first step; it is a reactive approach to cost management and doesn’t solve the problem completely.
Consider that time you noticed an EC2 instance that was exceeding its budget. You had perfect tags in place telling you that Jose was using the instance to host a MySQL database in the Northern Virginia or East Coast region. Now what? The problem here is what we call visibility without context: You have no idea what the consequence will be of shutting down that machine. You’ll have to manually reach out to Jose (assuming you know which person that is) to find out how to proceed.
Additionally, chances are you haven’t been monitoring that machine continuously, and you only found out about the cost overrun at the end of whatever timeline you report on (week, month, quarter). By that point, you may be 30 or 90 days too late. You could be reactive about fixing the situation, but that money has already been spent, and the best you can do is try to be more diligent about monitoring. Now rinse and repeat, and brace for the next blown budget.
Proactive multi-cloud cost management
Proactive cost control measures will always be more effective at managing cloud budgets. Unfortunately, there are few solutions that will help teams do that right now. I would say best practice is to set budget policies at the project and team level and enforce those policies through automated tools. That way applications can be grouped into projects associated with team or business unit budgets. IT and finance can set cost controls for business units; business units or individual teams can set budgets for projects. These policies can serve as guardrails, ensuring that applications and projects don’t exceed an expected budget, while still giving teams the freedom to be productive through methods such as automatic self-provisioning.
With these practices in place, even on the reactive side, IT and finance teams will have greater insight into where costs come from. They can flexibly assign and re-allocate budgets, and they can adapt to changes without losing context. Additionally, you will need to use a powerful analytics engine that can look at applications and usage trends and make suggestions for teams to improve costs in advance. For instance, a cost analysis may recommend using reserved instances when they will provide significant cost savings, and suggest workload right-sizing, which teams can either allow automatically or approve manually.
By enabling a proactive cost control system and a more powerful and contextual cost analysis mechanism, you can make runaway cloud costs a thing of the past. Speed, agility, flexibility, and cost efficiency—that’s the holy grail, and the future, of multi-cloud.