The Promise of FinOps

The Promise of FinOps

Cloudability’s Cloud Economic Summit put the spotlight on the importance of accountability and cloud cost management.

Our partner, Cloudability, recently hosted the Cloud Economic Summit in San Francisco, providing a look into the current and future state of cloud cost management. Cloudability CEO Mat Ellis, CTO Erik Onnen, Co-founder J.R. Storment, 451 Research Director Owen Rogers, and AWS Worldwide Business Development lead Keith Jarrett, presented alongside speakers from Autodesk and OLX Group, addressing the need for FinOps – a disciplined approach to managing cloud costs. Supporting the event, Cloudability published a press release, “FinOps Operating Model Codifies Best Practices of the World’s Largest Cloud Spenders, Enabling Enterprises to Bring Financial Accountability to the Variable Spend of Cloud.”

“Celebrate achievement, get better every day – this is FinOps.”

—Mat Ellis, CEO of Cloudability
Cloudability Logo

Introducing the day, Mat set the stage that public cloud adoption is part of a much bigger trend seen in many industries throughout history – managing a supply chain. Milestone innovations disrupt at an astronomical scale; from the printing press, to rubber, to the internet, and now cloud computing. We’ve all felt the disruption created by cloud computing and many of us have been part of the 21st Century IT revolution. As seen at AWS re:Invent last year, the adoption of DevOps culture to foster innovation and enable competitive advantage has been embraced by large insurance organizations like Guardian and the world-famous guitar manufacturer Fender.

However, with AWS now 13 years old, many cloud technology buying decisions are still based on an outdated model. There is a need for iterative, ongoing monitoring and accounting for cloud spend. Enter Cloudability. Analyzing hundreds of millions in cloud spend per month, and billions per year, Cloudability’s platform delivers keen insights and benchmarking tools that enable a clear path to cloud cost diligence and FinOps success.

Cost Management in The Cloud Age

Digging into the data behind the mission of FinOps, Owen Rogers, Research Director, 451 Research presented some stark realities about the current state of cloud cost management (Full report available here). The study found that more than half of large enterprises worry about cloud costs on a daily basis and 80% believe that poor cloud financial management has a negative impact on their business. These enterprises need a comprehensive platform to manage multi-million-dollar cloud budgets.

Owen Rogers, Research Director, 451 Research presented some stark realities about the current state of cloud cost management

Another eye-opening data point presented was that 85% of respondents overspend their budgets, with nearly 10% spending two to four times their allocated budget. Pair this with 18% of respondents that were unaware they were overspending, and the picture is not pretty. The biggest reasons cited for not addressing this issue were “too small of an overspend to resolve” and “not wanting to hinder innovation.”

While well-intentioned, the study showed that “not wanting to hinder innovation” and pushing off a responsible approach to cloud cost management does exactly what the respondents are trying to avoid: halts cloud adoption, cripples innovation, lowers the quality of service, increases cost, and creates a sprawling underutilized cloud footprint.

Cloud Cost Management Directly Impacts Company Culture and Business Bottom Line

The reality is that cloud cost management directly impacts business. Thankfully, there are steps to take to mitigate the commonplace inefficiencies identified by Owen. For example, 33% of respondents are manually extracting and aggregating cloud costs in a spreadsheet – this is the epitome of anti-agile. Only 52% of instances are rightsized for their workload and, beyond that, only 52% of respondents are taking advantage of Reserved Instance discounts.

The tools and opportunities to improve the health and efficiency of your cloud environments are readily available. In fact, the 451 Research report shows average savings of 27% were achieved through the use of a cost management platform. With an expected CAGR of 17% between 2017-2022, now is the time to implement the behavioral changes that instill a culture of FinOps within your organization.

The problem is shared accountability – The solution is a FinOps culture

What became apparent in the research presented by Owen Rogers is a distinct need for IT and Finance teams to come to the table together to discuss the path forward. The good news is there are companies that are pushing the envelope and leading the way in diligent and responsible cloud cost management. Those who have embraced a FinOps culture are utilizing performance benchmarking and have a clear understanding of the fully-loaded costs of their cloud infrastructure. This is the promise that we can aspire to and it starts with collaboration between IT, finance, and individual lines of business.

There is a distinct need for IT and Finance teams to come together to discuss the path forward

FinOps high performers have near real-time visibility of all cloud spend. Individual teams understand their portion of total spend, are enabled to budget and track against targets, and utilize Reserved Instances for 80-95% of their cloud services.

Similar to having a clear understanding of household finances, this level of diligence affords more benefits than just cost savings. A remarkable side effect of FinOps culture is a 10-40% improvement in operational efficiency within your organization.

FinOps Foundation

In addition to the information presented at the Cloud Economic Summit, Cloudability launched the FinOps Foundation. Comprised of founding members from Atlassian, Nationwide, Spotify, Autodesk, letgo, and many others, the FinOps Foundation is a non-profit trade organization bringing people together to create best practices around cloud spend.

J.R. Storment, Cloudability Co-founder, takes on the role of President of The FinOps Foundation. J.R. describes the need for the organization here.

“…Why is the Foundation needed? At many companies I talk with, engineering teams spend more than needed with little understanding of cost efficiency.”

J.R. Storment, Cloudability

We are excited to see our partner defining this space and eager to participate in the FinOps Foundation. We are also looking forward to reading “Cloud Financial Management Strategies, Creating a Culture of FinOps,” their O’Reilly Media book which is slated to be published later this year.

Thanks again to Cloudability for hosting us at the event, we are looking forward to an exciting year together.

Robb Allen is the CEO of Effectual, Inc.

When Best Efforts Aren’t Good Enough

When Best Efforts Aren’t Good Enough

“Have you tried rebooting it?”

There was a time, not so long ago, when that was the first question a technician would ask when attempting to resolve an issue with a PC or a server that evolved from PCs. This was not limited to servers; IT appliances, network equipment, and other computing devices could all be expected to behave oddly if not regularly rebooted. As enterprise IT departments matured, reboot schedules were developed for equipment as a part of routine preventative maintenance. Initially, IT departments developed policies, procedures, and redundant architectures to minimize the impact of regular reboots on clients. Hardware and O/S manufacturers did their part by addressing most of the issues that caused the need for these reboots, and the practice has gradually faded from memory. While the practice of routine reboots is mostly gone, the architectures, metrics, and SLAs remain.

Five Nines (or 99.999%) availability SLAs became the gold standard for infrastructure and is assumed in most environments today. As business applications have become more complex, integrated, and distributed, the availability of individual systems supporting them has become increasingly critical. Fault tolerance in application development is not trivial, and in application integration efforts it is orders of magnitude more difficult, particularly when the source code is not available to the team performing the integration. These complex systems are fragile and will behave in unpredictable ways if not shut down and restarted in an orderly fashion. If a single server supporting a piece of a large distributed application fails, it can cause system or data corruption that will take significant time to resolve, impacting client access to applications. The fragile nature of applications makes Five Nines architectures very important. Today, applications hosted in data centers rely on infrastructure and operating systems that are rock solid, never failing, and reliable to a Five Nines standard or better.

As we look at cloud, it’s easy to believe that there is an equivalency between a host in your data center and an instance in the cloud. While the specifications look similar, critical differences exist that often get overlooked. For example, instances in the cloud (as well as all other cloud services) have a significantly lower SLA standard than we are used to, some are even provided on a Best Effort basis. It’s easy to understand why this important difference is missed – the hardware and operating systems we currently place in data centers are designed to meet Five Nines standards, so it is assumed, and nobody asks about it anymore. Cloud-hosted services are designed to resemble systems we deploy to our data centers, and although the various cloud providers out there are clear and honest about their SLAs, they don’t exactly trumpet the difference between traditionally accepted SLAs and those they offer from their rooftops.

A Best Efforts SLA essentially boils down to your vendor promising to do whatever they are willing to do to make your systems available to you. There is no guarantee of uptime, availability or durability of systems, and if a system goes down, you have little or no legal recourse. Of course, it is in the interest of the vendor and their reputation to restore systems as quickly as possible, but they (not you) determine how the outage will be addressed, and how resources will be applied to resolve issues. For example, if the vendor decides that their most senior technicians should not be redirected from other priorities to address the outage, you’ll have more junior technicians handling the issue, who may potentially take longer to resolve it – a situation which is in your vendor’s self-determined best interest, not yours.

There are several instances where a cloud provider will provide an SLA better than the default of Best Efforts. An example of this is AWS S3, where Amazon is proud of their Eleven Nines of data durability. Don’t be confused by this, it is a promise that your data stored there won’t be lost, but not a promise that you’ll be able to access it whenever you want. You can find available SLAs for several AWS services, but none of them exceed Four Nines. This represents effectively 10x the potential outage time over Five Nines and applies only to the services provided by the cloud provider, not the infrastructure you use to connect to them or your applications which run on top of them. The nature of a cloud service outage is also different than one that happens in a data center. In your data center, catastrophic all-encompassing outages are rare, and your technicians will typically still have access to systems and data while your users do not. They can work on both restoring services and “Plan B” approaches concurrently. When systems fail in the cloud, oftentimes there is no access for technicians, and the work restoring services cannot begin until the cloud provider has restored access. This typically leads to more application downtime. Additionally, when systems go down in your data center, your teams can typically provide an ETA for restoration and status updates along the way. Cloud providers are notorious for not offering status updates while systems are down, and in some cases, the systems they use to report failures and provide status updates rely on the failed systems themselves – meaning you’ll get no information regarding the outage until it is resolved. Admittedly, these types of events are rare, but the possibility should still give you pause. So, you’ve decided to move your systems to the cloud, and now you’re wondering how you are going to deal with the inevitable outages. There are really only a few options available to you; first, you can do nothing and hope for the best. For some business applications, this may be the optimal (although most risky) path. Second, you can design your cloud infrastructure like your data centers have been designed for years. My last two posts explored how expensive this path is, and depending on how you design, it may not offer you the availability that you desire anyway. Third, you can implement cloud infrastructure automation and develop auto-scaling/healing designs that identify outages as they happen and often respond before your team is even aware of a problem. This option is more cost-effective than the second option, but it requires significant upfront capital and its effectiveness requires people well-versed in deploying this type of solution – people who are in high demand and hard to find right now. Finally, the ideal way to handle this challenge is to rewrite applications software to be cloud-native – modular, fault-tolerant applications that are infrastructure aware, able to self-deploy and self-re-deploy through CI/CD patterns and embedded infrastructure as code. For most enterprise applications this would be a herculean effort and a bridge too far. Over the past several decades, as we’ve made progress in IT towards total availability of services, you’ve come to rely on, take comfort in, and expect your applications/business features to be available all the time. Without proper thought, planning, and an understanding of the revolutionary nature of cloud-hosted infrastructure, that availability is likely to take a step backward. Don’t be like so many others and pay a premium for lower uptime. Be aware that there are hazards out there and bring in experienced people to help you identify the risks and mitigate them. You’re looking for people who view your moves toward the cloud as a business effort, not merely a technical one. Understand the challenges that lie ahead, make informed decisions regarding the future of your cloud estate, and above all, Cloud ConfidentlyTM!

Don’t Take Availability for Granted
Over the past several decades, as we’ve made progress in IT towards total availability of services, you’ve come to rely on, take comfort in, and expect your applications/business features to be available all the time. Without proper thought, planning, and an understanding of the revolutionary nature of cloud hosted infrastructure, that availability is likely to take a step backward. Bring in experienced people to help you identify the risks and mitigate them.

Cloud: The Mirage of Massive Cost Savings (Ketchup on the side)

Cloud: The Mirage of Massive Cost Savings (Ketchup on the side)

“Why are you moving to the cloud?” is a question I’ve asked more times than I can count. It’s one of the first questions posed to a potential client, for multiple reasons.

The two most important reasons are; first, I want to get a little insight into how thoughtful/educated this potential client is in relation cloud, and second, I want to understand what metrics will be used to determine success and or failure of the project we are considering undertaking. Potential clients respond to this question in various ways, but almost always, one of their first answers is around saving money and/or cutting costs. When I hear this response, I ask a couple of follow-up questions to clarify how they plan on accomplishing this ambiguous goal. More often than not, they have no idea how they will recognize cost savings and many just expect it to be a natural benefit of moving their VMs to the cloud. This near universal acceptance of a broad notion, with little factual basis, reminds me of the story of ketchup. In the mid 19thcentury, a doctor took the ketchup of the time (which was basically fermented mushroom sauce or ground up fish innards –further reading on this if you are so inclined ) and added tomatoes to it. He made some somewhat dubious claims regarding the maladies that could be cured by his new ketchup, which were picked up by the press. By the later part of the 19thcentury, with the help of unscrupulous hucksters along the way, nearly everyone believed that ketchup cured all ills. While ketchup does have some definite health benefits and is a very tasty condiment; it’s rich in Vitamin C and anti-oxidants; a cure-all it is most definitely not.

The truth is, simple cloud migration, even when instances are right sized and Reserved Instances (RIs) are purchased, is unlikely to produce significant cost efficiency for infrastructure that isn’t properly architected to take advantage of cloud services. In my last post I shared an example of two different cloud deployment strategies for a sample application. The five-year total operating costs were roughly $350k for one strategy and $14k for the other. The difference; to recognize the operational cost savings of $336k over 5 years, an enterprise would need to spend roughly $100k and several months of effort upfront. Enterprises are wary of the upfront costs, 125% of the expensive model’s projected first year operating costs, or 571% of those first quarter operating costs, so they make a short-term financial decision to proceed with the $350k option. More often than not, this decision is made in the IT department, based on their limited budget visibility, not at the executive level where greater budgetary visibility and enterprise strategy is handled. Another dirty little secret about cloud cost management that doesn’t often percolate up to executive levels is that the flexibility of cloud allows your IT teams to immediately spin up services that generate significant cost with little or no financial oversight until the invoice comes due. For example, common compute instances at AWS can cost from $6 – $10 per hour with specialized services available through the marketplace costing several times that. An otherwise well-meaning IT employee (with no purchasing authority) could spin up a single $8/hour resource with no oversight, which by the time the bill has been received could total $6k-$7k in charges. While this situation would likely be recognized and addressed at the time the invoice was reviewed, dozens and dozens of smaller instances could take years of invoice cycles to clear up and over time have a much greater but less initially obvious impact. I have been involved in several remediation efforts for clients’ IT departments that were spending $500k+ annually in un-accounted for cloud services.

Cross-functional governance

The most successful way I’ve seen enterprises address this new requirement is to create a cross-functional governance committee that includes representation from finance, IT and core business units, with the charter of managing cloud related costs.

Contrast this with the IT Provisioning Model where costs were governed prior to the time of purchase. When an IT department needed additional infrastructure, a Capital Expenditure process is/was in place that required finance approval. Budgets and expenditures were relatively easily managed, and without proper authority, individual purchasing power was limited. In the revolutionary world of 21stCentury IT, we need revolutionary methods of governance. The most successful way I’ve seen enterprises address this new requirement is to create a cross-functional governance committee that includes representation from finance, IT and core business units, with the charter of managing cloud related costs. In enterprises that have Cloud Steering or Governance Committees or a Cloud Center of Excellence, this cross functional group works under their direction. My good friends at Cloudability, who have developed what is probably the most comprehensive cloud cost reporting and management toolset in the industry, refer to this committee as the Cloud Financial Office (CFO – I believe the pun is intended). This committee evaluates the needs of the business, the reporting and cost management/containment requirements of finance, and the operational/support requirements of IT, to determine the best approach for meeting all three stakeholders. They develop strategy, policies and procedures for IT, finance, and business that lead to a deployed cloud infrastructure that is manageable from a cost perspective. As I mentioned above, there are tools that support this mission, but without the insights of the entire committee to interpret and act on the data, you will not recognize the value of the tools or succeed in being cost efficient with the capital you spend on cloud-based infrastructure. Tools are not a silver bullet. Just like there’s a nugget of truth underlying the health benefits of ketchup, when thoughtfully planned, considered and executed, you can recognize significant IT cost reductions as well as several other powerful benefits from transforming your infrastructure to the cloud. On the other hand, just like drinking a bottle of ketchup a day won’t cure or prevent any maladies, ignoring the revolutionary nature of the cloud and how your enterprise must adapt in order to “Cloud ConfidentlyTM”, won’t lead to any promised savings. It will likely result in higher costs for fewer benefits than you enjoy now. As you approach cloud adoption, remember, not everyone making claims of free and instant IT savings has your particular best interests at heart. Many of them, much like the 19thcentury ketchup hucksters, benefit handsomely as you overspend in the cloud. It’s the 21stCentury, don’t drink the ketchup, Cloud ConfidentlyTM!