CCM Container Costs —

One of the most technically complex projects for Cloud Cost Management

Role

Research/Design

Company/Year

Datadog/2023

Team

Front-end Eng: Jonathan Quach

Back-end Eng: Gui

Product Manager: Kayla Talor

Tech lead: Tyler

01. Project overview

What is Container Costs?

In cloud computing, container costs refer to the expenses incurred when running containerized applications, particularly for this project &emdash; Kubernetes — on different cloud vendors or platforms such as AWS, GCP, and Azure.

How to allocate container costs?

In order to understand the cost of each container, the most challenging part of a designer for this project, is to understand Kubernetes insfrastructure. Below is the breakdown of a Kubernetes cluster.

The Problems

  • Lack of understanding what is_cluster_idle & what metrics they should use in their queries and why.
  • Lack of visibility to understand the costs associated with running their applications.
  • User types

    There are two main user types: Platform Engineers and Application Developers

    1) As a Platform Engineer who manages kubernetes clusters and makes sure that they are provisioned properly. They'd want to understand costs on cluster-level detail, namespaces, and deployments.

    2) As an Application Developer, they do not care much about cluster-level detail. They specifically want to know how much teams and services are spending and if they have high workload idle costs or not so that they can take proper actions about their spend.

    Goals

  • Help users understand the basic concepts of container costs and key information about the costs for their clusters and workloads.
  • Create a clear and simple guided experience for the core users to gain deeper visibility into their container costs at both cluster and workload levels.
  • Proposed Solutions

  • Guide users through an opinionated experience by helping them understand the granularity of their container costs while also allowing them to explore container costs with the tags they care about.
  • While helping users better understand container costs, we also want to help them facilitate an optimized workflow using the Containers Cost page with the Containers App and CCM Recommendations page.
  • Opportunities

  • Facilitate workflows with other DD products by creating integration with Containers App and Service Catalog.
  • Cross-selling for Recommendations
  • 02. Design Breakdowns

    1) Kubernetes at a high level

    2) Investigation and faciliated workflows

    Kubernetes at a high level

    In this section, we want users to understand their Kubernetes at a high level by navigating through each sub-section. The description for each tile aims to help them understand what the content and information is about.

  • Breakdown of your spend on K8s: This shows the breakdown of spend on k8s, which comprises workload usage cost, workload idle cost, and cluster idle costs.
  • Kubernetes Spend on AWS Bill: This shows the breakdown of k8s spending on the entire AWS Bill that includes k8s cost and the costs of other non-computed resources such EC2, S3, RSD, etc.
  • Resource allocated to Kubernetes spend: Surfaces the key k8s resources allocated to the cost of k8s such as nodes, clusters, and aws accounts.
  • Cost Recommendations for your costs: Highlights the cost-saving opportunities for high idle workloads in k8s & helps tie CCM Recommendations into this page.
  • Dive deeper into cost investigation

    Remove barrier of forming complex queries so users can quickly and effectively get to see their containers costs.

    Worked closely with our tech lead and partnered up with graping team to create new tags in the tag pipelin & find the right graphing visual and data for costs at cluster level.

    Users can see which cluster/service etc. has the most absolute idle cost so that they can take the right action to adjust resource allocation and optimize their spend. Help users understand the breakdown of the total cost within that cluster/service: how much is usage vs. idle.

    Iterations

    We wanted to make the investigation workflows more seamless while also allowing users to have more control of drilling into granular level of container costs.

    Challenges

  • Ambiguilty with a lot of technical challenges such as no tags available in the tag pipeline to yield costs and data visualization broken into granular slices.
  • Design solutions supported for product maturity such as how design would look with expansion from Kubernetes to EC2 and for multiple clouds.
  • Facilitated and led cross product initiatives surfacing container costs workflows to ensure seamless container cost investigating experiences for users.
  • Results

  • 45% increase in the number of redirect to Recommendation Page.
  • Expanded integrated investigation workflows with other products: Containers App and Service Catalog; increasing number of visits to their products.
  • 9 intensive interviews with Chime, Mettle Bank, Akkiko, and Earnin, etc.. scored 4.5/5 in how actionable, useful, and informative the data shown on the page.
  • Helped Site Reliability Engineers and Application Developers investigate and gain visibility into their container costs without creating complex queries and calculation via number of Dashboards created.