14
14
Table of Contents

In recent years, GCP  has made strong efforts to compete with AWS, steadily rising in popularity. Efforts such as offering generous credits to attract new users are working well in its favour, as its market share now stands at 12% of the cloud computing market in 2025. It is an appealing choice for organizations already invested in the Google ecosystem.

However, GCP still holds the third spot in terms of cloud market share. AWS benefits from a mature ecosystem that covers almost every use case—ranging from performance optimization and cloud cost optimization to partners who guide customers through discount programs and more.

As GCP adoption continues, so will the challenges common to all cloud infrastructures—bill shocks, runaway costs, and unoptimized cloud spend—yet the resources for addressing these issues in GCP remain limited.

And that’s precisely why, in this edition of Ask the Cloud Expert, we’ll explore the lesser-discussed aspects of GCP cost optimization, drawing from real-world experiences and hands-on expertise shared by our expert.

Today’s Expert: Puneet Malhotra

Puneet Malhotra is a Senior Manager at CloudKeeper, responsible for all of CloudKeeper’s GCP clients as well as CloudKeeper’s partnership with GCP. A seasoned expert with extensive GCP experience across geographies and having delivered value to multiple clients, he’s the best person to walk you through the nitty-gritties of extracting the full potential out of your GCP investment.

So, let’s get started! 

Q1. Despite incentive campaigns and promotions, why is GCP adoption still relatively low compared to AWS?

There are quite a few reasons and a combination of factors due to which Google Cloud Platform’s (GCP) acceptance, while rising, is still behind AWS and Azure. The situation is fairly complicated.

a) Fear of Pulling Out of GCP: Google has a history of shutting down several major projects — such as Google Reader, Google Hangouts, Google App Engine Standard for Python 2, Google Stadia, and Firebase Dynamic Links. This created uncertainty for enterprises. Companies like Snap and Evernote had to re-architect parts of their stack when Google deprecated certain APIs or products.

This fear was further exacerbated in January 2024 when GCP announced the removal of cloud egress fees. While seen as a positive move, it also triggered apprehension that Google might again shift strategies abruptly.

b) Lack of SES Equivalent and Missing Latency-Based DNS: AWS offers Simple Email Service (SES), a highly scalable email sending and receiving service used for transactional emails, notifications, and marketing campaigns.

GCP lacks a native equivalent, leaving developers to rely on third-party integrations for this critical workload. Similarly, GCP does not yet have a latency-based DNS routing service

c) Late to the Game: By the time GCP seriously picked up pace against competitors, most large enterprises had already locked in their workloads with AWS or Azure. This “first-mover advantage” gap meant that Google had to fight harder to win enterprise trust.

d) Perception of Enterprise Readiness: For a long time, GCP was perceived as more developer/startup-focused, especially strong in data analytics (BigQuery, ML, Kubernetes), but enterprises, however, preferred AWS (and later Azure) because of the breadth of enterprise-ready services, mature compliance/certification coverage, and extensive enterprise sales and support.

However, GCP has worked on these concerns and has carried out many steps to boost consumer confidence, such as

a) Renewed Focus on GCP: Google has significantly increased its investment and focus on Google Cloud Platform.

b) AI & ML Innovation:

Introduction of Tensor Processing Units (TPUs) — custom hardware designed specifically for AI and ML workloads.

Aggressively competing with AWS and artificial intelligence.

Vertex AI platform enables end-to-end ML model training and deployment at a lower cost than AWS SageMaker, while delivering comparable performance.

c) Customer-Centric Investments:

  • Free training engagements.
  • Proactive technical and non-technical support.
  • Dedicated account managers for enterprises.

Market Outlook (2025)

By 2025, most earlier concerns will have been largely addressed, with GCP being widely adopted both as a primary cloud infrastructure and in hybrid settings alongside AWS and Azure.

Q2. What are the common causes of GCP cloud cost runaways for organizations?

The most common causes aren't usually one big mistake, but a combination of unchecked automation, lack of cloud infrastructure governance practices, and simple waste piling up. These are the common ones I’ve seen our customers often trying to rectify :

a) Unmonitored Autoscaling: This is the #1 runaway trigger. Autoscaling is amazing until a misconfigured policy or a code bug creates an infinite loop, spinning up thousands of preemptible VMs or instances that you don't notice until the bill arrives. You need to follow the recommended Autoscaling best practices to realise its potential fully.

b) Orphaned and Idle Resources:

  • Unattached Persistent Disks: Storage for deleted VMs that you're still paying for.
  • Idle VMs: Development or test instances left running 24/7 primarily due to fear of infrastructure crashing in case it is turned off, because no one knows what it does.
  • Old Snapshots & Images: Forgotten backups that accumulate storage costs over months.

c) Lack of Commitment Discounts: Without Committed Use Discounts (CUDs) or Sustained Use Discounts, you're leaving 30-70% in savings on the table.

d) Over-Provisioning ("Right-Sizing" Failure): Developers often overestimate needs, provisioning 8-CPU machines for workloads that barely use 2. You pay for the entire oversized instance, 24/7.

e) Poor Resource Hierarchy & Tagging: If you can't tell which project, team, or product is generating a cost, you can't hold anyone accountable. A flat structure with no labels or tags makes cost allocation and shutdown protocols impossible.

f) Spike in Data Processing or Egress: A sudden, large data analytics job (e.g., a misconfigured BigQuery query) or a spike in data transferred out of Google Cloud (egress) can add thousands of dollars in a single day.

The root cause of most of these is a lack of cloud cost visibility into the GCP infrastructure and governance. Without budget alerts, cost dashboards per team, or policies to delete temporary resources, these small oversights can add thousands of dollars to your cloud spend.

Q3. What role can AI and automation tools play in GCP cost optimization?

AI and Automation are instrumental if you’re looking to optimize your GCP setup. Especially considering the size of infrastructure in the context of services and instances, a cloud engineer can't be at the dashboard observing the GCP cloud round the clock.

Primarily, there are three use cases where AI-enabled automation tools have taken over:

  1. Right-sizing of Compute Engine resources: It is not feasible to always be correct and accurately choose the best-suited VM type for the workload, but AI tools automatically do that for you, selecting the most apt VM (for example, recommending an e2-standard-4 instead of an n2-standard-8 when the workload does not fully utilize the larger machine).
  2. Autoscaling: Spinning up and down cloud resources by assessing and predicting demand ensures maximum utilization of your resources when there is workload, and turning them off when not needed—thus saving cost while not impacting performance.
  3. Spot VM provisioning automation: This has largely been taken over by AI and automation tools since it is not possible for a human to select and provision a Spot VM within seconds of availability. This use case is one of the most prevalent in GCP cost optimization.

CloudKeeper Lens is one of the few tools in the market that empowers users with the ability to comprehensively monitor their GCP infrastructure. And CloudKeeper AZ is a well-rounded solution for reducing GCP spend.
We’ve perfected these offerings after continuous feedback from customers, and they continue to deliver tangible value.

Q4. How important is ongoing optimization for organizations seeking to reduce their GCP cloud bill, and what is the best approach for implementing it?

The best approach to ongoing optimization is a two-layered strategy:

  1. Foundational Optimization: Start by aligning your infrastructure with Google’s Cloud Architecture Framework, ensuring that your environment follows best practices around performance, reliability, security, and cost efficiency. This establishes a strong baseline that prevents recurring inefficiencies.
  2. Automation and AI-driven Tools: Use automation to handle day-to-day adjustments. Tools like Google’s own Recommender and Active Assist can right-size instances, automate autoscaling, and provision Spot VMs more effectively than manual intervention. These ensure continuous optimization without requiring engineers to monitor dashboards around the clock.

By combining architectural best practices with intelligent automation, organizations can move from one-time cost-cutting exercises to a culture of sustained cost efficiency, achieving long-term savings while maintaining performance.

Q5. Are there any discount programs or special pricing plans that Google offers to its GCP customers?

Yes, Google offers many discount plans that are driven by key factors such as the duration of commitment, spend on particular services and instances, as well as similar such contracts:

Sustained Use Discounts and Committed Use Discounts are the two discount programs GCP offers. However, before you enter into these pricing models, there are considerations you need to make:

a)Committed Use Discounts: In exchange for making either a spend-based commitment or a resource-usage-based commitment, you get up to 55% off for most resources like machine types or GPUs, and up to 70% off for memory-optimized machine types. There are two variants of Google’s CUD. Those are:

  • Resource-based Commitment Usage Discount: As the name suggests, you get discounted pricing — up to about 55% (or up to 70% for memory-optimized machines) — on Compute Engine resources, but only for a predetermined geography. Also, the discount applies throughout the billing account
  • Spend-based Committed Use Discounts: This is analogous to AWS Savings Plans. Spend-based discount requires you to commit to a minimum dollar amount spent on GCP resources. It’s best if you have a steady and predictable workload. After the commitment period, regular on-demand pricing applies. Discounts for spend-based CUDs range from roughly 28% for a 1-year commitment to 46% for a 3-year commitment

Drawbacks of Committed Use Discounts

  1.  Lock-in: CUDs are generally for a period of 1 year to 3 years, and for maximum discount, you typically make upfront or committed payments. Thus, if there is a reduction in workload, you may end up paying significantly more than the savings you received—defeating the purpose of the discount.

b) Sustained Use Discounts: Sustained Use Discounts are applied automatically to your billing account at the end of the billing cycle—the more a particular resource is used during a month, the higher the discount. The maximum discount is up to 30% depending on the machine type. 

It is essential that you first gauge your workload, then go in for these discount plans.

c) Discounted Pricing Agreements for Startups and Enterprises: Similar to AWS PPAs, Google also enters into private contracts with enterprises with considerable cloud spend. For startups, Google offers credits if there's strong growth potential. However, these agreements are generally negotiated behind closed doors, and the terms—including discount percentages—are not publicly disclosed.

These are the top discount programs and pricing models through which customers can save on their cloud spend instead of spending multiple times more by provisioning instances on demand. 

Q6. For an organization just beginning its GCP cost optimization journey, what quick-win strategies or action plans would you recommend?

For an organization just starting, I’d recommend a mix of immediate quick wins and a foundational strategy:

Quick Wins in first 48 Hours:

  1. Enable Recommender API: Turn on and review the Idle VM Recommender and Right-Sizing Recommender in the console. It gives you instant, actionable shutdown and downsizing suggestions.
  2. Create Budget Alerts: Set up Google Cloud billing alerts at 50%, 90%, and 100% of your forecasted budget to prevent surprise bills.
  3. Delete Unattached Disks: These are pure waste. Run a quick query in the console to find and delete them.
  4. Review Sustained Use Discounts: Check your report to see which discounts are being applied automatically. This identifies your steady-state workloads for future commitments.

However, these were quick wins. For ongoing optimization efforts, it is necessary to make corrections at a foundational level—specifically, by sorting the architecture of your GCP infrastructure.

And that foundation can be established by aligning your GCP setup with Google’s Cloud Architecture Framework (formerly called the Well-Architected Framework). The Cloud Architecture Framework provides a comprehensive set of recommendations along with actionable steps to implement them, with the end goal of optimizing performance, security, reliability, and cost efficiency in Google Cloud.

Reorganizing your Google Cloud is essential for long-term cost reduction. Since it forms the foundation, all other cost optimization gains remain temporary without it. The Architecture Framework review document helps cloud architects, developers, and admins design robust architectures and simplify the administration of Google Cloud resources.

Q7. What are the top strategies for reducing GCP costs? 

A lower GCP spend is what every organization strives for. However, it’s essential not to overstep and cut costs on resources you actually need. Think of cost optimization as maximizing ROI per dollar spent on cloud, rather than simply “cutting down cloud spend.”
Here are some tried-and-tested strategies  to follow for managing your GCP bill:

1. Use Spot VMs for non-critical workloads

Compared to the on-demand pricing of GCP instances, you can save up to 91% with Spot VMs, depending on configurations and region. The caveat, however, is that Google can reclaim these resources with just 30 seconds’ notice. To mitigate this, make sure you save snapshots so tasks can be resumed when the instance becomes available again.

Additionally, creating instance groups of Spot VMs increases your chances of securing the required machines for your workload.

2. Rightsize your VMs

Rightsizing is one of the most critical steps in optimizing cloud spend, but it requires careful planning. Rushing this process can result in excessive downsizing, leading to performance bottlenecks and application crashes.

Key considerations when rightsizing VMs include:

  • Match workload to resources: Define CPU count, memory allocation, storage size, and network bandwidth based on workload needs. For example, a high-performance analytics workload may require more memory and compute, whereas a web server may need lighter resources.
  • Avoid unnecessary premium SSDs: SSD storage is significantly more expensive than HDDs. Use premium SSDs only when workloads require fast or frequent data transfers.

3. Enforce a standardized tagging policy

Establish a clear, organization-wide tagging policy for all GCP resources (e.g., by project, environment, or department). Standardized tagging improves accountability and visibility, making it easier to track resource usage and enforce proactive cost optimization strategies.

4. Leverage GCP recommendations

Like other cloud providers, GCP regularly publishes recommendations, best practices, and updates on new instance types or services. Staying on top of these ensures you adopt the latest cost-saving measures and configurations.

5. Use Autoscaling: Rather than having instances at maximum capacity all the time, use autoscaling to scale according to actual traffic. GCP's load balancers maintain performance while helping with cost savings—another two key pillars of GCP Optimization.

By implementing these strategies, you can achieve sustainable cost reductions on your GCP infrastructure while ensuring performance and reliability remain intact.

Q8. What are the common mistakes organizations make when starting their GCP cost optimization journey?

The biggest mistakes stem from a reactive, overly aggressive mindset that alienates engineers and misses the bigger picture.

  1. Focusing Only on Unit Cost: Slashing spend without context. For example, forcing a service to use a smaller machine type that then crashes under load, costing more in lost revenue than it saved.
  2. Treating It as a One-Time Project: Thinking of cost optimization as a "cleanup" task you do once. It's an ongoing cultural practice (FinOps) that needs continuous monitoring and adjustment.
  3. Ignoring Commitment-Based Discounts: Staying on pure on-demand pricing for predictable workloads is the most common and expensive mistake. It's like refusing to use a subscription for a service you use daily.
  4. "Set and Forget" Policies: Creating autoscaling rules or budget alerts and never reviewing them. Usage patterns change, and your policies need to evolve with them.

Not Tagging Resources from Day One: Launching resources without labels or tags makes it impossible to answer the question, "Who owns this cost?" This single failure cripples accountability.

The core mistake is prioritizing cost-cutting over cost intelligence. The goal isn't just to reduce the bill; it's to understand why the bill is what it is and spend smarter.

Q9. What are the key cost metrics every GCP user should track to manage and save costs effectively?

You can't manage what you don't measure. These metrics move you from guessing to knowing.

  1. Total Cost (Monthly & YTD): Your absolute baseline. Track it in the Billing Reports to understand overall trends and the impact of your changes.
  2. Cost per Project or Product: Use labels to break down your bill. This tells you which parts of the business are driving spend and enables accountability.
  3. Committed Use Discount (CUD) Coverage: The percentage of your eligible compute spend covered by commitments. A low percentage (<50%) means you're leaving significant savings on the table.
  4. Idle Resource Cost: The amount spent on resources that are powered on but doing no work (e.g., VMs with <5% CPU utilization). This is pure, uncontroversial waste.
  5. Data Egress Costs: Often a hidden killer. Monitor costs for data transferred out of GCP to the internet or other regions, as these can spike unexpectedly.

Track these in Google Cloud's Billing Reports and set up Budget Alerts on them to catch anomalies before they become catastrophes.

To Sum Up

Optimizing a GCP infrastructure can be more challenging than optimizing with other cloud providers, such as AWS, mainly due to the smaller number of resources, fewer third-party optimization service providers, and the overall lack of awareness of the GCP ecosystem.

However, the fundamentals remain the same: right-sizing, visibility, and the use of discount plans. While these fundamentals are consistent across the cloud ecosystem, it is essential to have nuanced, hands-on knowledge of GCP and its ecosystem to get the most out of any optimization effort.

Make the Most of Your GCP Infrastructure with CloudKeeper

CloudKeeper is your end-to-end solutions partner for maximizing your ROI in GCP. We bring multiple competencies and partner programs, along with 60+ highly skilled and certified practitioners and consultants who will guide you through every step of your Google Cloud journey.

These are the Google Cloud services CloudKeeper offers:

  • End-to-End GCP Consulting: Our team provides strategic guidance and actionable insights to help you achieve your business objectives with a scalable GCP infrastructure setup.
  • Deployment and Migration Assistance: Shifting from another provider to GCP? Our team will help you efficiently deploy and migrate your workloads with minimal disruption while ensuring maximum performance.
  • Comprehensive Wellness Reviews: We thoroughly evaluate your current setup, identify areas for improvement, and provide recommendations to enhance efficiency, security, and cost optimization.
  • Best-in-class GCP visibility tool that offers all-rounded insights: CloudKeeper Lens is one of the few tools in the industry that provides comprehensive visibility into your GCP infrastructure. It delivers resource-level cost visibility and a unified view of multiple accounts—all without requiring access to your GCP account!

Get in touch with our GCP experts today!

12
Let's discuss your cloud challenges and see how CloudKeeper can solve them all!
Meet the Author
  • CK

    Team CloudKeeper is a collective of certified cloud experts with a passion for empowering businesses to thrive in the cloud.

Leave a Comment

Speak with our advisors to learn how you can take control of your Cloud Cost