Massive GCP Outage Thursday 6/12/25

Posted

On Thursday June 12, 2025, beginning at 1:49pm Eastern, Google Cloud Platform and Google’s Workspace products experienced cascading failures across a large number of services. You can see in Google’s Incident Report that the culprit was a bad code push for a component they internally call “Service Control” which authorizes API requests. Google concluded mitigation for the incident after three hours, and has acknowledged that some services were inaccessible for almost that long.

Unsurprisingly, Google’s cloud engineering team was able to patch Service Control after about 40 minutes of downtime, but the process of recovery included handling an elevated number of transactions that flooded Google systems after they came back online. Officially, the incident ran from 3 hours from 12 June, 2025 1:49pm EDT to 12 June, 2025 4:49. There would have been significant variance in downtime experienced by customers depending on which services they were consuming and which regions they were attached to.

Downstream Impact

If you experienced downtime courtesy of Google on Thursday, you weren’t alone. The Downdetector website showed over 13,000 reported incidents for Google Cloud at around 2:30pm Eastern. And in the aftermath, we’ve learned that the list of companies impacted is long and distinguished. It’s a testament to how interconnected these companies and technologies are. When an infrastructure provider like GCP goes down, millions of their customers’ end users feel the impact and the cost is enormous. That interconnectedness can also impact how they serve each other. Many of the companies listed below are both GCP customers and GitHub customers. Their service delivery and their service development could have been simultaneously affected. This is just a taste of the scope of this outage:

SLA impact

Credit for GCP’s infrastructure services with SLAs is awarded in three tranches. The first tranche, a 10% service credit, is triggered when monthly service availability falls below 99.99% or 99.95% (depending on the service). The second tranche is a 25% credit triggered when availability falls below 99.0%. The third tranche at 50% is triggered below 95.0%. To get a sense of what this means, the chart below spells out downtime thresholds that GCP measures:

AvailabilityDowntime per Month
99.99%4.38 minutes
99.95%21.92 minutes
99.0%7.31 hours
95.0%36.53 hours

For many services, just a few minutes of downtime can make you credit eligible. The 40 minutes it took Google to implement a fix to API Service Control represents the bare minimum impairment time. This single event on Thursday qualified any impacted services for a 10% monthly credit. Many services were impacted for several hours. Because downtime is calculated over the course of the month, Next Signal recommends waiting until the end of the month to submit a credit request. Additional fallout from this event (think tremors) or downtime from unrelated events could contribute to pushing that monthly downtime over 7.31 hours. That would push your credit request into the 25% tranche. Next Signal recommends submitting the largest credit request that you can.

Where do you start? >>>>Next Signal

Google reported 76 individual infrastructure services affected by this outage across all regions world-wide. Restoration times were not universal and, in fact, there was a wide variance in downtime primarily based on the size of the region. Larger regions experienced more downtime because Service Control was not able to handle the surge demand of so many customers hitting the APIs simultaneously when they came back online. The good news is that a couple hours of downtime can get you back 10% of your monthly spend on those services. The less great news is that it’s a pain to pinpoint exactly what you need to ask for.

Determining which services were affected and in which regions and then comparing that to what you have deployed is what we do. Supply Next Signal with your most recent month of billing data, and we can tell you what is owed, help you collect the evidence you need to submit, and format the request for you. If your development team would rather spend their time building and future proofing your network, cloud products, and cloud services, let Next Signal do the heavy lifting of credit retrieval. If your finance team wants more visibility into outage events and their impact, and wants to be empowered to make claims, reach out to Next Signal and schedule a conversation. Don’t pay for downtime.