We’ve journeyed across the entire landscape of Google Cloud compute. We’ve laid our own foundation with Compute Engine (IaaS). We’ve mastered the complex but powerful city-planning of GKE (CaaS). And we’ve moved into the effortless luxury apartment of App Engine (PaaS). But a new desire has emerged. Your developers love the scale-to-zero efficiency and zero-server-management of App Engine Standard. However, they also love the freedom of Docker containers that they get with GKE or App Engine Flexible, allowing them to use any language, any library, any binary. They come to you with a simple request: “Can we have both? Can we have the ‘just run my code’ simplicity of serverless, but with the ‘run any container’ flexibility of Docker?”
For years, the answer was “pick one.” Today, the answer is a resounding “yes.” And the service that makes this possible is Cloud Run.
What is Cloud Run?
Cloud Run is a fully managed, serverless platform for running stateless containers. Let’s break that down. It takes the best of serverless and the best of containers and merges them into a single, powerful service.
- From Serverless: You get automatic scaling (including scaling down to zero), no infrastructure to manage, and a pay-per-use billing model.
- From Containers: You get the freedom to package your application in a standard Docker container, giving you total control over your runtime environment.
The analogy: Cloud Run is like a magical, self-replicating shipping container. When a shipment (an HTTP request) arrives at the port, the container instantly appears, processes the shipment, and then vanishes. If a thousand shipments arrive at once, a thousand containers appear instantly to handle the load. You just provide the blueprint for the container; Google handles the magic.
How It Works: Revisions, Concurrency, and CPU
The workflow is beautifully simple. You package your web application into a container image, push it to Artifact Registry, and then tell Cloud Run to deploy it.
gcloud run deploy my-cool-service \
--image gcr.io/my-project/my-app:v1 \
--platform managed \
--region us-central1 \
--allow-unauthenticated
Cloud Run takes your container, gives you a secure HTTPS endpoint, and handles everything else. When you deploy a new version of your container or change its configuration, Cloud Run creates a new, immutable Revision. Just like with App Engine, you can split traffic between different revisions for safe, gradual rollouts.
The Secret Sauce: Container Concurrency Here’s a key feature that makes Cloud Run so efficient. A single Cloud Function instance handles only one request at a time. A single Cloud Run container instance, however, can process multiple requests simultaneously. By default, an instance can handle up to 80 concurrent requests.
This is a game-changer for cost and performance. If you have 80 simultaneous users hitting your API, you might only need one Cloud Run instance, whereas you’d need 80 Cloud Function instances. This makes Cloud Run exceptionally cost-effective for services with steady traffic.
CPU Allocation: A Tale of Two Models
- CPU is allocated only during request processing (Default): This is the classic serverless model. If your container instance is idle (not actively handling a request), its CPU is severely throttled. You only pay for the CPU you use while processing requests. This is perfect for standard web services.
- CPU is always allocated: You can configure your service to have its CPU available at all times, even between requests. This is for applications that need to perform background work. In this mode, the service will not scale to zero.
Controlling Access: Ingress and IAM
Who can access the HTTPS endpoint for your Cloud Run service? You have granular control.
- Ingress Control: You can set the ingress to:
- All: The service is public and accessible from anywhere on the internet.
- Internal: The service is private and can only be reached from within your VPC network.
- Internal and Cloud Load Balancing: The service is internal but can also be used as a backend for an Internal or External Load Balancer.
- Authentication: Even for a public service, you can require authentication. By default, a service is private. To call it, the user or service account must have the Cloud Run Invoker (
roles/run.invoker
) IAM role. The--allow-unauthenticated
flag during deployment is what grants this role to the specialallUsers
principal, making it truly public.
Connecting to Your Private World: VPC Access
Just like Cloud Functions and App Engine, your Cloud Run container runs in a secure, Google-managed environment. If it needs to connect to a Cloud SQL database or a Memorystore instance inside your private VPC, it needs a bridge.
This is accomplished using a Serverless VPC Access Connector. You create the connector in your VPC, and then configure your Cloud Run service to use it. This allows your container to securely communicate with other resources using their private IP addresses.
The Final Showdown: When to Use What?
This is one of the most important topics for the ACE exam. With GCE, GKE, App Engine, Cloud Functions, and Cloud Run, which do you choose?
Service | Best For… | You Manage… | Scales to Zero? |
---|---|---|---|
Compute Engine (GCE) | Full control, legacy apps, custom OS | Everything (VMs, OS, patching, scaling) | No |
GKE | Complex, orchestrated microservices, stateful apps | Containers, cluster configuration, node pools | No (Standard) <br> Yes (Autopilot pods) |
App Engine Standard | Web apps in specific runtimes, rapid scaling | Just your code | Yes |
Cloud Functions | Single-purpose, event-driven code snippets | Just your code/functions | Yes |
Cloud Run | Stateless, request-driven web services in containers | Just your container image | Yes |
The simple rule of thumb:
- Need a full VM? -> GCE
- Need Kubernetes? -> GKE
- Have a simple web app in a supported language? -> Start with App Engine Standard.
- Have a small piece of code that reacts to an event (like a file upload)? -> Cloud Functions.
- Have a web application you want to run as a container and want serverless scaling? -> Cloud Run.
Common Pitfalls & Best Practices
- Pitfall: Trying to run a stateful application (like a traditional database) in Cloud Run. The container’s local file system is ephemeral.
- Best Practice: Design your containers to be stateless. Externalize all state to a managed service like Cloud SQL, Firestore, or Memorystore.
- Pitfall: Setting concurrency too high for a CPU-intensive application. A single instance might get overwhelmed trying to handle too many requests at once.
- Best Practice: Tune your concurrency settings. For CPU-heavy work, a lower concurrency (even 1) might be more appropriate. For I/O-bound work, a higher concurrency is fine.
- Pitfall: Forgetting that a “CPU always allocated” service will not scale to zero and will incur costs 24/7.
- Best Practice: Use the default “CPU during requests” model unless you have a clear need for background processing.
- Pitfall: Not building your container images efficiently, leading to large images and slow cold starts.
- Best Practice: Use multi-stage builds and minimal base images (like
alpine
ordistroless
) to keep your container images small and lean.
Quick Reference Command Center
Here’s a cheatsheet of gcloud
commands for managing Cloud Run.
Action | Command |
---|---|
Deploy a Service | gcloud run deploy [SERVICE_NAME] --image [IMAGE_URL] --region [REGION] |
List Deployed Services | gcloud run services list |
Describe a Service | gcloud run services describe [SERVICE_NAME] --region [REGION] |
Set IAM Policy (Make Private) | gcloud run services remove-iam-policy-binding [SERVICE_NAME] --member=allUsers --role=roles/run.invoker |
Set IAM Policy (Make Public) | gcloud run services add-iam-policy-binding [SERVICE_NAME] --member=allUsers --role=roles/run.invoker |
Update Traffic Split | gcloud run services update-traffic [SERVICE_NAME] --to-revisions=REV1=50,REV2=50 |
View Service Logs | gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=[SERVICE_NAME]" |