GKE in Production: The Field Guide

Your journey as a cloud engineer has progressed. You’ve built robust networks, configured secure identity systems, and deployed applications on virtual machines. But a new problem has emerged. Your lead developer comes to you and says the magic words: “But it works on my machine!”

You’ve entered the world of dependency hell, where an application works perfectly in a developer’s controlled environment but fails in production because of a missing library or a different OS version. The first step in solving this is containerization.

We’ll use tools like Docker to package our application, its code, its libraries, and all its dependencies into a single, neat box called a container. This box can run anywhere—on a laptop, a VM, on-premises—and it will behave identically. Problem solved, right?

Not quite. Now you have a new problem: container sprawl. Instead of a few VMs, you now have hundreds of these container boxes. How do you deploy them? How do you make them talk to each other? What happens if one crashes in the middle of the night? Who restarts it? You need an orchestrator. You need a city planner for your digital metropolis. You need Kubernetes.

The Conductor of the Orchestra: What is Kubernetes?

Kubernetes (often shortened to K8s, because there are 8 letters between ‘K’ and ‘s’) is an open-source platform that automates the deployment, scaling, and management of containerized applications. It’s the de facto operating system for the cloud.

Think of it as the conductor of a massive, complex orchestra. Each container is a musician. Kubernetes tells them when to play, how loudly to play (scaling), brings in new musicians if needed, and makes sure a replacement is ready if one of them faints (crashes). It’s incredibly powerful, but setting up the conductor’s podium and the concert hall—the underlying infrastructure for Kubernetes itself—can be a complex task.

This is where Google Cloud steps in with a managed solution.

Google Kubernetes Engine (GKE)

Google Kubernetes Engine (GKE) is a managed Kubernetes service. This means Google handles the tedious and complex parts of running Kubernetes for you, allowing you to focus on what you actually care about: deploying your applications.

What does “managed” mean here? In Kubernetes, the architecture is split into two parts:

The Control Plane (The Brains): It makes all the decisions: where to schedule containers, how to respond to failures, etc. In GKE, Google manages the control plane entirely. They ensure it’s available, patched, and secure. You just interact with it via an API.
The Nodes (The Brawn): They are the worker machines—essentially GCE virtual machines—that actually run your containers. You are responsible for these, but GKE gives you powerful tools to manage them.

Choosing Your Management Style: Standard vs. Autopilot

GKE offers two distinct modes of operation, and choosing between them is one of your first big decisions.

GKE Standard

This is the classic, hands-on GKE experience. You have full control over your nodes. You decide:

What machine type to use (e.g., e2-standard-4).
How many nodes to start with.
How to configure their autoscaling.
When to perform upgrades.

You are responsible for managing your node pools—groups of nodes that share the same configuration. This gives you maximum flexibility but also more operational responsibility.

GKE Autopilot

This is the future of Kubernetes. Autopilot takes the “managed” concept to the next level. With Autopilot, Google manages both the control plane and the nodes.

You don’t create or manage nodes at all.
You simply deploy your application containers.
GKE automatically provisions the necessary compute resources to run them.
You pay per-pod for the CPU, memory, and storage your application requests, not for the underlying VMs.

Autopilot is a truly serverless Kubernetes experience. It’s a bit more opinionated and has some limitations, but for a vast number of workloads, it dramatically reduces operational overhead and can be more cost-efficient.

Your Application’s Building Blocks: Core K8s Objects

Whether you choose Standard or Autopilot, you’ll interact with your cluster using the same Kubernetes API objects. These are the fundamental building blocks of your application.

The Pod: The Smallest Unit

A container doesn’t run by itself in Kubernetes. It runs inside a wrapper called a Pod. A Pod is the smallest, most basic deployable object in Kubernetes. Think of a pea pod: the pod is the wrapper, and the peas inside are your containers. While a Pod can contain multiple containers, 95% of the time, you’ll have one main container per Pod. Pods are ephemeral; they can be created, destroyed, and replaced at any time.

The Blueprint: The Deployment

You rarely create a Pod directly. Instead, you declare the desired state of your application using a Deployment. A Deployment is a set of instructions that tells Kubernetes. “I want to run my application using this container image, and I always want 3 replicas (3 identical Pods) of it running.”

The Deployment’s job is to be the foreman. It will create the 3 Pods. If one of them crashes, the Deployment will automatically create a new one to replace it, ensuring you always have 3 running. This provides self-healing and scalability.

The Front Door: The Service

So, you have 3 Pods running your web server, thanks to your Deployment. But each Pod has its own internal, unstable IP address. If a Pod is replaced, its IP changes. How can other applications reliably connect to your web server?

You need a stable front door. In Kubernetes, this is a Service. A Service provides a single, stable IP address and DNS name that fronts a group of Pods. When traffic hits the Service, it automatically load-balances it to one of the healthy Pods behind it.

There are several types of Services, but the most important one is:

LoadBalancer: This is the magic wand. When you create a Service of type LoadBalancer in GKE, it automatically communicates with Google Cloud and provisions a real, external Cloud Load Balancer with a public IP address. This is the primary way you expose your application to the internet.

The Smart Gatekeeper: GKE Ingress

The LoadBalancer Service is great for giving one external IP to one set of Pods. But what if you have multiple services in your cluster—say, api-service and web-frontend-service—and you want them all accessible through a single external IP, but with different paths or hostnames? You don’t want to pay for a separate Load Balancer for every single service.

This is where GKE Ingress comes in.

An Ingress acts like a smart traffic controller at the edge of your cluster. It uses a single, global Cloud HTTP(S) Load Balancer to route incoming traffic to different Services within your cluster based on rules you define.

Example:

Traffic for api.your-app.com goes to your api-service.
Traffic for www.your-app.com goes to your web-frontend-service.
Traffic for www.your-app.com/admin goes to your admin-panel-service.

All of this happens through one efficient and cost-effective Cloud Load Balancer, saving you IP addresses and management overhead.

The Art of Scaling

Your application is a success! Traffic is surging. How do you handle the load? GKE provides two levels of autoscaling.

Horizontal Pod Autoscaler (HPA): This scales your Pods. You can create an HPA policy that says, “Watch the CPU utilization of the Pods in my web-server Deployment. If the average CPU goes above 60%, create more Pods.” The HPA will automatically increase the replica count in your Deployment from 3 to 4, 5, or whatever is needed to handle the load.
Cluster Autoscaler: This scales your Nodes (in Standard mode). What if the HPA wants to scale up to 100 pods, but your 3 nodes are already full? The Cluster Autoscaler detects that Pods are “Pending” because of a lack of resources. It then automatically adds new nodes to your node pool to provide the needed capacity. When the load decreases, it will consolidate Pods and safely remove the unneeded nodes to save you money. (In Autopilot, this is all done for you automatically and transparently).

Security & Identity: Workload Identity

Your Pod needs to access a Cloud Storage bucket. How does it authenticate securely? You could download a service account key and store it as a Kubernetes Secret, but managing and rotating keys is a security risk.

The modern, secure, and recommended way is Workload Identity.

Workload Identity is a GKE feature that lets you link a Kubernetes Service Account (KSA)—an identity for a Pod inside the cluster—to a Google Cloud IAM Service Account (GSA). This allows your Pod to inherit the permissions of the IAM service account without ever touching a JSON key. It’s the most secure way for your GKE workloads to interact with other Google Cloud services.

Storing Container Images: Artifact Registry

Your Deployment manifest will specify a container image, like my-app:v1.2.3. But where does that image physically live? While you could pull from public registries like Docker Hub, for production applications, you’ll want a private, secure, and integrated home for your images.

This is the job of Artifact Registry. It’s Google Cloud’s fully managed service for storing, managing, and securing your container images (as well as other build artifacts like Maven packages or npm packages).

Key benefits:

Private Storage: Your images are kept private within your Google Cloud project.
Vulnerability Scanning: Integrates with Container Analysis to automatically scan your images for known security vulnerabilities, giving you a crucial security layer.
Fast Access: As it’s native to Google Cloud, your GKE clusters can pull images from Artifact Registry very quickly and securely.

It’s the central hub for your application’s building blocks, right where they’re built and deployed.

Handling State: Persistent Storage

What about applications that need to store data, like a database? Containers and Pods are ephemeral. If a database Pod restarts, all its data is wiped. This is where PersistentVolumes come in.

The model works like this:

A developer creates a PersistentVolumeClaim (PVC). This is a request for storage, like “I need 100GB of fast, SSD-like storage.”
GKE sees this PVC and uses a StorageClass to dynamically provision a matching piece of underlying storage. In Google Cloud, this is typically a GCP Persistent Disk.
This Persistent Disk is represented in the cluster as a PersistentVolume (PV). Kubernetes binds the PV to the PVC.
You configure your database Pod to mount this PVC.

The magic is that the lifecycle of the Persistent Disk is now separate from the Pod. If the Pod crashes and is reattached to a different node, Kubernetes will detach the disk from the old node and reattach it to the new one. Your data is safe.

Private Clusters

For the most secure workloads, you might want your GKE worker nodes to be completely isolated from the public internet. This is where a Private Cluster becomes essential.

In a private GKE cluster, the worker nodes (the VMs that run your Pods) have only private IP addresses. This means they cannot be reached directly from the public internet, significantly reducing your attack surface.

But what if your Pods need to reach the internet? For example, to pull images from external registries (though Artifact Registry is preferred) or to download software updates? Since the nodes have no public IP, they can’t directly. The solution, as we learned in our networking article, is Cloud NAT. If you have a private cluster and need internet egress, you must configure a Cloud NAT gateway in the VPC where your cluster resides.

Managing Configuration & Secrets: ConfigMaps and Secrets

Your application code is neatly packaged in a container. But what about its configuration? Database connection strings, API keys, feature flags—these shouldn’t be hardcoded into your image. This is where ConfigMaps and Secrets save the day.

ConfigMaps: These Kubernetes objects are used to store non-sensitive configuration data. Think environment variables, command-line arguments, or entire configuration files. They allow you to change your application’s settings without rebuilding the container image.
Secrets: These are specifically designed for sensitive data, like passwords, API keys, and authentication tokens. Kubernetes Secrets provide a more secure way to store and distribute this information to your Pods. While they are simply base64 encoded by default (not encrypted at rest without additional configuration), they are designed to be exposed only to authorized Pods and can integrate with GCP’s Key Management Service for true encryption.

Your Pods can then consume these ConfigMaps and Secrets as environment variables or mount them as files within the container’s file system, keeping your sensitive data out of your code and container images.

Common Pitfalls & Best Practices

Pitfall: Using the default GKE node service account, which has broad (Editor) permissions.
Best Practice: Use Workload Identity. Create dedicated, least-privilege IAM service accounts for your applications and link them to Kubernetes service accounts.
Pitfall: Using the :latest tag for your container images in a Deployment. This can lead to unpredictable deployments.
Best Practice: Always use a specific, immutable image tag or hash (e.g., my-app:v1.2.3 or my-app@sha256:...). This ensures your deployments are repeatable and reliable.
Pitfall: Not setting resource requests and limits on your containers. This leads to “noisy neighbor” problems and makes autoscaling unreliable.
Best Practice: Always define CPU and memory requests and limits for your containers. This allows the Kubernetes scheduler to place Pods intelligently and the HPA to make accurate scaling decisions.
Pitfall: Choosing GKE Standard when your team doesn’t have Kubernetes operational experience.
Best Practice: Start with GKE Autopilot for new workloads. It dramatically simplifies operations and forces you to adopt best practices (like setting resource requests). Move to Standard only if you have a specific need for the extra control.
Pitfall: Hardcoding sensitive data or configuration into container images.
Best Practice: Use ConfigMaps for non-sensitive data and Secrets for sensitive data. Manage these Kubernetes objects separately from your application code.
Pitfall: Exposing GKE nodes to the public internet unnecessarily.
Best Practice: For production, consider using Private Clusters and manage external access strictly through Ingress or specific Services. If egress to the internet is needed from private nodes, configure Cloud NAT.

Quick Reference Command Center

You’ll use two main command-line tools: gcloud for managing the GKE cluster itself, and kubectl for managing the workloads inside the cluster.

Tool	Action	Command
gcloud	Create an Autopilot Cluster	`gcloud container clusters create-auto [CLUSTER_NAME] --region=[REGION]`
gcloud	Create a Standard Cluster	`gcloud container clusters create [CLUSTER_NAME] --num-nodes=3 --zone=[ZONE]`
gcloud	Create a Private Cluster	`gcloud container clusters create [NAME] --enable-private-nodes --master-ipv4-cidr=[CIDR] --enable-master-authorized-networks --master-authorized-networks=[CIDR]`
gcloud	Get credentials to connect to cluster	`gcloud container clusters get-credentials [CLUSTER_NAME] --region=[REGION]`
gcloud	Resize a Node Pool (Standard)	`gcloud container clusters resize [CLUSTER_NAME] --node-pool=[POOL_NAME] --num-nodes=5`
gcloud	Create Artifact Registry Repository	`gcloud artifacts repositories create [REPO_NAME] --repository-format=docker --location=[LOCATION]`
kubectl	Apply a configuration file	`kubectl apply -f my-deployment.yaml`
kubectl	See all running Pods	`kubectl get pods`
kubectl	See all Services	`kubectl get services` (or `svc`)
kubectl	See all Ingresses	`kubectl get ingress`
kubectl	See all ConfigMaps	`kubectl get configmaps`
kubectl	See all Secrets	`kubectl get secrets`
kubectl	Get logs from a Pod	`kubectl logs [POD_NAME]`
kubectl	Get a shell inside a running Pod	`kubectl exec -it [POD_NAME] -- /bin/sh`
kubectl	Scale a Deployment	`kubectl scale deployment/[DEPLOYMENT_NAME] --replicas=5`