The “It Works on My Machine” Apocalypse: Taming Vertex AI

Let’s be honest. Most “AI projects” start as a Jupyter notebook named Untitled12_final_final_v2.ipynb. It lives on a laptop that hasn’t been rebooted in three weeks, runs on a specific version of Python that no longer exists in the wild, and relies on a CSV file that Bob from accounting emailed you once.

Then management says, “Great, let’s put it in production.”

And that’s when the screaming starts.

If you want to survive the transition from “hacky script” to “enterprise system” without losing your sanity, you need to stop treating ML like a science fair project and start treating it like software engineering. Google Cloud’s Vertex AI is the toolkit for this, but it’s a beast. Let’s break down the parts that actually matter: Experiments, Registry, Endpoints, and the Model Garden.

Vertex AI Experiments: The “Messy Desk”

Before you have a model, you have a mess. You’re tweaking hyperparameters, swapping datasets, and trying eight different architectures. If you don’t track this, you will forget which combination gave you that 98% accuracy.

Vertex AI Experiments is essentially a logbook for your chaos. It tracks:

Parameters: Learning rate, batch size, dropout.
Metrics: Accuracy, loss, F1 score.
Context: Which dataset version you used.

Think of it as git commit messages, but for math. You run a job, you log the results to an Experiment. Later, when your boss asks why the new model is worse, you can pull up the Experiment Dashboard and prove that actually, this model is faster and cheaper, even if it’s 0.1% less accurate.

Pro Tip: Don’t log everything. Log the stuff that changes. If you log the value of Pi every time, you’re just wasting storage.

Model Registry: The “Filing Cabinet”

Once you have a model that isn’t terrible, you need to put it somewhere safe. That is NOT an S3 bucket or a folder on your desktop.

The Model Registry is the source of truth. It solves the “which version is running in prod?” problem.

Versioning: It handles v1, v2, v3 automatically.
Aliasing: You can tag a model as default, staging, or production.
Governance: You can see who trained it and when.

The Registry doesn’t store the massive model files (those live in Cloud Storage); it stores the metadata and pointers. It’s the difference between a pile of books on the floor and a library card catalog.

The Workflow:

Experiments: Try 50 things.
Winner: Pick the best one.
Registry: “Register” that winner. This is now an immutable artifact.

Model Garden: The “Shopping Mall”

Sometimes, you don’t need to build a car; you just need to rent a taxi. Model Garden is Google’s catalog of pre-trained models.

First-Party (Google): Gemini, PaLM, Chirp (speech), Imagen.
Open Source: Llama, BERT, RoBERTa, Mistral.
Third-Party: Models from partners.

The cynical take: It’s great for prototyping or generic tasks (sentiment analysis, OCR, chat). But if you have a highly specific domain—like detecting defects in underwater welding seams—you’re still going to need to fine-tune these or build your own. Treat Model Garden as a starting point, not a magic wand.

Endpoints: Where the Rubber Meets the Road

An “Endpoint” is just a URL that accepts data and spits out predictions. But in the cloud, nothing is “just” a URL.

Deployment Modes

You have a few ways to expose your model, and choosing the wrong one will either bankrupt you or get you hacked.

Public Endpoints:
- Standard: Public internet IP. Secured by IAM (you need a Google token to hit it). Easiest to set up.
- Good for: Mobile apps, public-facing web services, dev/test.
Private Endpoints:
- Private Service Connect (PSC): The modern way. Exposes the model as a service inside your VPC. No public internet exposure.
- Private Service Access (VPC Peering): The older way. Messier networking. Avoid unless you have legacy reasons.
- Good for: Internal enterprise apps, sensitive financial data, compliance-heavy workloads.

The “Traffic Splitting” Superpower

This is the coolest feature, provided you are using Public Endpoints. (Private endpoints currently generally support only one model per endpoint—a nasty gotcha).

You can deploy two models to the same endpoint ID and tell Vertex AI:

“Send 90% of traffic to Model v1 (Old Faithful)”
“Send 10% of traffic to Model v2 (The New Hotness)”

This is Canary Deployment.If v2 starts throwing errors, you just flip the switch back to 100% v1. Zero downtime. If you aren’t doing this, you are deploying on hope.

Modularizing Model Usage: Pipelines

If you are running your training by manually clicking “Run” in a notebook, you are doing it wrong.

Vertex AI Pipelines (based on Kubeflow) is how you modularize. You break your massive script into small, reusable Components:

Data Ingestion Component: Pulls data from BigQuery.
Preprocessing Component: Cleans the data.
Training Component: Crunches the numbers.
Evaluation Component: Checks if the model sucks.
Deployment Component: Pushes to Registry and Endpoint if it passes.

Why bother?

Reusability: You write the “Pull from BigQuery” component once and reuse it in 50 pipelines.
Caching: If the data hasn’t changed, the pipeline skips step 1 and 2 and goes straight to training. Saves money.
Sanity: If step 3 fails, you fix step 3. You don’t re-run the whole universe.

The Bottom Line

Vertex AI is complex because production ML is complex.

Use Experiments to make your messiness searchable.
Use Registry to lock down your winners.
Use Endpoints with traffic splitting to deploy without sweating bullets.
Use Pipelines so you don’t have to manually babysit scripts at 3 AM.

Vertex AI Experiments: The “Messy Desk”

Model Registry: The “Filing Cabinet”

Model Garden: The “Shopping Mall”

Endpoints: Where the Rubber Meets the Road

Deployment Modes

The “Traffic Splitting” Superpower

Modularizing Model Usage: Pipelines

The Bottom Line

Related Posts

A Deep Dive into Google Cloud DNS

Vertex AI: The “Build vs. Buy” Decision Matrix

Vertex AI : Feature Store, WorkBench & Colab Enterprise