nvargus crash and CaptureScheduler deadlock on Jetson — debug guide
nvargus failures fall into two distinct categories with different debugging paths: crashes (the process exits) and deadlocks (the process hangs). Confusing the two wastes significant debugging time. This guide covers both failure modes, how to diagnose them, and the known JetPack 6 libnvscf regression that causes random CaptureScheduler deadlocks in production.
Key Insights
- Crash vs deadlock is the first fork — check if
argus_daemonis still inps auxwhen frames stop; alive = deadlock, absent = crash LIBARGUS_ENABLE_LOGS=1is your first tool — without logs, crashes and deadlocks produce no useful output- The libnvscf CaptureScheduler bug affects JP 6.0–6.2.1 — upgrading to JP 6.2.2 or applying the patch fixes most random production deadlocks
- Restarting
argus_daemonrecovers a deadlock — build a watchdog into production systems that restarts the daemon when no frames arrive for N seconds strace -pon the deadlocked process reveals which syscall it is stuck on — this distinguishes kernel-level blocking from userspace mutex spin
Enabling nvargus logs
# Minimum: capture errors and warnings
export LIBARGUS_ENABLE_LOGS=1
# Verbose: capture scheduling, session events, internal state
export LIBARGUS_ENABLE_LOGS=3
# Logs go to stderr and syslog
# For the argus_daemon service:
sudo systemctl edit argus-daemon
# Add:
# [Service]
# Environment="LIBARGUS_ENABLE_LOGS=3"
sudo systemctl restart argus-daemon
# View daemon logs
journalctl -u argus-daemon -f
Diagnosing: crash vs deadlock
# Step 1: check if argus_daemon is alive
ps aux | grep argus_daemon
# Step 2a: if process is GONE → crash
# Check for core dump
ls -la /tmp/core* /var/core* 2>/dev/null
# Check dmesg for SIGABRT / segfault
dmesg | grep -E "argus|nvargus|segfault|killed" | tail -20
# Check syslog
journalctl -u argus-daemon --since "5 minutes ago"
# Step 2b: if process is ALIVE but no frames → deadlock
# Check what syscall the process is stuck on
sudo strace -p $(pidof argus_daemon) -e trace=futex,pthread_mutex 2>&1 | head -30
# If output shows futex(FUTEX_WAIT) spinning → mutex deadlock
# If output shows poll/epoll_wait → waiting on hardware
CaptureScheduler deadlock — the libnvscf regression
The most common cause of random deadlocks in JP 6.x production systems is a bug in libnvscf.so. The CaptureScheduler thread acquires a mutex for each capture request and can enter a state where the mutex is never released.
Identifying it:
# Check libnvscf version
strings /usr/lib/aarch64-linux-gnu/libnvscf.so | grep -E "version|build|r36"
# Attach gdb to the running daemon and get thread backtraces
sudo gdb -p $(pidof argus_daemon) -batch \
-ex "thread apply all bt" \
-ex "quit" 2>&1 | grep -A5 "CaptureScheduler"
# Deadlocked output will show:
# #0 0x... in __pthread_mutex_lock
# #1 0x... in CaptureScheduler::...
# #2 0x... in libargus::...
Workarounds:
- Upgrade to JP 6.2.2 — the latest point release has the libnvscf patch
- Apply patched library — NVIDIA support can provide a patched
libnvscf.sofor specific JP subversions - Watchdog restart — implement a frame watchdog that restarts
argus_daemonon deadlock:
#!/bin/bash
# /usr/local/bin/argus-watchdog.sh
# Restart argus_daemon if no frames for 30 seconds
LAST_FRAME_TIME=$(date +%s)
FRAME_COUNT_FILE=/tmp/.argus_frame_count
while true; do
# Your application writes frame count to this file
CURRENT_COUNT=$(cat "$FRAME_COUNT_FILE" 2>/dev/null || echo 0)
if [[ "$CURRENT_COUNT" != "$LAST_COUNT" ]]; then
LAST_COUNT=$CURRENT_COUNT
LAST_FRAME_TIME=$(date +%s)
fi
NOW=$(date +%s)
ELAPSED=$((NOW - LAST_FRAME_TIME))
if [[ $ELAPSED -gt 30 ]]; then
logger "argus-watchdog: no frames for ${ELAPSED}s, restarting argus_daemon"
systemctl restart argus-daemon
LAST_FRAME_TIME=$(date +%s)
sleep 5 # wait for daemon to restart
fi
sleep 1
done
# /etc/systemd/system/argus-watchdog.service
[Unit]
Description=Argus daemon watchdog
After=argus-daemon.service
[Service]
ExecStart=/usr/local/bin/argus-watchdog.sh
Restart=always
[Install]
WantedBy=multi-user.target
Debugging crashes (process exits)
Session destruction crash
Common cause: destroying an ICaptureSession while a capture request is still in flight.
// WRONG — race condition
delete captureSession; // crashes if request still in flight
// CORRECT — wait for all requests to complete
captureSession->stopRepeat();
captureSession->waitForIdle(); // blocks until all in-flight requests complete
delete captureSession;
Sensor mode mismatch crash
SIGABRT if the requested sensor mode does not match the modes in the device tree:
# List available sensor modes
v4l2-ctl -d /dev/video0 --list-formats-ext
# The Argus sensor mode index must correspond to a valid V4L2 format
# Mode 0 = first entry in sensor_modes in device tree
// Always validate mode index against available modes
Argus::SensorMode *sensorMode;
std::vector<Argus::SensorMode*> sensorModes;
iCameraProperties->getSensorModes(&sensorModes);
if (modeIndex >= sensorModes.size()) {
// Handle invalid mode — don't crash
return false;
}
sensorMode = sensorModes[modeIndex];
Memory leak leading to OOM crash
Long-running capture sessions accumulate frame buffers if EGLStream consumers don’t consume frames promptly:
// In your GStreamer nvarguscamerasrc pipeline, add timeout:
// nvarguscamerasrc sensor-id=0 timeout=0
// timeout=0 means run indefinitely but cap buffer queue
// For custom libargus: release frames promptly
iFrameConsumer->acquireFrame(&frame, timeout);
// ... process frame ...
iFrameConsumer->releaseFrame(frame); // MUST release; not automatic
GStreamer nvarguscamerasrc reset on deadlock
For GStreamer-based pipelines, the cleanest recovery is restarting the entire pipeline rather than just the daemon:
import subprocess
import time
import threading
def camera_watchdog(pipeline, timeout_sec=30):
"""Monitor pipeline, restart if stalled."""
last_good = time.time()
def on_new_sample(appsink):
nonlocal last_good
last_good = time.time()
sample = appsink.emit('pull-sample')
# process sample
return Gst.FlowReturn.OK
def watchdog_thread():
while True:
time.sleep(5)
if time.time() - last_good > timeout_sec:
pipeline.set_state(Gst.State.NULL)
subprocess.run(['systemctl', 'restart', 'argus-daemon'])
time.sleep(2)
pipeline.set_state(Gst.State.PLAYING)
last_good = time.time()
t = threading.Thread(target=watchdog_thread, daemon=True)
t.start()
For the V4L2 capture path that serves as a fallback when Argus deadlocks, see the fault isolation pattern in MIPI CSI camera driver setup on Jetson. For the gotcha that explains why v4l2-ctl succeeds but argus_camera fails, see Jetson camera works with v4l2-ctl but fails with argus_camera.
FAQ
What is the CaptureScheduler deadlock in nvargus on Jetson?
A threading bug in libnvscf.so where the capture scheduler thread acquires a mutex and never releases it. The argus_daemon process stays alive but stops delivering frames. Most prevalent in JP 6.0–6.2.1. Fix: upgrade to JP 6.2.2 or apply the patched libnvscf.so.
How do I enable nvargus debug logs on Jetson?
Set LIBARGUS_ENABLE_LOGS=1 (or 3 for verbose) before launching your application or argus_daemon. Logs go to stderr and syslog, readable via journalctl -u argus-daemon.
How do I tell the difference between an nvargus crash and a deadlock?
If argus_daemon is gone from ps aux → crash. If it is still alive but no frames arrive → deadlock. Use strace -p $(pidof argus_daemon) to see what syscall the deadlocked process is stuck on.
What is the libnvscf.so regression in JetPack 6?
A CaptureScheduler threading bug affecting JP 6.0–6.2.1 that causes random deadlocks under sustained capture. Fixed in JP 6.2.2. NVIDIA support can also provide a patched libnvscf.so for earlier releases.
Relevant Services
NVIDIA Jetson Expert Support
Stuck on a Jetson bring-up?
We've debugged this failure mode before. BSP, device tree, camera pipelines, OTA, most blockers clear in the first session. No long retainers. No guessing.
Frequently Asked Questions
What is the CaptureScheduler deadlock in nvargus on Jetson?
The CaptureScheduler deadlock is a threading issue in libnvscf.so where the capture scheduling thread stops processing requests while holding an internal mutex. The process remains alive but stops delivering frames. It manifests as a camera pipeline that ran correctly for minutes or hours, then silently stopped. Restarting argus_daemon recovers it.
How do I enable nvargus debug logs on Jetson?
Set LIBARGUS_ENABLE_LOGS=1 before launching your camera application or argus_daemon. For more detail, set LIBARGUS_ENABLE_LOGS=3. Logs go to stderr and syslog. The argus_daemon logs are also accessible via journalctl -u argus-daemon.
How do I tell the difference between an nvargus crash and a deadlock?
A crash produces a core dump or SIGABRT and the process exits — your pipeline will error out immediately. A deadlock leaves the argus_daemon process running but stops delivering frames — the pipeline hangs with no error. Check with ps aux to see if argus_daemon is still alive after frames stop.
What is the libnvscf.so regression in JetPack 6?
Several JetPack 6.x builds have a CaptureScheduler threading bug in libnvscf.so that causes random deadlocks under sustained multi-session or high-framerate capture workloads. NVIDIA issued patches for specific JP 6.x subversions. The fix is to update to the latest JP 6.2.x point release or apply the patched libnvscf.so from NVIDIA support.
Written by
Andrés CamposCo-Founder & CTO · ProventusNova
8 years deep in embedded systems, from underwater ROVs to edge AI. Andrés leads every technical delivery personally.
Connect on LinkedInRelated Articles
Jetson camera works with v4l2-ctl but fails to launch argus_camera — debug guide
Why your Jetson camera works with v4l2-ctl but argus_camera fails — tegra-camera DT node issues, sensor mode tables, and the V4L2-to-Argus fault path.
GMSL YUV422 capture and FORCE_FE errors on Jetson Orin — debug guide
Debug GMSL YUV422 capture issues on Jetson Orin — FORCE_FE decoder config, partial frame faults, and MAX9295/MAX9296 YUV format setup.
nvcompositor vs parallel GStreamer pipelines on Jetson Orin — when each is slower
When to use nvcompositor vs parallel GStreamer pipelines on Jetson Orin, why compositor is slower, and how to choose the right path for your workload.
How to enable CSI0 on Jetson Orin in the device tree
Enable CSI0, CSI1, and CSI2 on Jetson Orin by configuring nvcsi and VI device tree nodes, port status, and lane count in your DT overlay.