nvargus crash and CaptureScheduler deadlock on Jetson — debug guide

Q: What is the CaptureScheduler deadlock in nvargus on Jetson?

The CaptureScheduler deadlock is a threading issue in libnvscf.so where the capture scheduling thread stops processing requests while holding an internal mutex. The process remains alive but stops delivering frames. It manifests as a camera pipeline that ran correctly for minutes or hours, then silently stopped. Restarting argus_daemon recovers it.

Q: How do I enable nvargus debug logs on Jetson?

Set LIBARGUS_ENABLE_LOGS=1 before launching your camera application or argus_daemon. For more detail, set LIBARGUS_ENABLE_LOGS=3. Logs go to stderr and syslog. The argus_daemon logs are also accessible via journalctl -u argus-daemon.

Q: How do I tell the difference between an nvargus crash and a deadlock?

A crash produces a core dump or SIGABRT and the process exits — your pipeline will error out immediately. A deadlock leaves the argus_daemon process running but stops delivering frames — the pipeline hangs with no error. Check with ps aux to see if argus_daemon is still alive after frames stop.

Q: What is the libnvscf.so regression in JetPack 6?

Several JetPack 6.x builds have a CaptureScheduler threading bug in libnvscf.so that causes random deadlocks under sustained multi-session or high-framerate capture workloads. NVIDIA issued patches for specific JP 6.x subversions. The fix is to update to the latest JP 6.2.x point release or apply the patched libnvscf.so from NVIDIA support.

nvargus failures fall into two distinct categories with different debugging paths: crashes (the process exits) and deadlocks (the process hangs). Confusing the two wastes significant debugging time. This guide covers both failure modes, how to diagnose them, and the known JetPack 6 libnvscf regression that causes random CaptureScheduler deadlocks in production.

Key Insights

Crash vs deadlock is the first fork — check if argus_daemon is still in ps aux when frames stop; alive = deadlock, absent = crash
LIBARGUS_ENABLE_LOGS=1 is your first tool — without logs, crashes and deadlocks produce no useful output
The libnvscf CaptureScheduler bug affects JP 6.0–6.2.1 — upgrading to JP 6.2.2 or applying the patch fixes most random production deadlocks
Restarting argus_daemon recovers a deadlock — build a watchdog into production systems that restarts the daemon when no frames arrive for N seconds
strace -p on the deadlocked process reveals which syscall it is stuck on — this distinguishes kernel-level blocking from userspace mutex spin

Enabling nvargus logs

# Minimum: capture errors and warnings
export LIBARGUS_ENABLE_LOGS=1

# Verbose: capture scheduling, session events, internal state
export LIBARGUS_ENABLE_LOGS=3

# Logs go to stderr and syslog
# For the argus_daemon service:
sudo systemctl edit argus-daemon
# Add:
# [Service]
# Environment="LIBARGUS_ENABLE_LOGS=3"
sudo systemctl restart argus-daemon

# View daemon logs
journalctl -u argus-daemon -f

Diagnosing: crash vs deadlock

# Step 1: check if argus_daemon is alive
ps aux | grep argus_daemon

# Step 2a: if process is GONE → crash
# Check for core dump
ls -la /tmp/core* /var/core* 2>/dev/null
# Check dmesg for SIGABRT / segfault
dmesg | grep -E "argus|nvargus|segfault|killed" | tail -20
# Check syslog
journalctl -u argus-daemon --since "5 minutes ago"

# Step 2b: if process is ALIVE but no frames → deadlock
# Check what syscall the process is stuck on
sudo strace -p $(pidof argus_daemon) -e trace=futex,pthread_mutex 2>&1 | head -30
# If output shows futex(FUTEX_WAIT) spinning → mutex deadlock
# If output shows poll/epoll_wait → waiting on hardware

CaptureScheduler deadlock — the libnvscf regression

The most common cause of random deadlocks in JP 6.x production systems is a bug in libnvscf.so. The CaptureScheduler thread acquires a mutex for each capture request and can enter a state where the mutex is never released.

Identifying it:

# Check libnvscf version
strings /usr/lib/aarch64-linux-gnu/libnvscf.so | grep -E "version|build|r36"

# Attach gdb to the running daemon and get thread backtraces
sudo gdb -p $(pidof argus_daemon) -batch \
    -ex "thread apply all bt" \
    -ex "quit" 2>&1 | grep -A5 "CaptureScheduler"

# Deadlocked output will show:
# #0  0x... in __pthread_mutex_lock
# #1  0x... in CaptureScheduler::...
# #2  0x... in libargus::...

Workarounds:

Upgrade to JP 6.2.2 — the latest point release has the libnvscf patch
Apply patched library — NVIDIA support can provide a patched libnvscf.so for specific JP subversions
Watchdog restart — implement a frame watchdog that restarts argus_daemon on deadlock:

#!/bin/bash
# /usr/local/bin/argus-watchdog.sh
# Restart argus_daemon if no frames for 30 seconds

LAST_FRAME_TIME=$(date +%s)
FRAME_COUNT_FILE=/tmp/.argus_frame_count

while true; do
    # Your application writes frame count to this file
    CURRENT_COUNT=$(cat "$FRAME_COUNT_FILE" 2>/dev/null || echo 0)

    if [[ "$CURRENT_COUNT" != "$LAST_COUNT" ]]; then
        LAST_COUNT=$CURRENT_COUNT
        LAST_FRAME_TIME=$(date +%s)
    fi

    NOW=$(date +%s)
    ELAPSED=$((NOW - LAST_FRAME_TIME))

    if [[ $ELAPSED -gt 30 ]]; then
        logger "argus-watchdog: no frames for ${ELAPSED}s, restarting argus_daemon"
        systemctl restart argus-daemon
        LAST_FRAME_TIME=$(date +%s)
        sleep 5  # wait for daemon to restart
    fi

    sleep 1
done

# /etc/systemd/system/argus-watchdog.service
[Unit]
Description=Argus daemon watchdog
After=argus-daemon.service

[Service]
ExecStart=/usr/local/bin/argus-watchdog.sh
Restart=always

[Install]
WantedBy=multi-user.target

Debugging crashes (process exits)

Session destruction crash

Common cause: destroying an ICaptureSession while a capture request is still in flight.

// WRONG — race condition
delete captureSession;  // crashes if request still in flight

// CORRECT — wait for all requests to complete
captureSession->stopRepeat();
captureSession->waitForIdle();  // blocks until all in-flight requests complete
delete captureSession;

Sensor mode mismatch crash

SIGABRT if the requested sensor mode does not match the modes in the device tree:

# List available sensor modes
v4l2-ctl -d /dev/video0 --list-formats-ext

# The Argus sensor mode index must correspond to a valid V4L2 format
# Mode 0 = first entry in sensor_modes in device tree

// Always validate mode index against available modes
Argus::SensorMode *sensorMode;
std::vector<Argus::SensorMode*> sensorModes;
iCameraProperties->getSensorModes(&sensorModes);

if (modeIndex >= sensorModes.size()) {
    // Handle invalid mode — don't crash
    return false;
}
sensorMode = sensorModes[modeIndex];

Memory leak leading to OOM crash

Long-running capture sessions accumulate frame buffers if EGLStream consumers don’t consume frames promptly:

// In your GStreamer nvarguscamerasrc pipeline, add timeout:
// nvarguscamerasrc sensor-id=0 timeout=0
// timeout=0 means run indefinitely but cap buffer queue

// For custom libargus: release frames promptly
iFrameConsumer->acquireFrame(&frame, timeout);
// ... process frame ...
iFrameConsumer->releaseFrame(frame);  // MUST release; not automatic

GStreamer nvarguscamerasrc reset on deadlock

For GStreamer-based pipelines, the cleanest recovery is restarting the entire pipeline rather than just the daemon:

import subprocess
import time
import threading

def camera_watchdog(pipeline, timeout_sec=30):
    """Monitor pipeline, restart if stalled."""
    last_good = time.time()

    def on_new_sample(appsink):
        nonlocal last_good
        last_good = time.time()
        sample = appsink.emit('pull-sample')
        # process sample
        return Gst.FlowReturn.OK

    def watchdog_thread():
        while True:
            time.sleep(5)
            if time.time() - last_good > timeout_sec:
                pipeline.set_state(Gst.State.NULL)
                subprocess.run(['systemctl', 'restart', 'argus-daemon'])
                time.sleep(2)
                pipeline.set_state(Gst.State.PLAYING)
                last_good = time.time()

    t = threading.Thread(target=watchdog_thread, daemon=True)
    t.start()

For the V4L2 capture path that serves as a fallback when Argus deadlocks, see the fault isolation pattern in MIPI CSI camera driver setup on Jetson. For the gotcha that explains why v4l2-ctl succeeds but argus_camera fails, see Jetson camera works with v4l2-ctl but fails with argus_camera.

FAQ

What is the CaptureScheduler deadlock in nvargus on Jetson?

A threading bug in libnvscf.so where the capture scheduler thread acquires a mutex and never releases it. The argus_daemon process stays alive but stops delivering frames. Most prevalent in JP 6.0–6.2.1. Fix: upgrade to JP 6.2.2 or apply the patched libnvscf.so.

How do I enable nvargus debug logs on Jetson?

Set LIBARGUS_ENABLE_LOGS=1 (or 3 for verbose) before launching your application or argus_daemon. Logs go to stderr and syslog, readable via journalctl -u argus-daemon.

How do I tell the difference between an nvargus crash and a deadlock?

If argus_daemon is gone from ps aux → crash. If it is still alive but no frames arrive → deadlock. Use strace -p $(pidof argus_daemon) to see what syscall the deadlocked process is stuck on.

What is the libnvscf.so regression in JetPack 6?

A CaptureScheduler threading bug affecting JP 6.0–6.2.1 that causes random deadlocks under sustained capture. Fixed in JP 6.2.2. NVIDIA support can also provide a patched libnvscf.so for earlier releases.