Why does nvvidconv get slower when multiple GStreamer processes run on Jetson?

nvvidconv uses the VIC (Video Image Composer) hardware unit for color conversion and scaling. VIC has a limited number of concurrent contexts. When multiple processes each hold a VIC context, they serialize on the hardware, and each process sees dramatically lower throughput than when running alone.

How do I check VIC utilization on Jetson?

Use tegrastats — run it with sudo tegrastats and look for the VIC% column. Unlike GPU utilization which shows up in nvidia-smi, VIC utilization only appears in tegrastats. High VIC% with multiple pipelines confirms hardware contention.

Should I use one GStreamer process or multiple for a multi-camera system on Jetson?

One process. Run all pipelines within a single GStreamer process using GLib mainloop and multiple pipelines, or use nvmultistreamtiler for tiled display. Multiple processes fight for VIC contexts and cause serialization. Single-process architectures share the hardware context pool efficiently.

Does mixing NVMM and system memory in a GStreamer pipeline hurt performance?

Yes, significantly. When nvvidconv receives a buffer in system memory, it must copy it to hardware memory before processing, then copy the result back. This adds two DMA transfers per frame. Always keep buffers in NVMM memory throughout the pipeline — use nvarguscamerasrc, nvv4l2decoder, or nvmultistreamtiler as sources, not appsrc with system memory.

What is the DRM issue that makes nvvidconv 10x slower on Jetson?

On some Jetson configurations, loading the DRM (Direct Rendering Manager) display subsystem causes nvvidconv to switch to a slower buffer mapping path. This happens when the display stack initializes before the NVMM memory manager is ready. Headless deployments that load DRM without a display can trigger this. Disabling DRM in headless configs or loading nvmm drivers before DRM resolves it.

nvvidconv performance collapse with multiple GStreamer processes on Jetson

nvvidconv performance collapses when multiple GStreamer processes compete for NVIDIA’s video conversion hardware simultaneously. The root cause is that nvvidconv uses a shared hardware resource, and concurrent access from separate processes causes serialization and latency spikes that look like pipeline bugs. The fix is to consolidate streams into a single pipeline.

Key Insights

Multiple GStreamer processes each using nvvidconv on Jetson serialize on the VIC (Video Image Composer) hardware — throughput can drop 5–10x
VIC has a fixed number of concurrent contexts; when all are in use, other processes queue and wait
The fix is architectural: run all pipelines in one GStreamer process, not multiple separate processes
Keep every buffer in NVMM memory throughout the pipeline — system memory buffers add two DMA copies per frame
On headless Jetson, DRM loading can also cause nvvidconv to use a slower path — disable DRM or run jetson_clocks on boot

What’s actually happening inside nvvidconv

nvvidconv is not a pure software color converter. It uses VIC — the Video Image Composer hardware unit on Jetson — for color space conversion and image scaling. VIC is a fixed-function hardware block separate from both the GPU and the CPU.

VIC has a fixed number of concurrent execution contexts. On Orin-class hardware, this is small — typically 2 to 4 contexts depending on the variant. Each GStreamer process that calls nvvidconv acquires a VIC context when it processes a buffer.

When you run two processes each with a nvvidconv element handling 30 fps video, here’s what happens:

Process A acquires VIC context, submits frame 0
Process B tries to acquire VIC context, waits (all contexts in use)
Process A completes, releases context
Process B acquires context, submits its frame 0
Process A tries to acquire for frame 1, waits again

Both pipelines end up running at roughly half the speed they’d achieve alone. With three processes it’s worse. The nvvidconv elements in each process appear to work — no errors, no log messages — but frames queue up and the pipeline stalls.

The forum thread that surfaced this problem had a 9-reply discussion because the symptoms look like a hardware limitation (“Jetson can’t handle this many streams”) when it’s actually a process architecture problem.

How to confirm VIC contention

tegrastats is the tool. Run it while your pipelines are active:

sudo tegrastats --interval 500

Look for the VIC% field in the output:

RAM 2048/7764MB ... VIC 98% ...

If VIC% is near 100% and you have multiple processes, you’re hitting contention. Each process’s nvvidconv is queuing up behind the others. GPU% and CPU% may look fine — this is exclusively a VIC bottleneck.

For comparison, run a single pipeline alone and check VIC%. If it’s 40% for one pipeline and 98% with two pipelines, the math confirms serialization.

You can also use tegrastats --verbose on some JetPack versions for per-engine breakdowns.

The architectural fix: single process, multiple pipelines

The right architecture for multi-camera or multi-stream processing on Jetson is one GStreamer process running multiple pipelines, not multiple separate processes.

Before (broken architecture):

# Process 1
gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! nvvidconv ! appsink &

# Process 2
gst-launch-1.0 nvarguscamerasrc sensor-id=1 ! nvvidconv ! appsink &

Each process holds its own VIC context. They serialize.

After (correct architecture):

# Single process, compositor handles both streams
gst-launch-1.0 \
  nvarguscamerasrc sensor-id=0 ! \
    nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! \
    nvmultistreamtiler.sink_0 \
  nvarguscamerasrc sensor-id=1 ! \
    nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! \
    nvmultistreamtiler.sink_1 \
  nvmultistreamtiler rows=1 columns=2 width=3840 height=1080 ! \
    nvvidconv ! nvoverlaysink

Or in your application code, use the GStreamer API to create multiple GstPipeline objects within a single process, each running in the shared mainloop:

GstPipeline *pipeline1 = GST_PIPELINE(gst_pipeline_new("cam0"));
GstPipeline *pipeline2 = GST_PIPELINE(gst_pipeline_new("cam1"));
// Both share VIC contexts within the same process
GMainLoop *loop = g_main_loop_new(NULL, FALSE);
g_main_loop_run(loop);

Within a single process, VIC contexts are shared efficiently across pipelines. The serialization problem goes away.

Keep buffers in NVMM memory throughout the pipeline

A secondary cause of nvvidconv slowdowns — separate from multi-process contention — is buffer memory type mixing.

When nvvidconv receives a GStreamer buffer allocated in system memory (CPU-accessible, non-NVMM), it has to:

Copy the buffer from system memory to hardware memory (DMA transfer)
Process it in VIC
Copy the result back

That’s two extra DMA transfers per frame. At 4K30, this adds up fast.

Keeping buffers in NVMM memory throughout the pipeline eliminates both copies:

# Bad: system memory in the middle
... ! nvvidconv ! video/x-raw,format=NV12 ! nvvidconv ! ...
#                ^^^^^^^^ no memory:NVMM = system memory

# Good: NVMM throughout
... ! nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! nvvidconv ! ...

Check every caps negotiation point in your pipeline with GST_DEBUG=2 and look for buffer allocations that don’t include memory:NVMM. Each one is a DMA copy insertion point.

The DRM loading problem on headless Jetson

On headless Jetson deployments (no display connected), some configurations load the DRM (Direct Rendering Manager) subsystem at boot despite having no display. This can cause nvvidconv to initialize its buffer mapping in a compatibility mode that’s significantly slower — sometimes an order of magnitude slower.

The symptom: nvvidconv performs fine when tested with a monitor connected, but is slow in headless production.

Fix options:

Add nvpmodel -m 0 && jetson_clocks to your boot sequence to ensure clocks are at full speed before the pipeline starts
Disable display output if truly headless: add video=efifb:off to kernel command line
Load the NVMM drivers explicitly before the pipeline starts (the nvcamerasrc or nvjpeg initialization handles this in many cases)

For more on GStreamer pipeline performance generally, see our GStreamer pipeline performance on Jetson post. If you’re running CSI cameras into these pipelines and hitting frame drops before the VIC stage, see V4L2 uncorr_err on Jetson for the upstream diagnosis.

The GStreamer nvvidconv element is documented in NVIDIA’s Accelerated GStreamer User Guide. For multi-stream pipeline design patterns, the GStreamer documentation on pipeline construction covers the fundamentals that apply across all platforms.

Multi-process nvvidconv checklist

Problem	Fix
Multiple `gst-launch` processes competing	Refactor to single process with multiple pipelines
VIC% at 100% with two or more pipelines	Confirm single-process architecture
NVMM buffers becoming system memory mid-pipeline	Add `memory:NVMM` to caps at every handoff point
Slow on headless, fast with display	Disable DRM or run `jetson_clocks` on boot
nvvidconv inside a loop or per-frame callback	Move to pipeline — don’t call nvvidconv manually per frame