jetsongstreamernvvidconvperformancevicpipeline

nvvidconv performance collapse with multiple GStreamer processes on Jetson

Andres Campos ·

Key Insights

  • Multiple GStreamer processes each using nvvidconv on Jetson serialize on the VIC (Video Image Composer) hardware — throughput can drop 5–10x
  • VIC has a fixed number of concurrent contexts; when all are in use, other processes queue and wait
  • The fix is architectural: run all pipelines in one GStreamer process, not multiple separate processes
  • Keep every buffer in NVMM memory throughout the pipeline — system memory buffers add two DMA copies per frame
  • On headless Jetson, DRM loading can also cause nvvidconv to use a slower path — disable DRM or run jetson_clocks on boot

What’s actually happening inside nvvidconv

nvvidconv is not a pure software color converter. It uses VIC — the Video Image Composer hardware unit on Jetson — for color space conversion and image scaling. VIC is a fixed-function hardware block separate from both the GPU and the CPU.

VIC has a fixed number of concurrent execution contexts. On Orin-class hardware, this is small — typically 2 to 4 contexts depending on the variant. Each GStreamer process that calls nvvidconv acquires a VIC context when it processes a buffer.

When you run two processes each with a nvvidconv element handling 30 fps video, here’s what happens:

  • Process A acquires VIC context, submits frame 0
  • Process B tries to acquire VIC context, waits (all contexts in use)
  • Process A completes, releases context
  • Process B acquires context, submits its frame 0
  • Process A tries to acquire for frame 1, waits again

Both pipelines end up running at roughly half the speed they’d achieve alone. With three processes it’s worse. The nvvidconv elements in each process appear to work — no errors, no log messages — but frames queue up and the pipeline stalls.

The forum thread that surfaced this problem had a 9-reply discussion because the symptoms look like a hardware limitation (“Jetson can’t handle this many streams”) when it’s actually a process architecture problem.

How to confirm VIC contention

tegrastats is the tool. Run it while your pipelines are active:

sudo tegrastats --interval 500

Look for the VIC% field in the output:

RAM 2048/7764MB ... VIC 98% ...

If VIC% is near 100% and you have multiple processes, you’re hitting contention. Each process’s nvvidconv is queuing up behind the others. GPU% and CPU% may look fine — this is exclusively a VIC bottleneck.

For comparison, run a single pipeline alone and check VIC%. If it’s 40% for one pipeline and 98% with two pipelines, the math confirms serialization.

You can also use tegrastats --verbose on some JetPack versions for per-engine breakdowns.

The architectural fix: single process, multiple pipelines

The right architecture for multi-camera or multi-stream processing on Jetson is one GStreamer process running multiple pipelines, not multiple separate processes.

Before (broken architecture):

# Process 1
gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! nvvidconv ! appsink &

# Process 2
gst-launch-1.0 nvarguscamerasrc sensor-id=1 ! nvvidconv ! appsink &

Each process holds its own VIC context. They serialize.

After (correct architecture):

# Single process, compositor handles both streams
gst-launch-1.0 \
  nvarguscamerasrc sensor-id=0 ! \
    nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! \
    nvmultistreamtiler.sink_0 \
  nvarguscamerasrc sensor-id=1 ! \
    nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! \
    nvmultistreamtiler.sink_1 \
  nvmultistreamtiler rows=1 columns=2 width=3840 height=1080 ! \
    nvvidconv ! nvoverlaysink

Or in your application code, use the GStreamer API to create multiple GstPipeline objects within a single process, each running in the shared mainloop:

GstPipeline *pipeline1 = GST_PIPELINE(gst_pipeline_new("cam0"));
GstPipeline *pipeline2 = GST_PIPELINE(gst_pipeline_new("cam1"));
// Both share VIC contexts within the same process
GMainLoop *loop = g_main_loop_new(NULL, FALSE);
g_main_loop_run(loop);

Within a single process, VIC contexts are shared efficiently across pipelines. The serialization problem goes away.

Keep buffers in NVMM memory throughout the pipeline

A secondary cause of nvvidconv slowdowns — separate from multi-process contention — is buffer memory type mixing.

When nvvidconv receives a GStreamer buffer allocated in system memory (CPU-accessible, non-NVMM), it has to:

  1. Copy the buffer from system memory to hardware memory (DMA transfer)
  2. Process it in VIC
  3. Copy the result back

That’s two extra DMA transfers per frame. At 4K30, this adds up fast.

Keeping buffers in NVMM memory throughout the pipeline eliminates both copies:

# Bad: system memory in the middle
... ! nvvidconv ! video/x-raw,format=NV12 ! nvvidconv ! ...
#                ^^^^^^^^ no memory:NVMM = system memory

# Good: NVMM throughout
... ! nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! nvvidconv ! ...

Check every caps negotiation point in your pipeline with GST_DEBUG=2 and look for buffer allocations that don’t include memory:NVMM. Each one is a DMA copy insertion point.

The DRM loading problem on headless Jetson

On headless Jetson deployments (no display connected), some configurations load the DRM (Direct Rendering Manager) subsystem at boot despite having no display. This can cause nvvidconv to initialize its buffer mapping in a compatibility mode that’s significantly slower — sometimes an order of magnitude slower.

The symptom: nvvidconv performs fine when tested with a monitor connected, but is slow in headless production.

Fix options:

  • Add nvpmodel -m 0 && jetson_clocks to your boot sequence to ensure clocks are at full speed before the pipeline starts
  • Disable display output if truly headless: add video=efifb:off to kernel command line
  • Load the NVMM drivers explicitly before the pipeline starts (the nvcamerasrc or nvjpeg initialization handles this in many cases)

For more on GStreamer pipeline performance generally, see our GStreamer pipeline performance on Jetson post. If you’re running CSI cameras into these pipelines and hitting frame drops before the VIC stage, see V4L2 uncorr_err on Jetson for the upstream diagnosis.

The GStreamer nvvidconv element is documented in NVIDIA’s Accelerated GStreamer User Guide. For multi-stream pipeline design patterns, the GStreamer documentation on pipeline construction covers the fundamentals that apply across all platforms.

Multi-process nvvidconv checklist

ProblemFix
Multiple gst-launch processes competingRefactor to single process with multiple pipelines
VIC% at 100% with two or more pipelinesConfirm single-process architecture
NVMM buffers becoming system memory mid-pipelineAdd memory:NVMM to caps at every handoff point
Slow on headless, fast with displayDisable DRM or run jetson_clocks on boot
nvvidconv inside a loop or per-frame callbackMove to pipeline — don’t call nvvidconv manually per frame

Frequently Asked Questions

Why does nvvidconv get slower when multiple GStreamer processes run on Jetson?

nvvidconv uses the VIC hardware unit for processing. VIC has a limited number of concurrent contexts. When multiple processes each hold a VIC context, they serialize on the hardware, and each process sees dramatically lower throughput than when running alone.

How do I check VIC utilization on Jetson?

Use sudo tegrastats and look for the VIC% column. Unlike GPU utilization, VIC utilization doesn’t appear in nvidia-smi. High VIC% with multiple pipelines confirms hardware contention.

Should I use one GStreamer process or multiple for a multi-camera system on Jetson?

One process. Run all pipelines in a single GStreamer process using GLib mainloop and multiple GstPipeline objects, or use nvmultistreamtiler. Multiple processes fight for VIC contexts and cause serialization.

Does mixing NVMM and system memory in a GStreamer pipeline hurt performance?

Yes. When nvvidconv receives a system-memory buffer, it adds two DMA transfers per frame. Keep all buffers in NVMM memory throughout the pipeline using memory:NVMM caps.

What is the DRM issue that makes nvvidconv 10x slower on Jetson?

Loading the DRM display subsystem on headless Jetson can cause nvvidconv to use a slower buffer mapping path. Disabling DRM on headless configs or ensuring NVMM drivers are initialized before DRM resolves it.


ProventusNova optimizes GStreamer pipelines for production Jetson deployments — multi-camera, multi-stream, and headless. If nvvidconv performance is blocking your product, book a scoping call.