nvvidconv performance collapse with multiple GStreamer processes on Jetson
Key Insights
- Multiple GStreamer processes each using
nvvidconvon Jetson serialize on the VIC (Video Image Composer) hardware — throughput can drop 5–10x - VIC has a fixed number of concurrent contexts; when all are in use, other processes queue and wait
- The fix is architectural: run all pipelines in one GStreamer process, not multiple separate processes
- Keep every buffer in NVMM memory throughout the pipeline — system memory buffers add two DMA copies per frame
- On headless Jetson, DRM loading can also cause
nvvidconvto use a slower path — disable DRM or runjetson_clockson boot
What’s actually happening inside nvvidconv
nvvidconv is not a pure software color converter. It uses VIC — the Video Image Composer hardware unit on Jetson — for color space conversion and image scaling. VIC is a fixed-function hardware block separate from both the GPU and the CPU.
VIC has a fixed number of concurrent execution contexts. On Orin-class hardware, this is small — typically 2 to 4 contexts depending on the variant. Each GStreamer process that calls nvvidconv acquires a VIC context when it processes a buffer.
When you run two processes each with a nvvidconv element handling 30 fps video, here’s what happens:
- Process A acquires VIC context, submits frame 0
- Process B tries to acquire VIC context, waits (all contexts in use)
- Process A completes, releases context
- Process B acquires context, submits its frame 0
- Process A tries to acquire for frame 1, waits again
Both pipelines end up running at roughly half the speed they’d achieve alone. With three processes it’s worse. The nvvidconv elements in each process appear to work — no errors, no log messages — but frames queue up and the pipeline stalls.
The forum thread that surfaced this problem had a 9-reply discussion because the symptoms look like a hardware limitation (“Jetson can’t handle this many streams”) when it’s actually a process architecture problem.
How to confirm VIC contention
tegrastats is the tool. Run it while your pipelines are active:
sudo tegrastats --interval 500
Look for the VIC% field in the output:
RAM 2048/7764MB ... VIC 98% ...
If VIC% is near 100% and you have multiple processes, you’re hitting contention. Each process’s nvvidconv is queuing up behind the others. GPU% and CPU% may look fine — this is exclusively a VIC bottleneck.
For comparison, run a single pipeline alone and check VIC%. If it’s 40% for one pipeline and 98% with two pipelines, the math confirms serialization.
You can also use tegrastats --verbose on some JetPack versions for per-engine breakdowns.
The architectural fix: single process, multiple pipelines
The right architecture for multi-camera or multi-stream processing on Jetson is one GStreamer process running multiple pipelines, not multiple separate processes.
Before (broken architecture):
# Process 1
gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! nvvidconv ! appsink &
# Process 2
gst-launch-1.0 nvarguscamerasrc sensor-id=1 ! nvvidconv ! appsink &
Each process holds its own VIC context. They serialize.
After (correct architecture):
# Single process, compositor handles both streams
gst-launch-1.0 \
nvarguscamerasrc sensor-id=0 ! \
nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! \
nvmultistreamtiler.sink_0 \
nvarguscamerasrc sensor-id=1 ! \
nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! \
nvmultistreamtiler.sink_1 \
nvmultistreamtiler rows=1 columns=2 width=3840 height=1080 ! \
nvvidconv ! nvoverlaysink
Or in your application code, use the GStreamer API to create multiple GstPipeline objects within a single process, each running in the shared mainloop:
GstPipeline *pipeline1 = GST_PIPELINE(gst_pipeline_new("cam0"));
GstPipeline *pipeline2 = GST_PIPELINE(gst_pipeline_new("cam1"));
// Both share VIC contexts within the same process
GMainLoop *loop = g_main_loop_new(NULL, FALSE);
g_main_loop_run(loop);
Within a single process, VIC contexts are shared efficiently across pipelines. The serialization problem goes away.
Keep buffers in NVMM memory throughout the pipeline
A secondary cause of nvvidconv slowdowns — separate from multi-process contention — is buffer memory type mixing.
When nvvidconv receives a GStreamer buffer allocated in system memory (CPU-accessible, non-NVMM), it has to:
- Copy the buffer from system memory to hardware memory (DMA transfer)
- Process it in VIC
- Copy the result back
That’s two extra DMA transfers per frame. At 4K30, this adds up fast.
Keeping buffers in NVMM memory throughout the pipeline eliminates both copies:
# Bad: system memory in the middle
... ! nvvidconv ! video/x-raw,format=NV12 ! nvvidconv ! ...
# ^^^^^^^^ no memory:NVMM = system memory
# Good: NVMM throughout
... ! nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! nvvidconv ! ...
Check every caps negotiation point in your pipeline with GST_DEBUG=2 and look for buffer allocations that don’t include memory:NVMM. Each one is a DMA copy insertion point.
The DRM loading problem on headless Jetson
On headless Jetson deployments (no display connected), some configurations load the DRM (Direct Rendering Manager) subsystem at boot despite having no display. This can cause nvvidconv to initialize its buffer mapping in a compatibility mode that’s significantly slower — sometimes an order of magnitude slower.
The symptom: nvvidconv performs fine when tested with a monitor connected, but is slow in headless production.
Fix options:
- Add
nvpmodel -m 0 && jetson_clocksto your boot sequence to ensure clocks are at full speed before the pipeline starts - Disable display output if truly headless: add
video=efifb:offto kernel command line - Load the NVMM drivers explicitly before the pipeline starts (the
nvcamerasrcornvjpeginitialization handles this in many cases)
For more on GStreamer pipeline performance generally, see our GStreamer pipeline performance on Jetson post. If you’re running CSI cameras into these pipelines and hitting frame drops before the VIC stage, see V4L2 uncorr_err on Jetson for the upstream diagnosis.
The GStreamer nvvidconv element is documented in NVIDIA’s Accelerated GStreamer User Guide. For multi-stream pipeline design patterns, the GStreamer documentation on pipeline construction covers the fundamentals that apply across all platforms.
Multi-process nvvidconv checklist
| Problem | Fix |
|---|---|
Multiple gst-launch processes competing | Refactor to single process with multiple pipelines |
| VIC% at 100% with two or more pipelines | Confirm single-process architecture |
| NVMM buffers becoming system memory mid-pipeline | Add memory:NVMM to caps at every handoff point |
| Slow on headless, fast with display | Disable DRM or run jetson_clocks on boot |
| nvvidconv inside a loop or per-frame callback | Move to pipeline — don’t call nvvidconv manually per frame |
Frequently Asked Questions
Why does nvvidconv get slower when multiple GStreamer processes run on Jetson?
nvvidconv uses the VIC hardware unit for processing. VIC has a limited number of concurrent contexts. When multiple processes each hold a VIC context, they serialize on the hardware, and each process sees dramatically lower throughput than when running alone.
How do I check VIC utilization on Jetson?
Use sudo tegrastats and look for the VIC% column. Unlike GPU utilization, VIC utilization doesn’t appear in nvidia-smi. High VIC% with multiple pipelines confirms hardware contention.
Should I use one GStreamer process or multiple for a multi-camera system on Jetson?
One process. Run all pipelines in a single GStreamer process using GLib mainloop and multiple GstPipeline objects, or use nvmultistreamtiler. Multiple processes fight for VIC contexts and cause serialization.
Does mixing NVMM and system memory in a GStreamer pipeline hurt performance?
Yes. When nvvidconv receives a system-memory buffer, it adds two DMA transfers per frame. Keep all buffers in NVMM memory throughout the pipeline using memory:NVMM caps.
What is the DRM issue that makes nvvidconv 10x slower on Jetson?
Loading the DRM display subsystem on headless Jetson can cause nvvidconv to use a slower buffer mapping path. Disabling DRM on headless configs or ensuring NVMM drivers are initialized before DRM resolves it.
ProventusNova optimizes GStreamer pipelines for production Jetson deployments — multi-camera, multi-stream, and headless. If nvvidconv performance is blocking your product, book a scoping call.