How to Reduce End-to-End Latency in a GStreamer Pipeline on Jetson

Q: What causes high latency in a GStreamer pipeline on Jetson?

The three most common causes are: buffering in queue elements with large max-size-buffers settings, unnecessary CPU copies caused by leaving the NVMM memory path, and display sync enabled on the output sink. Each adds independent latency that compounds.

Q: Does setting sync=false on a GStreamer sink reduce latency?

Yes. With sync=true (the default), GStreamer waits for the presentation timestamp before rendering each frame. With sync=false, frames are rendered as fast as they arrive. For live camera feeds, sync=false is almost always the right choice.

Q: What is NVMM memory and why does it affect GStreamer latency on Jetson?

NVMM is GPU-mapped DMA memory. As long as buffers stay in NVMM format (memory:NVMM), they move between elements without CPU copies. Every transition to system RAM forces a DMA copy, which adds latency and CPU load. Keep the pipeline in NVMM as long as possible.

Q: How do I measure GStreamer pipeline latency on Jetson?

Use GST_DEBUG_NO_COLOR=1 GST_DEBUG=GST_LATENCY:5 to enable GStreamer's built-in latency tracing. For frame-to-frame timing, add a custom appsink probe and timestamp buffers. The latency element added with 'latency' property also reports per-element contribution.

Q: Should I use nvv4l2decoder or avdec_h264 for decoding on Jetson?

Always use nvv4l2decoder on Jetson. It uses the hardware VPU and keeps output in NVMM. avdec_h264 is a software decoder that produces system RAM buffers and will be significantly slower, particularly at 1080p and above.

A GStreamer pipeline that works correctly at low load often develops latency problems in production — frame delay that grows over time, display output that lags behind the capture stream, or simply sluggish end-to-end response. On Jetson, most of these problems come from three sources: buffering, CPU copies, and display sync. Each is fixable once you know where to look.

Key Insights

The biggest single latency fix on a live camera pipeline is setting sync=false on the output sink — it can reduce display latency from hundreds of milliseconds to single digits
Staying in NVMM memory (video/x-raw(memory:NVMM)) for as long as possible avoids DMA copies that add both latency and CPU load
Queue elements with default settings buffer up to 200 frames — on a live feed, reduce max-size-buffers to 1 and set leaky=2 to drop old frames
Hardware decode with nvv4l2decoder keeps the output in NVMM and runs on the VPU, not the CPU; software decoders (avdec_h264) defeat this entirely
Measuring latency with GST_DEBUG=GST_LATENCY:5 before tuning tells you which element is actually the bottleneck

The four latency sources and how to fix them

1. Display sync

The default sync=true on any GStreamer sink tells the element to hold each frame until its presentation timestamp arrives. For a live camera feed, this introduces as much latency as the buffer depth allows. Set sync=false on your output sink:

# Before: renders at presentation timestamp, adds buffering delay
nvvidconv ! nv3dsink

# After: renders frames immediately as they arrive
nvvidconv ! nv3dsink sync=false

This is the first thing to change on any live pipeline. The tradeoff is that without sync, the pipeline will not pace itself — if upstream produces faster than the display can consume, you need leaky queues (below) to manage the overflow.

2. Queue buffering

GStreamer’s queue element buffers up to 200 frames by default. On a 30fps camera feed, that is almost 7 seconds of latency sitting in the queue. For a live pipeline where you always want the most recent frame:

# Buffers up to 200 frames — up to ~6.7s latency at 30fps
queue

# Buffers 1 frame, drops old frames when full
queue max-size-buffers=1 leaky=2

leaky=2 means the queue drops the oldest buffer when full (downstream leaky). This keeps the output at the most recent frame rather than working through a backlog.

3. NVMM memory path

Every transition from video/x-raw(memory:NVMM) to video/x-raw (system RAM) forces a DMA copy through the CPU. This adds latency, increases CPU load, and defeats hardware acceleration. The rule is simple: stay in NVMM until you absolutely need system RAM.

# This pipeline forces a CPU copy between capture and display
nvarguscamerasrc ! video/x-raw(memory:NVMM) ! nvvidconv ! video/x-raw,format=BGRx ! nv3dsink sync=false

# This stays in NVMM all the way to the sink — no CPU copy
nvarguscamerasrc ! video/x-raw(memory:NVMM) ! nvvidconv ! video/x-raw(memory:NVMM),format=I420 ! nv3dsink sync=false

If you need system RAM for a specific consumer (OpenCV, Python appsink), extract it as late in the pipeline as possible and keep the display path in NVMM on a separate branch via tee.

4. Software decode

If your pipeline includes video decoding, the decoder choice matters significantly. avdec_h264 is a libav software decoder that runs on the CPU and outputs system RAM. nvv4l2decoder is the Jetson hardware VPU decoder that outputs NVMM:

# Software decoder — CPU load, system RAM output, higher latency
... ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! ...

# Hardware decoder — VPU, NVMM output, lower latency
... ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! ...

At 1080p30, the difference is typically 20 to 40ms of additional latency with avdec_h264, plus significant CPU overhead.

A low-latency camera display pipeline

Combining all four fixes:

gst-launch-1.0 \
  nvarguscamerasrc sensor-id=0 ! \
  video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12,framerate=30/1 ! \
  queue max-size-buffers=1 leaky=2 ! \
  nvvidconv ! \
  video/x-raw(memory:NVMM),format=I420 ! \
  nv3dsink sync=false

This keeps the full pipeline in NVMM, drops stale frames from the queue, and renders immediately without presentation timestamp sync.

Measuring before you tune

Before changing anything, enable latency tracing to see where the time actually goes:

GST_DEBUG_NO_COLOR=1 GST_DEBUG=GST_LATENCY:5 gst-launch-1.0 \
  nvarguscamerasrc sensor-id=0 ! \
  video/x-raw(memory:NVMM),format=NV12 ! \
  nvvidconv ! nv3dsink sync=false 2>&1 | grep latency

This logs the latency contribution of each element. Most pipelines show the queue or the display sync as the dominant source, not the conversion elements.

For more complete pipeline examples including RTSP streaming and multi-camera setups, the GStreamer pipeline examples for Jetson post covers the full range of production configurations.

Frequently Asked Questions

What causes high latency in a GStreamer pipeline on Jetson?

The three most common causes are large queue buffers, unnecessary CPU copies from leaving the NVMM memory path, and display sync enabled on the output sink. Each adds independent latency that compounds.

Does setting sync=false on a GStreamer sink reduce latency?

Yes. With sync=true (the default), GStreamer holds frames until their presentation timestamp. With sync=false, frames render immediately on arrival. For live camera feeds, sync=false is almost always the right choice.

What is NVMM memory and why does it affect GStreamer latency on Jetson?

NVMM is GPU-mapped DMA memory. Buffers in NVMM move between elements without CPU copies. Every transition to system RAM forces a DMA copy, adding latency and CPU load. Keep the pipeline in NVMM as long as possible.

How do I measure GStreamer pipeline latency on Jetson?

Use GST_DEBUG=GST_LATENCY:5 to enable GStreamer’s built-in latency tracing. This logs each element’s contribution and makes the bottleneck visible before you start tuning.

Should I use nvv4l2decoder or avdec_h264 for decoding on Jetson?

Always use nvv4l2decoder on Jetson. It uses the hardware VPU and keeps output in NVMM. avdec_h264 is a software decoder that produces system RAM buffers and runs significantly slower, particularly at 1080p and above.

ProventusNova builds and optimizes GStreamer pipelines for production Jetson deployments. Talk to us about your pipeline.

How to Reduce End-to-End Latency in a GStreamer Pipeline on Jetson

Key Insights

The four latency sources and how to fix them

1. Display sync

2. Queue buffering

3. NVMM memory path

4. Software decode

A low-latency camera display pipeline

Measuring before you tune

Frequently Asked Questions

What causes high latency in a GStreamer pipeline on Jetson?

Does setting sync=false on a GStreamer sink reduce latency?

What is NVMM memory and why does it affect GStreamer latency on Jetson?

How do I measure GStreamer pipeline latency on Jetson?

Should I use nvv4l2decoder or avdec_h264 for decoding on Jetson?

Stuck on a Jetson bring-up?

Frequently Asked Questions

Related Articles