MediaTek Genio for computer vision: a practical guide

Q: Should I use GStreamer or OpenCV for camera capture on Genio?

Use GStreamer for camera capture and preprocessing. GStreamer uses hardware-accelerated elements (mtk-video decode, ISP pipeline) and keeps frames in device memory. OpenCV VideoCapture works for USB cameras but copies frames through CPU memory on MIPI CSI cameras. Feed preprocessed frames from GStreamer's appsink into OpenCV or TFLite for inference.

Q: What is the fastest way to run object detection on Genio?

The fastest end-to-end path is: GStreamer MIPI CSI capture → hardware ISP → tensor_filter with TFLite INT8 model + NeuronStableDelegate → NNStreamer tensor_decoder → overlay on Wayland display. This path keeps data in device memory between pipeline stages and uses the NPU for inference.

Genio is well-matched to computer vision applications: it has MIPI CSI-2 interfaces with hardware ISP, a Mali GPU for OpenCL/Vulkan compute, and an MDLA NPU for inference. The challenge is wiring these together efficiently — the wrong capture path adds unnecessary CPU copies that hurt latency and throughput. This guide covers the full stack from camera to inference to output.

Key Insights

GStreamer is the right capture layer — it uses hardware ISP and keeps frames in device memory; OpenCV VideoCapture forces a CPU copy on MIPI cameras
NNStreamer bridges GStreamer and inference — it runs TFLite/ONNX inference as a GStreamer element with no copy between the pipeline and the model
INT8 quantized models on the NPU are 4–6× faster than FP32 on CPU; quantize your models before deployment
USB cameras are simpler to start — they appear as standard V4L2 devices, no ISP bring-up required; switch to MIPI CSI for production
OpenCV for post-processing, GStreamer for capture — use each for what it’s good at; don’t use OpenCV VideoCapture for MIPI cameras

Camera capture options

Option 1: USB camera (V4L2 / UVC)

The easiest starting point. USB cameras appear as /dev/videoN on Genio and work with standard V4L2 tools immediately.

# List connected cameras
v4l2-ctl --list-devices

# Check supported formats
v4l2-ctl -d /dev/video0 --list-formats-ext

# Capture test frame
v4l2-ctl -d /dev/video0 --stream-mmap --stream-count=1 \
  --stream-to=frame.raw

GStreamer capture from USB camera:

gst-launch-1.0 \
  v4l2src device=/dev/video0 ! \
  video/x-raw,format=YUY2,width=640,height=480,framerate=30/1 ! \
  videoconvert ! \
  video/x-raw,format=RGB ! \
  autovideosink

Option 2: MIPI CSI camera (ISP pipeline)

MIPI CSI cameras connect to the Genio CSI connector and go through the hardware ISP. The ISP handles auto-exposure, auto-white-balance, and noise reduction automatically.

# Check that the camera sensor was probed
dmesg | grep -i "sensor\|imx\|ov"

# List V4L2 devices including subdevices
v4l2-ctl --list-devices
media-ctl -d /dev/media0 --print-topology

GStreamer MIPI CSI capture (using libcamera on Ubuntu):

gst-launch-1.0 \
  libcamerasrc camera-name="/base/soc/seninf@1a040000/port@0" ! \
  video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! \
  videoconvert ! autovideosink

On Yocto with V4L2 backend:

gst-launch-1.0 \
  v4l2src device=/dev/video0 ! \
  video/x-raw,format=NV12,width=1920,height=1080 ! \
  videoconvert ! autovideosink

OpenCV with GStreamer capture

Avoid cv2.VideoCapture(0) for MIPI CSI cameras — it goes through V4L2 with a CPU copy. Use a GStreamer pipeline with appsink to feed frames into OpenCV:

import cv2
import numpy as np

# GStreamer pipeline that feeds into OpenCV
pipeline = (
    "v4l2src device=/dev/video0 ! "
    "video/x-raw,format=BGR,width=640,height=480,framerate=30/1 ! "
    "appsink name=sink max-buffers=1 drop=true sync=false"
)

cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # frame is a 480x640x3 BGR numpy array
    # Process with OpenCV
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 100, 200)

    cv2.imshow("Edges", edges)
    if cv2.waitKey(1) == ord('q'):
        break

cap.release()

TFLite inference with Neuron Stable Delegate

For object detection (SSD MobileNet example):

import tflite_runtime.interpreter as tflite
import cv2
import numpy as np

# Load model with NPU acceleration
interpreter = tflite.Interpreter(
    model_path="ssd_mobilenet_v2_int8.tflite",
    experimental_delegates=[
        tflite.load_delegate("libNeuronStableDelegate.so")
    ]
)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']  # [1, 300, 300, 3]
height, width = input_shape[1], input_shape[2]

# GStreamer capture
pipeline = (
    f"v4l2src device=/dev/video0 ! "
    f"video/x-raw,format=RGB,width=640,height=480 ! "
    f"videoscale ! video/x-raw,width={width},height={height} ! "
    f"appsink max-buffers=1 drop=true sync=false"
)
cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Preprocess
    input_data = np.expand_dims(frame, axis=0)
    if input_details[0]['dtype'] == np.uint8:
        input_data = input_data.astype(np.uint8)
    else:
        input_data = ((input_data / 255.0 - 0.5) / 0.5).astype(np.float32)

    # Inference
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()

    # SSD MobileNet outputs
    boxes   = interpreter.get_tensor(output_details[0]['index'])[0]
    classes = interpreter.get_tensor(output_details[1]['index'])[0]
    scores  = interpreter.get_tensor(output_details[2]['index'])[0]

    # Draw boxes with score > 0.5
    h, w = frame.shape[:2]
    for i, score in enumerate(scores):
        if score < 0.5:
            break
        ymin, xmin, ymax, xmax = boxes[i]
        cv2.rectangle(frame,
            (int(xmin * w), int(ymin * h)),
            (int(xmax * w), int(ymax * h)),
            (0, 255, 0), 2)

    cv2.imshow("Detection", cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
    if cv2.waitKey(1) == ord('q'):
        break

NNStreamer: inference inside GStreamer

NNStreamer runs TFLite inference as a native GStreamer element, eliminating the copy between the pipeline and the model:

# Object detection with NNStreamer + NPU
gst-launch-1.0 \
  v4l2src device=/dev/video0 ! \
  video/x-raw,format=RGB,width=640,height=480,framerate=30/1 ! \
  videoscale ! video/x-raw,width=300,height=300 ! \
  tensor_converter ! \
  tensor_filter \
    framework=tflite \
    model=ssd_mobilenet_v2_int8.tflite \
    accelerator=true:npu ! \
  tensor_decoder \
    mode=bounding_boxes \
    option1=mobilenet-ssd \
    option2=labels.txt \
    option3=0:1:2:3 \
    option4=640:480 \
    option5=300:300 ! \
  compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! \
  waylandsink \
  v4l2src device=/dev/video0 ! \
    video/x-raw,format=RGB,width=640,height=480 ! \
    mix.sink_0

NNStreamer is included in packagegroup-rity-ai-ml in the RITY Yocto image.

Multi-camera setup

For applications requiring multiple cameras simultaneously:

import threading
import cv2

class CameraThread(threading.Thread):
    def __init__(self, device, name):
        super().__init__()
        self.cap = cv2.VideoCapture(
            f"v4l2src device={device} ! "
            "video/x-raw,format=BGR,width=640,height=480 ! "
            "appsink max-buffers=1 drop=true sync=false",
            cv2.CAP_GSTREAMER
        )
        self.name = name
        self.frame = None
        self.running = True

    def run(self):
        while self.running:
            ret, frame = self.cap.read()
            if ret:
                self.frame = frame

cam0 = CameraThread("/dev/video0", "cam0")
cam1 = CameraThread("/dev/video2", "cam1")
cam0.start()
cam1.start()

Performance tips for CV pipelines on Genio

Use INT8 quantized models. On the Genio NPU, INT8 inference is 2–3× faster than FP16 and uses less memory bandwidth. Quantize with TFLite’s post-training quantization before deployment.

Skip frames if needed. If your pipeline can’t sustain real-time at 30fps, drop frames at the capture stage rather than queuing them. Use max-buffers=1 drop=true on the GStreamer appsink.

Separate capture and inference threads. Camera capture and NPU inference are independent hardware blocks. Running them in separate threads allows full hardware utilization — the NPU runs the previous frame while the ISP captures the next.

Avoid cv2.VideoCapture for MIPI cameras. It forces a CPU copy at every frame. Use GStreamer appsink and receive frames as numpy arrays directly.

For the NPU inference stack details including model conversion and quantization, see on-device AI without the cloud on Genio. For MIPI CSI camera driver bring-up, see MIPI CSI camera driver setup on Genio.

FAQ

What camera interfaces does MediaTek Genio support for computer vision?

Genio supports MIPI CSI-2 cameras (2–3 four-lane interfaces depending on platform) and USB UVC cameras. MIPI CSI cameras go through the hardware ISP via libcamera or V4L2. USB cameras appear as standard V4L2 video devices.

Should I use GStreamer or OpenCV for camera capture on Genio?

Use GStreamer for camera capture. It uses hardware-accelerated elements and keeps frames in device memory. Feed preprocessed frames from GStreamer’s appsink into OpenCV or TFLite for inference.

What is the fastest way to run object detection on Genio?

GStreamer MIPI CSI capture → hardware ISP → tensor_filter with TFLite INT8 model + NeuronStableDelegate → NNStreamer tensor_decoder → overlay on Wayland display. This keeps data in device memory across all stages and uses the NPU for inference.

Does OpenCV support hardware acceleration on Genio?

OpenCV on Genio uses the ARM CPU (NEON SIMD) for most operations. For AI inference, TFLite and ONNX Runtime with NeuronEP are faster than OpenCV’s DNN module because they use the dedicated MDLA NPU.