MediaTek Genio for computer vision: a practical guide
Genio is well-matched to computer vision applications: it has MIPI CSI-2 interfaces with hardware ISP, a Mali GPU for OpenCL/Vulkan compute, and an MDLA NPU for inference. The challenge is wiring these together efficiently — the wrong capture path adds unnecessary CPU copies that hurt latency and throughput. This guide covers the full stack from camera to inference to output.
Key Insights
- GStreamer is the right capture layer — it uses hardware ISP and keeps frames in device memory; OpenCV VideoCapture forces a CPU copy on MIPI cameras
- NNStreamer bridges GStreamer and inference — it runs TFLite/ONNX inference as a GStreamer element with no copy between the pipeline and the model
- INT8 quantized models on the NPU are 4–6× faster than FP32 on CPU; quantize your models before deployment
- USB cameras are simpler to start — they appear as standard V4L2 devices, no ISP bring-up required; switch to MIPI CSI for production
- OpenCV for post-processing, GStreamer for capture — use each for what it’s good at; don’t use OpenCV VideoCapture for MIPI cameras
Camera capture options
Option 1: USB camera (V4L2 / UVC)
The easiest starting point. USB cameras appear as /dev/videoN on Genio and work with standard V4L2 tools immediately.
# List connected cameras
v4l2-ctl --list-devices
# Check supported formats
v4l2-ctl -d /dev/video0 --list-formats-ext
# Capture test frame
v4l2-ctl -d /dev/video0 --stream-mmap --stream-count=1 \
--stream-to=frame.raw
GStreamer capture from USB camera:
gst-launch-1.0 \
v4l2src device=/dev/video0 ! \
video/x-raw,format=YUY2,width=640,height=480,framerate=30/1 ! \
videoconvert ! \
video/x-raw,format=RGB ! \
autovideosink
Option 2: MIPI CSI camera (ISP pipeline)
MIPI CSI cameras connect to the Genio CSI connector and go through the hardware ISP. The ISP handles auto-exposure, auto-white-balance, and noise reduction automatically.
# Check that the camera sensor was probed
dmesg | grep -i "sensor\|imx\|ov"
# List V4L2 devices including subdevices
v4l2-ctl --list-devices
media-ctl -d /dev/media0 --print-topology
GStreamer MIPI CSI capture (using libcamera on Ubuntu):
gst-launch-1.0 \
libcamerasrc camera-name="/base/soc/seninf@1a040000/port@0" ! \
video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! \
videoconvert ! autovideosink
On Yocto with V4L2 backend:
gst-launch-1.0 \
v4l2src device=/dev/video0 ! \
video/x-raw,format=NV12,width=1920,height=1080 ! \
videoconvert ! autovideosink
OpenCV with GStreamer capture
Avoid cv2.VideoCapture(0) for MIPI CSI cameras — it goes through V4L2 with a CPU copy. Use a GStreamer pipeline with appsink to feed frames into OpenCV:
import cv2
import numpy as np
# GStreamer pipeline that feeds into OpenCV
pipeline = (
"v4l2src device=/dev/video0 ! "
"video/x-raw,format=BGR,width=640,height=480,framerate=30/1 ! "
"appsink name=sink max-buffers=1 drop=true sync=false"
)
cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)
while True:
ret, frame = cap.read()
if not ret:
break
# frame is a 480x640x3 BGR numpy array
# Process with OpenCV
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 100, 200)
cv2.imshow("Edges", edges)
if cv2.waitKey(1) == ord('q'):
break
cap.release()
TFLite inference with Neuron Stable Delegate
For object detection (SSD MobileNet example):
import tflite_runtime.interpreter as tflite
import cv2
import numpy as np
# Load model with NPU acceleration
interpreter = tflite.Interpreter(
model_path="ssd_mobilenet_v2_int8.tflite",
experimental_delegates=[
tflite.load_delegate("libNeuronStableDelegate.so")
]
)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape'] # [1, 300, 300, 3]
height, width = input_shape[1], input_shape[2]
# GStreamer capture
pipeline = (
f"v4l2src device=/dev/video0 ! "
f"video/x-raw,format=RGB,width=640,height=480 ! "
f"videoscale ! video/x-raw,width={width},height={height} ! "
f"appsink max-buffers=1 drop=true sync=false"
)
cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)
while True:
ret, frame = cap.read()
if not ret:
break
# Preprocess
input_data = np.expand_dims(frame, axis=0)
if input_details[0]['dtype'] == np.uint8:
input_data = input_data.astype(np.uint8)
else:
input_data = ((input_data / 255.0 - 0.5) / 0.5).astype(np.float32)
# Inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# SSD MobileNet outputs
boxes = interpreter.get_tensor(output_details[0]['index'])[0]
classes = interpreter.get_tensor(output_details[1]['index'])[0]
scores = interpreter.get_tensor(output_details[2]['index'])[0]
# Draw boxes with score > 0.5
h, w = frame.shape[:2]
for i, score in enumerate(scores):
if score < 0.5:
break
ymin, xmin, ymax, xmax = boxes[i]
cv2.rectangle(frame,
(int(xmin * w), int(ymin * h)),
(int(xmax * w), int(ymax * h)),
(0, 255, 0), 2)
cv2.imshow("Detection", cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
if cv2.waitKey(1) == ord('q'):
break
NNStreamer: inference inside GStreamer
NNStreamer runs TFLite inference as a native GStreamer element, eliminating the copy between the pipeline and the model:
# Object detection with NNStreamer + NPU
gst-launch-1.0 \
v4l2src device=/dev/video0 ! \
video/x-raw,format=RGB,width=640,height=480,framerate=30/1 ! \
videoscale ! video/x-raw,width=300,height=300 ! \
tensor_converter ! \
tensor_filter \
framework=tflite \
model=ssd_mobilenet_v2_int8.tflite \
accelerator=true:npu ! \
tensor_decoder \
mode=bounding_boxes \
option1=mobilenet-ssd \
option2=labels.txt \
option3=0:1:2:3 \
option4=640:480 \
option5=300:300 ! \
compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! \
waylandsink \
v4l2src device=/dev/video0 ! \
video/x-raw,format=RGB,width=640,height=480 ! \
mix.sink_0
NNStreamer is included in packagegroup-rity-ai-ml in the RITY Yocto image.
Multi-camera setup
For applications requiring multiple cameras simultaneously:
import threading
import cv2
class CameraThread(threading.Thread):
def __init__(self, device, name):
super().__init__()
self.cap = cv2.VideoCapture(
f"v4l2src device={device} ! "
"video/x-raw,format=BGR,width=640,height=480 ! "
"appsink max-buffers=1 drop=true sync=false",
cv2.CAP_GSTREAMER
)
self.name = name
self.frame = None
self.running = True
def run(self):
while self.running:
ret, frame = self.cap.read()
if ret:
self.frame = frame
cam0 = CameraThread("/dev/video0", "cam0")
cam1 = CameraThread("/dev/video2", "cam1")
cam0.start()
cam1.start()
Performance tips for CV pipelines on Genio
Use INT8 quantized models. On the Genio NPU, INT8 inference is 2–3× faster than FP16 and uses less memory bandwidth. Quantize with TFLite’s post-training quantization before deployment.
Skip frames if needed. If your pipeline can’t sustain real-time at 30fps, drop frames at the capture stage rather than queuing them. Use max-buffers=1 drop=true on the GStreamer appsink.
Separate capture and inference threads. Camera capture and NPU inference are independent hardware blocks. Running them in separate threads allows full hardware utilization — the NPU runs the previous frame while the ISP captures the next.
Avoid cv2.VideoCapture for MIPI cameras. It forces a CPU copy at every frame. Use GStreamer appsink and receive frames as numpy arrays directly.
For the NPU inference stack details including model conversion and quantization, see on-device AI without the cloud on Genio. For MIPI CSI camera driver bring-up, see MIPI CSI camera driver setup on Genio.
FAQ
What camera interfaces does MediaTek Genio support for computer vision?
Genio supports MIPI CSI-2 cameras (2–3 four-lane interfaces depending on platform) and USB UVC cameras. MIPI CSI cameras go through the hardware ISP via libcamera or V4L2. USB cameras appear as standard V4L2 video devices.
Should I use GStreamer or OpenCV for camera capture on Genio?
Use GStreamer for camera capture. It uses hardware-accelerated elements and keeps frames in device memory. Feed preprocessed frames from GStreamer’s appsink into OpenCV or TFLite for inference.
What is the fastest way to run object detection on Genio?
GStreamer MIPI CSI capture → hardware ISP → tensor_filter with TFLite INT8 model + NeuronStableDelegate → NNStreamer tensor_decoder → overlay on Wayland display. This keeps data in device memory across all stages and uses the NPU for inference.
Does OpenCV support hardware acceleration on Genio?
OpenCV on Genio uses the ARM CPU (NEON SIMD) for most operations. For AI inference, TFLite and ONNX Runtime with NeuronEP are faster than OpenCV’s DNN module because they use the dedicated MDLA NPU.
Relevant Services
MediaTek Genio Expert Support
Building on MediaTek Genio?
BSP bring-up, GStreamer pipelines, NeuroPilot integration, we've shipped it. Get unblocked fast. One call to scope it, fixed bid to deliver it.
Frequently Asked Questions
What camera interfaces does MediaTek Genio support for computer vision?
Genio supports MIPI CSI-2 cameras (2–3 four-lane interfaces depending on platform) and USB UVC cameras. MIPI CSI cameras go through the hardware ISP via libcamera or V4L2. USB cameras appear as standard V4L2 video devices. For multi-camera setups, MIPI CSI is preferred due to lower latency and synchronization support.
Should I use GStreamer or OpenCV for camera capture on Genio?
Use GStreamer for camera capture and preprocessing. GStreamer uses hardware-accelerated elements (mtk-video decode, ISP pipeline) and keeps frames in device memory. OpenCV VideoCapture works for USB cameras but copies frames through CPU memory on MIPI CSI cameras. Feed preprocessed frames from GStreamer's appsink into OpenCV or TFLite for inference.
What is the fastest way to run object detection on Genio?
The fastest end-to-end path is: GStreamer MIPI CSI capture → hardware ISP → tensor_filter with TFLite INT8 model + NeuronStableDelegate → NNStreamer tensor_decoder → overlay on Wayland display. This path keeps data in device memory between pipeline stages and uses the NPU for inference.
Does OpenCV support hardware acceleration on Genio?
OpenCV on Genio uses the ARM CPU (NEON SIMD) for most operations. OpenCV's OpenCL backend can offload some ops to the Mali GPU if built with OpenCL support. For AI inference, TFLite and ONNX Runtime with NeuronEP are faster than OpenCV's DNN module on Genio because they use the dedicated MDLA NPU.
Written by
Aarón AnguloCo-Founder & CEO · ProventusNova
Obsessed with client outcomes. Aarón ensures every engagement delivers real results, on time, on scope, no exceptions.
Connect on LinkedInRelated Articles
On-device AI without the cloud on MediaTek Genio
Run AI inference on MediaTek Genio without cloud. NeuroPilot NPU, TFLite, ONNX Runtime, model conversion, and practical deployment patterns for edge AI.
MIPI CSI camera driver setup on MediaTek Genio
How to set up a MIPI CSI-2 camera on MediaTek Genio: device tree, sensor driver, seninf, V4L2 pipeline, and capturing frames with GStreamer.
ISP differences between Genio 510/700 and Genio 520/720
What changed in the MediaTek Genio ISP between Gen 1 (510/700) and Gen 2 (520/720): pipeline depth, virtual channels, and camera driver differences.
APU, NPU, VPU, and MDLA on MediaTek Genio: what each one does
Clear explanation of APU, NPU, VPU, and MDLA on MediaTek Genio. What each accelerator handles, which Genio SoCs include them, and when to use each.