What is the difference between APU and NPU on MediaTek Genio?

APU (AI Processing Unit) is MediaTek's umbrella term for the entire AI hardware subsystem. NPU (Neural Processing Unit) refers specifically to the programmable inference cores within that subsystem. MediaTek's published TOPS figures include NPU cores and MDLA combined. The NPU runs NeuroPilot and ONNX Runtime inference directly.

What does the VPU do on MediaTek Genio, and is it related to AI?

The VPU (Video Processing Unit) handles hardware-accelerated video encode and decode only — H.264, H.265, VP9, and AV1 depending on the platform. It has nothing to do with AI inference. In GStreamer it is exposed through mtkvdec and mtkh265enc elements, which require the NDA build.

What is MDLA and how does it differ from the NPU?

MDLA (MediaTek Deep Learning Accelerator) is a fixed-function matrix accelerator within the APU. It is extremely fast for supported operations — primarily convolutions and matrix multiply in CNN inference — but falls back to NPU for unsupported ops. TFLite models compiled with ncc-tflite target the MDLA.

Does the Genio 350 have an NPU or MDLA?

No. The Genio 350 (MT8365) has no NPU or MDLA. It has a Mali-G52 GPU that can run OpenCL-based inference, but there is no dedicated AI accelerator. For NPU workloads, the minimum platform is Genio 510 or 700.

Why do MediaTek's TOPS numbers not match real inference performance?

Peak TOPS is measured under ideal conditions with fully supported ops running entirely on MDLA. Real models have mixed operations, some of which fall back to NPU or CPU. Each fallback reduces throughput significantly. Always benchmark your specific model rather than relying on the SoC spec sheet.

APU, NPU, VPU, and MDLA on MediaTek Genio: what each one does

MediaTek uses four terms — APU, NPU, VPU, and MDLA — that appear throughout the Genio documentation, forum posts, and NeuroPilot SDK. They refer to four distinct hardware blocks with different functions, and confusing them leads to misrouted workloads and incorrect performance expectations. Here is what each one actually does.

Key Insights

VPU is for video codec only — it is not an AI accelerator and has no connection to the NPU or MDLA
APU is the marketing name for the entire AI subsystem; NPU and MDLA are the actual hardware blocks inside it
MDLA is fast for convolutions but falls back to NPU for unsupported ops — your model’s layer mix determines real throughput
Genio 350 has no NPU or MDLA; the minimum AI-capable platform is Genio 510/700
Published TOPS figures combine NPU and MDLA under peak conditions — benchmark your own model

What is the VPU?

The VPU (Video Processing Unit) is the hardware video codec engine. It handles:

Decode: H.264, H.265, VP9, AV1 (platform-dependent)
Encode: H.264, H.265

In GStreamer, the VPU is accessed through two proprietary elements included in the NDA BSP:

mtkvdec — hardware video decoder
mtkh265enc — hardware H.265 encoder

Without the NDA build, these elements are not available and video encode/decode falls back to CPU-based elements (avdec_h264, x264enc). The VPU has no relationship to AI inference. It does not appear in NeuroPilot, ONNX Runtime, or TFLite contexts.

What is the APU?

APU stands for AI Processing Unit. It is MediaTek’s umbrella term for the entire AI hardware subsystem on the SoC. When MediaTek publishes TOPS figures in datasheets or marketing materials, those figures refer to the APU as a whole — which internally contains the NPU cores and the MDLA.

The APU is the correct term when referring to the hardware block in system-level discussions or power management. In software, you do not program the APU directly — you target the NPU or MDLA through NeuroPilot, ONNX Runtime, or TFLite.

What is the NPU?

The NPU (Neural Processing Unit) refers to the programmable inference cores within the APU. It supports a broad range of operation types and is the execution target for:

NeuroPilot SDK (neuron_delegate)
ONNX Runtime with NeuronExecutionProvider
TFLite with Neuron delegate

The NPU can handle operations that the MDLA cannot — activations, normalizations, recurrent layers, and custom ops. It is more flexible but slower than the MDLA for operations they share.

NPU generation differs across Genio platforms:

Genio 510/700: NPU Gen 1 (APUSYS 2.0)
Genio 520/720: NPU Gen 2 (APUSYS 2.5) with MDLA 5.x
Genio 1200: dual MDLA with NPU Gen 2

What is MDLA?

MDLA (MediaTek Deep Learning Accelerator) is a fixed-function matrix accelerator within the APU. It is optimized specifically for the operations that dominate CNN inference — 2D convolutions, depthwise convolutions, matrix multiply, and batch normalization.

For models where these operations account for most of the compute (ResNet, MobileNet, YOLO), the MDLA provides the highest throughput and lowest power. TFLite models compiled with ncc-tflite are partitioned so supported layers go to MDLA and the rest fall back to NPU or CPU.

The TOPS numbers MediaTek publishes represent MDLA peak throughput. A model that runs 100% on MDLA achieves that number. A model with 20% unsupported ops will see noticeably less.

How do they map across Genio platforms?

Platform	SoC	VPU	NPU Gen	MDLA	AI TOPS
Genio 350	MT8365	H.264/H.265 decode	None	None	—
Genio 510	MT8370	H.264/H.265/VP9	Gen 1	None	~2 TOPS
Genio 700	MT8395	H.264/H.265/VP9/AV1	Gen 1	MDLA 4.x	~4 TOPS
Genio 520	MT8371	H.264/H.265/VP9	Gen 2	MDLA 5.x	~4 TOPS
Genio 720	MT8395S	H.264/H.265/VP9/AV1	Gen 2	MDLA 5.x	~6 TOPS
Genio 1200	MT8395	H.264/H.265/VP9/AV1	Gen 2	Dual MDLA 5.x	~8 TOPS

TOPS figures are approximate and reflect typical published values. Actual throughput depends on model and op support.

For guidance on choosing between these platforms, see Genio 510 vs 700 vs 1200: which MediaTek module for your product.

When to use which

VPU: Any video encode or decode workload. Use mtkvdec in GStreamer. Keep video processing on the VPU to avoid burning GPU or CPU on codec work.

NPU via NeuroPilot: General inference workloads. Use when your model has operations the MDLA does not support, or when using ONNX Runtime with NeuronExecutionProvider.

MDLA via ncc-tflite: CNN-heavy models (object detection, image classification, segmentation). Compile your TFLite model with ncc-tflite to partition it for MDLA execution. Best throughput per watt for supported architectures.

GPU (Mali): OpenCL workloads, compute shaders, image processing. Not the primary inference path in NeuroPilot but useful for preprocessing and custom operators.

Working on AI inference on Genio and need help with NeuroPilot, model conversion, or BSP integration? ProventusNova specializes in embedded AI on MediaTek Genio. See our edge AI deployment service or get in touch.