Products > ChatGPT/AI
Performance of Qualcomm's NPU
(1/1)
madires:
Benchmarking Qualcomm's NPU on the Microsoft Surface Tablet (https://github.com/usefulsensors/qc_npu_benchmark):
--- Quote ---TL;DR - We see 1.3% of Qualcomm's NPU 45 Teraops/s claim when benchmarking Windows AI PCs
--- End quote ---
--- Quote ---The first obvious thing is that the NPU results, even without float conversion, are slower than the CPU.
--- End quote ---
SiliconWizard:
That sounds unfortunate. But we just need more NPUs. :popcorn:
Marco:
--- Quote ---By contrast, running the same model on an Nvidia Geforce RTX 4080 Laptop GPU runs in 3.2ms, an equivalent of 2,160 billion operations per second, almost four times the throughput.
--- End quote ---
If you are that far from compute bound on a RTX4080 maybe reconsider your ability to make a compute bound benchmark. I'm pretty sure people get far far more than that on prefill with good code on small transformer models.
DimitriP:
45 trillion operations per second was the limit of their reimagination for the prerelease product [3].
https://www.microsoft.com/en-us/surface/devices/surface-pro-11th-edition#sup3
Marco:
Their tensors all have batch size 1 ... this will never be compute bound.
Navigation
[0] Message Index
Go to full version