Products > ChatGPT/AI

Performance of Qualcomm's NPU

(1/1)

madires:
Benchmarking Qualcomm's NPU on the Microsoft Surface Tablet (https://github.com/usefulsensors/qc_npu_benchmark):

--- Quote ---TL;DR - We see 1.3% of Qualcomm's NPU 45 Teraops/s claim when benchmarking Windows AI PCs

--- End quote ---


--- Quote ---The first obvious thing is that the NPU results, even without float conversion, are slower than the CPU.

--- End quote ---

SiliconWizard:
That sounds unfortunate. But we just need more NPUs. :popcorn:

Marco:

--- Quote ---By contrast, running the same model on an Nvidia Geforce RTX 4080 Laptop GPU runs in 3.2ms, an equivalent of 2,160 billion operations per second, almost four times the throughput.
--- End quote ---

If you are that far from compute bound on a RTX4080 maybe reconsider your ability to make a compute bound benchmark. I'm pretty sure people get far far more than that on prefill with good code on small transformer models.

DimitriP:
45 trillion operations per second was the limit of their reimagination for the prerelease product [3].

https://www.microsoft.com/en-us/surface/devices/surface-pro-11th-edition#sup3

Marco:
Their tensors all have batch size 1 ... this will never be compute bound.

Navigation

[0] Message Index

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod