Author Topic: Performance of Qualcomm's NPU  (Read 632 times)

0 Members and 1 Guest are viewing this topic.

Offline madiresTopic starter

  • Super Contributor
  • ***
  • Posts: 8259
  • Country: de
  • A qualified hobbyist ;)
Performance of Qualcomm's NPU
« on: October 17, 2024, 12:07:56 pm »
Benchmarking Qualcomm's NPU on the Microsoft Surface Tablet (https://github.com/usefulsensors/qc_npu_benchmark):
Quote
TL;DR - We see 1.3% of Qualcomm's NPU 45 Teraops/s claim when benchmarking Windows AI PCs

Quote
The first obvious thing is that the NPU results, even without float conversion, are slower than the CPU.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15736
  • Country: fr
Re: Performance of Qualcomm's NPU
« Reply #1 on: October 17, 2024, 02:26:51 pm »
That sounds unfortunate. But we just need more NPUs. :popcorn:
 

Offline Marco

  • Super Contributor
  • ***
  • Posts: 7021
  • Country: nl
Re: Performance of Qualcomm's NPU
« Reply #2 on: October 17, 2024, 05:22:56 pm »
Quote
By contrast, running the same model on an Nvidia Geforce RTX 4080 Laptop GPU runs in 3.2ms, an equivalent of 2,160 billion operations per second, almost four times the throughput.

If you are that far from compute bound on a RTX4080 maybe reconsider your ability to make a compute bound benchmark. I'm pretty sure people get far far more than that on prefill with good code on small transformer models.
« Last Edit: October 17, 2024, 05:25:39 pm by Marco »
 

Offline DimitriP

  • Super Contributor
  • ***
  • Posts: 1418
  • Country: us
  • "Best practices" are best not practiced.© Dimitri
Re: Performance of Qualcomm's NPU
« Reply #3 on: October 17, 2024, 06:06:53 pm »
45 trillion operations per second was the limit of their reimagination for the prerelease product [3].

https://www.microsoft.com/en-us/surface/devices/surface-pro-11th-edition#sup3
   If three 100  Ohm resistors are connected in parallel, and in series with a 200 Ohm resistor, how many resistors do you have? 
 

Offline Marco

  • Super Contributor
  • ***
  • Posts: 7021
  • Country: nl
Re: Performance of Qualcomm's NPU
« Reply #4 on: October 18, 2024, 06:42:42 am »
Their tensors all have batch size 1 ... this will never be compute bound.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf