Jimmyjames
Elite Member
- Joined
- Jul 13, 2022
- Posts
- 1,429
This investigation into the M4 ANE has been passed around various online forums and networks today. So far there are two parts.
Part 1:
https://maderix.substack.com/p/inside-the-m4-apple-neural-engine
Part 2:
maderix.substack.com
In part 1 they claim to access the ANE without use of CoreML to discover how it works. In part 2 they investigate Apple’s “38 TOPS” claim. They believe this is incorrect. Stating that FP16 and INT8 have equal performance but that INT8 is “dequantized” which allows memory savings. They claim that the figure of 38 TOPS comes from Apple using industry convention of taking FP16 performance and doubling it for the TOPS number, even though the ANE does not do that. As they say:
I’m a little surprised to see this. Geekbench AI frequently shows a 40% uplift for INT8 over FP16. Perhaps the memory savings explain this performance improvement?
Furthermore is there really a convention of quoting INT8? I could be misremembering but I thought the TOPS number is very often unclear in terms of how it has been achieved. I believe that some manufacturers have quoted INT4 (Qualcomm Hexagon NPU) and some have accumulated TOPS figures from across CPU, GPU and NPU (Intel Lunar Lake).
In any case. Perhaps some will find something interesting or of value in these articles.
Part 1:
https://maderix.substack.com/p/inside-the-m4-apple-neural-engine
Part 2:
Inside the M4 Apple Neural Engine, Part 2: ANE Benchmarks
Measuring the real performance of Apple's neural accelerator
In part 1 they claim to access the ANE without use of CoreML to discover how it works. In part 2 they investigate Apple’s “38 TOPS” claim. They believe this is incorrect. Stating that FP16 and INT8 have equal performance but that INT8 is “dequantized” which allows memory savings. They claim that the figure of 38 TOPS comes from Apple using industry convention of taking FP16 performance and doubling it for the TOPS number, even though the ANE does not do that. As they say:
I’m a little surprised to see this. Geekbench AI frequently shows a 40% uplift for INT8 over FP16. Perhaps the memory savings explain this performance improvement?
Furthermore is there really a convention of quoting INT8? I could be misremembering but I thought the TOPS number is very often unclear in terms of how it has been achieved. I believe that some manufacturers have quoted INT4 (Qualcomm Hexagon NPU) and some have accumulated TOPS figures from across CPU, GPU and NPU (Intel Lunar Lake).
In any case. Perhaps some will find something interesting or of value in these articles.