Hi.
Hopefully someone on here can enlighten me as to why something seems to be stripping my instrinsics.
My initial had the same problem so I’m just trying something simple now, load up two vectors rotate all the elements right with a narrow to 16-bit words, pack it to a 16x8, then save it :
int32x4V1 = vld1q_s32(src);
int32x4V2 = vld1q_s32(src+4);
int16x4V1 = vshrn_n_s32(int32x4V1, 1);
int16x4V2 = vshrn_n_s32(int32x4V2, 1);
int16x8_t cmb = vcombine_s16(int16x4V1,int16x4V2);
vst1q_s16(dst, cmb);
src+=8;
dst+=8;
If I look in the assembly output from GCC it looks good:
308 02ee 61F98D0A vld1.32 {d16-d17}, [r1]!
309 02f2 61F98F2A vld1.32 {d18-d19}, [r1]
310 02f6 DFEF3008 vshrn.i32 d16, q8, #1
311 02fa DFEF3228 vshrn.i32 d18, q9, #1
312 02fe 62EFB211 vmov d17, d18 @ v4hi
313 0302 42F94D0A vst1.16 {d16-d17}, [r2]!
But the disassembly from Eclipse shows something completely different (only shown part because its huge):
272 int32x4V2 = vld1q_s32(src+4);
00012258: ldr r3, [r11, #-8]
0001225c: ldr r2, [r3]
00012260: movw r3, #53276
00012264: movt r3, #129
00012268: ldr r3, [r3]
0001226c: asr r3, r2, r3
00012270: sxth r2, r3
00012274: ldr r3, [r11, #-12]
00012278: strh r2, [r3]
273 int16x4V1 = vshrn_n_s32(int32x4V1, 1);
0001227c: ldr r3, [r11, #-8]
00012280: add r3, r3, #4
00012284: str r3, [r11, #-8]
274 int16x4V2 = vshrn_n_s32(int32x4V2, 1);
00012288: ldr r3, [r11, #-12]
0001228c: add r3, r3, #2
00012290: str r3, [r11, #-12]
275 int16x8_t cmb = vcombine_s16(int16x4V1,int16x4V2);
00012294: ldr r3, [r11, #-40]
00012298: add r2, r3, #2
0001229c: str r2, [r11, #-40]
000122a0: ldrh r3, [r3]
000122a4: mov r2, r3
000122a8: ldr r3, [r11, #-8]
000122ac: ldr r3, [r3]
000122b0: add r2, r2, r3
000122b4: ldr r3, [r11, #-8]
000122b8: str r2, [r3]
So it almost looks like its stripping the SIMD instructions out and replacing them with linear versions.
What’s happening here? Does Eclipse disassembler just interpret the instructions wrong or are the SIMD instruction being replaced at some point? Are my compiler options incorrect?
CPU: Zynq7000 – running PetaLinux
GCC: arm-none-linux-gnueabihf 10.2
GCC options (tried many different combinations): -O2 -mfpu=neon -fcommon -march=armv7-a+simd+neon+vfpv3-fp16+neon-fp16+mp "-Wa,-ahl=$*.s"