EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: AkiTaiyo on March 29, 2021, 02:41:06 pm

Title: NEON Nonsense (ZYNQ-7000)?
Post by: AkiTaiyo on March 29, 2021, 02:41:06 pm
Hi.
Hopefully someone on here can enlighten me as to why something seems to be stripping my instrinsics. 
My initial had the same problem so I’m just trying something simple now, load up two vectors rotate all the elements right with a narrow to 16-bit words, pack it to a 16x8, then save it :
Code: [Select]
			int32x4V1 = vld1q_s32(src);
int32x4V2 = vld1q_s32(src+4);
int16x4V1 = vshrn_n_s32(int32x4V1, 1);
int16x4V2 = vshrn_n_s32(int32x4V2, 1);
int16x8_t cmb = vcombine_s16(int16x4V1,int16x4V2);
vst1q_s16(dst, cmb);
src+=8;
dst+=8;


If I look in the assembly output from GCC it looks good:
Code: [Select]
308 02ee 61F98D0A 		vld1.32	{d16-d17}, [r1]!
 309 02f2 61F98F2A vld1.32 {d18-d19}, [r1]
 310 02f6 DFEF3008 vshrn.i32 d16, q8, #1
 311 02fa DFEF3228 vshrn.i32 d18, q9, #1
 312 02fe 62EFB211 vmov d17, d18  @ v4hi
 313 0302 42F94D0A vst1.16 {d16-d17}, [r2]!

But the disassembly from Eclipse shows something completely different (only shown part because its huge):
Code: [Select]
272       			int32x4V2 = vld1q_s32(src+4);
00012258:   ldr     r3, [r11, #-8]
0001225c:   ldr     r2, [r3]
00012260:   movw    r3, #53276
00012264:   movt    r3, #129
00012268:   ldr     r3, [r3]
0001226c:   asr     r3, r2, r3
00012270:   sxth    r2, r3
00012274:   ldr     r3, [r11, #-12]
00012278:   strh    r2, [r3]
273        int16x4V1 = vshrn_n_s32(int32x4V1, 1);
0001227c:   ldr     r3, [r11, #-8]
00012280:   add     r3, r3, #4
00012284:   str     r3, [r11, #-8]
274        int16x4V2 = vshrn_n_s32(int32x4V2, 1);
00012288:   ldr     r3, [r11, #-12]
0001228c:   add     r3, r3, #2
00012290:   str     r3, [r11, #-12]
275        int16x8_t cmb = vcombine_s16(int16x4V1,int16x4V2);
00012294:   ldr     r3, [r11, #-40]
00012298:   add     r2, r3, #2
0001229c:   str     r2, [r11, #-40]
000122a0:   ldrh    r3, [r3]
000122a4:   mov     r2, r3
000122a8:   ldr     r3, [r11, #-8]
000122ac:   ldr     r3, [r3]
000122b0:   add     r2, r2, r3
000122b4:   ldr     r3, [r11, #-8]
000122b8:   str     r2, [r3]

So it almost looks like its stripping the SIMD instructions out and replacing them with linear versions.
What’s happening here?  Does Eclipse disassembler just interpret the instructions wrong or are the SIMD instruction being replaced at some point?  Are my compiler options incorrect?
CPU: Zynq7000 – running PetaLinux
GCC: arm-none-linux-gnueabihf 10.2
GCC options (tried many different combinations): -O2 -mfpu=neon -fcommon -march=armv7-a+simd+neon+vfpv3-fp16+neon-fp16+mp "-Wa,-ahl=$*.s"
Title: Re: NEON Nonsense (ZYNQ-7000)?
Post by: AkiTaiyo on March 29, 2021, 03:39:39 pm
Never mind... Turns out the GDB wasn't replacing the file on the target...  :palm: