That to me indicates that the values that syntax333 uses to call arm_cfft_f32 are 12, 0, 0, 0, 10, 0, 0, 0, 13, 0, 0, 0 etc (12+0i, 0+0i, 10+0i, 0+0i, 13+0i, 0+0i, ...) instead of 12, 0, 10, 0, 13, 0 etc (12+0i, 10+0i, 13+0i, ...).
I modified my wording to hopefully make it clear: by "samples", I mean logical samples as in complex numbers if doing a complex DFT, and real numbers if doing a real DFT; not elements of the array supplied to some specific arm*fft*() call. (Remember, we want the discussion to encompass the Matlab or other-tool derived DFTs for comparison.)
(In the frequency domain, I prefer to use the term 'bin', as in 'frequency bin'. These are always complex numbers, and only the first half are useful, since the latter half corresponds to negative frequencies and thus contain no additional information, just a mirrored copy. The very first bin is the DC component, i.e. average of the input samples.)
Assuming OP is using
github.com/ARM-software/CMSIS/blob/master/CMSIS/DSP_Lib/Source/TransformFunctions/arm_cfft_f32.c:arm_cfft_f32() (or an earlier version), then there are several issues in OP's approach. One is the extra zero insertion, the other is using complex FFT when the samples are purely real.
As I showed above, OP's current ARM code uses a DFT over 512 complex samples, where every odd complex number is zero, 0.0+0.0i, and the last 256 complex numbers are also zero, 0.0+0.0i. The first 128 even complex samples are the real values copied from the data OP supplied.
For the
arm_cfft_f32(), the input array contents are indeed
{ 12.0f, 0.0f, 0.0f, 0.0f, 10.0f, 0.0f, 0.0f, 0.0f, 13.0f, 0.0f, 0.0f, 0.0f, ... }, because for
arm_cfft_f32(), each consecutive pair of input
f32's forms one complex sample.
The proper functions to use with real samples is
arm_rfft_fast_f32() (which supersedes the older
arm_rfft_f32() and related functions). The input array to this is simply the float samples; the output array is an array of floats, with each bin described by two consecutive elements. The first pair is the DC bin, with imaginary part always zero (because the DC component for real samples is always real).
Exactly where OP's error is, I cannot tell. It could be that
tempBuf_1[] already contains complex samples, i.e. all odd indexes (imaginary parts) are already zero, and the call to PreFFTProcess() expands that so that every other complex sample, i.e. every other f32 pair, supplied to arm_cfft_f32(), ends up being zero. Or, it could be a bug in PreFFTProcess() itself. Also, we haven't seen the contents of the
arm_cfft_instance_f32 fftInstance; its
fftLen is the number of samples, not the number of
f32's, supplied to
arm_cfft_f32(); here, it should be 256, not 512.