The reason why the C application in the pipeline is avoiding this issue somewhat is due to buffering. I suggest you have a play with different period sizes and buffer sizes with arecord. I see you were specifying the`-B` switch which sets the buffer time, but this doesn't control the number of buffers. The flags to play with are `--period-size` and `--buffer-size`.
That's exactly why I wrote my suggested converter like I did: it does the minimum number of read syscalls to obtain the data already in the pipe that consists of an integer number of convertable samples, up to a compile-time maximum number of samples, and immediately forwards the converted data. Should yield minimum latency, but will be affected by those arecord options.
If one uses
<stdio.h> fread()/fgetc()/getc()/getchar()/fwrite()/fputc()/putchar(), then the standard C library will add its own software buffers in between.
While I do approve of the Unix philosophy, and chaining tools to get complex stuff done, when they start interfering with the task at hand (like dropping data), it is time to write a better tool, methinks.
I already have written a clunky VU meter program for Ed Kloonk on top of ALSA in a different thread around here somewhere, that could be easily modified to do e.g. DFT on the raw data using just ALSA and FFTW3 libraries (and Gtk+3 for the UI, if desired). Then, the optimum buffer/period size would be either one or one half DFT/FFT window (depending on the desired overlap and windowing function). This stuff isn't difficult, except when you are
also fighting against hardware or its drivers... (or are a burned out husk of a man like I am, I guess; which is why I never finished that program for Ed. Sorry, Ed.)