Another thought on RLE -- this time, not so much that it's bad, but the most likely way in which it could work -- if your 16-bit samples are divided into 8-bit bytes, then we expect the high byte to change less, and less often, than the low byte which is basically noise (uncompressible). It's still more likely the high byte changes between consecutive samples, I'm thinking, but in the case where it doesn't, RLE on every other byte would work. You'd then have to interleave the run with literal low-bytes, and merge them together for the output stream.
Basically, RLE works fantastic on sequences with little high-frequency content -- where values don't often change from point to point -- but you have literally the opposite case here, it's largely high frequency, when it's anything at all.
And, intuitively at least, it's unlikely that any other kind of skip-N pattern would apply -- perhaps the values [nearly] repeat every so and so, perhaps following Hadamard sequences or something -- but there are a LOT of sequences to check for (sequency is equivalent to frequency i.e. you're just doing a crummy DFT), and it's highly unlikely that the signal goes through a perfect (remember, bit-perfect) cycle over any given span -- but that is at least a little more likely if restricted to the upper bits, hence the above note.
Which also means that LZ compression may be difficult -- I bet it could perform a lot better if the high and low bytes are sliced apart, by the same logic as above; but for the typical case, it's unlikely that more than modest length patterns can be found. Again: they must be bit-perfect matches. FLAC seems to do well enough with typical audio (ca. 50%?) so it must not be too bad. But that's also mostly low frequency or impulsive content (i.e., percussion dominates a lot of the HF spectrum so we might expect it to be relatively sparse, while the LF part is more diverse).
This is probably a bit redundant, as I see FLAC is described as "tailored to audio formats", it's not just naive LZ compression -- likely they take advantage of bit-slicing, and also most music is stereo which contains considerable redundancy, and probably both together are where the main savings are found? Stereo is a good point, since your signal is mono you won't have that redundancy to save either.
So, aside from noise blanking, bit slicing and some lucky pattern repeats, there's probably not much else to save here, at least unless there's something much more specific (but also potentially lossy, even if minimally so) about the signal that can be operated on (like, how spectrally pure are these whistles, would a wavelet transform give low error?).
Tim