Start with Octave (a free Matlab replacement) first and then test your implementations on a PC. Once you know your algorithms work it is time to add real embedded hardware in the mix. By the way: you should seriously consider using a regular ARM based microcontrollers. These have lots of processing power and are more suitable for general purpose work as well. I'd only consider a DSP if I really need lots of processing power.
There’s a lot to agree with here.
The OP has a number of significant hoops to jump through.
Understanding basic DSP IMHO is best done on the PC first.
Octave isn’t a bad start, but frankly you can also do an awful lot with just Excel - yes, a spreadsheet - and you’ll learn a fair bit while you’re at it.
Get those algorithms working with various generated test data sets on your PC in batch mode first, in Octave and then C.
The drawback of PCs is latency. But that does not stop you using ready-made wav files and processing and listening to them real time. The benefit of the PC is that you can generate massive amounts of reproducible test dataon demand, and test in a native environment.
My personal method is to always have working C code with the algorthm working on both the PC and the embedded hardware, it makes it easy to separate if your errors are due to hardware or algorithm. Sometimes you have to hit assembly language, and have a knowledge of pipelines, memory busses, wait states, optimum register loading techniques and maximising throughput for a given target processor. Sometimes that means re-working your algorithms to minimise memory I/O and maximise target processor core register usage, such as on ARM.
M4F indeed isn’t a bad way to go, but it isn’t the same as a “proper” DSP, but that is probably a good thing when you’re starting out. Just be aware that a few of the CMSIS DSP functions are not optimised, despite what you might think, but they are definitely a good place to start. The floating point on M4F I found to be similar speed
overall to using its fixed point DSP extensions in many applications especially once optimised.
On the hardware side, for audio you need a demo board with an audio codec and a decent set of example demo code. Start by getting it to generate a sine wave, the blinky of DSP.
Unless the OP already has a good grounding in FPGA, I really wouldn’t recommend touching FPGA with a barge pole: the OP already has to deal with understanding DSP itself. The complexities of getting an FPGA based solution to work is a horrendously steep learning curve and an unnecessary distraction for what they are looking to achieve. That distraction is far more likely to leave the OP completely overwhelmed, with no end result at all.
I would also resist the Linux route, again there is a latency issue, but also if you are looking to make something low power that’s battery powered from a PP3 or a pair of AAs, Linux is probably something best avoided.
Stick to the PC, plus a microcontroller (preferably with floting point) demo board with an audio codec.