Thank you for your response.
The filter is not optimized out because it is actually working and I can see that Quartus has assigned some logic to it.
It's quite unexpected but I've found that neither Analysis nor Fitter are to blame for not using embedded multipliers. The FIR Compiler II itself decides how many DSP blocks or LUTs to use. Changing parameters like Clock Frequency, Input Sample Rate or decimation factor options can cause the compiler to use different amount of DSP blocks.
Sadly, FIR Compiler II doesn't want to use DSP blocks for my single rate filter when its Clock Frequency is equal to Sample rate.
What I am doing is a CIC decimator with a FIR compensator filter. The input of the CIC is 20 MS/s, then I clock the FIR filter at 400 kHz, then I decimate it twice again by sampling at 200 kHz. It looks like clocking the FIR filter at 20 MHz and using its internal Altera-provided 'decimation' saves a lot of logic elements. How does it work? I thought that the structure of a filter should be the same no matter how I decimate my signal before it.
Sorry if my questions are stupid, I'm quite new to DSP.