I usually say that if you can multiply and add, then you know the very essence of DSP
The rest is just elaborate arrangement of those operations.
Traditional characteristic of DSP processor has been indeed the capability to do efficient MAC operations, one MAC per machine cycle. Of course, to be able to sustain that, it is required in addition of 1-cycle multiplier-adder that next operands can be fetched simultaneously. Another thing is automatic saturation in aritmethic, so small overflows of accumulator cause just small clipping instead of wrapping over to opposite sign extreme (which would be disastrous in control applications). This requirement of operand fetches usually excludes Von Neumann-type architectures with one data and code memory (although there exists some kludges to make simultaneous accesses, like in TI DSP).
The first DSP chip I learned about was the Motorola (Now Freescale) DSP56002, where you could write the following assembly line which I think condenses the very essence of FIR-filter into one instruction:
mac x0, y0, a X:(r0)+,x0 y:(r4)+,y0
That performs quite many simultaneous operations in one machine cycle. First, it multiplies x0 and y0 registers and adds the result to the accumulator a. Second, it pre-fetches x0 and y0 from x and y data memories (yes, there were 3 independent memory spaces, code, X-data and Y-data). Third, it postincrements index registers r0 and r4 after the fetches, while taking care of the modulo (index registers reset back to buffer beginning automagically without separate if()'s). When combined with REP instruction (which repeated following instruction n times without any software loops), this makes very machine cycle efficient code. The whole FIR-filter could be written in, IIRC, 4 assembly instructions (excluding possible initializations). I wonder how you could achieve that in C without resorting to inline assembly. But in the other hand, it usually only makes sense to optimize the most time-intensive partitions of code and program rest in C.
dsPIC actually looks like that it has gotten some ideas from 56k, there are even x and y data spaces
Recently, I have done some experiments with Altera FPGA multipliers (3-way phase linear active crossover), and they make too quite nice platform to do these things, as FPGA is a parallel beast in nature, and many DSP operations parallelize easily.
Another other closely DSP-related area is digital control, especially advanced AC-motor control (like FOC) is very important nowadays to improve energy efficiency of motor drives. I believe that motor control applications is one of the most important application area of the dsPICs.
Regards,
Janne