That being said, the switch statement implementation will almost always compile to smaller code. (If you don't believe that this is the case then a simple mental exercise should convince you that it is--or better yet, write some code. A look-up table, by the way, will typically be faster.)
Not really, at least not in the generalized way you argued. Generally speaking, for a small lookup table, the memory footprint will be rather equivalent to using switch statements. As soon as you have to create a lookup table with more entries, the memory footprint for an equivalent implementation with switch blocks will grow larger.
The reason is simple: Adding a new state and the necessary new state transitions to a switch-based implementation does not only add the data for the new state+transitions, but it also adds new code. Whereas, in a lookup-based implementation, only the data for the new state+transitions will be added, but no additional code.
Now, the question would be at which point (number of states+transistions) the switch-based solution becomes significantly worse than a lookup-based implementation. That is not easy to answer, as it will depend on a number of factors (the CPU's instruction set, memory barriers, performance requirements)...
Perhaps you were thinking about a particular implementation of a lookup table? If you think about a look-up table being implemented as a simple array whose elements represent state transitions (the current state information and the event are used as index into this array), then even for a rather few states the switch-based implementation will be larger. Compare memory footprint for the number of required switch-blocks vs. code to calculate array index from current state+event and access an element in one or two arrays plus the memory footprint of the array(s) itself. I wager, that for even half a dozen states, the switch-based approach will have a larger memory footprint... (Of course, if you have to use a data structures other than arrays, then half a dozen of states will in all likelihood not yield a memory gain vs. switch statements.)
Jakeypoo's article was an introduction to state machines--seemingly aimed at Arduino users. I think he made the right decision to use the switch statement implementation. He likely would have had to do a lot of sidetracking if he had used the look-up table method. He likely would have been teaching two new concepts. Why muddy the waters?
What "sidetracking", and what second new concept, are you talking about? You mean the typical Arduino programmer would already be overwhelmed when being confronted with basic data structures?