There are some libraries available that use polynomial based linearization, but those are not really suitable for embedded applications as floating point operations and math functions take a lot of program space.
I disagree... If you think floats and math.h are required for this sort of thing, you're not trying nearly hard enough... if at all!

If that's the case, then I suggest embracing fixed point. Do the hard work and figure out what's actually going on, in the core, down to the bit level if possible! It's hard work, but it's rewarding, and often worthwhile.
Just don't get carried away: standard advice -- save the optimization for the end, once you've got a working program, trimmed down, that finally needs it!
Anyway, I've written a 7th order polynomial function -- for thermistors, not thermocouples -- which runs on AVR and doesn't take too much time (around 300 cycles), and little memory (74 words, including the coefficient table I think). It could be optimized better for either constraint, if desired.
The function converts a 12 bit ADC reading to a 12.4 fixed point temperature with < +/- 0.25C error from the manufacturer's typical data. Actual error of a given 1% thermistor will probably be worse than this, so I consider it more than sufficient numerically. The math is exact, in that the final result is rounded no worse than a high-precision calculation truncated to the same scale, give or take one LSB.
The alternative on this particular platform would be a massive 4k word lookup table with a few cycles worth of code to grab it, or a reduced lookup table with linear interpolation. I consider the former unsuitable because of its size. I consider the latter unsuitable for both size (the minimal table still requires ~40 words) and computation time (the divisions required for a linear interpolation will probably not cost much space, but will take over 400 cycles easily; the table lookup also has to be either a linear or binary search, which takes more time, and is variable).
I also considered a more creative approximation particular to thermistors of this type (a sum-of-reciprocals form), which took around 420 cycles and 81 words. Not bad, but even a single crude division costs more than four MACs, so it's just not a good approach.
These methods could easily be applied to 16 and 32 bit platforms; the sum-of-reciprocals would probably end up preferable on a platform without hardware multiply (although these days, how many architectures don't offer it?), but otherwise, the boring old polynomial is hard to beat.
For thermocouples in particular, there might be some interesting analytical functions that could be applied (that error term has sort of an oscillating tail to it, which might be a close fit to some polynomial or trig functions, or combinations thereof), which might lead to an interesting solution. But otherwise, yeah, the polynomial method again is hard to beat. Especially with how easily most platforms can calculate it.
On something like an ARM Cortex M4, there should be plenty of resources (bits, clock cycles, hardware, and powerful instructions) to pull together even more bits (16+ bit ADC?), a higher order polynomial (11+ order polynomial would give a near-exact fit to manufacturers' data) and desired output formatting (whatever number format, or string, you want) in even fewer cycles.
BTW, I developed the functions and coefficients for these functions with only a spreadsheet. Matlab/Octave might be a little neater method -- certainly if larger datasets were involved. The spreadsheet is kind of ugly at times (I needed a matrix of Chebychev polynomials to get the "solver" to do anything -- the raw polynomial was simply too awkward to manipulate directly), but I'm reasonably certain that the coefficients were close to optimal in any of the approaches I've mentioned. This includes simulating the derived (and truncated) coefficients, in fixed point, as the final hardware would: which again isn't hard to do, just a little messy. Main thing is to make sure you get the rounding right! (I also checked the "simulation" against the real
in silico* results, which were exact.)
*Err... wait.
in silico is usually meant as "we simulated it on a computer". But what if I'm comparing a computer...to a simulation of that computer? Is the simulation doubly-in-silico?!.....nevermind.
Tim