Not sure how you'd implement it; software emulation of existing routines? (i.e. sub out, or trap, FPU instructions, replacing with calls to your custom implementations instead.) That could work for special-case routines that handle everything in the FPU then dump out the final result; but anything that handles the arguments in memory must immediately drop the modified format. And that's probably a very narrow case where such a method is applicable. And, needless to say, it will run far slower than hardware FP.
If
double or better is available (in hardware), you're definitely better off using that, truncating the result if needed.
I guess the use case is, if you only have
single hardware available, so anything extended beyond that, will be significant software effort anyway? (
double on such a system might not be wholly software implemented -- hardware multiply helps considerably, if nothing else -- but division and transcendental functions likely benefit less?)
Assuming such a library were available, well tested, good performing, and ready to go -- I suppose it would be fine? Presumably it could perform better than
double on such a system. But therein lies the problem: every compiler and its uncle has IEEE-754 support. Are you going to write and test that entire library yourself? (And you still need to convert back to common formats, sooner or later -- even if you implement every single libc function that handles floats, you can't account for 3rd party libs. Well, fine, you might avoid those in your own project that no one else ever touches. But if it's your own thing, just go and do it however you see fit?
)
As for hacks of functions, if the function is continuous*, you could perhaps take the derivative at a few values around the desired value, and interpolate based on that. Note that this takes on the order of N evaluations to test N extended-precision parameters, times 2^B sample points to (hopefully) win an additional B bits of precision. The samples would probably be arranged in a N-dimensional grid, each axis spaced according to the inverse partial derivative of its parameter, so that on average there's about 1LSB between adjacent results, say.
*In a... somewhat roundabout and complicated manner, since this is a discrete function mimicking a real-valued one. The epsilon-delta definition is still good I think, but rather than letting their values be arbitrary, stipulate that they must cover a region ⊇ sampling region. And that the curvature within that region meets certain assumptions, limited by whatever interpolation method we happen to choose (e.g., instead of multilinear interpolation, Bezier splines could be used, of order, at most, corresponding to the width of the sample space).
Note that, if we assume ideal inputs to the function, and the function itself is perfect, but it loses precision on its output (bits have been rounded out): this is the sense in which we can increase resolution. Compare with the situation of dithering an ADC: the random noise smooths out the quantization noise on average, and since the step sizes (between any given adjacent values) generally vary, we can use this to also reduce DNL. We cannot however reduce INL, which manifests as a systematic error over this process. Note that, for random additive noise, the sample size is 2^(2B), because the variance goes as 1/sqrt(N) for N samples; we might hope to have better luck here (with a complete grid, rather than a random sampling of that space)
So, too, we can only hope to get extended precision with respect to what the function itself represents, or is. If it's supposed to represent some more ideal characteristic (like sin(x) say), it can only be as good as its own curve fits. You can't add higher order polynomial terms to a function that simply doesn't have them. (Well, sort of. A sampling process will have the effect of convolving the function with the filter function over the sampled area -- and that won't be exactly the original polynomial, either. I'm not actually sure what this comes out to, offhand.) We can reduce local errors due to rounding (also assuming the rounding is evenly distributed among every operation between input and output), but the systematic error is not removable.
Interesting thought, I guess, if unfortunate: it's probably pretty clear, this would only ever make sense as an last resort. Like, you have some black-box function that you can't even diassemble or RE any other way than by sampling it for values. I mean, even then, you should be able to collect enough data to reconstruct it; for it to still be a reasonably continuous function (that would be susceptible to the above method), it can't be inordinately complex. (Let's say, an implementation of an elliptic function, which will use some manner of approximation, which we could figure out; compare to a function on the Mandelbrot set, which is anything but continuous! Though, knowing that it's a specific type of discontinuous function, we might still be able to solve for it.)
Tim