Dividing by 25.4 is the same as multiplying by 0.0393700787401574797685910...

For ranges less than a meter, multiplying by 0.03937008 will give you the correct answer to over six digits. Add 0.5 ULP before discarding superfluous digits, and you'll even get correct rounding.

You'd need a result/scratchpad of 14 digits for long BCD multiplication, dddddd × 3937008 (but the highest digit will always be zero). The constant multiplier only has four unique digits, so you could just use four ten-byte tables, one each for the BDC multiplication by 3, 7, 8, and 9. Also remember that the digits can be done in any order, as long as you mathematically calculate the same dddddd × 3937008. I probably would do

dddddd×8 + dddddd×7000 + dddddd×3030000 + dddddd×900000 + 5000000

with each digit in a separate byte, noting that the third one just does the additions twice per digit, and the last addition is for rounding. 12th and 13th digits form the full inches, and 8th through 11th digit contains the ten-thousandths of an inch, rounded to the nearest ten-thousandth.

You can simply ignore the carries, doing a hundreds ripple digit pass before the multiplication by nine, and a full tens digit pass at the end. (3+9+3+7+8)×9 = 270 so leaving it at end could overflow, but doing it before the multiplication by nine, and after the addition of five million, would work just fine.

A good trick here is to repeatedly compare to 100, and substract by 100 and increment +2 digit. For tens, you might do a compare to 30, substract by 30, and add +3 to next higher digit, from low to high; followed by a simple compare to 10 and substract 10 and increment next higher digit. The total number of comparison iterations stays surprisingly low.