A time-dumb but space-smart approach is to use successive subtraction to perform division.
For example, assume the following:
//return: quotient of dividend / divsor, plus remainder
uint32_t div10(uint32_t dividend, uint32_t divisor, and uint32_t *remainder);
Something like this may work for you:
//convert val (0...9999) to a string in vRAM[]
vRAM[0]=div10(val, 1000, &val) + '0';
vRAM[1]=div10(val, 100, &val) + '0';
vRAM[2]=div10(val, 10, &val) + '0';
vRAM[3]=val + '0';
Writing div10() is fairly easy.
edit: compiled under an old winavr, the routine takes 148bytes of flash, unoptimized; 26 bytes of flash, optimized.