and dumped the count by UART every second
That is one of the points I am making- speed of conversion does not matter at all if outputting only every second. Even when you want to dump lots of (formatted) data it matters very little.
You original post was focused on the conversion speed, but if you had actually output all the data at your specified 9600 baud (960 bytes/sec) you would have found out that any formatting you can come up with will be blocking on the uart as it sends out the data, whether you used the arduino print which you had access to or your own custom bcd conversion.
Do what you desire, but using existing code for formatting such as printf or arduino print will be plenty fast. Code size will also be of little concern unless you have a 4k or less mcu. In the case of an avr, the printf code is about 1.5k and once brought in you can use it as much as you want (so use it for everything that needs formatting). For arduino (which I don't use) I imagine no one using that will concern themselves with how the details of formatting are done as they just use print when the need arises.
printf( "%lu\n", my10MHzCounter() ); //%lu for 32bits on avr
Serial.println( my10MHzCounter() );