UTF8 is one thing, but UTF16 is another, and ISO-8859-1 is still around, even Ye Olde CP437 from time to time. And the graphic rendered for a given character varies widely with the font.
The great thing about standards is, there's so many of them ...and developers just pick whichever they're familiar with, or whatever their tools provide (more or less transparently, at least until something breaks). Which, in turn, might involve more well-informed, but nonetheless backwards-compatible, decisions.
I don't know that there's even any necessary (required, or even reliable) correspondence between graphic and code with OCR documents. Considering how ugly some of those algorithms are, it's a wonder they're text-searchable at all.
Tim