| General > General Technical Chat |
| Google world brainL *trying* to digitize every paper book = FAIL |
| (1/2) > >> |
| Beamin:
EDIT the L is my world brain malfunctioning like googles. *Have you tried to down load an old book, like from the early 1900's off it onto an ebook? A nook in my case. Half the time its just pages of junk characters, why is that? ITs kind of like 100 pages of: $ %^%^ ^*(^| ~ ! ~ + |(&^gb #$^$^bbfd _ & % And it goes on and on found some really interesting reads like: _OF THE WAY METAL ALLOYS HARDEN WITH DIFFERENT COOLING FACTORS BASED ON GRAIN BOUNDARY SIZE CONDITIONS_ You know its an old book when they would make the title almost a blurb but theres lots of great theories and some times its fun to read about things that were definitely wrong like plum pudding flavored electrons. TONS of these old books where type set is pretty modern just comes out as junk, they are all free by the way/ but still you never know what you are going to get. |
| ejeffrey:
Example? Link? |
| T3sl4co1l:
Well.. OCR is a challenging problem, at best. Also, there's a, probably separate issue, often seen with PDFs (but more the ones that are composed as such, not OCR I think): some tools optimize the font / character set, only to what's used, or I suppose sorted by frequency of usage or whatever. You can read perfectly legible text on screen, but select and copy it and it's just gibberish -- albeit a simple substitution cipher, but pretty well useless for copy-pasting anyway. I don't know which of these problems eBooks have; evidently given your experience, they rely on the OCR, and perhaps built-in fonts, to show documents. So a faulty OCR is truly fatal to the experience. Tim |
| BrianHG:
:palm: This is google. They shouldn't be the ones using garbage run of the mil OCR software. They should be using one of their neuronet based deep learning systems powered by a few hundred GPU or dedicated processors with full language and context and layout aware interpreters. Like, Beamin said, FAIL on a pitiful level for a company of this tech caliber. |
| T3sl4co1l:
You're assuming they have infinite resources. Why scan books with your GPU farm when you can make a few extra cents doing instant ad auctions? Tim |
| Navigation |
| Message Index |
| Next page |