Author Topic: Google world brainL *trying* to digitize every paper book = FAIL  (Read 1155 times)

0 Members and 1 Guest are viewing this topic.

Offline BeaminTopic starter

  • Super Contributor
  • ***
  • Posts: 1567
  • Country: us
  • If you think my Boobs are big you should see my ba
EDIT the L is my world brain malfunctioning like googles.

*Have you tried to down load an old book, like from the early 1900's off it onto an ebook? A nook in my case. Half the time its just pages of junk characters, why is that?

ITs kind of like 100 pages of:
 $   %^%^      ^*(^|   ~  !  ~  +   |(&^gb  #$^$^bbfd    _   &   %

And it goes on and on found some really interesting reads like: _OF THE WAY METAL ALLOYS HARDEN WITH DIFFERENT COOLING FACTORS BASED ON GRAIN BOUNDARY SIZE CONDITIONS_

You know its an old book when they would make the title almost a blurb but theres lots of great theories and some times its fun to read about things that were definitely wrong like plum pudding flavored electrons. TONS of these old books where type set is pretty modern just comes out as junk, they are all free by the way/ but still you never know what you are going to get.
Max characters: 300; characters remaining: 191
Images in your signature must be no greater than 500x25 pixels
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 4033
  • Country: us
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #1 on: August 26, 2021, 11:26:25 pm »
Example? Link?
 
The following users thanked this post: thm_w

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 22436
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #2 on: August 27, 2021, 12:29:19 am »
Well.. OCR is a challenging problem, at best.

Also, there's a, probably separate issue, often seen with PDFs (but more the ones that are composed as such, not OCR I think): some tools optimize the font / character set, only to what's used, or I suppose sorted by frequency of usage or whatever.  You can read perfectly legible text on screen, but select and copy it and it's just gibberish -- albeit a simple substitution cipher, but pretty well useless for copy-pasting anyway.

I don't know which of these problems eBooks have; evidently given your experience, they rely on the OCR, and perhaps built-in fonts, to show documents.  So a faulty OCR is truly fatal to the experience.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8275
  • Country: ca
    • LinkedIn
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #3 on: August 27, 2021, 01:14:06 am »
 :palm: This is google.  They shouldn't be the ones using garbage run of the mil OCR software.  They should be using one of their neuronet based deep learning systems powered by a few hundred GPU or dedicated processors with full language and context and layout aware interpreters.

Like, Beamin said, FAIL on a pitiful level for a company of this tech caliber.
 
The following users thanked this post: Beamin

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 22436
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #4 on: August 27, 2021, 01:15:43 pm »
You're assuming they have infinite resources. Why scan books with your GPU farm when you can make a few extra cents doing instant ad auctions?

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 7521
  • Country: ca
  • Non-expert
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #5 on: August 27, 2021, 09:23:39 pm »
:palm: This is google.  They shouldn't be the ones using garbage run of the mil OCR software.  They should be using one of their neuronet based deep learning systems powered by a few hundred GPU or dedicated processors with full language and context and layout aware interpreters.

Like, Beamin said, FAIL on a pitiful level for a company of this tech caliber.

We don't know if the problem is the source or the conversion to ebook reader unless OP provides an actual link...
Profile -> Modify profile -> Look and Layout ->  Don't show users' signatures
 

Offline CatalinaWOW

  • Super Contributor
  • ***
  • Posts: 5569
  • Country: us
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #6 on: August 27, 2021, 10:09:36 pm »
I have never encountered a problem on this scale, but only use a desktop or laptop under Windows.  And have downloaded only a couple of dozen or so books and portions thereof.   I would suspect either a translation problem to Nook or possibly a poor communication protocol that doesn't recognize/correct errors.

My guess would be that Google sticks with, or at least emphasizes the OCR versions just to minimize bandwidth on the transmission end.  They surely have the resources to retain the original scanned images, though perhaps with methods that don't provide high speed and random access.

My personal experience with OCR is that the original needs to be pretty bad before you get the level of errors described.  And that is with the fairly garden variety OCR provided retail by Abbyy and running on a not very high end desk top computer.  It can bog down on technical texts which have complex math formula with glyphs from several languages, but that is usually handled well by just making an image out of the equations.  In all but the very densest of textbooks that still dramatically reduces image storage required.

 

Offline BeaminTopic starter

  • Super Contributor
  • ***
  • Posts: 1567
  • Country: us
  • If you think my Boobs are big you should see my ba
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #7 on: August 29, 2021, 09:38:55 pm »
Example? Link?

The only way I know to see it is through the nook which is missing right now sorry.

I think the fonts are built into the nook so maybe on googles end they see the old books fine but once its in the nook its garbage. I dont know what the current B&W eink reader is but go into free books and try downloading stuff from around 1900, you will see A LOT of books that are not readable.
« Last Edit: August 29, 2021, 09:43:57 pm by Beamin »
Max characters: 300; characters remaining: 191
Images in your signature must be no greater than 500x25 pixels
 

Offline TimFox

  • Super Contributor
  • ***
  • Posts: 8998
  • Country: us
  • Retired, now restoring antique test equipment
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #8 on: August 29, 2021, 11:37:01 pm »
When Oxford Univ Press wanted to digitize the Oxford English Dictionary, they used 128 typists for 18 months.
 

Offline CatalinaWOW

  • Super Contributor
  • ***
  • Posts: 5569
  • Country: us
Re: Google world brainL *trying* to digitize every paper book = FAIL
« Reply #9 on: August 29, 2021, 11:37:42 pm »
I did a bit of browsing, and on my system all of the books turned up as images and quite readable.  An English translation of Fourier's Theory of Heat was very legible, but dense with equations, not just as separate lines but embedded in the text.  It is one that I think OCR would have a lot of trouble with.

As others have suggested a specific link would be useful.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf