General > General Technical Chat
Anybody wants old data books (UK)?
<< < (4/6) > >>
tggzzz:

--- Quote from: nctnico on April 28, 2022, 03:56:13 pm ---
--- Quote from: TerraHertz on April 28, 2022, 03:30:36 pm ---
--- Quote from: tom66 on April 27, 2022, 08:20:59 pm ---On JPEG in PDF, it is possible to set the quality at near lossless (this is generally agreed as Q=96..99) where file sizes are large but image compression still occurs.  Note that most JPEG codecs divide the terms from DCT determined by the Q factor.  Very high Q factors leave most of the DCT components left (few components rounded towards zero), so you are only left with the artefact of DCT, which can be very good for the right JPEG encoder.   You can compare a JPEG image compressed with a modern libjpeg codec at Q=96 and tell me you can distinguish the difference:  certainly, it is not possible without looking closely at the pixel level.  Yet file size will be 1/4 or less of BMP.

PNG is not a good codec for compressing anything scanned due to variation in the page brightness or fixed pattern noise on the sensor.  It will not work well with such documents and I would be surprised if it offers much over BMP.

--- End quote ---

It's funny you are so conversant with codecs, but so basic on scanning. The trick with scanning documents (text, diagrams) is to pick levels and post processing, so that 'white' areas are really all white, and 'black' is really all black, leaving only edges to be maintained with minimal gray-levels. PNG works brilliantly with such. 4 bits per pixel for edge shading, and all blank areas vanishing into the run-length compression.

PNG still isn't ideal for documents, but it vastly beats other contenders. And if you're comparing something to BMP, then you should know you're wasting your time.

--- End quote ---
Still I recon that original (color!) scans with 600dpi or better resolution would be most ideal (in PNG format for example) because you can always post-process these to improve quality once newer technology to deal with digitising paper records comes along. Storage is super cheap nowadays anyway. Epub (which supports PNG natively) could be a good alternative to PDF as a distribution format.

BTW: choosing a single level between black/white doesn't sound like a good solution. AFAIK there are better ways available nowadays that use a dynamic threshold to determine which parts are black / white.

--- End quote ---

The problem is not storing such large files, it is transmitting them (if you pay for bytes) and viewing them.

I have some old scanned NBS Weston Cell documents that take a ridiculous time to display on a (somewhat old) desktop, and which would be completely intolerable on an e-ink class reader.

I think it is still worth doing compression in a "write-once-read-many" environment.
nctnico:
That is why I mentioned the Epub format for distribution / viewing purposes. The original scans can be of much higher quality as these are only used to create output as a distribution format.
TerraHertz:
Speaking of shelves of data books, here's most of my collection. Some not shown since the isles between the bookshelves are too narrow to get a decent camera view. And there are various piles, boxes full and other shelves with special categories. Like a pile upstairs of 'vintage data books' where I was sorting some old parts recently.

TerraHertz:

--- Quote from: nctnico on April 28, 2022, 03:56:13 pm ---BTW: choosing a single level between black/white doesn't sound like a good solution. AFAIK there are better ways available nowadays that use a dynamic threshold to determine which parts are black / white.

--- End quote ---

That's not what I suggested. You're thinking of two-tone, ie fax mode, which is evil even for simple text.  I meant, choose the upper and lower scan cutoff levels to give true white and black in areas that are supposed to be white and black. I say 'supposed to be' because on paper they never actually are, unless you're printing with vantablack and surface-of-the-Sun plasma. But the publisher's intent was pure white and black, so it's valid to assign ffffff and 000000 codes to those pixels.
There still need to be gray levels between. Just how many levels, depends on the context. For black and white text, where all that's needed is to preserve visually clean curves on character edges, 16 levels (4 bits/pixel) total is adequate with sensible pixel sizing relative to the font. For B&W photos, at least 256 and preferably 64K levels to avoid visible posterization effects. For full colour, then 24 bit or better.
But the main point is to remove visually insignificant noise in flat color areas, so PNG's RLL compression scheme can work best.

Btw, 'dynamic threshold' can't work for multi-page documents. It will adapt differently on pages of different content, resulting in digital page representations that look different when they should be the same. You have to do trial scans of representative pages, then choose a scanning and post-processing profile that works best for all of them, then stick with that one profile through all the work. Unless there are radically different types of pages, in which case you need a profile for each type.
tooki:

--- Quote from: TerraHertz on April 30, 2022, 07:48:22 am ---That's not what I suggested. You're thinking of two-tone, ie fax mode, which is evil even for simple text.  I meant, choose the upper and lower scan cutoff levels to give true white and black in areas that are supposed to be white and black.

--- End quote ---
That’s called “setting the white point” (or black point, respectively). But your original description goes beyond that, strongly suggesting increasing contrast to largely eliminate grays.

The problem when scanning is that the backgrounds are rarely as uniform as we think they are.

Random pro tip when dealing with thin paper where the reverse side bleeds through when scanning: rather than putting a white backing sheet behind it, use a black one! This is far more effective at eliminating bleed through, and the overall darker (but now more uniform) background can easily be adjusted back to white.
Navigation
Message Index
Next page
Previous page
There was an error while thanking
Thanking...

Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod