Author Topic: Text Search in Excel PDF Export...  (Read 1310 times)

0 Members and 1 Guest are viewing this topic.

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Text Search in Excel PDF Export...
« on: July 02, 2018, 11:49:16 pm »
So I just noticed this little perversion.

I'm looking at a PDF, which is an exported version of a spreadsheet.  It's some boring requirements thing, lots of text, no actual formulas.  So it doesn't matter that I'm reading it on PDF form, right?

I search for a multi-word term.  Nothing.  What?  No, that must be used in here, a dozen times.

I search for one word, and find an occurrence.  Yeah, no shit, it's there.  Okay, select and copy the text, and search on that.  Finds all of them.

WTF?!

Start poking at the search string.  The letters are normal letters.  The spaces are... breaking it?!  What?

In fact, Excel, in its infinite wisdom, has chosen to use non breaking spaces, " ", \u00a0, in the export.  Making text search nearly useless.

Thanks Microsoft.

And no, it's not in the source document, those are normal spaces (\u0020).

Just FYI, in case you encounter this, and in case you needed another occasion to vent about M$...  :blah:

So on a related note, I wonder if there is a way to add a plug-in to Windows itself, to add more functionality to OS widgets.  Namely, for the present case -- more features for text inputs and edit boxes, like viewing and editing direct Unicode code points, and rich-text formatting elements.  Maybe this already exists?  Maybe as an Office or $PDF-viewer plugin rather than the OS?  That'd be a bit less useful, but might help illuminate cases like the present one.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline iainwhite

  • Supporter
  • ****
  • Posts: 317
  • Country: us
  • Measure twice...
Re: Text Search in Excel PDF Export...
« Reply #1 on: July 03, 2018, 12:17:08 am »
Only way I can think is to post-process your pdf with a search/replace of  \u00a0 to  \u0020

I think MS has used this so that table cells don't ever get broken up with a newline dividing the two halves of your cell data.
Shame you can't turn off this feature in the pdf export options in excel.
 

Offline tpowell1830

  • Frequent Contributor
  • **
  • Posts: 863
  • Country: us
  • Peacefully retired from industry, active in life
Re: Text Search in Excel PDF Export...
« Reply #2 on: July 03, 2018, 12:36:55 am »
Hmmm, I wonder if the same thing would happen of you printed to PDF using some of the PDF document apps, such as PDFCreator or such.


EDIT: I know you can search the text with the resulting PDFs.
PEACE===>T
 

Offline JohnnyMalaria

  • Super Contributor
  • ***
  • Posts: 1154
  • Country: us
    • Enlighten Scientific LLC
Re: Text Search in Excel PDF Export...
« Reply #3 on: July 03, 2018, 12:44:38 am »
I set up a simple sheet in Excel 365 with each of the following words in adjacent cells: the quick brown fox (i.e., 4 cells) and I put the same words as a string in one cell.

I tried Excel's share as PDF/XPS document and also via Microsoft Print to PDF. I opened both in SumatraPDF and searched for "quick brown". It found both (i.e., in two adjacent cells and in a string). So everything behaves as it should.

What version of Excel? How are you creating the PDF from Excel? How are you viewing the PDF?
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Text Search in Excel PDF Export...
« Reply #4 on: July 03, 2018, 12:51:53 am »
I don't know how it was generated, probably Export PDF.  We have Office 365 here, yes.

Reading in Foxit Reader.

Looking at it in Chrome, the spaces seem to be normal.

Maybe most readers automatically convert \u00a0, or read metadata telling them to do so, and this version doesn't?

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline JohnnyMalaria

  • Super Contributor
  • ***
  • Posts: 1154
  • Country: us
    • Enlighten Scientific LLC
Re: Text Search in Excel PDF Export...
« Reply #5 on: July 03, 2018, 12:58:42 am »
I just tried Nuance Power PDF and Foxit Reader - both worked fine.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf