Author Topic: Printing from html  (Read 1442 times)

0 Members and 1 Guest are viewing this topic.

Offline westfwTopic starter

  • Super Contributor
  • ***
  • Posts: 4303
  • Country: us
Printing from html
« on: October 10, 2022, 09:20:47 am »
What tools are there for printing (or,perhaps converting to pdf) snippets of html? (That aren’t full web browsers.)
Something multi platform would be great.
You can assume that the html is “printable”, containing text (with funny html characters like &nbsp), font, face, and color settings….
 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1799
  • Country: us
Re: Printing from html
« Reply #1 on: October 10, 2022, 09:28:26 am »
I’d look into a browser rendering engine (Blink and similar), using headless Chrome, or one of the browser automation tools.

It’s a bit hard to answer your question because on the spectrum of “can read and correctly interpret html” and “but isn’t a full browser” it’s hard to know what part of full browser you’re trying to avoid.
 

Offline voltsandjolts

  • Supporter
  • ****
  • Posts: 2420
  • Country: gb
Re: Printing from html
« Reply #2 on: October 10, 2022, 09:39:02 am »
Maybe you're looking for an XSLT processor, although I don't know to what extent that would support the HTML tags you want.
 

Offline Hogwild

  • Regular Contributor
  • *
  • Posts: 186
  • Country: ca
Re: Printing from html
« Reply #3 on: October 10, 2022, 05:04:24 pm »
MS Word has been able to import HTML for a long time. It may mangle things a little, so I'd do some quick tests to verify.

There are things like this (written in pure .js):
https://github.com/eKoopmans/html2pdf.js
 

Offline boB

  • Frequent Contributor
  • **
  • Posts: 341
  • Country: us
    • my work www
Re: Printing from html
« Reply #4 on: October 10, 2022, 05:22:45 pm »
I use the Fireshot plugin.   It may be free too but I gave them like, $20 or $30 for lifetime support.

I tried MANY different things to be able to print HTML pages and this is the only thing I found that works well.

The mode I normally use is to save as PDF though.   I have successfully printed at least a couple hundred pages of forum posts.

I find that print to PDF from windoze only works for the simplest of pages.

https://getfireshot.com/

It will also save as text and links when it can.

boB

« Last Edit: October 10, 2022, 05:24:36 pm by boB »
K7IQ
 

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1033
  • Country: es
    • Picuino web
Re: Printing from html
« Reply #5 on: October 10, 2022, 05:22:48 pm »
I use phantomjs to automatically convert (with makefiles) from HTML source code to an image.

https://phantomjs.org/

"""
PhantomJS is a headless web browser scriptable with JavaScript. It runs on Windows, macOS, Linux, and FreeBSD.

Using QtWebKit as the back-end, it offers fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

The following simple script for PhantomJS loads Google homepage, waits a bit, and then captures it to an image.

var page = require('webpage').create();
page.open('http://www.google.com', function() {
    setTimeout(function() {
        page.render('google.png');
        phantom.exit();
    }, 200);
});

PhantomJS is an optimal solution for:
Page automation
    Access webpages and extract information using the standard DOM API, or with usual libraries like jQuery.
Screen capture
    Programmatically capture web contents, including SVG and Canvas. Create web site screenshots with thumbnail preview.
Headless website testing
    Run functional tests with frameworks such as Jasmine, QUnit, Mocha, WebDriver, etc.
Network monitoring
    Monitor page loading and export as standard HAR files. Automate performance analysis using YSlow and Jenkins.
"""
« Last Edit: October 10, 2022, 05:26:04 pm by Picuino »
 

Offline Picuino

  • Super Contributor
  • ***
  • Posts: 1033
  • Country: es
    • Picuino web
Re: Printing from html
« Reply #6 on: October 10, 2022, 05:31:08 pm »
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3249
  • Country: us
Re: Printing from html
« Reply #7 on: October 10, 2022, 05:48:11 pm »
Also check out Prince:

https://www.princexml.com/

Cross platform and runnable from the command line. "It's free to download Prince software for non-commercial use."
 

Offline voltsandjolts

  • Supporter
  • ****
  • Posts: 2420
  • Country: gb
Re: Printing from html
« Reply #8 on: October 10, 2022, 06:38:08 pm »
Well, this is turning out to be a bit of a party ;D

More FOSS options:

https://weasyprint.org/
https://pandoc.org/
 
The following users thanked this post: boB

Offline boB

  • Frequent Contributor
  • **
  • Posts: 341
  • Country: us
    • my work www
Re: Printing from html
« Reply #9 on: October 10, 2022, 07:53:11 pm »
Here is an example of Fireshot if you want to take a look.

Just captured this particular page to PDF here...


Well.. Not a great example because just ^P and printing to PDF appeared to work just fine here.
« Last Edit: October 10, 2022, 07:55:34 pm by boB »
K7IQ
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4227
  • Country: gb
Re: Printing from html
« Reply #10 on: October 11, 2022, 07:56:00 am »
what if a page contains tons of crappy JavaScript?
the browser of my new 2022 kindle is unable to correctly show some html documents

yeah, modern web sucks! :scared:
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 
The following users thanked this post: boB

Offline westfwTopic starter

  • Super Contributor
  • ***
  • Posts: 4303
  • Country: us
Re: Printing from html
« Reply #11 on: October 11, 2022, 09:09:20 am »
Like I said, “assume the html is printable.”


The actual thing I’m looking at is Arduino’s “copy to html” command, which does a fine job of creating a formatted and colorized source code snippet.  As oppose to its “print” command, which is unbelievably crappy.  Apparently the “copy” function uses some common library for tokenizing input to colorized html, but doesn’t do other “printer friendly” formats.


Cross-platform standardized printing seems to have deteriorated since the days you could send 132 column output to a line printer with a bit of Fortran code, and have it be portable…

 

Offline Warhawk

  • Frequent Contributor
  • **
  • Posts: 832
  • Country: 00
    • Personal resume
Re: Printing from html
« Reply #12 on: October 11, 2022, 09:30:22 am »
What tools are there for printing (or,perhaps converting to pdf) snippets of html? (That aren’t full web browsers.)
Something multi platform would be great.
You can assume that the html is “printable”, containing text (with funny html characters like &nbsp), font, face, and color settings….

I am not sure if you're asking for printing the HTML tags or parts of the webpage. For parts of the websites I use this: https://www.printfriendly.com/
I started backing up different tutorials and forum snippets if I find something I may need in future (e.g. Mikrotik Router know-how, etc.).

Offline JustMeHere

  • Frequent Contributor
  • **
  • Posts: 812
  • Country: us
Re: Printing from html
« Reply #13 on: October 16, 2022, 06:57:56 pm »
I've never encountered something that does a perfect job.  That is for PostScript and PDF.  HTML is just way too open to "interpretation" by the rendering agent.

Adobe had their Distiller product which was good, but expensive, and I believe long gone.

iText has a product I haven't used.

These days all OSes have some type of print driver that outputs PDFs. 
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4227
  • Country: gb
Re: Printing from html
« Reply #14 on: October 17, 2022, 04:19:39 pm »
I wrote a program that takes c code in input and outputs a colorured pdf..
It was usesul to include the sources into my thesis.
I can adapt it for hrml, with or without a css

Umm. Latek should have a similar module  :-//
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline AndyBeez

  • Frequent Contributor
  • **
  • Posts: 856
  • Country: nu
Re: Printing from html
« Reply #15 on: October 17, 2022, 04:45:35 pm »
Define 'html' on a web page. Many web pages are now dynamically generated from Javascript frameworks feeding off of JSON callbacks. In English, the 'page source' is just links to Google and Github APIs. Only the browser's developer mode can reveal the true shape and context of the web page, after the page framework has achieved a 'post render' state.

If you want to 'pretty print' html, most web developer IDEs will pretty print code snippets to a target output.
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 22373
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Printing from html
« Reply #16 on: October 17, 2022, 04:48:14 pm »
What's really wrong with a browser, you're using one right now, everyone has one? :)

IIRC, most browsers have command line options, that aren't obvious since you normally just use it plain, but a perusal of those features might yield a ready-to-go solution.  Otherwise the "headless" suggestions above sound promising.

Don't know if \$\LaTeX\$ has a more fancy (auto language parsing / coloring) package for that; probably.  Well, maybe a big PITA to write language parsers in TeX for whatever languages people want... but technically possible, at least.

See also Doxygen, which I think? uses HTML or LaTeX backend depending on desired output formats.  Also don't know if coloring is implemented.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf