Author Topic: Generating documents .docx from lightweight markup language  (Read 2734 times)

0 Members and 1 Guest are viewing this topic.

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Generating documents .docx from lightweight markup language
« on: February 07, 2022, 12:38:09 pm »
Hi all. I am looking for a way to generate complex documents (docx or pdf) from a lightweight markup language.
The lightweight languages ​​I know are Markdown and ReStructuredText, but I could adapt to any other.
I know that this task can be done from the LaTeX language, but I think that LaTeX is too complex for what I'm looking for.

So far I have used pandoc (https://pandoc.org/) to try to do the task, but I can't define the .docx template to change the fonts, margins, etc.

I have also used Sphinx (https://www.sphinx-doc.org/) on occasion to generate PDF files. It's a bit complex, generate a single file, not several and won't let me change the font formatting, etc. Although the result is usually very good. Another problem it has is that it does not allow to generate .docx.

Is there any good alternative or solution for what I'm looking for?
« Last Edit: February 07, 2022, 12:42:28 pm by Picuino »
 

Offline Whales

  • Super Contributor
  • ***
  • Posts: 2121
  • Country: au
    • Halestrom
Re: Generating documents .docx from lightweight markup language
« Reply #1 on: February 07, 2022, 12:47:51 pm »
Re pandoc: see if this does what you want https://pandoc.org/MANUAL.html#option--reference-doc
 
The following users thanked this post: SiliconWizard

Offline PKTKS

  • Super Contributor
  • ***
  • Posts: 1766
  • Country: br
Re: Generating documents .docx from lightweight markup language
« Reply #2 on: February 07, 2022, 01:18:36 pm »
Hi all. I am looking for a way to generate complex documents (docx or pdf) from a lightweight markup language.


Both asserts are opposite

Complex  documents like PDFs and the sick doc docx doc family will not be kind with simple markup.. results will be messy

For that kind of docs (not simple MAN or PODs or html template based) TeX is the tool

I know that this task can be done from the LaTeX language, but I think that LaTeX is too complex for what I'm looking for.

It is simpler than PostScript itself. With even better results.

Best possible bottom line is SGML with proper  docbook and sgmlcat templates...
You just can not  generate complex  without this at minimum.

Bare bones things can be done directly with PERL PODs structured markup

Paul

PS> ALAS if i recall correctly last time I used needed that to generate brain damaged DOCs i have used sgml which will be directly translated to RTF  by JADE.

SGML  plus docbook templates have a wide output of formats..
« Last Edit: February 07, 2022, 01:22:39 pm by PKTKS »
 

Offline PKTKS

  • Super Contributor
  • ***
  • Posts: 1766
  • Country: br
Re: Generating documents .docx from lightweight markup language
« Reply #3 on: February 07, 2022, 02:47:09 pm »
Just checked the status of my converters as of today...

And before some  modern  folk start whining about SGML..

It does the job amazingly well even today (while SGML has less than minimum attention these days) ..   I like very much NEDIT to execute my macros and commands (although vi and emacs can do the same thing)  as a matter of preference.. even though NEDIT is still MOTIF only..

I am pretty sure it is hard to replace all simple functional things of this solution.

JADE can output RTF  PDF HTML XML and MIF from a single simple source
Just make sure to have all OASIS templates and SGML catalogs properly available

Code: [Select]
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"/share/sgml/docbook/sgml-dtd-4.5/docbookx.dtd" [

<!ENTITY genindex.sgm SYSTEM "genindex.sgm">
]>

<!-- "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" -->

<book>
<bookinfo>
<title>SGML bare bones</title>
<date>Version $Revision$, $Date$</date>
<authorgroup>
<author>
<firstname>Nobody</firstname><surname>None</surname>
</author>
</authorgroup>
<copyright>
<year>$Date$</year>
<holder></holder>
</copyright>
</bookinfo>

<toc></toc>

<chapter><title>1st Chapter</title>
<para>
SGML bare bones minimum
</para>
</chapter>

<!-- &genindex.sgm; -->

</book>

Cheers
Paul
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #4 on: February 07, 2022, 03:20:14 pm »
This is a skeleton of what I want.
I'm going to try several solutions to see which one comes closest.
 

Online nfmax

  • Super Contributor
  • ***
  • Posts: 1624
  • Country: gb
Re: Generating documents .docx from lightweight markup language
« Reply #5 on: February 07, 2022, 03:24:49 pm »
groff? https://en.wikipedia.org/wiki/Groff_(software)

Brings back memories of writing "programmer's notes" for custom hardware using ms macros and nroff on Unix system V release 3 back in the 80's...
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #6 on: February 07, 2022, 03:26:03 pm »
Thanks for the SGML tip. That might work for me, but not for other people. It is important that the source files are simple so that anyone can write them. Markdown is my first choice, perhaps ReStructuredText although it is already starting to get very complicated for normal people.
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #7 on: February 07, 2022, 03:33:34 pm »
groff? https://en.wikipedia.org/wiki/Groff_(software)

Brings back memories of writing "programmer's notes" for custom hardware using ms macros and nroff on Unix system V release 3 back in the 80's...

Seems to be a very good tool. The problem is that Groff mixed content and presentation.
I want to separate content (in lightweight markup language) from the presentation (in a template) so that users who write content cannot change the presentation format so that all documents have an identical format.


Edit: I'm looking for something similar to what the forum or any content manager like Wordpress does when you write as a user, which then presents the content in a uniform way. But instead of generating a web page, I search for an output of docx, odt, rtf or similar document.
« Last Edit: February 07, 2022, 03:36:23 pm by Picuino »
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #8 on: February 07, 2022, 04:08:42 pm »
It seems that complex (multi-line) tables can't be handled well in Markdown.
I'll have to think about starting with a ReStructuredText or AsciiDoc document.
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2275
  • Country: 00
Re: Generating documents .docx from lightweight markup language
« Reply #9 on: February 07, 2022, 04:16:12 pm »
I use wkhtmltopdf, but you need to know at least some basic HTML.

https://wkhtmltopdf.org/
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #10 on: February 07, 2022, 04:58:01 pm »
HTML is difficult for normal people to learn. Thank you anyway.

I have found a ReStructuredText to ODT converter in the Docutils package. Seems to be a good solution allowing complex tables.
https://docutils.sourceforge.io/docs/user/odt.html
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #11 on: February 07, 2022, 08:27:04 pm »
I have managed to create complex tables (with lists inside) easily with Docutils. The problem now is that I can't change the template of the output .odt file.

I'm thinking of using Docutils to generate an html file and then using pandoc to transform the html to .docx or .odt.
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #12 on: February 07, 2022, 09:46:02 pm »
I've found that the newly installed version of Pandoc can interpret ReStructuredText list-tables (https://docutils.sourceforge.io/docs/ref/rst/directives.html#list-table) so I don't need to use Docutils, I can do all the work with Pandoc.

Now the problem is to change the format of the numbered lists so that they take up less space in the Table.
I don't know how to change reference.docx so that the numbering occupies fewer millimeters of the document.
« Last Edit: February 07, 2022, 09:50:45 pm by Picuino »
 

Offline PlainName

  • Super Contributor
  • ***
  • Posts: 7508
  • Country: va
Re: Generating documents .docx from lightweight markup language
« Reply #13 on: February 08, 2022, 01:26:59 am »
Perhaps not what you're looking for but Scrivener will take Markdown as import and compile to whatever format and template you fancy. It's a writers took which focuses on the writing part and treats fancy formatting and output as a compiler process. Notionally aimed at authors of fiction and/or screenplays, but I've used it to put together project documentation.

Some chap detailing how he uses it with Markdown, which appears to be similar to what you want:

http://www.raydanielmystery.com/rays-scrivenermarkdown-flow
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #14 on: February 08, 2022, 01:22:02 pm »
I finally got something similar to what I was looking for. Using Pandoc you can set (in Windows) the following options.

pandoc --data-dir=%~dp0 --reference-doc=template.docx --toc -s document.rst -o document.docx

template.docx  = template document with headers, footers, corporate images, etc. Used like base document.
reference.docx  = template document used to copy the text styles.
document.rst = ReStructuredText with the text content.
document.docx = ouput document.
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #15 on: February 08, 2022, 08:11:10 pm »
Doesn't work as well as expected.
I'm going to investigate how to generate printable HTML, which I think will be much more flexible and easier.
 

Offline Someone

  • Super Contributor
  • ***
  • Posts: 5155
  • Country: au
    • send complaints here
Re: Generating documents .docx from lightweight markup language
« Reply #16 on: February 08, 2022, 10:02:26 pm »
If the output file can PDF then going through Tex seems like the sensible way:
https://ctan.org/pkg/markdown
Then all the structure and template can be isolated as Tex, taking some plain text MD files as the content.
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #17 on: February 09, 2022, 07:59:42 am »
I managed to do that with LaTeX templates with Pandoc.
The problem is that I don't know how to make the tables display correctly, with separations and divisions between pages.
I attach an example
 

Offline PKTKS

  • Super Contributor
  • ***
  • Posts: 1766
  • Country: br
Re: Generating documents .docx from lightweight markup language
« Reply #18 on: February 09, 2022, 03:05:00 pm »
I managed to do that with LaTeX templates with Pandoc.
The problem is that I don't know how to make the tables display correctly, with separations and divisions between pages.
I attach an example

IMHO.. if you are going LaTeX  then go the proper way...  use no shortcut..
no newbie modern shit will easy you

Tables in TeX are very simple and can be displayed in multi-page all across PDF RTF HTML and multi column as long as you require the directives..

Do not mess TeX with strange front ends.. I can almost bet a bad result

Attached a skeleton and the rendered PDF made today by latex

Code: [Select]
\input { class_ARTICLE.tex }  
\usepackage{multirow}   % used for multi-row tables
\usepackage{multicol}    % used for multi-column tables and text envars

\begin{document}
  \title{PRINCIPAL TITLE }
  \pagenumbering{gobble}
  \maketitle
  \newpage
  \pagenumbering{arabic}

\section{OneSection}
\begin {table}[h!]
  \centering
  \caption{Caption for the table.}
  \label {tab:table1}
  \begin {tabular}{l|c||r}
    1 & 2 & 3\\
    \hline
    a & b & c\\
  \end {tabular}
\end {table}
\lipsum[1-10]
\end{document}

Included just for fun - page numbering in BARCODEs...  :popcorn:
and watermarking

Paul
« Last Edit: February 10, 2022, 08:08:27 am by PKTKS »
 

Offline PicuinoTopic starter

  • Super Contributor
  • ***
  • Posts: 1072
  • Country: es
    • Picuino web
Re: Generating documents .docx from lightweight markup language
« Reply #19 on: February 11, 2022, 04:27:44 pm »
After many tests with LaTeX, docx, ODT, Pandoc and Docutils I have finally decided to use Docutils with ODT.

I attached the project in case it helps someone else. The PDF document comes from ODT.

Thank you very much for the comments.

 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf