I'm writing an django application which renders docx document based on user input. The main idea is: "docx is django template". So, I need to unzip word file, extract XML with markup, render it as django template and put everything back.
Everything works just fine, except annoying problem with creating initial document (I use standard editors such as MS Word itself and LibreOffice Writer): while setting up markdown, editor can break template blocks ({{}} and {%%}) with XML tags.
For example: <w:t>{{doc_title}}</w:t> can become
<w:t>{{</w:t><w:t>doc_title</w:t>}}
Is that any way I could avoid this, except inserting template blocks directly in XML?
Related
I'm looking into using a rich text editor in my Django project. TinyMCE looks like the obvious solution, however i see that the output format is html (here). Goal is to store user input and then serve it inside a word document using python-docx( which is not html).
Do you know of any solution for this? Either a feature of tinyMCE or a html to word-format converter which keeps styles, or maybe another rich text editor similar to tinymce?
UPDATE:
This is another option which i found to be working fine. Still at the point of trying to convert HTML to Word without losing styles. A solution for this may be pywin32 as stated here but it doesn't help me that much + it's Windows only.
Update2
After quite some digging i found pandoc and pypandoc which appear to be able to translate in any of these output formats:
"asciidoc, beamer, commonmark, context, docbook, docbook4, docbook5, docx, dokuwiki, dzslides, epub, epub2, epub3, fb2, gfm, haddock, html, html4, html5, icml, jats, json, latex, man, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opendocument, opml, org, plain, pptx, revealjs, rst, rtf, s5, slideous, slidy, tei, texinfo, textile, zimwiki"
I haven't figured out how to integrate such an input to python-docx.
I had the same challenge. You'll want to use Python's Beautiful Soup library to iterate through the content in your HTML editor (I use Summernote, but any HTML editor should work) then parse HTML tags into a usable format for python-docx. Pandoc and Pypandoc will convert files for you (e.g. you start with a LateX file and need to convert it to Word), but will not provide the tools to need to convert to and from xml/html.
Good luck!
Currently I have a working Django 1.9 application using Python 3.5 in development. The database is Postgres 9.4.2.0.
I have a TEXT type field in the database which contains raw input gathered from users, which is then rendered back for other users to read.
The raw text contains newlines and whatnot which look like:
chat.freenode.net\r\n#randomchannel
The HTML template itself attempts to replace the line breaks with break tags and escape anything else
{{ post.body|linebreaksbr|escape }}
But it doesn't seem to matter what filters I add to the post.body, it always renders the raw \r\n and never replaces the values with <br> tags.
I am not getting any errors in the development server and the rendering of the template works fine, it just seems the filters are not working.
I'm pulling my hair our trying to work out why these filters are not working. Does anyone have any ideas?
Turns out this had nothing to do with Django itself, which is not surprising.
The data migration which happened between the last and current version broke the newlines in the raw data. Therefore the linebreaksbr was working, but didn't find any linebreaks.
Is there an easy way to link to a webpage in rmarkdown without displaying the link?
For example, putting "https://www.google.com/" in a .rmd file renders as the entire website, but I want something analogous to ABC instead.
The html method above, i.e., <a href= ... works when I knit to html, but it does not work when I knit to a word document.
Markdown provides two ways to create a link as you mention (and I suppose that is supported on rmarkdown).
Markdown way:
[ABC](http://example.com)
HTML way:
ABC
The first way is native and the second way is supported since Markdown allows every HTML code.
I am not able to understand what actually parsing the html means ?
As i understand -
- it means that suppose we have any html file by parsing we can have the contents of the html file and we can edit them using parsing. Am i right ?? (parsing simply gives the idea about the contents and structure inside the file.)
I have one more question-
- I also want to know that suppose i have html file contents stored in a stream suppose (inside IStream *HTMLContents - No matter for now that how i got these contents). Is there any process exist that using these file contents may i create the preview on any window/Dialog Box/Preview pane with the same way exactly as i get the view of that html file in the browser.(for now you can imagine that i have downloded the HTML File contents from any web page(or from any where-No matter- But i have contents of html file in my stream i am sure about it) and i want to render that html file view in my own created window/Dialog Box/Preview pane(i mean it should view exactly as it appears in browser-Yes i know it won't be avle to display some pictures in html file but thats not a problem for me). How to do that ?? (I am using Visual c++ for my accomplishing my task)
Parsing basically means analyzing any data. When you parse HTML, it could be that you are figuring out where all the various elements are located and what do they do.
As for displaying HTML, it depends on what do you want to do:
If you want to open the file in your browser, use something like this.
As for displaying HTML directly in your form, I don't really know of any other way than parsing the HTML and creating your own web rendering engine. Good luck and have fun with that I guess.
Parse HTML means build object model such as DOM: https://en.wikipedia.org/wiki/Document_Object_Model in your program
I'm writing a Django app to serve some documentation written in RestructuredText.
I have many documents written in *.rst, each of them is quite long with many section, subsection and so on.
Display the whole document in a single page is not a problem using Django filters, but I'd rather have just the topic index on a first page, whit links to an URL where I can display a single section / subsection (which will need some 'previous | up | home | next' link I guess...). In a way similar to a 'multiple HTML page output' as in a docbook / XML to HTML conversion.
Can anyone point me to some direction to build a document tree of a *.rst document an parse a single section of it, or suggest a clever way to obtain a similar result?
Choice 1. Include URL links to the other parts of the document.
You write an index.rst, part1.rst, part2.rst, etc. And your index.rst has links to the other parts. This requires almost no work, except careful planning to make sure that your RST HTML links are correct.
There's no "parse". You just break your document into sections. Manually.
[This seems so obvious, I'm afraid to mention it.]
Choice 2. Use Sphinx. It manages table-of-contents and inter-document connections very nicely.
However, the Sphinx extensions to RST aren't handled directly by Django, so you'd need to save the Sphinx output and then display that in Django. We use the JSON HTML Builder (http://sphinx.pocoo.org/builders.html?highlight=json#sphinx.builders.html.JSONHTMLBuilder) output from Sphinx. Then we render these documents through a template.