TinyMCE, Django and python-docx - django

I'm looking into using a rich text editor in my Django project. TinyMCE looks like the obvious solution, however i see that the output format is html (here). Goal is to store user input and then serve it inside a word document using python-docx( which is not html).
Do you know of any solution for this? Either a feature of tinyMCE or a html to word-format converter which keeps styles, or maybe another rich text editor similar to tinymce?
UPDATE:
This is another option which i found to be working fine. Still at the point of trying to convert HTML to Word without losing styles. A solution for this may be pywin32 as stated here but it doesn't help me that much + it's Windows only.
Update2
After quite some digging i found pandoc and pypandoc which appear to be able to translate in any of these output formats:
"asciidoc, beamer, commonmark, context, docbook, docbook4, docbook5, docx, dokuwiki, dzslides, epub, epub2, epub3, fb2, gfm, haddock, html, html4, html5, icml, jats, json, latex, man, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opendocument, opml, org, plain, pptx, revealjs, rst, rtf, s5, slideous, slidy, tei, texinfo, textile, zimwiki"
I haven't figured out how to integrate such an input to python-docx.

I had the same challenge. You'll want to use Python's Beautiful Soup library to iterate through the content in your HTML editor (I use Summernote, but any HTML editor should work) then parse HTML tags into a usable format for python-docx. Pandoc and Pypandoc will convert files for you (e.g. you start with a LateX file and need to convert it to Word), but will not provide the tools to need to convert to and from xml/html.
Good luck!

Related

Workbook gem - how to write the excel to html in a formatted manner?

I am using Workbook gem to preview the excel file without page breaks in my website. Right now, I am successful in extracting the excel file and writing it into html format and display as preview.
The following code extracts and writes the excel to html:
excel_file = Workbook::Book.open "#{file_url}"
excel_file.write_to_html(file_name + ".html")
But this gives me an unformatted html sheet with no rows and columns or any of the existing excel file.
According to murb/workbook documentation, it is said that we can pass the format as a hash within its options.
write_to_html(filename = "#{title}.html", options = {})
So, to achieve the format hash, I tried the following code:
excel_file.template.formats
But this returns a null hash. So, how can i get all the formats from the excel file and write to html? Or at least show the html table with borders for all rows and columns.
The author here. The Workbook gem is mainly built to extract and rerepresent the data in files, and not so much the formatting. In the past I made a few attempts on adding support to maintain formatting when converting, but it is far from complete. Some importers don't even set the formatting hash as you found out, notably the xlsx importer needs work on this.
The HTML was built to simply give a basic preview of the data. It basically returns a html-page with all tables which is by default unformatted, although format-names are used in the classes. There is an option though, if you'd pass style_with_inline_css: true... but then it requires an importer to actually set the format hash properly...
I'm happy to guide you here and there when you want to improve the xlsx importer code to suit your needs and hopefully the workbook gem in general, but it will need serious work if you want more than just some background colours and font properties.

How to disable DOI/URL for bibtex in Rmarkdown

I am using better bibtex and zotero to generate references in rmarkdown.
It works very good except that journal articles and books have an url/doi associated.
My adviser is not too happy about it and I could not figure out how to disable the url/doi in the rmarkdown config or elsewhere.
What I know is that you have to edit your *.csl file (asa.csl, apa.csl or something you use). You could accomplish this very easy by uploading it to this online csl editor. Browse to bibliography/layout/access(macro)/Group/conditional/ and look if there is an URL entry. I got rid of the DOI by setting an option there that the variable should be 'url' AND the document type 'webpage'. Then download the new *csl file, save it to your prefered directory and just knit it. (Look also here with pictures).
Note: Please make rather a safety copy before messing around with your *csl.

Linking to url with rmarkdown using Knit Word in Rstudio

Is there an easy way to link to a webpage in rmarkdown without displaying the link?
For example, putting "https://www.google.com/" in a .rmd file renders as the entire website, but I want something analogous to ABC instead.
The html method above, i.e., <a href= ... works when I knit to html, but it does not work when I knit to a word document.
Markdown provides two ways to create a link as you mention (and I suppose that is supported on rmarkdown).
Markdown way:
[ABC](http://example.com)
HTML way:
ABC
The first way is native and the second way is supported since Markdown allows every HTML code.

How do I encode html leaving out the safe html

My data coming from the database might contain some html. If I use
string dataFromDb = "Some text<br />some more <br><ul><li>item 1</li></ul>";
HttpContext.Current.Server.HtmlEncode(dateFromDb);
Then everything gets encoded and I see the safe Html on the screen.
However, I want to be able to execute the safe html as noted in the dataFromDb above.
I think I am trying to create white list to check against.
How do I go about doing this?
Is there some Regex already out there that can do this?
Check out this article the AntiXSS library is also worth a look
You should use the Microsoft AntiXSS library. I believe the latest version is available here. Specifically, you'll want to use the GetSafeHtmlFragment method.

Multiple pages html output from a .rst document in Django

I'm writing a Django app to serve some documentation written in RestructuredText.
I have many documents written in *.rst, each of them is quite long with many section, subsection and so on.
Display the whole document in a single page is not a problem using Django filters, but I'd rather have just the topic index on a first page, whit links to an URL where I can display a single section / subsection (which will need some 'previous | up | home | next' link I guess...). In a way similar to a 'multiple HTML page output' as in a docbook / XML to HTML conversion.
Can anyone point me to some direction to build a document tree of a *.rst document an parse a single section of it, or suggest a clever way to obtain a similar result?
Choice 1. Include URL links to the other parts of the document.
You write an index.rst, part1.rst, part2.rst, etc. And your index.rst has links to the other parts. This requires almost no work, except careful planning to make sure that your RST HTML links are correct.
There's no "parse". You just break your document into sections. Manually.
[This seems so obvious, I'm afraid to mention it.]
Choice 2. Use Sphinx. It manages table-of-contents and inter-document connections very nicely.
However, the Sphinx extensions to RST aren't handled directly by Django, so you'd need to save the Sphinx output and then display that in Django. We use the JSON HTML Builder (http://sphinx.pocoo.org/builders.html?highlight=json#sphinx.builders.html.JSONHTMLBuilder) output from Sphinx. Then we render these documents through a template.