I'm looking into using a rich text editor in my Django project. TinyMCE looks like the obvious solution, however i see that the output format is html (here). Goal is to store user input and then serve it inside a word document using python-docx( which is not html).
Do you know of any solution for this? Either a feature of tinyMCE or a html to word-format converter which keeps styles, or maybe another rich text editor similar to tinymce?
UPDATE:
This is another option which i found to be working fine. Still at the point of trying to convert HTML to Word without losing styles. A solution for this may be pywin32 as stated here but it doesn't help me that much + it's Windows only.
Update2
After quite some digging i found pandoc and pypandoc which appear to be able to translate in any of these output formats:
"asciidoc, beamer, commonmark, context, docbook, docbook4, docbook5, docx, dokuwiki, dzslides, epub, epub2, epub3, fb2, gfm, haddock, html, html4, html5, icml, jats, json, latex, man, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opendocument, opml, org, plain, pptx, revealjs, rst, rtf, s5, slideous, slidy, tei, texinfo, textile, zimwiki"
I haven't figured out how to integrate such an input to python-docx.
I had the same challenge. You'll want to use Python's Beautiful Soup library to iterate through the content in your HTML editor (I use Summernote, but any HTML editor should work) then parse HTML tags into a usable format for python-docx. Pandoc and Pypandoc will convert files for you (e.g. you start with a LateX file and need to convert it to Word), but will not provide the tools to need to convert to and from xml/html.
Good luck!
Is there an easy way to link to a webpage in rmarkdown without displaying the link?
For example, putting "https://www.google.com/" in a .rmd file renders as the entire website, but I want something analogous to ABC instead.
The html method above, i.e., <a href= ... works when I knit to html, but it does not work when I knit to a word document.
Markdown provides two ways to create a link as you mention (and I suppose that is supported on rmarkdown).
Markdown way:
[ABC](http://example.com)
HTML way:
ABC
The first way is native and the second way is supported since Markdown allows every HTML code.
I'm writing an django application which renders docx document based on user input. The main idea is: "docx is django template". So, I need to unzip word file, extract XML with markup, render it as django template and put everything back.
Everything works just fine, except annoying problem with creating initial document (I use standard editors such as MS Word itself and LibreOffice Writer): while setting up markdown, editor can break template blocks ({{}} and {%%}) with XML tags.
For example: <w:t>{{doc_title}}</w:t> can become
<w:t>{{</w:t><w:t>doc_title</w:t>}}
Is that any way I could avoid this, except inserting template blocks directly in XML?
I'm using markdown to format some comments in a Django app.
If I try to combine markdown and urlize, inevitably bad formatting errors happen (links get added where they don't belong or aren't recognized, and of course the errors change depending on which filter I use first).
Basically I'd like a filter that does markdown and automatically turns links into hyperlinks if not done so by markdown.
Otherwise, I suppose I'll have to roll my own filter, which I would so rather not do.
What I do is use the Markdown urlize extension.
Once installed, you can use it in a Django template like this:
{{ value|markdown:"urlize" }}
Or in Python code like this:
import markdown
md = markdown.Markdown(safe_mode=True, extensions=['urlize'])
converted_text = md.convert(text)
Here is the start of the Markdown extension docs in case you need more info.
I'm trying to parse some HTML with C++ to extract all urls from the HTML (the urls can be inside the href and src attributes).
I tried to use Webkit to do the heavy work for me but for some reason when I load a frame with HTML the generated document is all wrong (if I make Webkit get the page from the web the generated document is just fine but Webkit also downloads all images, styles, and scripts and I don't want that)
Here is what I tried to do:
frame->setHtml(HTML);
QWebElement document = frame->documentElement();
QList<QWebElement> imgs = document.findAll("a"); // Doesn't find all links
QList<QWebElement> imgs = document.findAll("img"); // Doesn't find all images
QList<QWebElement> imgs = document.findAll("script");// Doesn't find all scripts
qDebug() << document.toInnerXml(); // Print a completely messed-up document with several missing elements
What am I doing wrong? Is there an easy way to parse HTML with Qt? (Or some other lightweight library)
You can always use XPath expressions to make your parsing life easier, take a look at this for instance.
or you can do something like this
QWebView* view = new QWebView(parent);
view.load(QUrl("http://www.your_site.com"));
QWebElementCollection elements = view.page().mainFrame().findAllElements("a");