I'm looking into using a rich text editor in my Django project. TinyMCE looks like the obvious solution, however i see that the output format is html (here). Goal is to store user input and then serve it inside a word document using python-docx( which is not html).
Do you know of any solution for this? Either a feature of tinyMCE or a html to word-format converter which keeps styles, or maybe another rich text editor similar to tinymce?
UPDATE:
This is another option which i found to be working fine. Still at the point of trying to convert HTML to Word without losing styles. A solution for this may be pywin32 as stated here but it doesn't help me that much + it's Windows only.
Update2
After quite some digging i found pandoc and pypandoc which appear to be able to translate in any of these output formats:
"asciidoc, beamer, commonmark, context, docbook, docbook4, docbook5, docx, dokuwiki, dzslides, epub, epub2, epub3, fb2, gfm, haddock, html, html4, html5, icml, jats, json, latex, man, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opendocument, opml, org, plain, pptx, revealjs, rst, rtf, s5, slideous, slidy, tei, texinfo, textile, zimwiki"
I haven't figured out how to integrate such an input to python-docx.
I had the same challenge. You'll want to use Python's Beautiful Soup library to iterate through the content in your HTML editor (I use Summernote, but any HTML editor should work) then parse HTML tags into a usable format for python-docx. Pandoc and Pypandoc will convert files for you (e.g. you start with a LateX file and need to convert it to Word), but will not provide the tools to need to convert to and from xml/html.
Good luck!
I'm new at XSLT. I want to create a hyperlink using XSLT. Should look like this:
Document
Document is the link and upon clicking this, download of a file should begin.
Any ideas? :)
Thanks
There's no such thing as a hyperlink in XSLT or XML. If you're generating HTML with your XSLT, then you just need to output the appropriate elements and attributes, which you can do literally if you want, e.g.
<xsl:template match="somethingthatgeneratesalink">
This is a link to example.com
</xsl:template>
I have an xsl-fo xslt file, and it generates blank pages, along with the standard "An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem" error message.
I found that the issue was due to the text I was writing into the pdf contained dashes "–" and when I removed characters, the pdf was rendered successfully.
does anyone know why this "–" character would cause the xsl-fo pdf process should fail? everything I have read says this is not a special character and I shouldn't worry about encoding it.
Update here is a link to the Aspose forum's post I created:
http://www.aspose.com/community/forums/permalink/593149/593149/showthread.aspx#593149
Update #2 I found the issue was not the forward slash, but an encoding issue with dashes (I have updated the questions content, sorry for the confusion) at this point i am still looking for an answer, but I confused the differences between my working xml data, and my not working xml.
I'm using a Java class(http://pastebin.com/KhSGPmCV) that takes in an HTML document and tries to convert it to a PDF document by the following steps:
Uses Tidy to parse it into an XML document.
Uses an XSLT style sheet(http://pastebin.com/s45gRTKy) to covert
the xml into an FO document
Uses Apache FoP to convert from FO to PDF document.
The problem that I am facing is only the first page of my HTML document is getting converted to PDF. The warning message that i see is -
Mar 2, 2013 2:53:06 PM org.apache.fop.events.LoggingEventListener processEvent
WARNING: Content overflows the viewport of an fo:block-container in block-progression direction by 350 millipoints. Content will be clipped. (See position 51:261)
I'm pretty certain that the problem is in the XSL FO style sheet that I'm using. But, even after adding/modifying a lot of variables in this style sheet, I'm unable to get the 2nd page as visible. Could anyone please help me out?
Link to the HTML that I'm trying to convert to PDF - pastebin.com/iBLw8Pbv
You're using Apache FOP to build a PDF. Read this very important note in the xsl:
Since this stylesheet is originally developed by Antenna House to be
used with XSL Formatter, it may not be compatible with another XSL-FO
processors.
You may be forced to use Antenna is you expect a nice output. If you can get a binary, the script below might help (Ubuntu). If you use the xsl anyway: <nobr> is not in that xsl... in your HTML you must replace it with <pre>. Another problem is that tidy doesn't seem to fix end quotes and will generate a LOT of warnings about bad #ids (some #ids will contain the #class).
I have no idea on how to fix this. I don't have fop on my classpath so I needed this:
javac -cp .:/usr/share/java/fop.jar:/usr/share/java/jtidy.jar Html2PDF.java
java -cp .:/usr/share/java/fop.jar:/usr/share/java/jtidy.jar Html2PDF samplehtml.txt xhtml2fo.xsl
And I wrote this simple script that will help a lot as you debug:
# remove broken IDs
sed "s/id=\"[^\"]* //g" samplehtml.txt > samplehtml.txt.fixedID
# use tidy
tidy -utf8 -w 255 -indent -quiet -asxhtml < samplehtml.txt.fixedID > samplehtml.txt.tidy
# change
# - to &$160;
# - remove xmlns declaration
# - <nobr to <pre ;; </nobr to </pre
sed -e "s/nbsp/#160/g;s/<html [^>]*/<html/;s/<nobr/<pre/g;s/<\/nobr/<\/pre/g" samplehtml.txt.tidy > samplehtml.txt.tidy2
xalan -xsl xhtml2fo.xsl -in samplehtml.txt.tidy2 -out res.fo
fop res.fo res.pdf
Edit: I found a neat project that does what you need, and the output looks great. https://code.google.com/p/wkhtmltopdf/
I am really facing a strange problem. ROMAN characters are not displaying at all in mozilla and google chrome except on IE8(not in IE10 as well).
The code was written using xsl transformations. and i am unable to find what does
<var name="ROMAN"> is? This is the exact text when i see the html source.
Even the same code is written in xsl.
Any help would be greatly appreciable.
<var> is an HTML element that means "This is a variable". It will typically cause the contained text to be rendered in italic. It doesn't mean anything special to XSLT, it's just like any other HTML element name. name="ROMAN" is just like the name attribute of any other HTML element, it can be used in Javascript to address the relevant element node in the page. It doesn't change the rendition of the element, unless perhaps there is some stylesheet somewhere that recognizes name="ROMAN" and associates it with a display style.
I think you've got an HTML question, not an XSLT question. There's something wrong with your HTML, and we don't know what, because you haven't given enough information.