How to change image urls in a markdown string - regex

I am working on a nodejs CMS where users write blog posts in Markdown locally, after uploading we process the post in an HTML file. Sometimes users will add a picture like my dog.jpg to the post by copying the image and writing:
![a picture of my dog](my dog.jpg)
I use uslug to convert all filenames so that my dog.jpg becomes my-dog.jpg. However I also need to update the link in the blogpost using uslug, because a) otherwise the link would break because we just changed the filename and b) because most markdown parsers for node will skip the above image syntax because of the whitespace (while the image does get previewed in a lot of local Markdown editors, like Mou).
Does anybody know how I can achieve this using regex?

You'll need a lot of slashes:
string.replace(/(!\[.*?\]\()(.+?)(\))/g, function(whole, a, b, c) {
return a + addDashesOrWhatever(b) + c;
});

Related

TinyMCE, Django and python-docx

I'm looking into using a rich text editor in my Django project. TinyMCE looks like the obvious solution, however i see that the output format is html (here). Goal is to store user input and then serve it inside a word document using python-docx( which is not html).
Do you know of any solution for this? Either a feature of tinyMCE or a html to word-format converter which keeps styles, or maybe another rich text editor similar to tinymce?
UPDATE:
This is another option which i found to be working fine. Still at the point of trying to convert HTML to Word without losing styles. A solution for this may be pywin32 as stated here but it doesn't help me that much + it's Windows only.
Update2
After quite some digging i found pandoc and pypandoc which appear to be able to translate in any of these output formats:
"asciidoc, beamer, commonmark, context, docbook, docbook4, docbook5, docx, dokuwiki, dzslides, epub, epub2, epub3, fb2, gfm, haddock, html, html4, html5, icml, jats, json, latex, man, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opendocument, opml, org, plain, pptx, revealjs, rst, rtf, s5, slideous, slidy, tei, texinfo, textile, zimwiki"
I haven't figured out how to integrate such an input to python-docx.
I had the same challenge. You'll want to use Python's Beautiful Soup library to iterate through the content in your HTML editor (I use Summernote, but any HTML editor should work) then parse HTML tags into a usable format for python-docx. Pandoc and Pypandoc will convert files for you (e.g. you start with a LateX file and need to convert it to Word), but will not provide the tools to need to convert to and from xml/html.
Good luck!

Linking to url with rmarkdown using Knit Word in Rstudio

Is there an easy way to link to a webpage in rmarkdown without displaying the link?
For example, putting "https://www.google.com/" in a .rmd file renders as the entire website, but I want something analogous to ABC instead.
The html method above, i.e., <a href= ... works when I knit to html, but it does not work when I knit to a word document.
Markdown provides two ways to create a link as you mention (and I suppose that is supported on rmarkdown).
Markdown way:
[ABC](http://example.com)
HTML way:
ABC
The first way is native and the second way is supported since Markdown allows every HTML code.

Rendering docx files as django templates: Word breaks {{}}-blocks

I'm writing an django application which renders docx document based on user input. The main idea is: "docx is django template". So, I need to unzip word file, extract XML with markup, render it as django template and put everything back.
Everything works just fine, except annoying problem with creating initial document (I use standard editors such as MS Word itself and LibreOffice Writer): while setting up markdown, editor can break template blocks ({{}} and {%%}) with XML tags.
For example: <w:t>{{doc_title}}</w:t> can become
<w:t>{{</w:t><w:t>doc_title</w:t>}}
Is that any way I could avoid this, except inserting template blocks directly in XML?

Getting markdown and urlize template tags to play nice

I'm using markdown to format some comments in a Django app.
If I try to combine markdown and urlize, inevitably bad formatting errors happen (links get added where they don't belong or aren't recognized, and of course the errors change depending on which filter I use first).
Basically I'd like a filter that does markdown and automatically turns links into hyperlinks if not done so by markdown.
Otherwise, I suppose I'll have to roll my own filter, which I would so rather not do.
What I do is use the Markdown urlize extension.
Once installed, you can use it in a Django template like this:
{{ value|markdown:"urlize" }}
Or in Python code like this:
import markdown
md = markdown.Markdown(safe_mode=True, extensions=['urlize'])
converted_text = md.convert(text)
Here is the start of the Markdown extension docs in case you need more info.

Parsing HTML with C++ (using Qt preferably)

I'm trying to parse some HTML with C++ to extract all urls from the HTML (the urls can be inside the href and src attributes).
I tried to use Webkit to do the heavy work for me but for some reason when I load a frame with HTML the generated document is all wrong (if I make Webkit get the page from the web the generated document is just fine but Webkit also downloads all images, styles, and scripts and I don't want that)
Here is what I tried to do:
frame->setHtml(HTML);
QWebElement document = frame->documentElement();
QList<QWebElement> imgs = document.findAll("a"); // Doesn't find all links
QList<QWebElement> imgs = document.findAll("img"); // Doesn't find all images
QList<QWebElement> imgs = document.findAll("script");// Doesn't find all scripts
qDebug() << document.toInnerXml(); // Print a completely messed-up document with several missing elements
What am I doing wrong? Is there an easy way to parse HTML with Qt? (Or some other lightweight library)
You can always use XPath expressions to make your parsing life easier, take a look at this for instance.
or you can do something like this
QWebView* view = new QWebView(parent);
view.load(QUrl("http://www.your_site.com"));
QWebElementCollection elements = view.page().mainFrame().findAllElements("a");