custom item labels in markdown - list

In latex the following produces a nice output (more examples here)
\begin{itemize}
\item[$ABC$] Definition and details of $ABC$.
\item[$EFG-PQE$] Definition and details of $EFG$ and Definition and details of $PQR$. Writing this sentence to make it multiline.
\end{itemize}
How to get similar output in Markdown (.md) file?

We need to consider the format this Markdown is intended to end up in.
Given that GitLab renders Markdown to HTML, description lists are your best bet:
<dl>
<dt>ABC</dt>
<dd>Definition and details of ABC.</dd>
<dt>EFG-PQE</dt>
<dd>Definition and details of EFG and Definition and details of PQR.</dd>
</dl>
The list itself is defined by a <dl> tag. Terms go into a <dt> and descriptions go into a <dl>.
Since GitLab-Flavored Markdown has no special syntax for description lists, you'll have to use inline HTML. And since Markdown is not normally processed in block-level HTML tags, if you want to format the terms in italics, as your rendered LaTeX example shows, you'll have to do one of two things:
Use inline HTML again:
<dl>
<dt>ABC</dt>
<dd>Definition and details of <em>ABC</em>.</dd>
<!-- ^^^^ ^^^^^ -->
</dl>
This option should work with pretty much any Markdown tool.
Put the Markdown content on its own line, separated from the HTML by whitespace:
<dl>
<dt>ABC</dt>
<dd>
Definition and details of _ABC_.
</dd>
</dl>
This option works in GitLab- and GitHub-Flavored Markdown. It also seems to work in Visual Studio Code's Markdown preview and on Stack Overflow.
Exactly how this gets rendered depends on the CSS being applied. Here's how Stack Overflow displays description lists:
ABC
Definition and details of ABC.
EFG-PQE
Definition and details of EFG and Definition and details of PQR.
And here is how it looks in a README.md on GitLab:

Related

How can I specify HTML5 output from RMarkdown to get semantic elements like <section>?

I noticed that in HTML produced by knitr from my RMarkdown document, sections are marked up thus:
<div id="chunk_id" class="section level2">
<h2>...</h2>
<p>...</p>
</div>
and so on. I think it's best practice to use a <section> element rather than a <div> here (reference 1, reference 2), so I forked the RMarkdown code to see if I could make a change and a PR. In the code I found the following:
#'#param section_divs Wrap sections in <div> tags (or <section> tags in HTML5),
#' and attach identifiers to the enclosing <div> (or <section>) rather than the
#' header itself. ```
so it seems like there is no need for a change to RMarkdown - it will already use <section> in the way I want, if it is told to output HTML5.
My question is: how do you tell knitr to output HTML5? I have
output:
html_document:
section_divs = TRUE
but no idea how to "switch on" HTML5.

Search for property values within specific HTML tags

Using Visual Studio, within a large ASP.NET project I need to find all images that have a HTML class of "info". These images however can be applied using the following approaches:
Directly in the page, <img alt="..." title="..." class="info" />
As an ASP image, <asp:Image runat="server" ImageUrl="..." CssClass="info" />
As string concatenation, Dim s = "<img .... class=""info"" />" (notice double quotes)
Other hurdles are that images may have multiple classes, e.g. <img ... class="foo info bar" />, so a search for class="info" doesn't work. Also, other HTML elements also use this class, but should be ignored, e.g. <p class="info">Foo</p>.
I need a Regular Expression for searching that provides the following logic:
Must contain img or asp:Image (case-insensitive)
Must contain class or CssClass (case-insensitive)
Must contain info (case-sensitive)
It turns out the problem was the Visual Studio doesn't accept regular expressions that are pasted verbatim.
The test I did worked fine online (see working example)
/(img|asp:Image)(?=.*class\b)(?=.*\binfo\b).*$/igm
However, this failed to find anything in Visual Studio. I didn't realise that I needed to remove the start and end characters. Visual Studio required this revision, which works fine:
(img|asp:Image)(?=.*class\b)(?=.*\binfo\b).*$
Credit to this answer which was a lead in the right direction.

Regex to match only the first occurrence of an html element

Yes yes, I know, "don't parse HTML with Regex". I'm doing this in notepad++ and it's a one-time thing so please bear with me for a moment.
I'm trying to simplify some HTML code by using some more advanced techniques. Notably, I have "inserts" or "callouts" or whatever you call them, in my documentation, indicating "note", "warning" and "technical" short phrases to grab the attention of the reader on important information:
<div class="note">
<p><strong>Notes</strong>: This icon shows you something that complements
the information around it. Understanding notes is not critical but
may be helpful when using the product.</p>
</div>
<div class="warning">
<p><strong>Warnings</strong>: This icon shows information that may
be critical when using the product.
It is important to pay attention to these warnings.</p>
</div>
<div class="technical">
<p><strong>Technical</strong>: This icon shows technical information
that may require some technical knowledge to understand. </p>
</div>
I want to simplify this HTML into the following:
<div class="box note"><strong>Notes</strong>: This icon shows you something that complements
the information around it. Understanding notes is not critical but
may be helpful when using the product.</div>
<div class="box warning"><strong>Warnings</strong>: This icon shows information that may
be critical when using the product.
It is important to pay attention to these warnings.</div>
<div class="box technical"><strong>Technical</strong>: This icon shows technical information
that may require some technical knowledge to understand.</div>
I almost have the regex necessary to do a nice global search & replace in my project from notepad++, but it's not picking up "only" the first div, it's picking up all of them - if my cursor is at the beginning of my file, the "select" when I click Find is from the first <div class="something"> up until the last </div>, essentially.
Here's my expression: <div class="(.*[^"])">[^<]*<p>(.*?)<\/p>[^<]*<\/div> (notepad++ "automatically" adds the / / around it, kinda).
What am I doing wrong, here?
You have a greedy dot-quantifier while matching the class attribute — that's the evil guy who's causing your problems.
Make it non-greedy: <div class="(.*?[^"])"> or change it to a character class: <div class="([^"]*)">.
Compare: greedy class vs. non-greedy class.

Bluecloth: markdown to HTML results in lots of empty tags

For example, the following markdown:
# Game Version
Need For Speed Most Wanted v1.3 English version.
Results in the following HTML:
<h1>Game Version</h1>
<p></p>
<p></p>
<p>Need For Speed Most Wanted v1.3 English version.</p>
<p></p>
<p></p>
This is even more annoying in lists where every <li></li> is <br><li></li><br>, contrary to the markdown spec. I have checked my markdown and there are no extra end-of-line spaces or anything of the sort. The data is stored as a text field on Heroku Postgres.
Is this a problem with Bluecloth, or am I doing something terribly wrong?
I was actually calling simple_format on the returned string in the view:
simple_format BlueCloth.new(#event.description)
Replacing simple_format with raw fixed the problem.

HTML: sanitize a set of tags but allow all tags in <code> blocks

I'm using Django+Markdown for processing user input. Text produced by the markdown filter need to be 'safe' and is not protected by django's auto-escape mechanism, so I have to escape user input myself. This is how I do it now:
{{ text|force_escape|markdown:"codehilite" }}
However, if text contains something that would be marked as <code> by markdown, it is escaped as well and the output would be pretty ugly(e.g., '<' is displayed as < in <code>). For example, if
text = u'''
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
<script>alert("not xss 'cause I'm in <code>")</script>
'''
Using the filter mentioned above, the produced text is:
<p>
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
</p>
<pre class="codehilite">
<code>
&lt;script&gt;alert(&quot;not xss &#39;cause I&#39;m in &lt;code&gt;&quot;)&lt;/script&gt;
</code>
</pre>
What I what is:
<p>
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
</p>
<pre class="codehilite">
<code>
<script>alert("not xss 'cause I'm in <code>")</script>
</code>
</pre>
I'm thinking about using BeautifulSoup to get the <code> blocks produced by markdown and reverse-escape their content. But soup.code.text returns only the 'text', excluding the tags. so I couldn't get my hands on any of the <,>,',",&s in it..
Don't escape the input before passing it to Markdown. As you found, this breaks user input in some cases. And, it doesn't ensure security: consider, e.g., "[clickme](javascript:alert%28%22xss%22%29)".
Instead, the correct approach is to use Markdown in its safe mode. I've written elsewhere about how to do so, but the short version in Django is to use something like {{ text | markdown:"safe" }}. (Alternatively, you can apply a HTML sanitizer, like HTML Purifier, to the output of the Markdown processor.)