How can I specify HTML5 output from RMarkdown to get semantic elements like <section>? - r-markdown

I noticed that in HTML produced by knitr from my RMarkdown document, sections are marked up thus:
<div id="chunk_id" class="section level2">
<h2>...</h2>
<p>...</p>
</div>
and so on. I think it's best practice to use a <section> element rather than a <div> here (reference 1, reference 2), so I forked the RMarkdown code to see if I could make a change and a PR. In the code I found the following:
#'#param section_divs Wrap sections in <div> tags (or <section> tags in HTML5),
#' and attach identifiers to the enclosing <div> (or <section>) rather than the
#' header itself. ```
so it seems like there is no need for a change to RMarkdown - it will already use <section> in the way I want, if it is told to output HTML5.
My question is: how do you tell knitr to output HTML5? I have
output:
html_document:
section_divs = TRUE
but no idea how to "switch on" HTML5.

Related

Bootstrap group list shows text sprites at runtime

The code is listed below, using the bootstrap class class="list-group-item-text".
As the image here shows, the text seems to have a duplicate version just above it - looks almost like dotted lines.
`
<asp:Panel ID="pnlInstructions" runat="server" Visible="true">
<h4>Instructions:</h4>
<div class="form-inline col-lg-12" id="Instructions" style="padding-top:10px;padding-bottom:20px;padding-left:10px;">
<ol class="list-group-item-text" style="text-emphasis:filled;outline:none;">
<li>First, please use MyEd to get your extract file ready.</li>
<li>Then, fill in the following to Log in.</li>
</ol>
</div>
</asp:Panel>
`
I've researched the problem using the word "sprites", but that seems to have a different meaning than I expected, since I thought it meant "unwanted junk" in a display.
I'm not sure if this appears on all browsers.

How to extend Pandoc

I am using bookdown for a documentation which is outputted with bookdown::gitbook and bookdown::pdf_book.
In my Rmd files, I am using a div to wrap around notes and warnings styled with a css file. For example:
<div class="note">
This is a note.
</div>
Obviously, HTML and CSS is ignored when generating the PDF file. I was wondering if there is a way to "inject" a small script that would replace the div with, for example, a simple prefix text.
Or, is there another way to have it formatted in HTML and in the PDF without littering my file by adding something lengthy every time like:
if (knitr::is_html_output(excludes='epub')) {
cat('
<div class="note">
This is a note.
</div>
')
} else {
cat('Note: This is a note.')
}
I could also style blockquotes as described here but it is not an option as I still need blockquotes.
The appropriate way to do this is to use a fenced div rather than inserting HTML directly into your markdown and trying to parse it later with LUA. Pandoc already allows you to insert custom styles and process them to both file types. In other words, it will take care of creating the appropriate HTML and LaTeX for you, and then you just need to style each of them. The Bookdown documentation references this here, but it simply points to further documentation here, and here.
This method will create both your custom classed div in html and apply the same style name in the LaTeX code.
So, for your example, it would look like this:
::: {.note data-latex=""}
This is a note.
:::
The output in HTML will be identical to yours:
<div class="note">
<p>This is a note.</p>
</div>
And you've already got the CSS you want to style that.
The LaTeX code will be as follows:
\begin{note}
This is a note.
\end{note}
To style that you'll need to add some code to your preamble.tex file, which you've already figured out as well. Here's a very simple example of some LaTeX that would simply indent the text from both the left and right sides:
\newenvironment{note}[0]{\par\leftskip=2em\rightskip=2em}{\par\medskip}
I found this answer on tex.stackexchange.com which brought me on the right track to solve my problem.
Here is what I am doing.
Create boxes.lua with following function:
function Div(element)
-- function based on https://tex.stackexchange.com/a/526036
if
element.classes[1] == "note"
or element.classes[1] == "side-note"
or element.classes[1] == "warning"
or element.classes[1] == "info"
or element.classes[1] == "reading"
or element.classes[1] == "exercise"
then
-- get latex environment name from class name
div = element.classes[1]:gsub("-", " ")
div = div:gsub("(%l)(%w*)", function(a, b) return string.upper(a)..b end)
div = "Div"..div:gsub(" ", "")
-- insert element in front
table.insert(
element.content, 1,
pandoc.RawBlock("latex", "\\begin{"..div.."}"))
-- insert element at the back
table.insert(
element.content,
pandoc.RawBlock("latex", "\\end{"..div.."}"))
end
return element
end
Add pandoc_args to _output.yml:
bookdown::pdf_book:
includes:
in_header: latex/preamble.tex
pandoc_args:
- --lua-filter=latex/boxes.lua
extra_dependencies: ["float"]
Create environments in preamble.tex (which is also configured in _output.yml):
I am using tcolorbox instead of mdframed
\usepackage{xcolor}
\usepackage{tcolorbox}
\definecolor{notecolor}{RGB}{253, 196, 0}
\definecolor{warncolor}{RGB}{253, 70, 0}
\definecolor{infocolor}{RGB}{0, 183, 253}
\definecolor{readcolor}{RGB}{106, 50, 253}
\definecolor{taskcolor}{RGB}{128, 252, 219}
\newtcolorbox{DivNote}{colback=notecolor!5!white,colframe=notecolor!75!black}
\newtcolorbox{DivSideNote}{colback=notecolor!5!white,colframe=notecolor!75!black}
\newtcolorbox{DivWarning}{colback=warncolor!5!white,colframe=warncolor!75!black}
\newtcolorbox{DivInfo}{colback=infocolor!5!white,colframe=infocolor!75!black}
\newtcolorbox{DivReading}{colback=readcolor!5!white,colframe=readcolor!75!black}
\newtcolorbox{DivExercise}{colback=taskcolor!5!white,colframe=taskcolor!75!black}
Because I have also images and tables within the boxes, I run into LaTeX Error: Not in outer par mode.. I was able to solve this issue by adding following command to my Rmd file:
```{r, echo = F}
knitr::opts_chunk$set(fig.pos = "H", out.extra = "")
```

custom item labels in markdown

In latex the following produces a nice output (more examples here)
\begin{itemize}
\item[$ABC$] Definition and details of $ABC$.
\item[$EFG-PQE$] Definition and details of $EFG$ and Definition and details of $PQR$. Writing this sentence to make it multiline.
\end{itemize}
How to get similar output in Markdown (.md) file?
We need to consider the format this Markdown is intended to end up in.
Given that GitLab renders Markdown to HTML, description lists are your best bet:
<dl>
<dt>ABC</dt>
<dd>Definition and details of ABC.</dd>
<dt>EFG-PQE</dt>
<dd>Definition and details of EFG and Definition and details of PQR.</dd>
</dl>
The list itself is defined by a <dl> tag. Terms go into a <dt> and descriptions go into a <dl>.
Since GitLab-Flavored Markdown has no special syntax for description lists, you'll have to use inline HTML. And since Markdown is not normally processed in block-level HTML tags, if you want to format the terms in italics, as your rendered LaTeX example shows, you'll have to do one of two things:
Use inline HTML again:
<dl>
<dt>ABC</dt>
<dd>Definition and details of <em>ABC</em>.</dd>
<!-- ^^^^ ^^^^^ -->
</dl>
This option should work with pretty much any Markdown tool.
Put the Markdown content on its own line, separated from the HTML by whitespace:
<dl>
<dt>ABC</dt>
<dd>
Definition and details of _ABC_.
</dd>
</dl>
This option works in GitLab- and GitHub-Flavored Markdown. It also seems to work in Visual Studio Code's Markdown preview and on Stack Overflow.
Exactly how this gets rendered depends on the CSS being applied. Here's how Stack Overflow displays description lists:
ABC
Definition and details of ABC.
EFG-PQE
Definition and details of EFG and Definition and details of PQR.
And here is how it looks in a README.md on GitLab:

Getting the first author in a pandoc template (Rmarkdown)

I am currently creating my own pandoc template for Rmarkdown (outputting html). I want my report to show a footer containing the title and first author's name.
Reading the pandoc manual, I saw that it is possible to use a pipe to get the first element of an array (var/first). So having the header:
title: "My Report"
author:
- "Jane Doe"
- "John Doe"
I tried to do the following in the template:
<div class="footer">
<div class="footer-content">
<div class="footer-title">
<h3>$title$</h3>
</div>
<div class="footer-author">
<h3>$author/first$</h3>
</div>
</div>
</div>
Then I got as error in the author line:
"template" (line 96, column 24):
unexpected "/"
expecting "." or "$"
Please note that the pandoc version used by rmarkdown is 2.3.1.
Is there a different way to accomplish that?
You could write a small Lua filter to get just the first author:
function Meta (meta)
local firstauthor = meta.author.t == 'MetaList'
and meta.author[1]
or meta.author
meta.firstauthor = firstauthor
return meta
end
You can then use $firstauthor$ in your template. See here for a brief discussion of Lua filters and how to use them.

HTML: sanitize a set of tags but allow all tags in <code> blocks

I'm using Django+Markdown for processing user input. Text produced by the markdown filter need to be 'safe' and is not protected by django's auto-escape mechanism, so I have to escape user input myself. This is how I do it now:
{{ text|force_escape|markdown:"codehilite" }}
However, if text contains something that would be marked as <code> by markdown, it is escaped as well and the output would be pretty ugly(e.g., '<' is displayed as < in <code>). For example, if
text = u'''
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
<script>alert("not xss 'cause I'm in <code>")</script>
'''
Using the filter mentioned above, the produced text is:
<p>
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
</p>
<pre class="codehilite">
<code>
&lt;script&gt;alert(&quot;not xss &#39;cause I&#39;m in &lt;code&gt;&quot;)&lt;/script&gt;
</code>
</pre>
What I what is:
<p>
<script>alert("I'm not working 'cause I'll be escaped")</script>
The following would be marked as a code block:
</p>
<pre class="codehilite">
<code>
<script>alert("not xss 'cause I'm in <code>")</script>
</code>
</pre>
I'm thinking about using BeautifulSoup to get the <code> blocks produced by markdown and reverse-escape their content. But soup.code.text returns only the 'text', excluding the tags. so I couldn't get my hands on any of the <,>,',",&s in it..
Don't escape the input before passing it to Markdown. As you found, this breaks user input in some cases. And, it doesn't ensure security: consider, e.g., "[clickme](javascript:alert%28%22xss%22%29)".
Instead, the correct approach is to use Markdown in its safe mode. I've written elsewhere about how to do so, but the short version in Django is to use something like {{ text | markdown:"safe" }}. (Alternatively, you can apply a HTML sanitizer, like HTML Purifier, to the output of the Markdown processor.)