XSS concerns when using OpenCPU and knitr to print user-supplied data - xss

I'm using openCPU and knitr to generate custom feedback after surveys. To this end, I basically let survey developers specify rmd files. In this use case, the survey developers are trusted, but the survey takers may not be.
I'm now thinking about XSS. It's not a big worry as user feedback will of course usually only be displayed to the user who entered the data on display, but of course characters like '<' will be used for non-malicious reasons and I'd like to think ahead and explore some of the trials and tribulations of freely mixing R with web apps.
Knitr and R generally were not made with untrusted users and XSS in my mind. OpenCPU rectifies many security issues with running AppArmored-R as an API, but I wonder whether a maximum-flexibility approach like mine can also be proofed.
Possible points at which one might separate trusted and untrusted markup:
Before knitting, i.e. I pass escaped user data to the rmd-file. Drawback: An oblivious survey dev might unescape it accidentally or because it's annoying in some context.
During knitting. This would be ideal, I guess, but I don't know if it's possible, especially if a survey dev could potentially alter chunk settings.
After knitting. I think it's impossible to separate trusted and untrusted markup post-hoc.
Some code to paste into OpenCPU's knitr app:
```{r}
good_userdata = 'I like brackets [].'
bad_userdata = 'some text should not be
[linked](javascript:location.href=\'http://example.com?secrets\';), <s>struck</s> or __bold__'
escape_html = highr:::escape_html
escape_md <- function(x){
x <- gsub('\\[', '\\\\[', x);
x <- gsub('_', '\\\\_', x);
x
}
good_userdata_escaped = escape_md(escape_html(good_userdata))
bad_userdata_escaped = escape_md(escape_html(bad_userdata))
```
## let's say survey devs wants to print text like this
```{r}
cat(good_userdata_escaped)
cat(bad_userdata_escaped) # doesn't know about text like this
```
## gets annoyed, does
```{r}
good_userdata_escaped <- gsub('\\\\', '', good_userdata_escaped);
bad_userdata_escaped <- gsub('\\\\', '', bad_userdata_escaped);
```
##
so that this looks nice
```{r}
cat(good_userdata_escaped)
```
## later renders the same text inline, so that is evaluated as markdown
`r good_userdata_escaped # doesn't look dangerous`
`r bad_userdata_escaped`
Edit 2
Sorry, I had provided only some HTML tags, thinking possible XSS attacks were obvious. Michel Fortin had some examples on his page.

I'm not 100% sure I understand your concern. If you're worried about XSS, you're worried about users including a javascript tag or so in the markdown right?
```{r}
userdata = '<script>alert("I am evil")</script>'
```
```{r,results='asis'}
cat(userdata)
```
You can prevent this by escaping html characters. I think there's a section on this in the markdown definition. So you would need to escape all user input, either when inserting it in your DB or when extracting it:
escape <- function(x){
x <- gsub("<", "<", x);
x <- gsub(">", ">", x);
x <- gsub("&", "&", x);
x
}
Try running the following:
```{r output}
escape <- function(x){
x <- gsub("&", "&", x);
x <- gsub("<", "<", x);
x <- gsub(">", ">", x);
x
}
```
```{r}
userdata = escape('<script>alert("I am evil")</script>')
```
```{r,results='asis'}
cat(userdata)
```
That should take care of any code injection. I'm not quite sure how the __bold__ example is a concern, because afaics this can not be used for an XSS attack as there is no scripting. But if you want to prevent users from messing with layout too, than you should escape all markdown characters I guess.

Related

Word count in Quarto

I would love a convenient and easy way to print my word count automatically in quarto and stumbled across this nice add-in from Ben Marwick:
https://github.com/benmarwick/wordcountaddin
It is sound for rmarkdown and I presumed it should be no issue with quarto too. However, when I use the add-in, though it can count out the number of words within my RStudio session, it doesn't print it in my final pdf format and just returns [1] NA.
{r, #wordcountdev, message = FALSE, warning = FALSE, echo = FALSE}
wordstats <- wordcountaddin:::text_stats('CMI Write Up.qmd')
words <- substr(wordstats[3], start=19, stop=30)
print(words)
I don't understand what is going on here, it is seemingly simple, would anyone know of a better way to achieve what I'm trying?
You could take a look at the wordcounts pandoc filter, e.g. as a starting point it prints the number of words in the body to the console while rendering:
---
format: html
filters: [wordcounts.lua]
---
Hello there, how many words are in the body?
Or: You can use the development version (devtools::install_github("benmarwick/wordcountaddin", type = "source", dependencies = TRUE)) of the above mentioned package:
---
format: html
---
```{r}
#| echo: false
#| label: wordstats
#| warning: false
#| message: false
wordcount <- wordcountaddin::text_stats('wordcount.qmd')
words <- substr(wordcount[3], start=19, stop=30)
```
Hello there, how many words are in the body?
There are `r words` words in the whole document.

Unpredictable figure size in RMD chunk in code loop

I have an rmd chunk of r code with a loop. The structure of the code is like this:
```{r echo=FALSE, results="asis", out.width="100%"}
## out.width="100%"
## fig.width=12
## fig.height=(6+2*ceiling(6/4))
section_number <- 3
i = 1 ## for testing
while (i <= length(target_var_list)) {
target_var <- target_var_list[i]
data_segments <- data_segments(wrangled_devices, target_var)
# Code
exposure_chart_data <- monkeyr::get_exposure_chart_data(wrangled_obs, wrangled_devices, target_var)
exposure_plot <- monkeyr::get_exposure_plot(exposure_chart_data, target_var)
# knitr::opts_chunk$set(fig.height=(6+2*ceiling(data_segments/4)))
print(exposure_plot)
# print(exposure_plot, fig.height=(12+2*ceiling(data_segments/4)))
section_number <- section_number + 1
cat("\n\n\n")
i <- i + 1
}
```
I have commented out a few attempts I made to control the width and height of the plot. And I have commented out 2 attempts I made to control the knitr behaviour on a per plot basis.
The problem is that I can't find a reliable way to control the plot size that accommodates different lengths of the target_var_length.
It is possible to control the height at chunk level, but that is then fixed, and won't respond to each element in the loop. Here are some viz. What I would like is for the actual bar to be the same size in every case. So the case with 3 values would be 75% as wide as the 4. And the case with 7 would look be 2 rows, so twice the height of the 4. Do you see what I mean...
After quite a few hours of messing around with different approaches, here are some insights and an answer.
knitr::opts_chunk$set
I expected this to take effect on execution and change the chunk options for whatever elements follow. To change the plot height based on the number of rows / column in a facetted plot, I tried this:
knitr::opts_chunk$set(fig.height=(6+2*ceiling(data_segments/4)))
However it has no effect. The documentation bears this out. This actually sets the default chunk settings for subsequent chunks, and has no effect whatsoever on the current chunk. I encountered another function:
knitr::opts_current$set(fig.height=(6+2*ceiling(data_segments/4)))
The documentation as much as warns you off using this. And I found that it didn't achieve the expected results either in any case.
Blind Hope
I considered the possibility that I was overthinking this and left it up to blind hope by removing all efforts to control the height. Sometimes things just work out you know! ... They didn't.
Using an rmd child chunk
This is the approach that I finally got to work. It's a slightly horrible hack. My first effort was to create a separate rmd file for each plot:
```{r echo=FALSE, results="asis", out.width="100%", fig.height=(6+2*ceiling(data_segments/4))}
print(myPlot)
```
But that meant creating lots of new rmd plots. I have a major problem with how messy that would get. So I cleaned it up by using a single rmd file for any plot and lumped the code to call it into a fucntion.
resize_plot <- function(resizePlot, resizeHeight) {
resizePlot <- resizePlot
resizeHeight <- resizeHeight
res <- knitr::knit_child('resizePlot.rmd', quiet = TRUE)
cat(res, sep = '\n')
}
Now to insert a custom height plot I just call my new function:
resize_plot(exposure_plot, 3.25*ceiling(data_segments/4))
And the single rmd file just looks like this:
```{r echo=FALSE, results="asis", out.width="100%", fig.height=resizeHeight}
print(resizePlot)
```
And bingo - it looks perfect!

How to hide data parsing code in R Markdown

I am pretty new in R markdown and while creating a report "I like to hide data parsing code"
I have tried "echo=FALSE", but it does not doing the job in this particular case.
Any help would be much appreciated.
**Data Parsing**
```{r echo=FALSE}
library(ggplot2)
library(readr)
SU1 <- read_csv("......./SU.csv")
I like to hide the below chunk in the report
Thanks
Thanks.In fact i was able to solve it by adding Message= FALSE.

Automatically Change RMarkdown Text Styles from One to Another

Due to different journal requirements, I need to frequently change certain text styles within Rmarkdown from one kind to another. For instance, here is an example Rmarkdown document.
---
title: "Changing format of Rmarkdown"
author: "Paul Hargarten"
date: "5/9/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an $\mathcal{R}$ Markdown document. Markdown is a simple formatting syntax for authoring **HTML**, **PDF**, and **MS Word* documents. For more details on using $\mathcal{R}$ Markdown see <http://rmarkdown.rstudio.com>. $\matcal{R}$ is based on $\mathcal{S}$.
When you click the `Knit` button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. Calculate a `summary` as follows:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example: `r plot(pressure)`.
Without searching for the exact phrase, suppose that I would like to find and replace the following items:
1. Change items in bold ** ... ** to italics _ ... _.
2. Change items that look like $\mathcal{ ... }$ to bold ** ... **.
3. Change special font `...`, except those that start with r, to \code{ ... }.
4. Add dollar signs to `r ... ` => $`r ... `$.
Is this possible to use regex to make these formatting style changes in
Rmarkdown? Thanks!
This is something that LaTeX is good at, but it will be harder with Markdown.
If you were entirely in LaTeX, you could define your own macros based on the uses for that markup. For example,
\newcommand{\booktitle}[1]{\textbf #1}
used for book titles as \booktitle{The Book}. If you wanted to switch book titles to italic, you'd just change that definition.
Doing this in R Markdown is possible, but you wouldn't be able to mark book titles using **. You could do it (you can embed LaTeX in R Markdown), but it's ugly. For example,
---
title: Using LaTeX
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
\newcommand{\booktitle}[1]{\textbf{#1}}
This is \booktitle{The Book}.
Once you're doing this, you might as well switch to Sweave-like *.Rnw format, or all the way to LaTeX.

Remove Hashes in R Output from R Markdown and Knitr

I am using RStudio to write my R Markdown files. How can I remove the hashes (##) in the final HTML output file that are displayed before the code output?
As an example:
---
output: html_document
---
```{r}
head(cars)
```
You can include in your chunk options something like
comment=NA # to remove all hashes
or
comment='%' # to use a different character
More help on knitr available from here: http://yihui.name/knitr/options
If you are using R Markdown as you mentioned, your chunk could look like this:
```{r comment=NA}
summary(cars)
```
If you want to change this globally, you can include a chunk in your document:
```{r include=FALSE}
knitr::opts_chunk$set(comment = NA)
```
Just HTML
If your output is just HTML, you can make good use of the PRE or CODE HTML tag.
Example
```{r my_pre_example,echo=FALSE,include=TRUE,results='asis'}
knitr::opts_chunk$set(comment = NA)
cat('<pre>')
print(t.test(mtcars$mpg,mtcars$wt))
cat('</pre>')
```
HTML Result:
Welch Two Sample t-test
data: mtcars$mpg and mtcars$wt
t = 15.633, df = 32.633, p-value < 0.00000000000000022
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
14.67644 19.07031
sample estimates:
mean of x mean of y
20.09062 3.21725
Just PDF
If your output is PDF, then you may need some replace function. Here what I am using:
```r
tidyPrint <- function(data) {
content <- paste0(data,collapse = "\n\n")
content <- str_replace_all(content,"\\t"," ")
content <- str_replace_all(content,"\\ ","\\\\ ")
content <- str_replace_all(content,"\\$","\\\\$")
content <- str_replace_all(content,"\\*","\\\\*")
content <- str_replace_all(content,":",": ")
return(content)
}
```
Example
The code also needs to be a little different:
```{r my_pre_example,echo=FALSE,include=TRUE,results='asis'}
knitr::opts_chunk$set(comment = NA)
resultTTest <- capture.output(t.test(mtcars$mpg,mtcars$wt))
cat(tidyPrint(resultTTest))
```
PDF Result
PDF and HTML
If you really need the page work in both cases PDF and HTML, the tidyPrint should be a little different in the last step.
```r
tidyPrint <- function(data) {
content <- paste0(data,collapse = "\n\n")
content <- str_replace_all(content,"\\t"," ")
content <- str_replace_all(content,"\\ ","\\\\ ")
content <- str_replace_all(content,"\\$","\\\\$")
content <- str_replace_all(content,"\\*","\\\\*")
content <- str_replace_all(content,":",": ")
return(paste("<code>",content,"</code>\n"))
}
```
Result
The PDF result is the same, and the HTML result is close to the previous, but with some extra border.
It is not perfect but maybe is good enough.