How to generate conditional words in the text? (Inline code) - r-markdown

I want to have some conditional words in my R Markdown document. Depending of the outcome in some of the calculations from the tables different words should show up in the ordinary text. Please, see my example below:
The table (a chunk):
testtabell <- matrix(c(32, 33, 45, 67, 21, 56, 76, 33, 22), ncol=3,byrow = TRUE)
colnames(testtabell) <- c("1990", "1991", "1992")
rownames(testtabell) <- c("Region1", "Region2", "Region3")
testtabell <- as.table(testtabell)
testtabell
This should be in the inline code and generate different word options in the regular text flow in the RMD:
`r if testtabell[2,2]-[2,1] < testtabell[3,2]-testtabell[3,1] then type "under" or else "above"`

Although the question is ancient, maybe someone can use this solution.
You can use the following inline code:
`r ifelse(testtabell[2,2]-testtabell[2,1] < testtabell[3,2]-testtabell[3,1],"under","above")`

Related

Is there a way to compare formatted value by NumberFormat?

I'm creating unit tests for a currency format using intl package last version 0.16.1.
My formatter is: final currencyFormat = NumberFormat.simpleCurrency(locale: 'pt_BR');
My expected result is: var expected = r'R$ 1,00';. And I applied my formatter to value 1.0 in this way: var formatted = currencyFormat.format(1.0) resulting in R$ 1,00.
When I test two values using an expect(formatted, expected), it show me that the results are not the same.
Expected: 'R$ 1,00'
Actual: 'R$ 1,00'
Which: is different.
Expected: R$ 1,00
Actual: R$ 1,00
^
Differ at offset 2
Well, I had take some time to discover that the runes of two strings have one character diff.
expected runes: (82, 36, 32, 49, 44, 48, 48)
formatted runes: (82, 36, 160, 49, 44, 48, 48)
My question is: if is using a Non-breaking space when formatting, how can I avoid this error when there is no documentation talking about it?
Until now, the only way I achieved was put unicode character to my expected string: var expected = 'R\$\u{00A0}1,00';.
This is not a good solution because if we have many formatters inside app and test all that use Non-breaking space we have to put in our expected value the ASCII code.

R-markdown reproducibility with set.seed

I wrote an R script to randomly assign participants for and RCT. I used set.seed() to ensure I would have reproducible results.
I now want to document what I have done in an R markdown document and confusingly I don't get the same results, despite using the same seed.
Here is the code chunk:
knitr::opts_chunk$set(cache = T)
set.seed(4321)
Group <- sample(1:3, 5, replace=TRUE)
couple.df <- data.frame(couple.id=1:5,
partner1=paste0("FRS0", c(35, 36, 41, 50, 61)),
partner2=paste0("FRS0", c(38, 37, 42, 51, 62)),
Group)
print(couple.df)
And here is the output I get when running it as a chunk:
couple.id
<int>
partner1
<chr>
partner2
<chr>
Group
<int>
1 FRS035 FRS038 2
2 FRS036 FRS037 3
3 FRS041 FRS042 2
4 FRS050 FRS051 1
5 FRS061 FRS062 3
(not sure how to get this to format)
This is the same as I had when I wrote the original code as an R script.
However, when I knit the markdown file I get the following output in my html document (sorry again about the formatting - I have just copied and pasted from the html document, adding in the ticks to format it as code, pointers on how to do this properly would also be welcome)
knitr::opts_chunk$set(cache = T)
set.seed(4321)
Group <- sample(1:3, 5, replace=TRUE)
couple.df <- data.frame(couple.id=1:5,
partner1=paste0("FRS0", c(35, 36, 41, 50, 61)),
partner2=paste0("FRS0", c(38, 37, 42, 51, 62)),
Group)
print(couple.df)
## couple.id partner1 partner2 Group
## 1 1 FRS035 FRS038 1
## 2 2 FRS036 FRS037 2
## 3 3 FRS041 FRS042 3
## 4 4 FRS050 FRS051 2
## 5 5 FRS061 FRS062 1
That is, they are different. What is going on here and how can I get the markdown document to give the same results? I am committed to using the allocation I arrived at using the original script.

Split parts of string defined by multiple delimiters into multiple variables in R

I have a large list of file names that I need to extract information from using R. The info is delimited by multiple dashes and underscores. I am having trouble figuring out a method that will accommodate the fact that the number of characters between delimiters is not consistent (the order of the information will remain constant, as will the delimiters used (hopefully)).
For example:
f <- data.frame(c("EI-SM4-AMW11_20160614_082800.wav", "PA-RF-A50_20160614_082800.wav"), stringsAsFactors = FALSE)
colnames(f)<-"filename"
f$area <- str_sub(f$filename, 1, 2)
f$rec <- str_sub(f$filename, 4, 6)
f$site <- str_sub(f$filename, 8, 12)
This produces correct results for the first file, but incorrect results for the second.
I've tried using the "stringr" and "stringi" packages, and know that hard coding the values in doesn't work, so I've come up with awkward solutions using both packages such as:
f$site <- str_sub(f$filename,
stri_locate_last(f$filename, fixed="-")[,1]+1,
stri_locate_first(f$filename, fixed="_")[,1]-1)
I feel like there must be a more elegant (and robust) method, perhaps involving regex (which I am painfully new to).
I've looked at other examples (Extract part of string (till the first semicolon) in R, R: Find the last dot in a string, Split string using regular expressions and store it into data frame).
Any suggestions/pointers would be very much appreciated.
Try this, from the `tidyr' package:
library(tidyr)
f %>% separate(filename, c('area', 'rec', 'site'), sep = '-')
You can also split along multiple difference delimeters, like so:
f %>% separate(filename, c('area', 'rec', 'site', 'date', 'don_know_what_this_is', 'file_extension'), sep = '-|_|\\.')
and then keep only the columns you want using dplyr's select function:
library(dplyr)
library(tidyr)
f %>%
separate(filename,
c('area', 'rec', 'site', 'date',
'don_know_what_this_is', 'file_extension'),
sep = '-|_|\\.') %>%
select(area, rec, site)
Something like this:
library(stringr)
library(dplyr)
f$area <- word(f$filename, 1, sep = "-")
f$rec <- word(f$filename, 2, sep = "-")
f$site <- word(f$filename, 3, sep = "-") %>%
word(1,sep = "_")
dplyr is not necessary but makes concatenation cleaner.
The function word belongs to stringr.

In regex, mystery Error: assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634

Assume 900+ company names pasted together to form a regex pattern using the pipe separator -- "firm.pat".
firm.pat <- str_c(firms$firm, collapse = "|")
With a data frame called "bio" that has a large character variable (250 rows each with 100+ words) named "comment", I would like to replace all the company names with blanks. Both a gsub call and a str_replace_all call return the same mysterious error.
bio$comment <- gsub(pattern = firm.pat, x = bio$comment, replacement = "")
Error in gsub(pattern = firm.pat, x = bio$comment, replacement = "") :
assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634
library(stringr)
bio$comment <- str_replace_all(bio$comment, firm.pat, "")
Error: assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634
traceback() did not enlighten me.
> traceback()
4: gsub("aaronson rappaport|adams reese|adelson testan|adler pollock|ahlers cooney|ahmuty demers|akerman|akin gump|allen kopet|allen matkins|alston bird|alston hunt|alvarado smith|anderson kill|andrews kurth|archer
# hundreds of lines of company names omitted here
lties in all 50 states and washington, dc. results are compiled through a peer-review survey in which thousands of lawyers in the u.s. confidentially evaluate their professional peers."
), fixed = FALSE, ignore.case = FALSE, perl = FALSE)
3: do.call(f, compact(args))
2: re_call("gsub", string, pattern, replacement)
1: str_replace_all(bio$comment, firm.pat, "")
Three other posts have mentioned the cryptic error on SO, a passing reference and cites two other oblique references, but with no discussion.
I know this question lacks reproducible code, but even so, how do I find out what the error is explaining? Even better, how do I avoid throwing the error? The error does not seem to occur with smaller numbers of companies but I can't detect a pattern or threshold. I am running Windows 8, RStudio, updated versions of every package.
Thank you.
I had the same problem with pattern consisiting of hundreds of manufacters names. As I can suggest the pattern is too long, so I split it in two or more patterns and it works well.
ml<-length(firms$firm)
xyz<-gsub(sprintf("(*UCP)\\b(%s)\\b", paste(head(firms$firm,n=ml/2), collapse = "|")), "", bio$comment, perl=TRUE)
xyz<-gsub(sprintf("(*UCP)\\b(%s)\\b", paste(tail(firms$firm,n=ml/2), collapse = "|")), "", xyz, perl=TRUE)
You can use mgsub in the qdap package, which is an extension to gsub that handles vectors of patterns and replacements.
Please refer to this Answer

How to measure the similarity between three vectors?

How to measure the similarity between three vectors?
Suppose I have three students and their subjects marks.
Student 1 (12,23,43,35,21)
Student 2 (23, 34, 45, 25.17) and
Student 3 (34, 43, 22, 11, 39)
now I want to measure the similarity between these three students. Can anyone help me on this. Thanks in advance.
You want similarity, not dissimilarity. The latter is available in numerous functions, some noted in the comments. The most commonly used metric for dissimilarity is Euclidean distance.
To measure similarity, you could use the simil(...) function in the proxy package in R, as shown below. Assuming that the scores are in the same order for each student, you would combine the scores into a matrix row-wise, then:
Student.1 <- c(12, 23, 43, 35, 21)
Student.2 <- c(23, 34, 45, 25, 17)
Student.3 <- c(34, 43, 22, 11, 39)
students <- rbind(Student.1,Student.2,Student.3)
library(proxy)
simil(students,method="Euclidean")
# Student.1 Student.2
# Student.2 0.04993434
# Student.3 0.02075985 0.02593140
This calculates the Euclidean distance for every student vs. every other student, and converts that to a similarity score using
sim = 1 / (1+dist)
So if the scores for two students are identical, their similarity will be 1.
But this is only one way to do it. There are 48 similarity/distance metrics coded in the proxy package, which can be listed using:
pr_DB$get_entries()
You can even code your own metric, using, e.g.,
simil(students,FUN=f)
where f(x,y) is a function that takes two vectors as arguments and returns a similarity score defined as you like. This might be relevant if, for example, some courses were "more important" in the sense that you wanted to weight differences wrt those courses more highly than the others.