Problem for line breaks (\n) with gtsummary functions - line-breaks

I have a problem trying to include line breaks into arguments of gtsummary functions, as statistic argument of tbl_summary() or update for modify_header(). It's a bit strange because it always have worked until now, and the package documentation indicates that this the way to do so...
Here a reproducible example :
## loading packages ##
library(dplyr)
library(gtsummary)
## gtsummary table ##
trial %>% tbl_summary(include = c("trt","stage","grade"),
by = "trt",
statistic = all_categorical() ~ "{p}% \n ({n})", # \n does not pass "({n})" to next line...
missing = "no") %>%
modify_header(update =list(all_stat_cols() ~ "**{level}** \n ({p}%, \n N = {n})"), # and here as well...
text_interpret = "md")
gtsummary cross table
Does the problem uniquely come from my computer ? Could it be due to a recent package update ?

Related

Knit a kable to PDF in RMarkdown that includes special characters in the table values

I'm trying to format a kable that includes the permyriad sign "‱". Permyriad means 1 out of every 10,000, so 1‱ = 0.01%.
I can get it to work with the special character σ, as in the screenshot and code below. Looking for a way to replace "σ" replaced with "‱".
I am pretty sure that there exists a magical combination of the three variables what_should_this_be, should_i_escape_or_not, and id_like_to_use_booktabs that will do the trick.
I'm doing this within RStudio using the tinytex package.
Here's what I've attempted so far:
The exact value for the variable what_should_this_be that results in knitting the ‱ sign in the final pdf. The Unicode value for "‱" is U+2031.
Values I've tried:
Combinations of "\textperthousand" with varying numbers of escapes, with and without brackets, with and without opening & closing $
Copy-pasting the ‱ symbol directly with varying numbers of escapes
"\U2031" with varying numbers of escapes
Various combinations with should_i_escape_or_not set to TRUE or FALSE.
I'd like to use booktabs... but that might be asking a bit much, so I've tried various combinations setting id_like_to_use_booktabs to TRUE or FALSE.
Various combinations of setting the "Typeset LaTeX into PDF using:" option in RStudio > Tools > Sweave
```{r, echo = FALSE}
library(magrittr)
what_should_this_be <- "$\\sigma$"
should_escape_or_not <- FALSE
id_like_to_use_booktabs <- TRUE
knitr::kable(
head(mtcars) %>%
dplyr::select(mpg) %>%
tibble::rownames_to_column("car") %>%
dplyr::mutate(mpg = paste0(mpg, what_should_this_be)),
align = "cc",
escape = should_escape_or_not,
booktabs = id_like_to_use_booktabs,
caption = "Works with character $\\sigma$, but what about permyriad?"
)
```
Could use the textcomp package and \textpertenthousand
---
output:
pdf_document:
latex_engine: xelatex
header-includes:
- \usepackage{textcomp}
---
```{r, echo = FALSE}
library(magrittr)
what_should_this_be <- "\\textpertenthousand"
knitr::kable(
head(mtcars) %>%
dplyr::select(mpg) %>%
tibble::rownames_to_column("car") %>%
dplyr::mutate(mpg = paste0(mpg, what_should_this_be)),
align = "cc",
escape = F,
booktabs = T,
caption = "Works with character $\\sigma$, but what about permyriad?"
)
```

Extracting text after "?"

I have a string
x <- "Name of the Student? Michael Sneider"
I want to extract "Michael Sneider" out of it.
I have used:
str_extract_all(x,"[a-z]+")
str_extract_all(data,"\\?[a-z]+")
But can't extract the name.
I think this should help
substr(x, str_locate(x, "?")+1, nchar(x))
Try this:
sub('.*\\?(.*)','\\1',x)
x <- "Name of the Student? Michael Sneider"
sub(pattern = ".+?\\?" , x , replacement = '' )
To take advantage of the loose wording of the question, we can go WAY overboard and use natural language processing to extract all names from the string:
library(openNLP)
library(NLP)
# you'll also have to install the models with the next line, if you haven't already
# install.packages('openNLPmodels.en', repos = 'http://datacube.wu.ac.at/', type = 'source')
s <- as.String(x) # convert x to NLP package's String object
# make annotators
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
entity_annotator <- Maxent_Entity_Annotator()
# call sentence and word annotators
s_annotated <- annotate(s, list(sent_token_annotator, word_token_annotator))
# call entity annotator (which defaults to "person") and subset the string
s[entity_annotator(s, s_annotated)]
## Michael Sneider
Overkill? Probably. But interesting, and not actually all that hard to implement, really.
str_match is more helpful in this situation
str_match(x, ".*\\?\\s(.*)")[, 2]
#[1] "Michael Sneider"

In regex, mystery Error: assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634

Assume 900+ company names pasted together to form a regex pattern using the pipe separator -- "firm.pat".
firm.pat <- str_c(firms$firm, collapse = "|")
With a data frame called "bio" that has a large character variable (250 rows each with 100+ words) named "comment", I would like to replace all the company names with blanks. Both a gsub call and a str_replace_all call return the same mysterious error.
bio$comment <- gsub(pattern = firm.pat, x = bio$comment, replacement = "")
Error in gsub(pattern = firm.pat, x = bio$comment, replacement = "") :
assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634
library(stringr)
bio$comment <- str_replace_all(bio$comment, firm.pat, "")
Error: assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634
traceback() did not enlighten me.
> traceback()
4: gsub("aaronson rappaport|adams reese|adelson testan|adler pollock|ahlers cooney|ahmuty demers|akerman|akin gump|allen kopet|allen matkins|alston bird|alston hunt|alvarado smith|anderson kill|andrews kurth|archer
# hundreds of lines of company names omitted here
lties in all 50 states and washington, dc. results are compiled through a peer-review survey in which thousands of lawyers in the u.s. confidentially evaluate their professional peers."
), fixed = FALSE, ignore.case = FALSE, perl = FALSE)
3: do.call(f, compact(args))
2: re_call("gsub", string, pattern, replacement)
1: str_replace_all(bio$comment, firm.pat, "")
Three other posts have mentioned the cryptic error on SO, a passing reference and cites two other oblique references, but with no discussion.
I know this question lacks reproducible code, but even so, how do I find out what the error is explaining? Even better, how do I avoid throwing the error? The error does not seem to occur with smaller numbers of companies but I can't detect a pattern or threshold. I am running Windows 8, RStudio, updated versions of every package.
Thank you.
I had the same problem with pattern consisiting of hundreds of manufacters names. As I can suggest the pattern is too long, so I split it in two or more patterns and it works well.
ml<-length(firms$firm)
xyz<-gsub(sprintf("(*UCP)\\b(%s)\\b", paste(head(firms$firm,n=ml/2), collapse = "|")), "", bio$comment, perl=TRUE)
xyz<-gsub(sprintf("(*UCP)\\b(%s)\\b", paste(tail(firms$firm,n=ml/2), collapse = "|")), "", xyz, perl=TRUE)
You can use mgsub in the qdap package, which is an extension to gsub that handles vectors of patterns and replacements.
Please refer to this Answer

Pattern matching in dataset

been struggling with this for a while.
I have a dataset with two columns, a Description column and the other is the pattern column that I am trying to match against the description column.If the corresponding pattern exists in the Description column, it needs to be replaced by an asterisk
For instance, if the Description is ABCDEisthedescription and the Pattern is ABCDE, then the new description should *isthedescription
I tried the following
data$NewDescription <- gsub(data$pattern,"\\*",Data$Description )
since there is more than one row in the dataset, it throws an error ( a warning rather)
"argument 'pattern' has length > 1 and only the first element will be used"
Any help will be hugely appreciated.
You can use an mapply here to apply the function to each row.
#sample data
data<-data.frame(
pattern=c("ABCDE","XYZ"),
Description=c("ABCDEisthedescription", "sillyXYZvalue")
)
Now use mapply
mapply(function(p,d) gsub(p, "\\*", d, fixed=T), data$pattern, data$Description)
# [1] "\\*isthedescription" "silly\\*value"
Additionally,
Patterns <- paste0(
sample(LETTERS[1:4],500,replace=TRUE),
sample(LETTERS[1:4],500,replace=TRUE),
sample(LETTERS[1:4],500,replace=TRUE),
sample(LETTERS[1:4],500,replace=TRUE))
##
Desc <- paste0(Patterns,"isthedescription")
Ptrn <- sample(Patterns,500)
##
Data <- data.frame(
Description=Desc,
Pattern=Ptrn,
stringsAsFactors=FALSE)
##
newDesc <- sapply(1:nrow(Data), function(X){
if(substr(Data$Description[X],1,4)==Data$Pattern[X]){
gsub(Data$Pattern[X],"*",Data$Description[X])
} else {
Data$Description[X]
}
})
#MrFlick's approach seems more concise though.

read table with spaces in one column

I am attempting to extract tables from very large text files (computer logs). Dickoa provided very helpful advice to an earlier question on this topic here: extracting table from text file
I modified his suggestion to fit my specific problem and posted my code at the link above.
Unfortunately I have encountered a complication. One column in the table contains spaces. These spaces are generating an error when I try to run the code at the link above. Is there a way to modify that code, or specifically the read.table function to recognize the second column below as a column?
Here is a dummy table in a dummy log:
> collect.models(, adjust = FALSE)
model npar AICc DeltaAICc weight Deviance
5 AA(~region + state + county + city)BB(~region + state + county + city)CC(~1) 17 11111.11 0.0000000 5.621299e-01 22222.22
4 AA(~region + state + county)BB(~region + state + county)CC(~1) 14 22222.22 0.0000000 5.621299e-01 77777.77
12 AA(~region + state)BB(~region + state)CC(~1) 13 33333.33 0.0000000 5.621299e-01 44444.44
12 AA(~region)BB(~region)CC(~1) 6 44444.44 0.0000000 5.621299e-01 55555.55
>
> # the three lines below count the number of errors in the code above
Here is the R code I am trying to use. This code works if there are no spaces in the second column, the model column:
my.data <- readLines('c:/users/mmiller21/simple R programs/dummy.log')
top <- '> collect.models\\(, adjust = FALSE)'
bottom <- '> # the three lines below count the number of errors in the code above'
my.data <- my.data[grep(top, my.data):grep(bottom, my.data)]
x <- read.table(text=my.data, comment.char = ">")
I believe I must use the variables top and bottom to locate the table in the log because the log is huge, variable and complex. Also, not every table contains the same number of models.
Perhaps a regex expression could be used somehow taking advantage of the AA and the CC(~1) present in every model name, but I do not know how to begin. Thank you for any help and sorry for the follow-up question. I should have used a more realistic example table in my initial question. I have a large number of logs. Otherwise I could just extract and edit the tables by hand. The table itself is an odd object which I have only ever been able to export directly with capture.output, which would probably still leave me with the same problem as above.
EDIT:
All spaces seem to come right before and right after a plus sign. Perhaps that information can be used here to fill the spaces or remove them.
try inserting my.data$model <- gsub(" *\\+ *", "+", my.data$model) before read.table
my.data <- my.data[grep(top, my.data):grep(bottom, my.data)]
my.data$model <- gsub(" *\\+ *", "+", my.data$model)
x <- read.table(text=my.data, comment.char = ">")