print gtsummary table with print() with wrong footnote format - r-markdown

In my Rmarkdown report, I have to put a gtsummary table under an if statement, which requires print() to explicitly print it. However, the printout does not have the right format for the footnote. What did I miss, and how do I fix this?
Code:
stats_summary_table <-
dat_wide %>%
tbl_summary(by = id,
missing = "no",
digits = list(all_continuous() ~ c(0, 0, 1, 1, 3)),
type = list(all_numeric() ~ "continuous"),
statistic = list(all_continuous() ~
"{min} ~ {max} {mean} ± {sd} [{cv}]")) %>%
modify_footnote(starts_with("stat_") ~ "Range and mean±SD [cv]")
if (n_distinct(dat_wide$id) > 1) {
stats_summary_table <- stats_summary_table %>%
add_stat(
fns = everything() ~ cal_cv,
fmt_fun = NULL,
header = "**%CV**",
footnote = "Coefficient of variation") %>%
add_p()
}
print(stats_summary_table)
implicit printout by stats_summary_table.
explicit printout by print(stats_summary_table).

Related

How to create a crosstab with variable labels for PDF output in R markdown

I would like to make a table in R markdown that prints a crosstabulation of two variables and includes the variable name above it and on the left side. Also, I need to print this to a PDF so I require code that is compatible with kable("latex").
Reproducible example:
set.seed(143)
x <- sample(x = c("yes", "no"), size = 20, replace = TRUE)
y <- sample(x = c("yes", "no"), size = 20, replace = TRUE)
table(x,y) %>%
kable("latex") %>%
pack_rows("X", 1, 2) %>%
add_header_above(c(" ", "Y" = 2))
Which gives the following output:
However I would like it to look like this (created in Word for example):

How to remove specific number of events in a tbl_regression (gtsummary package)

I'm using tbl_regression from the gtsummary package to show the results of cox proportional hazards models. Due to circumstances regarding sensitive personal information, I am not allowed to show strata with a number of observations less than 5. I can, however, still show the estimates, CIs etc. for those strata, but not how many persons have had the event if the number is less than 5. In these number of events strata with less than 5 observations, I would like to insert just a line to indicate this.
From what I have read, the modify_table_body function is perhaps the correct function to achieve this. However, I cannot manage to find out how to use it correctly. Is there any way to define that the regression table should not show N_event less than 5 but still show HRs, CIs, person years ect. for those given stratas?
Below is my preliminary code in which I thought maybe should be followed by "%>% modify_table_body()".
Thank you in advance for your help!
Best,
Mathilde
cox_cat_cns2 <- coxph(Surv(TTD_year, Dod_status) ~ Highest_Edu_Household + Diag_year + Age_household_mom_num + Age_household_dad_num + Country_origin_household, data = data_cox_cat_cns)
cox_cat_cns_adj_table <- tbl_regression(cox_cat_cns2,
label = c(Highest_Edu_Household ~ "Highest parental education",
Diag_year ~ "Year of diagnosis",
Age_household_mom_num ~ "Mother's age at diagnosis",
Age_household_dad_num ~ "Father's age at diagnosis",
Country_origin_household ~ "Parents' country of origin"),
exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
bold_labels() %>%
italicize_levels() %>%
modify_table_styling(
columns = estimate,
rows = reference_row %in% TRUE,
missing_symbol = "Ref.") %>%
modify_footnote(everything() ~ NA, abbreviation = TRUE) %>%
modify_table_styling(
column = p.value,
hide = TRUE) %>%
modify_header(
label = "",
stat_nevent = "**Events (N)**",
exposure ~ "**Person years**")
You can 1. define a new function to "style" the number of events that collapses any counts less than 5 as "<5", then 2. use that function to style the column in the resulting table. Example below!
library(gtsummary)
#> #Uighur
library(survival)
library(dplyr)
style_number5 <- function(x, ...) {
ifelse(
x < 5,
paste0("<", style_number(5, ...)),
style_number(x, ...)
)
}
style_number5(4:6)
#> [1] "<5" "5" "6"
tbl <-
trial %>%
slice(1:45) %>%
coxph(Surv(ttdeath, death) ~ stage, data = .) %>%
tbl_regression(exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
modify_fmt_fun(stat_nevent ~ style_number5)
Created on 2022-04-21 by the reprex package (v2.0.1)

How to exclude standard errors from stargazer table?

Amazing R gurus,
I am just wondering if there is any way to exclude standard errors from stargazer table.
Here is a quick reproducible example:
---
title: "Test regression"
output: html_document
date: "`r format(Sys.time(), '%d %B, %Y')`"
---
```{r setup, echo=FALSE, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(cashe = TRUE)
rm(list=ls())
library(stargazer)
library(ggplot2)
```
```{r, results='asis', echo=FALSE}
fit <- lm(price ~ carat + table + x + y + z, data = diamonds)
stargazer(fit, title="Diamonds Regression",
single.row = TRUE, type ="html", header = FALSE, df=FALSE, digits=2, se = NULL)
```
I would like to see results without standard error like shown in the following screenhsot.
Your time and help is much appreciated.
I just wanted to achieve the same thing, and found the report argument in the stargazer documentation, wich can be used to control the elements shown (and the order) in the output table.
If used like this:
fit <- lm(price ~ carat + table + x + y + z, data = diamonds)
stargazer(fit, title="Diamonds Regression",
single.row = TRUE,
type ="html",
report = "vc*",
header = FALSE,
df=FALSE,
digits=2,
se = NULL
)
It produces the desired output without the need to capture the output first (or any other additional code).
Here is a simple way:
```{r, results='asis', echo=FALSE}
fit <- lm(price ~ carat + table + x + y + z, data = diamonds)
mytab <- capture.output(stargazer(fit, title="Diamonds Regression",
single.row = TRUE, type ="html", header = FALSE, df=FALSE,
digits=2,
apply.se = function(x) { 0 }))
cat(paste(gsub("\\(0.00\\)", "", mytab), collapse = "\n"), "\n")
```
We first capture the output of stargazer and suppress automatic printing. In stargazer we set all standard errors to be 0 (makes the following replacement more failsave). Lastly, we print the output and replace these standard errors.

Table of contents on xaringan slides?

I'm wondering whether it is possible to add a slide showing the table of contents into the document created with xaringan package? Thanks.
Update
Following lines will give you an automated Outline.
Unfortunately I do not know how to automatically reload an R script file in RStudio. If someone knows anything, please feel free to comment or answer in following Question.
The code searches for all the level 1 headers and the Outline header. Then it simply adds the headers as a list and overrides the current script. After reloading the file you have an outline.
---
```{r, echo=FALSE}
require("magrittr")
file_name <- rstudioapi::getSourceEditorContext()[["path"]]
doc <- toc <- readLines(file_name)
tocc <- character()
for (i in 1:length(toc)) {
if(substr(toc[i][1], 1, 2) == "# ") {
toc[i] <- gsub("# ", "", toc[i], fixed = TRUE) %>%
gsub("#", "", ., fixed = TRUE)
tocc <- append(tocc, toc[i])
}
}
tocc <- paste("- ", tocc[-1])
row_outline <- which(doc == "# Outline")
row_body <- which(doc == "---")
row_body <- row_body[which(row_body > row_outline)][1]
doc <- c(doc[1:row_outline], "\n", tocc, "\n", doc[(row_body):length(doc)])
writeLines(doc, file_name)
```
# Outline
---
Old Post
Since Sébastien Rochette's comment did not work for me, I created a very dirty solution with R.
I can think of much nicer solutions but this was a very quick solution. I hope I will update my answer in the near future.
---
```{r, echo=FALSE}
require("magrittr")
toc <- readLines("presentation.Rmd")
tocc <- character()
for (i in 1:length(toc)) {
if(substr(toc[i][1], 1, 2) == "# ") {
toc[i] <- gsub("# ", "", toc[i], fixed = TRUE) %>%
gsub("#", "", ., fixed = TRUE) %>%
paste0(" ", .)
tocc <- append(tocc, toc[i])
}
}
text <- paste(tocc[-1], "\n")
yy <- seq(.9, 0, length = length(text))
```
# Outline
```{r, echo = FALSE}
plot(x = rep(0.2, length(text)), y = yy * 1.035,
xlim = c(0, 1), ylim = c(-0.1, 1), xlab = "", ylab = "", axes = FALSE,
col = "#056EA7", type = "p", pch = 16)
text(x = 0.2, y = yy, labels = text, adj = 0, col = "black")
```
---

Splitting string columns FAST in R

I have a data frame with 107 columns and 745000 rows (much bigger than in my example).
The case is, that I have character type columns which I want to separate, because they seem to contain some type-ish ending at the end of each sequence.
I want to saparate these type-ending parts to new columns.
I have made my own solution, but it seem to be far too slow for iterating through all the 745000 rows 53 times.
So I embed my solution in the following code, with some arbitrary data:
set.seed(1)
code_1 <- paste0(round(runif(5000, 100000, 999999)), "_", round(runif(1000, 1, 15)))
code_2 <- sample(c(paste0(round(runif(10, 100000, 999999)), "_", round(runif(10, 1, 15))), NA), 5000, replace = TRUE)
code_3 <- sample(c(paste0(round(runif(3, 100000, 999999)), "_", round(runif(3, 1, 15))), NA), 5000, replace = TRUE)
code_4 <- sample(c(paste0(round(runif(1, 100000, 999999)), "_", round(runif(1, 1, 15))), NA), 5000, replace = TRUE)
code_type_1 <- rep(NA, 5000)
code_type_2 <- rep(NA, 5000)
code_type_3 <- rep(NA, 5000)
code_type_4 <- rep(NA, 5000)
df <- data.frame(cbind(code_1,
code_2,
code_3,
code_4,
code_type_1,
code_type_2,
code_type_3,
code_type_4),
stringsAsFactors = FALSE)
df_new <- data.frame(code_1 = character(),
code_2 = character(),
code_3 = character(),
code_4 = character(),
code_type_1 = character(),
code_type_2 = character(),
code_type_3 = character(),
code_type_4 = character(),
stringsAsFactors = FALSE)
for (i in 1:4) {
i_t <- i + 4
temp <- strsplit(df[, c(i)], "[_]")
for (j in 1:nrow(df)) {
df_new[c(j), c(i)] <- unlist(temp[j])[1]
df_new[c(j), c(i_t)] <- ifelse(is.na(unlist(temp[j])[1]), NA, unlist(temp[j])[2])
}
print(i)
}
for (i in 1:8) {
df_new[, c(i)] <- factor(df_new[, c(i)])
}
Do anyone have some ideas how to speed things up here?
First we pre-allocate the results data.frame to the desired final length. This is very important; see The R Inferno, Circle 2. Then we vectorize the inner loop. We also use fixed = TRUE and avoid the regex in strsplit.
system.time({
df_new1 <- data.frame(code_1 = character(nrow(df)),
code_2 = character(nrow(df)),
code_3 = character(nrow(df)),
code_4 = character(nrow(df)),
code_type_1 = character(nrow(df)),
code_type_2 = character(nrow(df)),
code_type_3 = character(nrow(df)),
code_type_4 = character(nrow(df)),
stringsAsFactors = FALSE)
for (i in 1:4) {
i_t <- i + 4
temp <- do.call(rbind, strsplit(df[, c(i)], "_", fixed = TRUE))
df_new1[, i] <- temp[,1]
df_new1[, i_t] <- ifelse(is.na(temp[,1]), NA, temp[,2])
}
df_new1[] <- lapply(df_new1, factor)
})
# user system elapsed
# 0.029 0.000 0.029
all.equal(df_new, df_new1)
#[1] TRUE
Of course, there are ways to make this even faster, but this is close to your original approach and should be sufficient.
Here's another way, using gsub inside a custom function in combination with purrr::dmap() - which is equivalent to lapply, but outputs a data.frame instead of a list.
library(purrr)
# Define function which gets rid of everything after and including "_"
replace01 <- function(df, ptrn = "_.*")
dmap(df[,1:4], gsub, pattern = ptrn, replacement = "")
# Because "pattern" is argument we can change it to get 2nd part, then cbind()
test <- cbind(replace01(df),
replace01(df, ptrn = ".*_"))
Note that the output here character columns, you can always convert them to factor if you like.
Another possibility:
setNames(do.call(rbind.data.frame, lapply(1:nrow(df), function(i) {
x <- stri_split_fixed(df[i, 1:4], "_", 2, simplify=TRUE)
y <- c(x[,1], x[,2])
y[y==""] <- NA
y
})), colnames(df)) -> df_new
or
setNames(do.call(rbind.data.frame, lapply(1:nrow(df), function(i) {
x <- stri_split_fixed(df[i, 1:4], "_", 2, simplify=TRUE)
c(x[,1], x[,2])
})), colnames(df)) -> df_new
df_new[df_new==""] <- NA
df_new
which is marginally faster:
Unit: milliseconds
expr min lq mean median uq max neval cld
na_after 669.8357 718.1301 724.8803 723.5521 732.9998 790.1405 10 a
na_inner 719.3362 738.1569 766.4267 762.1594 791.6198 825.0269 10 b