Knitting Rmarkdown to have AIC round to one's place while rounding regression coefficients and SE to tenth's place - r-markdown

I use the following function in the setup of Rmarkdown to make it so that in knitting everything rounds to two decimal places, but how can I alter the code to create a conditional such that for AIC (x>1000, for instance) it will round to the one's place?
Thanks!
Minimal reproducible example using mtcars data set. Looking at the effect of car weight on mpg with a random factor of cylinder. Make sure to knit everything below in Rmarkdown...if you just use the code in R, it will round the AIC.
---
output:
pdf_document: default
---
```{r echo = FALSE, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(scientific=FALSE)
scientific=FALSE
# knitr::clean_cache()
options(digits=3)
library(tidyverse)
library(lme4)
inline_hook <- function (x) {
if (is.numeric(x)) {
# ifelse does a vectorized comparison
# If integer, print without decimal; otherwise print two places
res <- ifelse(x == round(x),
sprintf("%d", x),
sprintf("%.3f", x)
)
paste(res, collapse = ", ")
}
}
knitr::knit_hooks$set(inline = inline_hook)
mpg <- lmer(mpg~wt + (1|cyl), mtcars, na.action = 'na.exclude', control = lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE)
AIC(logLik(mpg))
coef(summary(mpg))[2]
```
AIC = `r AIC(logLik(mpg))`
Effect size = `r coef(summary(mpg))[2]`

Related

How to knit out table codes into table in R markdown

I am a basic-level learner of R. I am having a problem knitting out tables with a code my professor designed for the students. The code for table designs is set as below. I put this in my R markdown as below.
```{r, results="hide", message=FALSE, warning = FALSE, error = FALSE}
## my style latex summary of regression
jhp_report <- function(...){
output <- capture.output(stargazer(..., omit.stat=c("f", "ser")))
# The first three lines are the ones we want to remove...
output <- output[4:length(output)]
# cat out the results - this is essentially just what stargazer does too
cat(paste(output, collapse = "\n"), "\n")
}
```
After this, I tried printing this out with knitr.
```{r, message=FALSE, warning = FALSE, error = FALSE}
set.seed(1973)
N <- 100
x <- runif(N, 6, 20)
D <- rbinom(N, 1, .5)
t <- 1 + 0.5*x - .4*D + rnorm(N)
df.lm <- data.frame(y = y, x =x, D =D)
df.lm$D <- factor(df.lm$D, labels = c('Male', 'Female'))
##REGRESSION
reg.parallel <- lm(y ~ x + D, data = df.lm)
jhp_report(reg.parallel, title = "Result", label = "tab:D", dep.var.labels = "$y$")
```
As a result, instead of a table, it keeps on showing only the pure codes. I would like to know how I have to set up R markdown for it to print out the table instead of the codes. This is how the result looks like when I knit it.
I expected that there must be some setup options to print the table out. But I couldn't find the right one. Also, my assignment for class requires students to use this code. I did find other options like knitr::kable but I would like to use the given code for this assignment.
Thank you in advance!

How to remove specific number of events in a tbl_regression (gtsummary package)

I'm using tbl_regression from the gtsummary package to show the results of cox proportional hazards models. Due to circumstances regarding sensitive personal information, I am not allowed to show strata with a number of observations less than 5. I can, however, still show the estimates, CIs etc. for those strata, but not how many persons have had the event if the number is less than 5. In these number of events strata with less than 5 observations, I would like to insert just a line to indicate this.
From what I have read, the modify_table_body function is perhaps the correct function to achieve this. However, I cannot manage to find out how to use it correctly. Is there any way to define that the regression table should not show N_event less than 5 but still show HRs, CIs, person years ect. for those given stratas?
Below is my preliminary code in which I thought maybe should be followed by "%>% modify_table_body()".
Thank you in advance for your help!
Best,
Mathilde
cox_cat_cns2 <- coxph(Surv(TTD_year, Dod_status) ~ Highest_Edu_Household + Diag_year + Age_household_mom_num + Age_household_dad_num + Country_origin_household, data = data_cox_cat_cns)
cox_cat_cns_adj_table <- tbl_regression(cox_cat_cns2,
label = c(Highest_Edu_Household ~ "Highest parental education",
Diag_year ~ "Year of diagnosis",
Age_household_mom_num ~ "Mother's age at diagnosis",
Age_household_dad_num ~ "Father's age at diagnosis",
Country_origin_household ~ "Parents' country of origin"),
exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
bold_labels() %>%
italicize_levels() %>%
modify_table_styling(
columns = estimate,
rows = reference_row %in% TRUE,
missing_symbol = "Ref.") %>%
modify_footnote(everything() ~ NA, abbreviation = TRUE) %>%
modify_table_styling(
column = p.value,
hide = TRUE) %>%
modify_header(
label = "",
stat_nevent = "**Events (N)**",
exposure ~ "**Person years**")
You can 1. define a new function to "style" the number of events that collapses any counts less than 5 as "<5", then 2. use that function to style the column in the resulting table. Example below!
library(gtsummary)
#> #Uighur
library(survival)
library(dplyr)
style_number5 <- function(x, ...) {
ifelse(
x < 5,
paste0("<", style_number(5, ...)),
style_number(x, ...)
)
}
style_number5(4:6)
#> [1] "<5" "5" "6"
tbl <-
trial %>%
slice(1:45) %>%
coxph(Surv(ttdeath, death) ~ stage, data = .) %>%
tbl_regression(exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
modify_fmt_fun(stat_nevent ~ style_number5)
Created on 2022-04-21 by the reprex package (v2.0.1)

Functions dplyr with rlang::last_error() in purrr::map loop in r

I'm using a function to calculate the length of linestring per cell by ID and store in a list, convert each element of the list into a RasterLayer and turn that list into a RasterStack, average all layers and get a single raster.
#function
# build_length_raster <- function(one_df) {
intersect_list <- by(
one_df ,
one_df$sub_id,
function(subid_df) sf::st_intersection(grid2, subid_df) %>%
dplyr::mutate(length = as.numeric(sf::st_length(.))) %>%
sf::st_drop_geometry()
)
list_length_grid <- purrr::map(intersect_list, function(x)
x %>% dplyr::left_join(x=grid2, by="cell", copy=T) %>%
dplyr::mutate(length=length) %>%
dplyr::mutate_if(is.numeric,coalesce,0)
)
list_length_raster <- purrr::map(list_length_grid, function(x)
raster::rasterize(x, r, field="length", na.rm=F, background=0)
)
list_length_raster2 <- unlist(list_length_raster, recursive=F)
raster_stack <- raster::stack(list_length_raster2)
raster_mean <- raster::stackApply(
raster_stack,
indices = rep(1,nlayers(raster_stack)),
fun = "mean", na.rm = TRUE)
#}
The function presents a step where, in order for the resulting grid of st_intersection() to have the same number of cells as it had initially, I use left_join(by="cell" column).Then I use mutate() to replace the NA's with 0. When I run the function steps for one dataframe from the list, it works perfectly, but when I put it inside map() to do this in a list, I get this error, which seems to refer to the dplyr functions:
final_list <- purrr::map(mylist, build_length_raster)
> rlang::last_error()
<error/rlang_error>
Join columns must be present in data.
x Problem with `cell`.
Backtrace:
1. purrr::map(mylist, build_length_raster)
15. dplyr:::left_join.data.frame(., x = grid, by = "cell", copy = T)
16. dplyr:::join_mutate(...)
17. dplyr:::join_cols(...)
18. dplyr:::standardise_join_by(by, x_names = x_names, y_names = y_names)
19. dplyr:::check_join_vars(by$y, y_names)
Run `rlang::last_trace()` to see the full context.
Is there a way to solved this problem?
MYDATA example
library(tidyverse)
library(sf)
library(purrr)
library(raster)
#data example
id <- c("844", "844", "844", "844", "844","844", "844", "844", "844", "844",
"844", "844", "845", "845", "845", "845", "845","845", "845", "845",
"845","845", "845", "845")
sub_id <- c("2017_844_1", "2017_844_1", "2017_844_1", "2017_844_1", "2017_844_2",
"2017_844_2", "2017_844_2", "2017_844_2", "2017_844_3", "2017_844_3",
"2017_844_3", "2017_844_3", "2017_845_1", "2017_845_1", "2017_845_1",
"2017_845_1", "2017_845_2","2017_845_2", "2017_845_2", "2017_845_2",
"2017_845_3","2017_845_3", "2017_845_3", "2017_845_3")
lat <- c(-30.6456, -29.5648, -27.6667, -31.5587, -30.6934, -29.3147, -23.0538,
-26.5877, -26.6923, -23.40865, -23.1143, -23.28331, -31.6456, -24.5648,
-27.6867, -31.4587, -30.6784, -28.3447, -23.0466, -27.5877, -26.8524,
-23.8855, -24.1143, -23.5874)
long <- c(-50.4879, -49.8715, -51.8716, -50.4456, -50.9842, -51.9787, -41.2343,
-40.2859, -40.19599, -41.64302, -41.58042, -41.55057, -50.4576, -48.8715,
-51.4566, -51.4456, -50.4477, -50.9937, -41.4789, -41.3859, -40.2536,
-41.6502, -40.5442, -41.4057)
df <- tibble(id = as.factor(id), sub_id = as.factor(sub_id), lat, long)
#converting ​to sf
df.sf <- df %>%
​sf::st_as_sf(coords = c("long", "lat"), crs = 4326)
#creating grid
xy <- sf::st_coordinates(df.sf)
grid = sf::st_make_grid(sf::st_bbox(df.sf),
​cellsize = .1, square = FALSE) %>%
​sf::st_as_sf()
#creating raster
r <- raster::raster(grid, res=0.1)
#return grid because raster function changes number of cells
grid2 <- rasterToPolygons(r, na.rm=F) %>%
st_as_sf() %>% mutate(cell=1:ncell(r))
#creating linestring to each sub_id
df.line <- df.sf %>%
dplyr::group_by(sub_id, id) %>%
dplyr::summarize() %>%
sf::st_cast("LINESTRING")
#creating ID list
mylist<- split(df.line, df.line$id)
#separating one dataframe of list to test function
one_df <- df.line[df.line$id=="844",]
one_df$id <- droplevels(one_df$id)
one_df$sub_id <- droplevels(one_df$sub_id)
The specific error is caused because intersect_list has empty items in the list, which cannot be joined because they are empty, and hence have no columns to join by. If you modified the map function to only use non-empty items of intersect_list you would not get that error.
As you noted in the comments, removing the empty list entries with keep(intersect_list, ~ !is.null(.)) before mapping left_join onto the list items will fix the error.
However, I don't think this is the most elegant way to solve this problem. I might misunderstand what the goal is, but if it's to produce a raster from the total length of lines within each grid cell, I think a simpler approach without using purrr might work.
This is not the exact same as your product, but I'm keeping it simpler rn to illustrate an alternate approach. Here is a sum of the lengths in each cell as a stars object (similar to raster but plays better with the tidyverse and sf).
I'm starting off from your objects one_df and grid:
# Turn multiple lines into single MULTILINESTRING:
one_df %>%
st_union() ->
union_df
# Intersection of each grid cell with the MULTILINESTRING geometry:
grid %>%
st_intersection(union_df) ->
grid_lines
# Get lengths:
grid_lines %>%
mutate(length = st_length(x)) %>%
st_drop_geometry() ->
grid_lengths
# Join the calculated lengths back with the spatial grid,
# most of which will have NA for length
grid %>%
left_join(grid_lengths, by = "cell") ->
grid_with_lengths
# Rasterize the length field of the grid
grid_with_lengths %>%
dplyr::select(length) %>%
stars::st_rasterize() ->
length_stars
length_stars %>% mapview::mapview()

How can I split a table so that it appears side by side in R markdown?

I'm writing a document with R markdown and I'd like to put a table. The problem is that this table only has two columns and takes a full page, which is not very beautiful. So my question is : is there a way to split this table in two and to place the two "sub-tables" side by side with only one caption ?
I use the kable command and I tried this solution (How to split kable over multiple columns?) but I could not do the cbind() command.
Here's my code to create the table :
---
title:
author:
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: pdf_document
indent: true
header-includes:
- \usepackage{indentfirst}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, echo = FALSE}
kable(aerop2, format = "markdown")
```
where aerop2 is my data frame with a list of country names in column 1 and the number of airports in each of these countries in column 2.
I have a long two-column table which is a waste of space. I would like to split this table in two sub-tables and put these sub-tables side by side with a caption that includes both of them.
This doesn't give a lot of flexibility in spacing, but here's one way to do it. I'm using the mtcars dataset as an example because I don't have aerop2.
---
output: pdf_document
indent: true
header-includes:
- \usepackage{indentfirst}
- \usepackage{booktabs}
---
```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(echo = TRUE)
```
The data are in Table \ref{tab:tables}, which will float to the top of the page.
```{r echo = FALSE}
rows <- seq_len(nrow(mtcars) %/% 2)
kable(list(mtcars[rows,1:2],
matrix(numeric(), nrow=0, ncol=1),
mtcars[-rows, 1:2]),
caption = "This is the caption.",
label = "tables", format = "latex", booktabs = TRUE)
```
This gives:
Note that without that zero-row matrix, the two parts are closer together. To increase the spacing more, put extra copies of the zero-row matrix into
the list.
The solution offered by 'user2554330' was very useful.
As I needed to split in more columns and eventually more sections, I further developed the idea.
I also needed to have the tables after the text, not floating to the top. I found a way using kableExtra::kable_styling(latex_options = "hold_position").
I am writing here to share the development and to ask minor questions.
1 - Why did you add the line - \usepackage{indentfirst}?
2 - What is the effect of label = "tables" as kable() input?
(The questions are related to Latex. I probably know to little to understand the explanation in kable() documentation: "label - The table reference label"!)
---
title: "Test-split.print"
header-includes:
- \usepackage{booktabs}
output:
pdf_document: default
html_document:
df_print: paged
---
```{r setup, include=FALSE}
suppressPackageStartupMessages(library(tidyverse))
library(knitr)
library(kableExtra)
split.print <- function(x, cols = 2, sects = 1, spaces = 1, caption = "", label = ""){
if (cols < 1) stop("cols must be GT 1!")
if (sects < 1) stop("sects must be GT 1!")
rims <- nrow(x) %% sects
nris <- (rep(nrow(x) %/% sects, sects) + c(rep(1, rims), rep(0, sects-rims))) %>%
cumsum() %>%
c(0, .)
for(s in 1:sects){
xs <- x[(nris[s]+1):nris[s+1], ]
rimc <- nrow(xs) %% cols
nric <- (rep(nrow(xs) %/% cols, cols) + c(rep(1, rimc), rep(0, cols-rimc))) %>%
cumsum() %>%
c(0, .)
lst <- NULL
spc <- NULL
for(sp in 1:spaces) spc <- c(spc, list(matrix(numeric(), nrow=0, ncol=1)))
for(c in 1:cols){
lst <- c(lst, list(xs[(nric[c]+1):nric[c+1], ]))
if (cols > 1 & c < cols) lst <- c(lst, spc)
}
kable(lst,
caption = ifelse(sects == 1, caption, paste0(caption, " (", s, "/", sects, ")")),
label = "tables", format = "latex", booktabs = TRUE) %>%
kable_styling(latex_options = "hold_position") %>%
print()
}
}
```
```{r, results='asis'}
airquality %>%
select(1:3) %>%
split.print(cols = 3, sects = 2, caption = "multi page table")
```

How to exclude standard errors from stargazer table?

Amazing R gurus,
I am just wondering if there is any way to exclude standard errors from stargazer table.
Here is a quick reproducible example:
---
title: "Test regression"
output: html_document
date: "`r format(Sys.time(), '%d %B, %Y')`"
---
```{r setup, echo=FALSE, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(cashe = TRUE)
rm(list=ls())
library(stargazer)
library(ggplot2)
```
```{r, results='asis', echo=FALSE}
fit <- lm(price ~ carat + table + x + y + z, data = diamonds)
stargazer(fit, title="Diamonds Regression",
single.row = TRUE, type ="html", header = FALSE, df=FALSE, digits=2, se = NULL)
```
I would like to see results without standard error like shown in the following screenhsot.
Your time and help is much appreciated.
I just wanted to achieve the same thing, and found the report argument in the stargazer documentation, wich can be used to control the elements shown (and the order) in the output table.
If used like this:
fit <- lm(price ~ carat + table + x + y + z, data = diamonds)
stargazer(fit, title="Diamonds Regression",
single.row = TRUE,
type ="html",
report = "vc*",
header = FALSE,
df=FALSE,
digits=2,
se = NULL
)
It produces the desired output without the need to capture the output first (or any other additional code).
Here is a simple way:
```{r, results='asis', echo=FALSE}
fit <- lm(price ~ carat + table + x + y + z, data = diamonds)
mytab <- capture.output(stargazer(fit, title="Diamonds Regression",
single.row = TRUE, type ="html", header = FALSE, df=FALSE,
digits=2,
apply.se = function(x) { 0 }))
cat(paste(gsub("\\(0.00\\)", "", mytab), collapse = "\n"), "\n")
```
We first capture the output of stargazer and suppress automatic printing. In stargazer we set all standard errors to be 0 (makes the following replacement more failsave). Lastly, we print the output and replace these standard errors.