How to remove specific number of events in a tbl_regression (gtsummary package) - cox-regression

I'm using tbl_regression from the gtsummary package to show the results of cox proportional hazards models. Due to circumstances regarding sensitive personal information, I am not allowed to show strata with a number of observations less than 5. I can, however, still show the estimates, CIs etc. for those strata, but not how many persons have had the event if the number is less than 5. In these number of events strata with less than 5 observations, I would like to insert just a line to indicate this.
From what I have read, the modify_table_body function is perhaps the correct function to achieve this. However, I cannot manage to find out how to use it correctly. Is there any way to define that the regression table should not show N_event less than 5 but still show HRs, CIs, person years ect. for those given stratas?
Below is my preliminary code in which I thought maybe should be followed by "%>% modify_table_body()".
Thank you in advance for your help!
Best,
Mathilde
cox_cat_cns2 <- coxph(Surv(TTD_year, Dod_status) ~ Highest_Edu_Household + Diag_year + Age_household_mom_num + Age_household_dad_num + Country_origin_household, data = data_cox_cat_cns)
cox_cat_cns_adj_table <- tbl_regression(cox_cat_cns2,
label = c(Highest_Edu_Household ~ "Highest parental education",
Diag_year ~ "Year of diagnosis",
Age_household_mom_num ~ "Mother's age at diagnosis",
Age_household_dad_num ~ "Father's age at diagnosis",
Country_origin_household ~ "Parents' country of origin"),
exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
bold_labels() %>%
italicize_levels() %>%
modify_table_styling(
columns = estimate,
rows = reference_row %in% TRUE,
missing_symbol = "Ref.") %>%
modify_footnote(everything() ~ NA, abbreviation = TRUE) %>%
modify_table_styling(
column = p.value,
hide = TRUE) %>%
modify_header(
label = "",
stat_nevent = "**Events (N)**",
exposure ~ "**Person years**")

You can 1. define a new function to "style" the number of events that collapses any counts less than 5 as "<5", then 2. use that function to style the column in the resulting table. Example below!
library(gtsummary)
#> #Uighur
library(survival)
library(dplyr)
style_number5 <- function(x, ...) {
ifelse(
x < 5,
paste0("<", style_number(5, ...)),
style_number(x, ...)
)
}
style_number5(4:6)
#> [1] "<5" "5" "6"
tbl <-
trial %>%
slice(1:45) %>%
coxph(Surv(ttdeath, death) ~ stage, data = .) %>%
tbl_regression(exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
modify_fmt_fun(stat_nevent ~ style_number5)
Created on 2022-04-21 by the reprex package (v2.0.1)

Related

Functions dplyr with rlang::last_error() in purrr::map loop in r

I'm using a function to calculate the length of linestring per cell by ID and store in a list, convert each element of the list into a RasterLayer and turn that list into a RasterStack, average all layers and get a single raster.
#function
# build_length_raster <- function(one_df) {
intersect_list <- by(
one_df ,
one_df$sub_id,
function(subid_df) sf::st_intersection(grid2, subid_df) %>%
dplyr::mutate(length = as.numeric(sf::st_length(.))) %>%
sf::st_drop_geometry()
)
list_length_grid <- purrr::map(intersect_list, function(x)
x %>% dplyr::left_join(x=grid2, by="cell", copy=T) %>%
dplyr::mutate(length=length) %>%
dplyr::mutate_if(is.numeric,coalesce,0)
)
list_length_raster <- purrr::map(list_length_grid, function(x)
raster::rasterize(x, r, field="length", na.rm=F, background=0)
)
list_length_raster2 <- unlist(list_length_raster, recursive=F)
raster_stack <- raster::stack(list_length_raster2)
raster_mean <- raster::stackApply(
raster_stack,
indices = rep(1,nlayers(raster_stack)),
fun = "mean", na.rm = TRUE)
#}
The function presents a step where, in order for the resulting grid of st_intersection() to have the same number of cells as it had initially, I use left_join(by="cell" column).Then I use mutate() to replace the NA's with 0. When I run the function steps for one dataframe from the list, it works perfectly, but when I put it inside map() to do this in a list, I get this error, which seems to refer to the dplyr functions:
final_list <- purrr::map(mylist, build_length_raster)
> rlang::last_error()
<error/rlang_error>
Join columns must be present in data.
x Problem with `cell`.
Backtrace:
1. purrr::map(mylist, build_length_raster)
15. dplyr:::left_join.data.frame(., x = grid, by = "cell", copy = T)
16. dplyr:::join_mutate(...)
17. dplyr:::join_cols(...)
18. dplyr:::standardise_join_by(by, x_names = x_names, y_names = y_names)
19. dplyr:::check_join_vars(by$y, y_names)
Run `rlang::last_trace()` to see the full context.
Is there a way to solved this problem?
MYDATA example
library(tidyverse)
library(sf)
library(purrr)
library(raster)
#data example
id <- c("844", "844", "844", "844", "844","844", "844", "844", "844", "844",
"844", "844", "845", "845", "845", "845", "845","845", "845", "845",
"845","845", "845", "845")
sub_id <- c("2017_844_1", "2017_844_1", "2017_844_1", "2017_844_1", "2017_844_2",
"2017_844_2", "2017_844_2", "2017_844_2", "2017_844_3", "2017_844_3",
"2017_844_3", "2017_844_3", "2017_845_1", "2017_845_1", "2017_845_1",
"2017_845_1", "2017_845_2","2017_845_2", "2017_845_2", "2017_845_2",
"2017_845_3","2017_845_3", "2017_845_3", "2017_845_3")
lat <- c(-30.6456, -29.5648, -27.6667, -31.5587, -30.6934, -29.3147, -23.0538,
-26.5877, -26.6923, -23.40865, -23.1143, -23.28331, -31.6456, -24.5648,
-27.6867, -31.4587, -30.6784, -28.3447, -23.0466, -27.5877, -26.8524,
-23.8855, -24.1143, -23.5874)
long <- c(-50.4879, -49.8715, -51.8716, -50.4456, -50.9842, -51.9787, -41.2343,
-40.2859, -40.19599, -41.64302, -41.58042, -41.55057, -50.4576, -48.8715,
-51.4566, -51.4456, -50.4477, -50.9937, -41.4789, -41.3859, -40.2536,
-41.6502, -40.5442, -41.4057)
df <- tibble(id = as.factor(id), sub_id = as.factor(sub_id), lat, long)
#converting ​to sf
df.sf <- df %>%
​sf::st_as_sf(coords = c("long", "lat"), crs = 4326)
#creating grid
xy <- sf::st_coordinates(df.sf)
grid = sf::st_make_grid(sf::st_bbox(df.sf),
​cellsize = .1, square = FALSE) %>%
​sf::st_as_sf()
#creating raster
r <- raster::raster(grid, res=0.1)
#return grid because raster function changes number of cells
grid2 <- rasterToPolygons(r, na.rm=F) %>%
st_as_sf() %>% mutate(cell=1:ncell(r))
#creating linestring to each sub_id
df.line <- df.sf %>%
dplyr::group_by(sub_id, id) %>%
dplyr::summarize() %>%
sf::st_cast("LINESTRING")
#creating ID list
mylist<- split(df.line, df.line$id)
#separating one dataframe of list to test function
one_df <- df.line[df.line$id=="844",]
one_df$id <- droplevels(one_df$id)
one_df$sub_id <- droplevels(one_df$sub_id)
The specific error is caused because intersect_list has empty items in the list, which cannot be joined because they are empty, and hence have no columns to join by. If you modified the map function to only use non-empty items of intersect_list you would not get that error.
As you noted in the comments, removing the empty list entries with keep(intersect_list, ~ !is.null(.)) before mapping left_join onto the list items will fix the error.
However, I don't think this is the most elegant way to solve this problem. I might misunderstand what the goal is, but if it's to produce a raster from the total length of lines within each grid cell, I think a simpler approach without using purrr might work.
This is not the exact same as your product, but I'm keeping it simpler rn to illustrate an alternate approach. Here is a sum of the lengths in each cell as a stars object (similar to raster but plays better with the tidyverse and sf).
I'm starting off from your objects one_df and grid:
# Turn multiple lines into single MULTILINESTRING:
one_df %>%
st_union() ->
union_df
# Intersection of each grid cell with the MULTILINESTRING geometry:
grid %>%
st_intersection(union_df) ->
grid_lines
# Get lengths:
grid_lines %>%
mutate(length = st_length(x)) %>%
st_drop_geometry() ->
grid_lengths
# Join the calculated lengths back with the spatial grid,
# most of which will have NA for length
grid %>%
left_join(grid_lengths, by = "cell") ->
grid_with_lengths
# Rasterize the length field of the grid
grid_with_lengths %>%
dplyr::select(length) %>%
stars::st_rasterize() ->
length_stars
length_stars %>% mapview::mapview()

Remove_Column from a kable table which will be output as latex/pdf

I am trying to remove two columns from the below table however when knitted as a pdf from markdown I get the below error. I am unsure if this just isn't possible or if there is a work around.
ERROR:
Error in remove_column(., 4:5) :
Removing columns was not implemented for latex kables yet
Calls: ... kable_classic -> kable_light -> kable_styling -> remove_column
Execution halted
OverCortTab <- bind_rows(AlexOverCortTab$table_body,OptOverCortTab$table_body)%>%
mutate(Predictor=str_replace(Predictor,"TAS_Tot","Alexithymia"),
Predictor=str_replace(Predictor,"OPT_Tot","Optimism"))%>%
mutate(Fit=str_replace(Fit,"95%","95\\\\%"),
Fit=str_replace(Fit,"R2","R$2$"),
Fit=lead(Fit,n=2))%>%
filter(Predictor !=""|Fit !="")%>%
kable(caption = "Table 1: Hair Cortisol and Personality",
booktabs=TRUE,
col.names = c("Predictor","$\\beta$","$\\beta$ 95\\% CI","$\\beta$","$\\beta$ 95\\% CI","sr2","sr2 95\\% CI","r","Fit"),
escape=FALSE) %>%
remove_column(4:5)%>%
kable_classic(font_size=12)%>%
footnote(general=c("A Significant $\\beta$-weight indicates the semi-partial correlation is also significant.","$\\beta$ represents unstandardised regression weights.","sr2 represents the semi-partial correlation squared.","Square brackets are used to enclose the lower and upper limits of the confidence interval.","* indicates p < .05", "** indicates p < .01"),escape=FALSE) %>%
group_rows("Alexithymia",1,2,hline_before=TRUE,hline_after=TRUE,underline=TRUE)%>%
group_rows("Optimism",3,4,hline_before=TRUE,hline_after=TRUE,underline=TRUE) ```
Instead of removing the column inside the kbl function, you could try removing it before.
This could work, by piping select:
OverCortTab <- bind_rows(AlexOverCortTab$table_body,OptOverCortTab$table_body)%>%
mutate(Predictor=str_replace(Predictor,"TAS_Tot","Alexithymia"),
Predictor=str_replace(Predictor,"OPT_Tot","Optimism"))%>%
mutate(Fit=str_replace(Fit,"95%","95\\\\%"),
Fit=str_replace(Fit,"R2","R$2$"),
Fit=lead(Fit,n=2))%>%
filter(Predictor !=""|Fit !="")%>%
select(-c(4:5))%>%
kable(caption = "Table 1: Hair Cortisol and Personality",
booktabs=TRUE,
col.names = c("Predictor","$\\beta$","$\\beta$ 95\\% CI","$\\beta$","$\\beta$ 95\\% CI","sr2","sr2 95\\% CI","r","Fit"),
escape=FALSE) %>%
kable_classic(font_size=12)%>%
footnote(general=c("A Significant $\\beta$-weight indicates the semi-partial correlation is also significant.","$\\beta$ represents unstandardised regression weights.","sr2 represents the semi-partial correlation squared.","Square brackets are used to enclose the lower and upper limits of the confidence interval.","* indicates p < .05", "** indicates p < .01"),escape=FALSE) %>%
group_rows("Alexithymia",1,2,hline_before=TRUE,hline_after=TRUE,underline=TRUE)%>%
group_rows("Optimism",3,4,hline_before=TRUE,hline_after=TRUE,underline=TRUE)

print gtsummary table with print() with wrong footnote format

In my Rmarkdown report, I have to put a gtsummary table under an if statement, which requires print() to explicitly print it. However, the printout does not have the right format for the footnote. What did I miss, and how do I fix this?
Code:
stats_summary_table <-
dat_wide %>%
tbl_summary(by = id,
missing = "no",
digits = list(all_continuous() ~ c(0, 0, 1, 1, 3)),
type = list(all_numeric() ~ "continuous"),
statistic = list(all_continuous() ~
"{min} ~ {max} {mean} ± {sd} [{cv}]")) %>%
modify_footnote(starts_with("stat_") ~ "Range and mean±SD [cv]")
if (n_distinct(dat_wide$id) > 1) {
stats_summary_table <- stats_summary_table %>%
add_stat(
fns = everything() ~ cal_cv,
fmt_fun = NULL,
header = "**%CV**",
footnote = "Coefficient of variation") %>%
add_p()
}
print(stats_summary_table)
implicit printout by stats_summary_table.
explicit printout by print(stats_summary_table).

Reactive not displaying appropriate graphs with working data filtering

server code:
silver_state <- fread("./Data/silver_state.csv")
silver <- silver_state %>% arrange(total_drug_cost)
state_cast <- reactive({
if(input$sort == "alphabetical"){
silver <- silver
}
else if(input$sort == "descending"){
silver <- silver_state %>% arrange(desc(total_drug_cost))
silver$nppes_provider_state <- factor(silver$nppes_provider_state,
levels = silver$nppes_provider_state[order(silver$total_drug_cost)])
}
else{
silver <- silver_state %>% arrange(total_drug_cost)
silver$nppes_provider_state <- factor(silver$nppes_provider_state,
levels = silver$nppes_provider_state[order(silver$total_drug_cost)])
}
})
output$compare <- renderPlot({
ggplot(silver) +
geom_bar(aes(x = nppes_provider_state, y = total_drug_cost), position
= position_stack(reverse = TRUE), stat = "identity") +
coord_flip() +
labs(title = "Total Cost of Drugs per State", y = "Total Drug Cost",
x = "State")
})
}
shinyServer(my.server)
The data filtering runs fine on its own however, it is not passing through the inputs correctly? It has to be something surrounding how we are structuring the reactive function. Could it have anything to do with using multiple tabs? Thank you.
state_cast is not used anywhere and shouldn't really exist. It looks like it's being abused as a side-effect-only function. Just move its contents into renderPlot().
Additionally, you have a silver <- silver that doesn't seem to do anything.
I also recommend you use the Reindent Lines and Reformat Code buttons, because the indentation in the state_cast makes it a bit difficult to read.

gsub in columns value in dataframe

I have a file with multiple columns. I am showing two columns in which I am interested two columns
Probe.Set.ID Entrez.Gene
A01157cds_s_at 50682
A03913cds_s_at 29366
A04674cds_s_at 24860 /// 100909612
A07543cds_s_at 24867
A09811cds_s_at 25662
---- ----
A16585cds_s_at 25616
I need to replace /// with "\t"(tab) and the output should be like
A01157cds_s_at;50682
A03913cds_s_at;29366
A04674cds_s_at;24860 100909612
Also, I need to avoid the ones with "---"
Here is slightly more different approach using dplyr:
data <- data.frame(Probe.Set.ID = c("A01157cds_s_at",
"A03913cds_s_at",
"A04674cds_s_at",
"A07543cds_s_at",
"A09811cds_s_at",
"----",
"A16585cds_s_at"),
Entrez.Gene = c("50682",
"29366",
"24860 /// 100909612",
"24867",
"25662",
"----",
"25616")
)
if(!require(dplyr)) install.packages("dplyr")
library(dplyr)
data %>%
filter(Entrez.Gene != "----") %>%
mutate(new_column = paste(Probe.Set.ID,
gsub("///", "\t", Entrez.Gene),
sep = ";"
)
) %>% select(new_column)
Looks like you will want to subset the data, then paste the two columns together, then use gsub to make the replace the '///'. Here is what I came up with, with dat being the dataframe containing the two columns.
dat = dat[dat$Probe.Set.ID != "----",] # removes the rows with "---"
dat = paste0(dat$Probe.Set.ID, ";", dat$Entrez.Gene) # pastes the columns together and adds the ";"
dat = gsub("///","\t",dat) # replaces the "///" with a tab
Also, use cat() to view the tab as opposed to "\t". I got that from here: How to replace specific characters of a string with tab in R. This will output a list as opposed to a data.frame. You can convert back with data.frame(), but then you cannot use cat() to view.
We can use dplyr and tidyr here.
library(dplyr)
library(tidyr)
> df <- data.frame(
col1 = c('A01157cds_s_at', 'A03913cds_s_at', 'A04674cds_s_at', 'A07543cds_s_at', '----'),
col2 = c('50682', '29366', '24860 /// 100909612', '24867', '----'))
> df %>% filter(col1 != '----') %>%
separate(col2, c('col2_first', 'col2_second'), '///', remove = T) %>%
unite(col1_new, c(col1, col2_first), sep = ';', remove = T)
> df
## col1_new col2_second
## 1 A01157cds_s_at;50682 <NA>
## 2 A03913cds_s_at;29366 <NA>
## 3 A04674cds_s_at;24860 100909612
## 4 A07543cds_s_at;24867 <NA>
filter removes the observations with col1 == '----'.
separate splits col2 into two columns, namely col2_first and col2_second
unite concatenates col1 and col2_first with ; as separator.