I am trying to create a table that combines features from Formattable with KableExtra. I have found a number of examples which have helped but doesn't quite do everything I'm trying to achieve.
This is what I've tried so far:
library(KableExtra)
library(Formattable)
df <- structure(list(Income_source = c("A", "B", "C", "C"), Jul = c(1777.01,
0.13, 9587.39, 11364.53), Aug = c(0, 0.09, 9908.78, 9908.87),
Sep = c(5374.6, 0.03, 9859.87, 15234.5)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
Example of the Formattable function I'd like to apply. Note the color_tile is applied specifically to each row
formattable(df, lapply(1:nrow(df), function(row) {
area(row, col = 1:nrow(df)) ~ color_tile("transparent", "pink")
}))
The example I found which lets me combine Formattable with KableExtra looks like this:
df %>%
mutate(Jul = formattable::color_tile("transparent", "pink")(Jul),
Aug = formattable::color_tile("transparent", "pink")(Aug),
Sep = formattable::color_tile("transparent", "pink")(Sep)) %>%
select(Name,everything()) %>%
kable("html", escape = F,format.args = list(big.mark = ",",scientific = FALSE)) %>%
kable_classic(full_width = T, html_font = "Cambria") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T)
The problems with this solution is:
1: The color_tile function is applied to columns rather than rows
2: The numeric values drop the commas
The table I'm planning on generating would be updated monthly so that next month the data for October would be presented, followed by November and so forth. As such I'm hoping for a solution that doesn't require me to edit the script each time i.e. mutate the new data. Hopefully that makes sense.
List item
Related
I would like to make a table in R markdown that prints a crosstabulation of two variables and includes the variable name above it and on the left side. Also, I need to print this to a PDF so I require code that is compatible with kable("latex").
Reproducible example:
set.seed(143)
x <- sample(x = c("yes", "no"), size = 20, replace = TRUE)
y <- sample(x = c("yes", "no"), size = 20, replace = TRUE)
table(x,y) %>%
kable("latex") %>%
pack_rows("X", 1, 2) %>%
add_header_above(c(" ", "Y" = 2))
Which gives the following output:
However I would like it to look like this (created in Word for example):
I'm using tbl_regression from the gtsummary package to show the results of cox proportional hazards models. Due to circumstances regarding sensitive personal information, I am not allowed to show strata with a number of observations less than 5. I can, however, still show the estimates, CIs etc. for those strata, but not how many persons have had the event if the number is less than 5. In these number of events strata with less than 5 observations, I would like to insert just a line to indicate this.
From what I have read, the modify_table_body function is perhaps the correct function to achieve this. However, I cannot manage to find out how to use it correctly. Is there any way to define that the regression table should not show N_event less than 5 but still show HRs, CIs, person years ect. for those given stratas?
Below is my preliminary code in which I thought maybe should be followed by "%>% modify_table_body()".
Thank you in advance for your help!
Best,
Mathilde
cox_cat_cns2 <- coxph(Surv(TTD_year, Dod_status) ~ Highest_Edu_Household + Diag_year + Age_household_mom_num + Age_household_dad_num + Country_origin_household, data = data_cox_cat_cns)
cox_cat_cns_adj_table <- tbl_regression(cox_cat_cns2,
label = c(Highest_Edu_Household ~ "Highest parental education",
Diag_year ~ "Year of diagnosis",
Age_household_mom_num ~ "Mother's age at diagnosis",
Age_household_dad_num ~ "Father's age at diagnosis",
Country_origin_household ~ "Parents' country of origin"),
exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
bold_labels() %>%
italicize_levels() %>%
modify_table_styling(
columns = estimate,
rows = reference_row %in% TRUE,
missing_symbol = "Ref.") %>%
modify_footnote(everything() ~ NA, abbreviation = TRUE) %>%
modify_table_styling(
column = p.value,
hide = TRUE) %>%
modify_header(
label = "",
stat_nevent = "**Events (N)**",
exposure ~ "**Person years**")
You can 1. define a new function to "style" the number of events that collapses any counts less than 5 as "<5", then 2. use that function to style the column in the resulting table. Example below!
library(gtsummary)
#> #Uighur
library(survival)
library(dplyr)
style_number5 <- function(x, ...) {
ifelse(
x < 5,
paste0("<", style_number(5, ...)),
style_number(x, ...)
)
}
style_number5(4:6)
#> [1] "<5" "5" "6"
tbl <-
trial %>%
slice(1:45) %>%
coxph(Surv(ttdeath, death) ~ stage, data = .) %>%
tbl_regression(exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
modify_fmt_fun(stat_nevent ~ style_number5)
Created on 2022-04-21 by the reprex package (v2.0.1)
As you can see from the pic below. There is a gap between the note and the context.
The code I used is below. Any ideas to get rid of the gap?
kable(summarize(df.sum, group = "Experiment", test = T, digits = 1, show.NAs = F),
row.names = F, caption = 'Summary Statistics for Treated and Control Groups',
booktabs = T) %>% kable_styling(latex_options = c('striped', 'hold_position')) %>%
footnote(general = 'DM8OZ indicates the daily max 8-hour ozone concentration;
Daily_PM2.5 is the daily average of PM2.5; Tavg is the daily average temperature;
Prcp is the daily accumulated precipitation. The last column in the table represents the testing results of null
hypotheses that the treated and control groups are not statistically different. ',
footnote_as_chunk = T, threeparttable = T, fixed_small_size = T)
I have a file with multiple columns. I am showing two columns in which I am interested two columns
Probe.Set.ID Entrez.Gene
A01157cds_s_at 50682
A03913cds_s_at 29366
A04674cds_s_at 24860 /// 100909612
A07543cds_s_at 24867
A09811cds_s_at 25662
---- ----
A16585cds_s_at 25616
I need to replace /// with "\t"(tab) and the output should be like
A01157cds_s_at;50682
A03913cds_s_at;29366
A04674cds_s_at;24860 100909612
Also, I need to avoid the ones with "---"
Here is slightly more different approach using dplyr:
data <- data.frame(Probe.Set.ID = c("A01157cds_s_at",
"A03913cds_s_at",
"A04674cds_s_at",
"A07543cds_s_at",
"A09811cds_s_at",
"----",
"A16585cds_s_at"),
Entrez.Gene = c("50682",
"29366",
"24860 /// 100909612",
"24867",
"25662",
"----",
"25616")
)
if(!require(dplyr)) install.packages("dplyr")
library(dplyr)
data %>%
filter(Entrez.Gene != "----") %>%
mutate(new_column = paste(Probe.Set.ID,
gsub("///", "\t", Entrez.Gene),
sep = ";"
)
) %>% select(new_column)
Looks like you will want to subset the data, then paste the two columns together, then use gsub to make the replace the '///'. Here is what I came up with, with dat being the dataframe containing the two columns.
dat = dat[dat$Probe.Set.ID != "----",] # removes the rows with "---"
dat = paste0(dat$Probe.Set.ID, ";", dat$Entrez.Gene) # pastes the columns together and adds the ";"
dat = gsub("///","\t",dat) # replaces the "///" with a tab
Also, use cat() to view the tab as opposed to "\t". I got that from here: How to replace specific characters of a string with tab in R. This will output a list as opposed to a data.frame. You can convert back with data.frame(), but then you cannot use cat() to view.
We can use dplyr and tidyr here.
library(dplyr)
library(tidyr)
> df <- data.frame(
col1 = c('A01157cds_s_at', 'A03913cds_s_at', 'A04674cds_s_at', 'A07543cds_s_at', '----'),
col2 = c('50682', '29366', '24860 /// 100909612', '24867', '----'))
> df %>% filter(col1 != '----') %>%
separate(col2, c('col2_first', 'col2_second'), '///', remove = T) %>%
unite(col1_new, c(col1, col2_first), sep = ';', remove = T)
> df
## col1_new col2_second
## 1 A01157cds_s_at;50682 <NA>
## 2 A03913cds_s_at;29366 <NA>
## 3 A04674cds_s_at;24860 100909612
## 4 A07543cds_s_at;24867 <NA>
filter removes the observations with col1 == '----'.
separate splits col2 into two columns, namely col2_first and col2_second
unite concatenates col1 and col2_first with ; as separator.
I have a CSV file like
Market,CampaignName,Identity
Wells Fargo,Gary IN MetroChicago IL Metro,56
EMC,Los Angeles CA MetroBoston MA Metro,78
Apple,Cupertino CA Metro,68
Desired Output to a CSV file with the first row as the headers
Market,City,State,Identity
Wells Fargo,Gary,IN,56
Wells Fargo,Chicago,IL,56
EMC,Los Angeles,CA,78
EMC,Boston,MA,78
Apple,Cupertino,CA,68
res <-
gsub('(.*) ([A-Z]{2})*Metro (.*) ([A-Z]{2}) .*','\\1,\\2:\\3,\\4',
xx$Market)
How to modify the above regular expressions to get the result in R?
New to R, any help is appreciated.
library(stringr)
xx.to.split <- with(xx, setNames(gsub("Metro", "", as.character(CampaignName)), Market))
do.call(rbind, str_match_all(xx.to.split, "(.+?) ([A-Z]{2}) ?"))[, -1]
Produces:
[,1] [,2]
Wells Fargo "Gary" "IN"
Wells Fargo "Chicago" "IL"
EMC "Los Angeles" "CA"
EMC "Boston" "MA"
Apple "Cupertino" "CA"
This should work even if you have different number of Compaign Names in each market. Unfortunately I think base options are annoying to implement because frustratingly there isn't a gregexec, although I'd be curious if someone comes up with something comparably compact in base.
Here is a solution using base R. Split the CampaignName column on the string Metro adding sequential numbers as names. stack turns it into a data frame with columns ind and values which we massage into DF1. Merge that with xx by the sequence numbers of DF1 and the row numbers of xx. Move Market to the front of DF2 and remove ind and CampaignName. Finally write it out.
xx <- read.csv("Campaign.csv", as.is = TRUE)
s <- strsplit(xx$CampaignName, " Metro")
names(s) <- seq_along(s)
ss <- stack(s)
DF1 <- with(ss, data.frame(ind,
City = sub(" ..$", "", values),
State = sub(".* ", "", values)))
DF2 <- merge(DF1, xx, by.x = "ind", by.y = 0)
DF <- DF2[ c("Market", setdiff(names(DF2), c("ind", "Market", "CampaignName"))) ]
write.csv(DF, file = "myfile.csv", row.names = FALSE, quote = FALSE)
REVISED to handle extra columns after poster modified the question to include such. Minor improvements.