I'm using r-markdown to generate pdf. i want to add a line break or newline to a column in a dataframe where ever the space is there
C1 C2 ID
-----------------
22.6 a-b a
23.5 ba-cd b
24 c-d c
25.3 d-e d
i want output like
C1 ID
------------
22.6 a
a-b
23.5 b
ba-cd
my code
df$C1<- with(df, paste0(C1,'\n', C2))
df$C2<-NULL
kbl(df,booktabs = T,longtable = T,align = c("c", "c", "c", "c")) %>%
kable_styling(latex_options = c("repeat_header"),bootstrap_options = "bordered",font_size = 7,full_width = F)%>%
column_spec(1, width = "8cm")%>%
column_spec(2:4, width = "2.25cm")%>%
row_spec(0, bold = T, color = "white", background = "#008752")
Use the <br/> tag for linebreak -
library(knitr)
library(kableExtra)
df$C1<- with(df, paste0(C1,'<br/>', C2))
df$C2<-NULL
kbl(df, booktabs = T,longtable = T,align = c("c", "c"), escape = F) %>%
kable_styling(latex_options = c("repeat_header"),
bootstrap_options = "bordered",font_size = 7,full_width = F)%>%
column_spec(1, width = "8cm")
I have created a latex table as below using kable in Rmarkdown:
---
output: pdf_document
header-includes:
- \usepackage{xcolor}
---
```{r, message=FALSE, warning=FALSE, echo=FALSE}
library(kableExtra)
library(tidyr)
library(dplyr)
data(iris)
iris %>%
as_tibble %>%
gather(.,key = variable,value = value,-Species) %>%
group_by(Species,variable) %>%
summarise(value=mean(value)) %>%
ungroup %>%
spread(.,key = variable,value = value) %>%
mutate(`Percentage Change`=`Petal.Length`/`Petal.Width`*100) %>%
kable(.,format='latex',
align='c',linesep='',
booktabs=TRUE,escape=FALSE) %>%
add_header_above(.,c(' '=1,'Parts'=4,' '=1),
escape = FALSE) %>%
kable_styling(latex_options = c('striped','HOLD_position','scale_down'))
```
I would like to have the column header "Species" and "Percentage Change" merged with the empty space above them respectively, so that Species can be placed in the middle of the two header rows, while Percentage Change (Petal Length/ Petal Width) can occupy two rows, rather than having a empty row above, and prevent other column to have an empty row below.
Wonder if it can be modified in kable preferably, latex "hack" suggestion is also welcome.
Thanks!
I think for this latex 'hack' solution is much cleaner. In kable also this can be done but that would require changing the data frame (convert column names to row) so that collapse_rows can be used. Anyway, here's the latex way out:
The code that you gave in your questions does not give the column name as in the pdf snapshot. So I edited the code first to get that table:
---
output:
pdf_document:
keep_tex: true
header-includes:
- \usepackage{xcolor}
---
```{r, message=FALSE, warning=FALSE, echo=FALSE}
library(kableExtra)
library(tidyr)
library(dplyr)
data(iris)
iris %>%
as_tibble %>%
gather(.,key = variable,value = value,-Species) %>%
group_by(Species,variable) %>%
summarise(value=mean(value)) %>%
ungroup %>%
spread(.,key = variable,value = value) %>%
mutate('Percentage Change\n(Petal length/ Petal width)'=`Petal.Length`/`Petal.Width`*100) %>%
kable(format='latex',align='c',linesep='',booktabs=TRUE,escape=FALSE,
col.names = linebreak(colnames(.),align = 'c')) %>%
add_header_above(.,c(' '=1,'Parts'=4,' '=1),escape = FALSE) %>%
collapse_rows(columns = c(1,6),valign = 'middle')%>%
kable_styling(latex_options = c('striped','HOLD_position','scale_down'))
```
This gives this:
Note two things in above code:
keep_tex: true: this retains the .tex file generated and can be used to edit.
Use of linebreaks to ensure that the entire column name for last column is not in one line.
Now we make small changes in latex output. In code below the commented out line is the original code generated by kable. This is replaced by the new lines just below the commented out line as indicated.
\begin{table}[H]
\centering
\resizebox{\linewidth}{!}{
\begin{tabular}{cccccc}
\toprule
% \multicolumn{1}{c}{ } & \multicolumn{4}{c}{Parts} & \multicolumn{1}{c}{ } \\
\multirow{2}{*}{Species} & \multicolumn{4}{c}{Parts} & \multirow{2}{*}{\makecell[c]{Percentage Change\\(Petal length/ Petal width)}} \\ % replaced line
\cmidrule(l{3pt}r{3pt}){2-5}
% Species & Petal.Length & Petal.Width & Sepal.Length & Sepal.Width & \makecell[c]{Percentage Change\\(Petal length/ Petal width)}\\
& Petal.Length & Petal.Width & Sepal.Length & Sepal.Width &\\ % replaced line
\midrule
\cellcolor{gray!6}{setosa} & \cellcolor{gray!6}{1.462} & \cellcolor{gray!6}{0.246} & \cellcolor{gray!6}{5.006} & \cellcolor{gray!6}{3.428} & \cellcolor{gray!6}{594.3089}\\
\cmidrule{1-6}
versicolor & 4.260 & 1.326 & 5.936 & 2.770 & 321.2670\\
\cmidrule{1-6}
\cellcolor{gray!6}{virginica} & \cellcolor{gray!6}{5.552} & \cellcolor{gray!6}{2.026} & \cellcolor{gray!6}{6.588} & \cellcolor{gray!6}{2.974} & \cellcolor{gray!6}{274.0375}\\
\bottomrule
\end{tabular}}
\end{table}
This gives the following output:
Sometimes in academic texts one wants to present a Table in which every column has units. It is usual that the units are specified below the column names, like this
|Object |Volume | area | Price |
| |$cm^3$ |$cm^2$ | euros |
|:------------|:-------|--------:|---------:|
|A |3 | 43.36| 567.40|
|B |15 | 43.47| 1000.80|
|C |1 | 42.18| 8.81|
|D |7 | 37.92| 4.72|
How could I achieve this for my bookdown documents?
Thank you in advance.
Here is a way using kableExtra:
```{r}
library(kableExtra)
df <- data.frame(Object = LETTERS[1:5],
Volume = round(runif(5, 1, 20)),
area = rnorm(5, 40, 3),
Price = rnorm(5, 700, 200))
colNames <- names(df)
dfUnits <- c("", "$cm^3$", "$cm^2$", "€")
kable(df, col.names = dfUnits,escape = F, align = "c") %>%
add_header_above(header = colNames, line = F, align = "c")
```
I wish to split strings into non-overlapping segments where the endpoints of a segment are numbers within a field of dots. I can do this using the code below. However, this code seems to be overly complex and involves nested for-loops. Is there a simpler way, ideally using regex in base R?
Here is an example and the desired.result.
my.data <- read.table(text = '
my.string cov1 cov2
11....... 1 A
1.1.2.1.1 2 B
1234..... 3 C
1...2...3 4 C
..3..4... 5 D
', header = TRUE, stringsAsFactors = FALSE)
desired.result <- read.table(text = '
my.string cov1 cov2
11....... 1 A
1.1...... 2 B
..1.2.... 2 B
....2.1.. 2 B
......1.1 2 B
12....... 3 C
.23...... 3 C
..34..... 3 C
1...2.... 4 C
....2...3 4 C
..3..4... 5 D
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
new.data <- data.frame(do.call(rbind, strsplit(my.data$my.string,'')), stringsAsFactors = FALSE)
n.segments <- rowSums(!(new.data[1:ncol(new.data)] == '.')) - 1
my.end.points <- do.call(rbind, gregexpr("[0-9]", my.data$my.string, perl=TRUE))
my.end.point.char <- do.call(rbind, strsplit(my.data$my.string, ""))
my.end.point.char <- t(apply(my.end.point.char, 1, as.numeric))
new.strings <- matrix('.', nrow = sum(n.segments), ncol = max(nchar(my.data$my.string)))
new.cov <- as.data.frame(matrix(NA, nrow = sum(n.segments), ncol = (ncol(my.data) - 1)))
m <- 1
for(i in 1:nrow(new.data)) {
for(j in 1:n.segments[i]) {
for(k in 1:ncol(new.strings)) {
new.strings[m, my.end.points[i, j ] ] <- my.end.point.char[i, my.end.points[i, j ]]
new.strings[m, my.end.points[i, (j+1)] ] <- my.end.point.char[i, my.end.points[i,(j+1)]]
new.cov[m,] <- my.data[i, c(2:ncol(my.data))]
}
m <- m + 1
}
}
my.result <- data.frame(my.string = apply(new.strings, 1, function(x) paste0(x, collapse = '')), stringsAsFactors = FALSE)
my.result <- data.frame(my.result, new.cov)
colnames(my.result) <- names(my.data)
all.equal(desired.result, my.result)
# [1] TRUE
w <- nchar(my.data$my.string[1L]);
dps <- character(w+1L); dps[1L] <- ''; for (i in seq_len(w)) dps[i+1L] <- paste0(dps[i],'.');
x <- Map(my.data$my.string,gregexpr('[^.]',my.data$my.string),f=function(s,g)
if (length(g)<3L) s else sapply(seq_len(length(g)-1L),function(gi)
paste0(dps[g[gi]],substr(s,g[gi],g[gi+1L]),dps[w-g[gi+1L]+1L])
)
);
res <- transform(my.data[rep(seq_len(nrow(my.data)),sapply(x,length)),],my.string=unlist(x));
res;
## my.string cov1 cov2
## 1 11....... 1 A
## 2 1.1...... 2 B
## 2.1 ..1.2.... 2 B
## 2.2 ....2.1.. 2 B
## 2.3 ......1.1 2 B
## 3 12....... 3 C
## 3.1 .23...... 3 C
## 3.2 ..34..... 3 C
## 4 1...2.... 4 C
## 4.1 ....2...3 4 C
## 5 ..3..4... 5 D
Note: You can replace the sapply(x,length) piece with lengths(x) if you have a recent enough version of R.
Benchmarking
library(microbenchmark);
bgoldst <- function(my.data) { w <- nchar(my.data$my.string[1L]); dps <- character(w+1L); dps[1L] <- ''; for (i in seq_len(w)) dps[i+1L] <- paste0(dps[i],'.'); x <- Map(my.data$my.string,gregexpr('[^.]',my.data$my.string),f=function(s,g) if (length(g)<3L) s else sapply(seq_len(length(g)-1L),function(gi) paste0(dps[g[gi]],substr(s,g[gi],g[gi+1L]),dps[w-g[gi+1L]+1L]))); transform(my.data[rep(seq_len(nrow(my.data)),sapply(x,length)),],my.string=unlist(x)); };
rawr <- function(my.data) { f <- function(x, m) { y <- gsub('.', '\\.', x); cs <- attr(m, "capture.start"); cl <- attr(m, "capture.length"); Vectorize(`substr<-`)(y, cs, cl + cs - 1, Vectorize(substr)(x, cs, cl + cs - 1)); }; m <- gregexpr('(?=([0-9][.]*[0-9]))', my.data$my.string, perl = TRUE); strs <- Map(f, my.data$my.string, m); tmp <- `rownames<-`(my.data[rep(1:nrow(my.data), sapply(strs,length)), ], NULL); tmp$my.string <- unlist(strs); tmp; };
carroll <- function(my.data) { strings <- sapply(my.data$my.string, function(x) { stri_match_all_regex(x, "(?=([0-9]{1}\\.*[0-9]{1}))")[[1]][,2]; }); strpos <- lapply(1:length(strings), function(x) { y <- {nchar(sub(perl=T,'^\\.*\\K.*','',my.data$my.string[x]))+c(0, cumsum(nchar(strings[[x]])-1))}; return(y[-length(y)]); }); w <- nchar(my.data$my.string[1L]); output.result <- data.frame(my.string = cbind(unlist(sapply(1:length(strings), function(y) { cbind(sapply(1:length(strings[[y]]), function(x) { leftstr <- paste0(paste0(rep(".", strpos[[y]][[x]]), collapse=""), strings[[y]][x]); rightstr <- paste0(rep(".", w-nchar(leftstr)), collapse=""); paste0(leftstr, rightstr, collapse=""); })); }))), my.data[unlist(sapply(1:length(strings), function(x) { rep(x, sapply(strings, length)[x]); })), c(2,3)], stringsAsFactors=FALSE); row.names(output.result) <- NULL; output.result; };
## OP's sample input
my.data <- read.table(text = '
my.string cov1 cov2
11....... 1 A
1.1.2.1.1 2 B
1234..... 3 C
1...2...3 4 C
..3..4... 5 D
', header = TRUE, stringsAsFactors = FALSE);
ex <- bgoldst(my.data);
all.equal(ex,rawr(my.data),check.attributes=F);
## [1] TRUE
all.equal(ex,carroll(my.data),check.attributes=F);
## [1] TRUE
microbenchmark(bgoldst(my.data),rawr(my.data),carroll(my.data));
## Unit: microseconds
## expr min lq mean median uq max neval
## bgoldst(my.data) 422.094 451.816 483.5305 476.6195 503.775 801.421 100
## rawr(my.data) 1096.502 1160.863 1277.7457 1236.7720 1298.996 3092.785 100
## carroll(my.data) 1130.287 1176.900 1224.6911 1213.2515 1247.249 1525.437 100
## scale test
set.seed(1L);
NR <- 1e4; NS <- 30L; probDot <- 3/4;
x <- c('.',0:9); probs <- c(probDot,rep((1-probDot)/10,10L)); my.data <- data.frame(my.string=do.call(paste0,as.data.frame(replicate(NS,sample(x,NR,T,probs)))),cov1=sample(seq_len(NR)),cov2=sample(make.unique(rep(LETTERS,len=NR))),stringsAsFactors=F);
repeat { w <- which(sapply(gregexpr('[^.]',my.data$my.string),length)==1L); if (length(w)==0L) break; my.data$my.string[w] <- do.call(paste0,as.data.frame(replicate(NS,sample(x,length(w),T,probs)))); }; ## prevent single-digit strings, which rawr and carroll solutions don't support
ex <- bgoldst(my.data);
all.equal(ex,rawr(my.data),check.attributes=F);
## [1] TRUE
all.equal(ex,carroll(my.data),check.attributes=F);
## [1] TRUE
microbenchmark(bgoldst(my.data),rawr(my.data),carroll(my.data),times=1L);
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst(my.data) 904.887 904.887 904.887 904.887 904.887 904.887 1
## rawr(my.data) 2736.462 2736.462 2736.462 2736.462 2736.462 2736.462 1
## carroll(my.data) 108575.001 108575.001 108575.001 108575.001 108575.001 108575.001 1
my.data <- read.table(text = '
my.string cov1 cov2
11....... 1 A
1.1.2.1.1 2 B
1234..... 3 C
1...2...3 4 C
..3..4... 5 D
', header = TRUE, stringsAsFactors = FALSE)
f <- function(x, m) {
if (nchar(gsub('.', '', x, fixed = TRUE)) < 2L) return(x)
y <- gsub('.', '\\.', x)
cs <- attr(m, "capture.start")
cl <- attr(m, "capture.length")
Vectorize(`substr<-`)(y, cs, cl + cs - 1, Vectorize(substr)(x, cs, cl + cs - 1))
}
m <- gregexpr('(?=([0-9][.]*[0-9]))', my.data$my.string, perl = TRUE)
strs <- Map(f, my.data$my.string, m)
tmp <- `rownames<-`(my.data[rep(1:nrow(my.data), lengths(strs)), ], NULL)
tmp$my.string <- unlist(strs)
# my.string cov1 cov2
# 1 11....... 1 A
# 2 1.1...... 2 B
# 3 ..1.2.... 2 B
# 4 ....2.1.. 2 B
# 5 ......1.1 2 B
# 6 12....... 3 C
# 7 .23...... 3 C
# 8 ..34..... 3 C
# 9 1...2.... 4 C
# 10 ....2...3 4 C
# 11 ..3..4... 5 D
identical(tmp, desired.result)
# [1] TRUE
Here's an option. Not clean, but neither is the problem.
library(stringi)
## isolate the strings, allowing overlap via positive lookaheads
strings <- sapply(my.data$my.string, function(x) {
stri_match_all_regex(x, "(?=([0-9]{1}\\.*[0-9]{1}))")[[1]][,2]
})
Identify the offsets at the start of each group.
## identify the . offsets
strpos <- lapply(1:length(strings), function(x) {
y <- {nchar(sub(perl=T,'^\\.*\\K.*','',my.data$my.string[x]))+c(0, cumsum(nchar(strings[[x]])-1))}
return(y[-length(y)])
})
Build up the data.frame with only 2 sapply loops.
## collate the results using sapply
w <- nchar(my.data$my.string[1L]);
output.result <- data.frame(
my.string = cbind(unlist(sapply(1:length(strings), function(y) {
cbind(sapply(1:length(strings[[y]]), function(x) {
leftstr <- paste0(paste0(rep(".", strpos[[y]][[x]]), collapse=""), strings[[y]][x])
rightstr <- paste0(rep(".", w-nchar(leftstr)), collapse="")
paste0(leftstr, rightstr, collapse="")
}))
}))),
my.data[unlist(sapply(1:length(strings), function(x) {
rep(x, sapply(strings, length)[x])
})), c(2,3)], stringsAsFactors=FALSE
)
row.names(output.result) <- NULL
output.result
my.string cov1 cov2
1 11....... 1 A
2 1.1...... 2 B
3 ..1.2.... 2 B
4 ....2.1.. 2 B
5 ......1.1 2 B
6 12....... 3 C
7 .23...... 3 C
8 ..34..... 3 C
9 1...2.... 4 C
10 ....2...3 4 C
11 ..3..4... 5 D
identical(desired.result, output.result)
[1] TRUE
I have
rownames(results.summary)
[1] "2 - 1" "3 - 1" "4 - 1"
What I want is to return a matrix of
2 1
3 1
4 1
The way Ive done it as:
for(i in 1:length(rownames(results.summary)){
current.split <- unlist(strsplit(rownames(results.summary)[i], "-"))
matrix.results$comparison.group[i] <- trim(current.split[1])
matrix.results$control.group[i] <- trim(current.split[2])
}
The trim function basically removes any whitespace on either end.
I've been learning regex and was wondering if there's perhaps a more elegant vectorized solution?
No need to use strsplit, just read it using read.table:
read.table(text=vec,sep='-',strip.white = TRUE) ## see #flodel comment
V1 V2
1 2 1
2 3 1
3 4 1
where vec is :
vec <- c("2 - 1", "3 - 1", "4 - 1")
This should work:
vv <- c("2 - 1", "3 - 1", "4 - 1")
matrix(as.numeric(unlist(strsplit(vv, " - "))), ncol = 2, byrow = TRUE)
# [,1] [,2]
# [1,] 2 1
# [2,] 3 1
# [3,] 4 1
You can also try scan
vec <- c("2 - 1", "3 - 1", "4 - 1")
s <- scan(text = vec, what = integer(), sep = "-", quiet = TRUE)
matrix(s, length(s)/2, byrow = TRUE)
# [,1] [,2]
# [1,] 2 1
# [2,] 3 1
# [3,] 4 1
Another option is cSplit.
library(splitstackshape)
cSplit(data.frame(vec), "vec", sep = " - ", fixed=TRUE)
# vec_1 vec_2
# 1: 2 1
# 2: 3 1
# 3: 4 1
You can use str_match from the package stringr for this:
library(stringr)
##
x <- c("2 - 1","3 - 1","4 - 1")
##
cmat <- str_match(x, "(\\d).+(\\d)")[,-1]
> apply(cmat,2,as.numeric)
[,1] [,2]
[1,] 2 1
[2,] 3 1
[3,] 4 1
Using reshape2 colsplit
library(reshape2)
colsplit(x, " - ", c("A", "B"))
# A B
# 1 2 1
# 2 3 1
# 3 4 1
Or using tidyrs separate
library(tidyr)
separate(data.frame(x), x, c("A", "B"), sep = " - ")
# A B
# 1 2 1
# 2 3 1
# 3 4 1