Stop knitr from stripping padding spaces in kable

Stop knitr from stripping padding spaces in kable - r-markdown

I have the following table that is generated from this reprex:
---
output:
pdf_document:
keep_tex: true
---
```{r, include = FALSE}
library(tidyverse)
library(knitr)
library(kableExtra)
```
```{r}
df <- data_frame(
level = c("Level 1", "Level 2", "Level 3", "Level 4"),
cat1_1 = c("39.5", "25.1", "28.9", "6.6"),
cat2_1 = c("37.7", "26.1", "30.0", "6.2"),
cat3_1 = c("30.3", "23.3", "29.7", "16.7"),
cat1_2 = c("58.7", "29.1", "9.9", "2.3"),
cat2_2 = c("56.4", "30.3", "10.7", "2.5"),
cat3_2 = c("43.6", "31.4", "16.8", "8.1")
)
kable(df, format = "latex", align = "c", booktabs = TRUE, escape = TRUE,
col.names = c("Level", "Cat1", "Cat2", "Cat3", "Cat4", "Cat5", "Cat6"),
caption = "Percentage of Respondents in Each Category by Level and Group") %>%
kable_styling(latex_options = "HOLD_position", full_width = TRUE) %>%
column_spec(1, width = "12em") %>%
row_spec(0, bold = TRUE, align = "c") %>%
add_header_above(c(" " = 1, "Group 1 (%)" = 3,
"Group 2 (%)" = 3), bold = TRUE)
```
What I desire is to have the columns, centered but aligned on the decimal. I understand from here and here, that the "S" column alignment is not currently supported, which is fine. As a work around, I intended to right align, and then add pad with extra spaces on the end to push the number back to the center:
---
output:
pdf_document:
keep_tex: true
---
```{r, include = FALSE}
library(tidyverse)
library(knitr)
library(kableExtra)
```
```{r}
df <- data_frame(
level = c("Level 1", "Level 2", "Level 3", "Level 4"),
cat1_1 = c("39.5", "25.1", "28.9", "6.6"),
cat2_1 = c("37.7", "26.1", "30.0", "6.2"),
cat3_1 = c("30.3", "23.3", "29.7", "16.7"),
cat1_2 = c("58.7", "29.1", "9.9", "2.3"),
cat2_2 = c("56.4", "30.3", "10.7", "2.5"),
cat3_2 = c("43.6", "31.4", "16.8", "8.1")
) %>%
mutate_at(vars(contains("_")), funs(paste0(., paste(rep(" ", 4), collapse = ""))))
df
```
```{r}
kable(df, format = "latex", align = c("l", rep("r", 6)), booktabs = TRUE,
col.names = c("Level", "Cat1", "Cat2", "Cat3", "Cat4", "Cat5", "Cat6"),
caption = "Percentage of Respondents in Each Category by Level and Group") %>%
kable_styling(latex_options = "HOLD_position", full_width = TRUE) %>%
column_spec(1, width = "7em") %>%
row_spec(0, bold = TRUE, align = "c") %>%
add_header_above(c(" " = 1, "Group 1 (%)" = 3,
"Group 2 (%)" = 3), bold = TRUE)
```
Now the columns are successfully right aligned, but they have not pushed back to the middle. Looking at the raw Latex we can see why:
\begin{table}[H]
\caption{\label{tab:unnamed-chunk-3}Percentage of Respondents in Each Category by Level and Group}
\centering
\begin{tabu} to \linewidth {>{\raggedright\arraybackslash}p{7em}>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X}
\toprule
\multicolumn{1}{c}{\textbf{ }} & \multicolumn{3}{c}{\textbf{Group 1 (\%)}} & \multicolumn{3}{c}{\textbf{Group 2 (\%)}} \\
\cmidrule(l{3pt}r{3pt}){2-4} \cmidrule(l{3pt}r{3pt}){5-7}
\multicolumn{1}{>{\centering\arraybackslash}p{7em}}{\textbf{Level}} & \multicolumn{1}{c}{\textbf{Cat1}} & \multicolumn{1}{c}{\textbf{Cat2}} & \multicolumn{1}{c}{\textbf{Cat3}} & \multicolumn{1}{c}{\textbf{Cat4}} & \multicolumn{1}{c}{\textbf{Cat5}} & \multicolumn{1}{c}{\textbf{Cat6}}\\
\midrule
Level 1 & 39.5 & 37.7 & 30.3 & 58.7 & 56.4 & 43.6\\
Level 2 & 25.1 & 26.1 & 23.3 & 29.1 & 30.3 & 31.4\\
Level 3 & 28.9 & 30.0 & 29.7 & 9.9 & 10.7 & 16.8\\
Level 4 & 6.6 & 6.2 & 16.7 & 2.3 & 2.5 & 8.1\\
\bottomrule
\end{tabu}
\end{table}
Where the data is entered, the padding spaces have been stripped out. I have tried to escape the trailing spaces:
df <- data_frame(
level = c("Level 1", "Level 2", "Level 3", "Level 4"),
cat1_1 = c("39.5", "25.1", "28.9", "6.6"),
cat2_1 = c("37.7", "26.1", "30.0", "6.2"),
cat3_1 = c("30.3", "23.3", "29.7", "16.7"),
cat1_2 = c("58.7", "29.1", "9.9", "2.3"),
cat2_2 = c("56.4", "30.3", "10.7", "2.5"),
cat3_2 = c("43.6", "31.4", "16.8", "8.1")
) %>%
mutate_at(vars(contains("_")), funs(paste0(., paste(rep(" ", 4), collapse = "")))) %>%
mutate_at(vars(contains("_")), str_replace_all, " ", fixed("\\\\ "))
df
#> # A tibble: 4 x 7
#> level cat1_1 cat2_1 cat3_1 cat1_2 cat2_2 cat3_2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Level… "39.5\\ \… "37.7\\ \… "30.3\\ \… "58.7\\ \… "56.4\\ \… "43.6\\ \…
#> 2 Level… "25.1\\ \… "26.1\\ \… "23.3\\ \… "29.1\\ \… "30.3\\ \… "31.4\\ \…
#> 3 Level… "28.9\\ \… "30.0\\ \… "29.7\\ \… "9.9\\ \\… "10.7\\ \… "16.8\\ \…
#> 4 Level… "6.6\\ \\… "6.2\\ \\… "16.7\\ \… "2.3\\ \\… "2.5\\ \\… "8.1\\ \\…
But I get this error when trying to knit the document:
! Misplaced \noalign.
\bottomrule ->\noalign
{\ifnum 0=`}\fi \#aboverulesep =\aboverulesep \global...
l.202 \end{tabu}
Error: Failed to compile add_space.tex. See add_space.log for more info.
Again, looking at the raw Latex is somewhat informative:
\begin{table}[H]
\caption{\label{tab:unnamed-chunk-3}Percentage of Respondents in Each Category by Level and Group}
\centering
\begin{tabu} to \linewidth {>{\raggedright\arraybackslash}p{7em}>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X}
\toprule
\multicolumn{1}{c}{\textbf{ }} & \multicolumn{3}{c}{\textbf{Group 1 (\%)}} & \multicolumn{3}{c}{\textbf{Group 2 (\%)}} \\
\cmidrule(l{3pt}r{3pt}){2-4} \cmidrule(l{3pt}r{3pt}){5-7}
\multicolumn{1}{>{\centering\arraybackslash}p{7em}}{\textbf{Level}} & \multicolumn{1}{c}{\textbf{Cat1}} & \multicolumn{1}{c}{\textbf{Cat2}} & \multicolumn{1}{c}{\textbf{Cat3}} & \multicolumn{1}{c}{\textbf{Cat4}} & \multicolumn{1}{c}{\textbf{Cat5}} & \multicolumn{1}{c}{\textbf{Cat6}}\\
\midrule
Level 1 & 39.5\ \ \ \ & 37.7\ \ \ \ & 30.3\ \ \ \ & 58.7\ \ \ \ & 56.4\ \ \ \ & 43.6\ \ \ \\\
Level 2 & 25.1\ \ \ \ & 26.1\ \ \ \ & 23.3\ \ \ \ & 29.1\ \ \ \ & 30.3\ \ \ \ & 31.4\ \ \ \\\
Level 3 & 28.9\ \ \ \ & 30.0\ \ \ \ & 29.7\ \ \ \ & 9.9\ \ \ \ & 10.7\ \ \ \ & 16.8\ \ \ \\\
Level 4 & 6.6\ \ \ \ & 6.2\ \ \ \ & 16.7\ \ \ \ & 2.3\ \ \ \ & 2.5\ \ \ \ & 8.1\ \ \ \\\
\bottomrule
\end{tabu}
\end{table}
It appears that the final trailing space is still being truncated, resulting in a line ending of \\\. Is there a way to maintain padding spaces when knitting pdf documents? Or is there a better way to achieve a centered, decimal aligned column?

Related

how to add line break or newline for space in specific columns using kable extra r-markdown?

I'm using r-markdown to generate pdf. i want to add a line break or newline to a column in a dataframe where ever the space is there
C1 C2 ID
-----------------
22.6 a-b a
23.5 ba-cd b
24 c-d c
25.3 d-e d
i want output like
C1 ID
------------
22.6 a
a-b
23.5 b
ba-cd
my code
df$C1<- with(df, paste0(C1,'\n', C2))
df$C2<-NULL
kbl(df,booktabs = T,longtable = T,align = c("c", "c", "c", "c")) %>%
kable_styling(latex_options = c("repeat_header"),bootstrap_options = "bordered",font_size = 7,full_width = F)%>%
column_spec(1, width = "8cm")%>%
column_spec(2:4, width = "2.25cm")%>%
row_spec(0, bold = T, color = "white", background = "#008752")

Use the <br/> tag for linebreak -
library(knitr)
library(kableExtra)
df$C1<- with(df, paste0(C1,'<br/>', C2))
df$C2<-NULL
kbl(df, booktabs = T,longtable = T,align = c("c", "c"), escape = F) %>%
kable_styling(latex_options = c("repeat_header"),
bootstrap_options = "bordered",font_size = 7,full_width = F)%>%
column_spec(1, width = "8cm")

Merging column header in latex table created by kable in Rmarkdown

I have created a latex table as below using kable in Rmarkdown:
---
output: pdf_document
header-includes:
- \usepackage{xcolor}
---
```{r, message=FALSE, warning=FALSE, echo=FALSE}
library(kableExtra)
library(tidyr)
library(dplyr)
data(iris)
iris %>%
as_tibble %>%
gather(.,key = variable,value = value,-Species) %>%
group_by(Species,variable) %>%
summarise(value=mean(value)) %>%
ungroup %>%
spread(.,key = variable,value = value) %>%
mutate(`Percentage Change`=`Petal.Length`/`Petal.Width`*100) %>%
kable(.,format='latex',
align='c',linesep='',
booktabs=TRUE,escape=FALSE) %>%
add_header_above(.,c(' '=1,'Parts'=4,' '=1),
escape = FALSE) %>%
kable_styling(latex_options = c('striped','HOLD_position','scale_down'))
```
I would like to have the column header "Species" and "Percentage Change" merged with the empty space above them respectively, so that Species can be placed in the middle of the two header rows, while Percentage Change (Petal Length/ Petal Width) can occupy two rows, rather than having a empty row above, and prevent other column to have an empty row below.
Wonder if it can be modified in kable preferably, latex "hack" suggestion is also welcome.
Thanks!

I think for this latex 'hack' solution is much cleaner. In kable also this can be done but that would require changing the data frame (convert column names to row) so that collapse_rows can be used. Anyway, here's the latex way out:
The code that you gave in your questions does not give the column name as in the pdf snapshot. So I edited the code first to get that table:
---
output:
pdf_document:
keep_tex: true
header-includes:
- \usepackage{xcolor}
---
```{r, message=FALSE, warning=FALSE, echo=FALSE}
library(kableExtra)
library(tidyr)
library(dplyr)
data(iris)
iris %>%
as_tibble %>%
gather(.,key = variable,value = value,-Species) %>%
group_by(Species,variable) %>%
summarise(value=mean(value)) %>%
ungroup %>%
spread(.,key = variable,value = value) %>%
mutate('Percentage Change\n(Petal length/ Petal width)'=`Petal.Length`/`Petal.Width`*100) %>%
kable(format='latex',align='c',linesep='',booktabs=TRUE,escape=FALSE,
col.names = linebreak(colnames(.),align = 'c')) %>%
add_header_above(.,c(' '=1,'Parts'=4,' '=1),escape = FALSE) %>%
collapse_rows(columns = c(1,6),valign = 'middle')%>%
kable_styling(latex_options = c('striped','HOLD_position','scale_down'))
```
This gives this:
Note two things in above code:
keep_tex: true: this retains the .tex file generated and can be used to edit.
Use of linebreaks to ensure that the entire column name for last column is not in one line.
Now we make small changes in latex output. In code below the commented out line is the original code generated by kable. This is replaced by the new lines just below the commented out line as indicated.
\begin{table}[H]
\centering
\resizebox{\linewidth}{!}{
\begin{tabular}{cccccc}
\toprule
% \multicolumn{1}{c}{ } & \multicolumn{4}{c}{Parts} & \multicolumn{1}{c}{ } \\
\multirow{2}{*}{Species} & \multicolumn{4}{c}{Parts} & \multirow{2}{*}{\makecell[c]{Percentage Change\\(Petal length/ Petal width)}} \\ % replaced line
\cmidrule(l{3pt}r{3pt}){2-5}
% Species & Petal.Length & Petal.Width & Sepal.Length & Sepal.Width & \makecell[c]{Percentage Change\\(Petal length/ Petal width)}\\
& Petal.Length & Petal.Width & Sepal.Length & Sepal.Width &\\ % replaced line
\midrule
\cellcolor{gray!6}{setosa} & \cellcolor{gray!6}{1.462} & \cellcolor{gray!6}{0.246} & \cellcolor{gray!6}{5.006} & \cellcolor{gray!6}{3.428} & \cellcolor{gray!6}{594.3089}\\
\cmidrule{1-6}
versicolor & 4.260 & 1.326 & 5.936 & 2.770 & 321.2670\\
\cmidrule{1-6}
\cellcolor{gray!6}{virginica} & \cellcolor{gray!6}{5.552} & \cellcolor{gray!6}{2.026} & \cellcolor{gray!6}{6.588} & \cellcolor{gray!6}{2.974} & \cellcolor{gray!6}{274.0375}\\
\bottomrule
\end{tabular}}
\end{table}
This gives the following output:

Get a second header with the units of columns

Sometimes in academic texts one wants to present a Table in which every column has units. It is usual that the units are specified below the column names, like this
|Object |Volume | area | Price |
| |$cm^3$ |$cm^2$ | euros |
|:------------|:-------|--------:|---------:|
|A |3 | 43.36| 567.40|
|B |15 | 43.47| 1000.80|
|C |1 | 42.18| 8.81|
|D |7 | 37.92| 4.72|
How could I achieve this for my bookdown documents?
Thank you in advance.

Here is a way using kableExtra:
```{r}
library(kableExtra)
df <- data.frame(Object = LETTERS[1:5],
Volume = round(runif(5, 1, 20)),
area = rnorm(5, 40, 3),
Price = rnorm(5, 700, 200))
colNames <- names(df)
dfUnits <- c("", "$cm^3$", "$cm^2$", "€")
kable(df, col.names = dfUnits,escape = F, align = "c") %>%
add_header_above(header = colNames, line = F, align = "c")
```

split string into non-overlapping segments

I wish to split strings into non-overlapping segments where the endpoints of a segment are numbers within a field of dots. I can do this using the code below. However, this code seems to be overly complex and involves nested for-loops. Is there a simpler way, ideally using regex in base R?
Here is an example and the desired.result.
my.data <- read.table(text = '
my.string cov1 cov2
11....... 1 A
1.1.2.1.1 2 B
1234..... 3 C
1...2...3 4 C
..3..4... 5 D
', header = TRUE, stringsAsFactors = FALSE)
desired.result <- read.table(text = '
my.string cov1 cov2
11....... 1 A
1.1...... 2 B
..1.2.... 2 B
....2.1.. 2 B
......1.1 2 B
12....... 3 C
.23...... 3 C
..34..... 3 C
1...2.... 4 C
....2...3 4 C
..3..4... 5 D
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
new.data <- data.frame(do.call(rbind, strsplit(my.data$my.string,'')), stringsAsFactors = FALSE)
n.segments <- rowSums(!(new.data[1:ncol(new.data)] == '.')) - 1
my.end.points <- do.call(rbind, gregexpr("[0-9]", my.data$my.string, perl=TRUE))
my.end.point.char <- do.call(rbind, strsplit(my.data$my.string, ""))
my.end.point.char <- t(apply(my.end.point.char, 1, as.numeric))
new.strings <- matrix('.', nrow = sum(n.segments), ncol = max(nchar(my.data$my.string)))
new.cov <- as.data.frame(matrix(NA, nrow = sum(n.segments), ncol = (ncol(my.data) - 1)))
m <- 1
for(i in 1:nrow(new.data)) {
for(j in 1:n.segments[i]) {
for(k in 1:ncol(new.strings)) {
new.strings[m, my.end.points[i, j ] ] <- my.end.point.char[i, my.end.points[i, j ]]
new.strings[m, my.end.points[i, (j+1)] ] <- my.end.point.char[i, my.end.points[i,(j+1)]]
new.cov[m,] <- my.data[i, c(2:ncol(my.data))]
}
m <- m + 1
}
}
my.result <- data.frame(my.string = apply(new.strings, 1, function(x) paste0(x, collapse = '')), stringsAsFactors = FALSE)
my.result <- data.frame(my.result, new.cov)
colnames(my.result) <- names(my.data)
all.equal(desired.result, my.result)
# [1] TRUE

w <- nchar(my.data$my.string[1L]);
dps <- character(w+1L); dps[1L] <- ''; for (i in seq_len(w)) dps[i+1L] <- paste0(dps[i],'.');
x <- Map(my.data$my.string,gregexpr('[^.]',my.data$my.string),f=function(s,g)
if (length(g)<3L) s else sapply(seq_len(length(g)-1L),function(gi)
paste0(dps[g[gi]],substr(s,g[gi],g[gi+1L]),dps[w-g[gi+1L]+1L])
)
);
res <- transform(my.data[rep(seq_len(nrow(my.data)),sapply(x,length)),],my.string=unlist(x));
res;
## my.string cov1 cov2
## 1 11....... 1 A
## 2 1.1...... 2 B
## 2.1 ..1.2.... 2 B
## 2.2 ....2.1.. 2 B
## 2.3 ......1.1 2 B
## 3 12....... 3 C
## 3.1 .23...... 3 C
## 3.2 ..34..... 3 C
## 4 1...2.... 4 C
## 4.1 ....2...3 4 C
## 5 ..3..4... 5 D
Note: You can replace the sapply(x,length) piece with lengths(x) if you have a recent enough version of R.
Benchmarking
library(microbenchmark);
bgoldst <- function(my.data) { w <- nchar(my.data$my.string[1L]); dps <- character(w+1L); dps[1L] <- ''; for (i in seq_len(w)) dps[i+1L] <- paste0(dps[i],'.'); x <- Map(my.data$my.string,gregexpr('[^.]',my.data$my.string),f=function(s,g) if (length(g)<3L) s else sapply(seq_len(length(g)-1L),function(gi) paste0(dps[g[gi]],substr(s,g[gi],g[gi+1L]),dps[w-g[gi+1L]+1L]))); transform(my.data[rep(seq_len(nrow(my.data)),sapply(x,length)),],my.string=unlist(x)); };
rawr <- function(my.data) { f <- function(x, m) { y <- gsub('.', '\\.', x); cs <- attr(m, "capture.start"); cl <- attr(m, "capture.length"); Vectorize(`substr<-`)(y, cs, cl + cs - 1, Vectorize(substr)(x, cs, cl + cs - 1)); }; m <- gregexpr('(?=([0-9][.]*[0-9]))', my.data$my.string, perl = TRUE); strs <- Map(f, my.data$my.string, m); tmp <- `rownames<-`(my.data[rep(1:nrow(my.data), sapply(strs,length)), ], NULL); tmp$my.string <- unlist(strs); tmp; };
carroll <- function(my.data) { strings <- sapply(my.data$my.string, function(x) { stri_match_all_regex(x, "(?=([0-9]{1}\\.*[0-9]{1}))")[[1]][,2]; }); strpos <- lapply(1:length(strings), function(x) { y <- {nchar(sub(perl=T,'^\\.*\\K.*','',my.data$my.string[x]))+c(0, cumsum(nchar(strings[[x]])-1))}; return(y[-length(y)]); }); w <- nchar(my.data$my.string[1L]); output.result <- data.frame(my.string = cbind(unlist(sapply(1:length(strings), function(y) { cbind(sapply(1:length(strings[[y]]), function(x) { leftstr <- paste0(paste0(rep(".", strpos[[y]][[x]]), collapse=""), strings[[y]][x]); rightstr <- paste0(rep(".", w-nchar(leftstr)), collapse=""); paste0(leftstr, rightstr, collapse=""); })); }))), my.data[unlist(sapply(1:length(strings), function(x) { rep(x, sapply(strings, length)[x]); })), c(2,3)], stringsAsFactors=FALSE); row.names(output.result) <- NULL; output.result; };
## OP's sample input
my.data <- read.table(text = '
my.string cov1 cov2
11....... 1 A
1.1.2.1.1 2 B
1234..... 3 C
1...2...3 4 C
..3..4... 5 D
', header = TRUE, stringsAsFactors = FALSE);
ex <- bgoldst(my.data);
all.equal(ex,rawr(my.data),check.attributes=F);
## [1] TRUE
all.equal(ex,carroll(my.data),check.attributes=F);
## [1] TRUE
microbenchmark(bgoldst(my.data),rawr(my.data),carroll(my.data));
## Unit: microseconds
## expr min lq mean median uq max neval
## bgoldst(my.data) 422.094 451.816 483.5305 476.6195 503.775 801.421 100
## rawr(my.data) 1096.502 1160.863 1277.7457 1236.7720 1298.996 3092.785 100
## carroll(my.data) 1130.287 1176.900 1224.6911 1213.2515 1247.249 1525.437 100
## scale test
set.seed(1L);
NR <- 1e4; NS <- 30L; probDot <- 3/4;
x <- c('.',0:9); probs <- c(probDot,rep((1-probDot)/10,10L)); my.data <- data.frame(my.string=do.call(paste0,as.data.frame(replicate(NS,sample(x,NR,T,probs)))),cov1=sample(seq_len(NR)),cov2=sample(make.unique(rep(LETTERS,len=NR))),stringsAsFactors=F);
repeat { w <- which(sapply(gregexpr('[^.]',my.data$my.string),length)==1L); if (length(w)==0L) break; my.data$my.string[w] <- do.call(paste0,as.data.frame(replicate(NS,sample(x,length(w),T,probs)))); }; ## prevent single-digit strings, which rawr and carroll solutions don't support
ex <- bgoldst(my.data);
all.equal(ex,rawr(my.data),check.attributes=F);
## [1] TRUE
all.equal(ex,carroll(my.data),check.attributes=F);
## [1] TRUE
microbenchmark(bgoldst(my.data),rawr(my.data),carroll(my.data),times=1L);
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst(my.data) 904.887 904.887 904.887 904.887 904.887 904.887 1
## rawr(my.data) 2736.462 2736.462 2736.462 2736.462 2736.462 2736.462 1
## carroll(my.data) 108575.001 108575.001 108575.001 108575.001 108575.001 108575.001 1

my.data <- read.table(text = '
my.string cov1 cov2
11....... 1 A
1.1.2.1.1 2 B
1234..... 3 C
1...2...3 4 C
..3..4... 5 D
', header = TRUE, stringsAsFactors = FALSE)
f <- function(x, m) {
if (nchar(gsub('.', '', x, fixed = TRUE)) < 2L) return(x)
y <- gsub('.', '\\.', x)
cs <- attr(m, "capture.start")
cl <- attr(m, "capture.length")
Vectorize(`substr<-`)(y, cs, cl + cs - 1, Vectorize(substr)(x, cs, cl + cs - 1))
}
m <- gregexpr('(?=([0-9][.]*[0-9]))', my.data$my.string, perl = TRUE)
strs <- Map(f, my.data$my.string, m)
tmp <- `rownames<-`(my.data[rep(1:nrow(my.data), lengths(strs)), ], NULL)
tmp$my.string <- unlist(strs)
# my.string cov1 cov2
# 1 11....... 1 A
# 2 1.1...... 2 B
# 3 ..1.2.... 2 B
# 4 ....2.1.. 2 B
# 5 ......1.1 2 B
# 6 12....... 3 C
# 7 .23...... 3 C
# 8 ..34..... 3 C
# 9 1...2.... 4 C
# 10 ....2...3 4 C
# 11 ..3..4... 5 D
identical(tmp, desired.result)
# [1] TRUE

Here's an option. Not clean, but neither is the problem.
library(stringi)
## isolate the strings, allowing overlap via positive lookaheads
strings <- sapply(my.data$my.string, function(x) {
stri_match_all_regex(x, "(?=([0-9]{1}\\.*[0-9]{1}))")[[1]][,2]
})
Identify the offsets at the start of each group.
## identify the . offsets
strpos <- lapply(1:length(strings), function(x) {
y <- {nchar(sub(perl=T,'^\\.*\\K.*','',my.data$my.string[x]))+c(0, cumsum(nchar(strings[[x]])-1))}
return(y[-length(y)])
})
Build up the data.frame with only 2 sapply loops.
## collate the results using sapply
w <- nchar(my.data$my.string[1L]);
output.result <- data.frame(
my.string = cbind(unlist(sapply(1:length(strings), function(y) {
cbind(sapply(1:length(strings[[y]]), function(x) {
leftstr <- paste0(paste0(rep(".", strpos[[y]][[x]]), collapse=""), strings[[y]][x])
rightstr <- paste0(rep(".", w-nchar(leftstr)), collapse="")
paste0(leftstr, rightstr, collapse="")
}))
}))),
my.data[unlist(sapply(1:length(strings), function(x) {
rep(x, sapply(strings, length)[x])
})), c(2,3)], stringsAsFactors=FALSE
)
row.names(output.result) <- NULL
output.result
my.string cov1 cov2
1 11....... 1 A
2 1.1...... 2 B
3 ..1.2.... 2 B
4 ....2.1.. 2 B
5 ......1.1 2 B
6 12....... 3 C
7 .23...... 3 C
8 ..34..... 3 C
9 1...2.... 4 C
10 ....2...3 4 C
11 ..3..4... 5 D
identical(desired.result, output.result)
[1] TRUE

Split a vector of strings over a character to return a matrix

I have
rownames(results.summary)
[1] "2 - 1" "3 - 1" "4 - 1"
What I want is to return a matrix of
2 1
3 1
4 1
The way Ive done it as:
for(i in 1:length(rownames(results.summary)){
current.split <- unlist(strsplit(rownames(results.summary)[i], "-"))
matrix.results$comparison.group[i] <- trim(current.split[1])
matrix.results$control.group[i] <- trim(current.split[2])
}
The trim function basically removes any whitespace on either end.
I've been learning regex and was wondering if there's perhaps a more elegant vectorized solution?

No need to use strsplit, just read it using read.table:
read.table(text=vec,sep='-',strip.white = TRUE) ## see #flodel comment
V1 V2
1 2 1
2 3 1
3 4 1
where vec is :
vec <- c("2 - 1", "3 - 1", "4 - 1")

This should work:
vv <- c("2 - 1", "3 - 1", "4 - 1")
matrix(as.numeric(unlist(strsplit(vv, " - "))), ncol = 2, byrow = TRUE)
# [,1] [,2]
# [1,] 2 1
# [2,] 3 1
# [3,] 4 1

You can also try scan
vec <- c("2 - 1", "3 - 1", "4 - 1")
s <- scan(text = vec, what = integer(), sep = "-", quiet = TRUE)
matrix(s, length(s)/2, byrow = TRUE)
# [,1] [,2]
# [1,] 2 1
# [2,] 3 1
# [3,] 4 1
Another option is cSplit.
library(splitstackshape)
cSplit(data.frame(vec), "vec", sep = " - ", fixed=TRUE)
# vec_1 vec_2
# 1: 2 1
# 2: 3 1
# 3: 4 1

You can use str_match from the package stringr for this:
library(stringr)
##
x <- c("2 - 1","3 - 1","4 - 1")
##
cmat <- str_match(x, "(\\d).+(\\d)")[,-1]
> apply(cmat,2,as.numeric)
[,1] [,2]
[1,] 2 1
[2,] 3 1
[3,] 4 1

Using reshape2 colsplit
library(reshape2)
colsplit(x, " - ", c("A", "B"))
# A B
# 1 2 1
# 2 3 1
# 3 4 1
Or using tidyrs separate
library(tidyr)
separate(data.frame(x), x, c("A", "B"), sep = " - ")
# A B
# 1 2 1
# 2 3 1
# 3 4 1

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Stop knitr from stripping padding spaces in kable - r-markdown

Related

how to add line break or newline for space in specific columns using kable extra r-markdown?

Merging column header in latex table created by kable in Rmarkdown

Get a second header with the units of columns

split string into non-overlapping segments

Split a vector of strings over a character to return a matrix

Categories

Resources