Extract numbers from strings including '|' - regex

I have data where some of the items are numbers separated by "|", like:
head(mintimes)
[1] "3121|3151" "1171" "1351|1381" "1050" "" "122"
head(minvalues)
[1] 14 10 11 31 Inf 22
What I would like to do is extract all the times and match them to the minvalues. To end up with something like:
times values
3121 14
3151 14
1171 10
1351 11
1381 11
1050 31
122 22
I've tried to strsplit(mintimes, "|") and I've tried str_extract(mintimes, "[0-9]+") but they don't seem to work. Any ideas?

| is a regular expression metacharacter. When used literally, these special characters need to be escaped either with [] or with \\ (or you could use fixed = TRUE in some functions). So your call to strsplit() should be
strsplit(mintimes, "[|]")
or
strsplit(mintimes, "\\|")
or
strsplit(mintimes, "|", fixed = TRUE)
Regarding your other try with stringr functions, str_extract_all() seems to do the trick.
library(stringr)
str_extract_all(mintimes, "[0-9]+")
To get your desired result,
> mintimes <- c("3121|3151", "1171", "1351|1381", "1050", "", "122")
> minvalues <- c(14, 10, 11, 31, Inf, 22)
> s <- strsplit(mintimes, "[|]")
> data.frame(times = as.numeric(unlist(s)),
values = rep(minvalues, sapply(s, length)))
# times values
# 1 3121 14
# 2 3151 14
# 3 1171 10
# 4 1351 11
# 5 1381 11
# 6 1050 31
# 7 122 22

By default strsplit splits using a regular expression and "|" is a special character in the regular expression syntax. You can either escape it
strsplit(mintimes,"\\|")
or just set fixed=T to not use regular expressions
strsplit(mintimes,"|", fixed=T)

I have written a function called cSplit that is useful for these types of things. You can get it from my Gist: https://gist.github.com/mrdwab/11380733
Usage would be:
cSplit(data.table(mintimes, minvalues), "mintimes", "|", "long")
# mintimes minvalues
# 1: 3121 14
# 2: 3151 14
# 3: 1171 10
# 4: 1351 11
# 5: 1381 11
# 6: 1050 31
# 7: 122 22
It also has a "wide" setting, in case that would be at all useful to you:
cSplit(data.table(mintimes, minvalues), "mintimes", "|", "wide")
# minvalues mintimes_1 mintimes_2
# 1: 14 3121 3151
# 2: 10 1171 NA
# 3: 11 1351 1381
# 4: 31 1050 NA
# 5: Inf NA NA
# 6: 22 122 NA
Note: The output is a data.table.

As others have mentioned, you need to escape the | to include it literally in a regular expression. As always, we can skin this cat many ways, and here's one way to do it with stringr:
x <- c("3121|3151", "1171", "1351|1381", "1050", "", "122")
library(stringr)
unlist(str_extract_all(x, "\\d+"))
# [1] "3121" "3151" "1171" "1351" "1381" "1050" "122"
This won't work as expected if you have any decimal points in a character string of numbers, so the following (which says to match anything but |) might be safer:
unlist(str_extract_all(x, '[^|]+'))
# [1] "3121" "3151" "1171" "1351" "1381" "1050" "122"
Either way, you might want to wrap the result in as.numeric.

And here's another solution using stri_split_fixed from the stringi package. As an added value, we also play with mapply and do.call.
Input data:
mintimes <- c("3121|3151", "1171", "1351|1381", "1050", "", "122")
minvalues <- c(14, 10, 11, 31, Inf, 22)
Split mintimes w.r.t. | and convert to numeric:
library("stringi")
mintimes <- lapply(stri_split_fixed(mintimes, "|"), as.numeric)
## [[1]]
## [1] 3121 3151
##
## [[2]]
## [1] 1171
##
## [[3]]
## [1] 1351 1381
##
## [[4]]
## [1] 1050
##
## [[5]]
## [1] NA
##
## [[6]]
## [1] 122
Column-bind each minvalues with corresponding mintimes:
tmp <- mapply(cbind, mintimes, minvalues)
## [[1]]
## [,1] [,2]
## [1,] 3121 14
## [2,] 3151 14
##
## [[2]]
## [,1] [,2]
## [1,] 1171 10
##
## [[3]]
## [,1] [,2]
## [1,] 1351 11
## [2,] 1381 11
##
## [[4]]
## [,1] [,2]
## [1,] 1050 31
##
## [[5]]
## [,1] [,2]
## [1,] NA Inf
##
## [[6]]
## [,1] [,2]
## [1,] 122 22
Row-bind all the 6 matrices & remove NA-rows:
res <- do.call(rbind, tmp)
res[!is.na(res[,1]),]
## [,1] [,2]
## [1,] 3121 14
## [2,] 3151 14
## [3,] 1171 10
## [4,] 1351 11
## [5,] 1381 11
## [6,] 1050 31
## [7,] 122 22

To get the output you want, try something like this:
library(dplyr)
Split.Times <- function(x) {
mintimes <- as.numeric(unlist(strsplit(as.character(x$mintimes), "\\|")))
return(data.frame(mintimes = mintimes, minvalues = x$minvalues, stringsAsFactors=FALSE))
}
df <- data.frame(mintimes, minvalues, stringsAsFactors=FALSE)
df %>%
filter(mintimes != "") %>%
group_by(mintimes) %>%
do(Split.Times(.))
This produces:
mintimes minvalues
1 1050 31
2 1171 10
3 122 22
4 1351 11
5 1381 11
6 3121 14
7 3151 14
(I borrowed from my answer here - which is pretty much the same question/problem)

Here's a qdap package approach:
mintimes <- c("3121|3151", "1171", "1351|1381", "1050", "", "122")
minvalues <- c(14, 10, 11, 31, Inf, 22)
library(qdap)
list2df(setNames(strsplit(mintimes, "\\|"), minvalues), "times", "values")
## times values
## 1 3121 14
## 2 3151 14
## 3 1171 10
## 4 1351 11
## 5 1381 11
## 6 1050 31
## 7 122 22

You can use [:punct:]
strsplit(mintimes, "[[:punct:]]")

Related

Boost Dijkstra code causes damaged segment memory?

I'm trying to find the shortest paths in a graph using the boost dijkstra algorithm.
std::pair<c_vertex_iterator_t, c_vertex_iterator_t> vi;
std::pair<c_vertex_iterator_t, c_vertex_iterator_t> vj;
boost::property_map<ConGraph,boost::edge_weight_t>::type weightmap = get(boost::edge_weight, cg);
std::vector<c_vertex_t> p(num_vertices(cg));
std::vector<int> d(num_vertices(cg));
for (vi = vertices(cg); vi.first != vi.second; ++vi.first)
{
boost::dijkstra_shortest_paths(cg, *vi.first,
predecessor_map(boost::make_iterator_property_map(p.begin(), get(boost::vertex_index, cg))).
distance_map(boost::make_iterator_property_map(d.begin(), get(boost::vertex_index, cg))));
for (vj = vertices(cg); vj.first != vj.second; ++vj.first)
{
distMat[*vi.first][*vj.first]= d[*vj.first];
}
}
return boost::num_vertices(cg);
But I have a problem in this code; the application stop running at this line:
distance_map(boost::make_iterator_property_map(d.begin(), get(boost::vertex_index, cg))));
visual c++ detect a damaged memory segment error caused by this instruction
retval = HeapFree(_crtheap, 0, pBlock);
What should I do to fix the problem?
I'm with #1201ProgramAlarm: safely allocating memory for distMat shows that there's little real problem with the code:
Live On Coliru
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/dag_shortest_paths.hpp>
using ConGraph = boost::adjacency_list<boost::vecS, boost::vecS, boost::directedS,
boost::no_property, boost::property<boost::edge_weight_t, int> >;
using c_vertex_iterator_t = ConGraph::vertex_iterator;
using c_vertex_t = ConGraph::vertex_descriptor;
template <typename Matrix>
int foo(ConGraph& cg, Matrix& distMat) {
std::pair<c_vertex_iterator_t, c_vertex_iterator_t> vi;
std::pair<c_vertex_iterator_t, c_vertex_iterator_t> vj;
boost::property_map<ConGraph,boost::edge_weight_t>::type weightmap = get(boost::edge_weight, cg);
std::vector<c_vertex_t> p(num_vertices(cg));
std::vector<int> d(num_vertices(cg));
for (vi = vertices(cg); vi.first != vi.second; ++vi.first)
{
boost::dijkstra_shortest_paths(cg, *vi.first,
predecessor_map(boost::make_iterator_property_map(p.begin(), get(boost::vertex_index, cg))).
distance_map(boost::make_iterator_property_map(d.begin(), get(boost::vertex_index, cg))).
weight_map(weightmap)
);
for (vj = vertices(cg); vj.first != vj.second; ++vj.first) {
distMat[*vi.first][*vj.first] = d[*vj.first];
}
}
return boost::num_vertices(cg);
}
#include <boost/graph/graph_utility.hpp>
#include <boost/graph/random.hpp>
#include <iomanip>
#include <random>
int main() {
ConGraph g;
{
std::mt19937 rng { std::random_device{}() };
std::uniform_int_distribution<int> wdist(1,10);
generate_random_graph(g, 20, 40, rng);
auto weightmap = get(boost::edge_weight, g);
for (auto ed : boost::make_iterator_range(edges(g)))
put(weightmap, ed, wdist(rng));
}
print_graph(g);
std::vector<std::vector<int> > mat(num_vertices(g), std::vector<int>(num_vertices(g)));
std::cout << "foo(g, mat): " << foo(g, mat) << "\n";
for (auto& row : mat) {
for (auto i : row) {
if (i == std::numeric_limits<int>::max())
std::cout << "## ";
else
std::cout << std::setw(2) << i << " ";
}
std::cout << "\n";
}
}
Prints (e.g., random generated graphs):
0 --> 2 16
1 --> 11 5
2 --> 8 10 17 10
3 --> 13 16 16
4 -->
5 --> 16
6 -->
7 --> 19 3 9 18
8 --> 10
9 --> 13
10 --> 6
11 --> 4 4 16 19
12 --> 11 3 1 1 11
13 --> 10 5
14 --> 10 1
15 --> 1 13
16 --> 8
17 --> 15 2
18 --> 4
19 --> 0 9
foo(g, mat): 20
0 23 9 ## 27 29 20 ## 12 35 17 26 ## 26 ## 19 2 15 ## 31
13 0 22 ## 4 6 30 ## 21 12 27 3 ## 20 ## 32 11 28 ## 8
27 14 0 ## 18 20 11 ## 10 26 8 17 ## 17 ## 10 25 6 ## 22
## ## ## 0 ## 6 15 ## 16 ## 12 ## ## 2 ## ## 6 ## ## ##
## ## ## ## 0 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
## ## ## ## ## 0 24 ## 15 ## 21 ## ## ## ## ## 5 ## ## ##
## ## ## ## ## ## 0 ## ## ## ## ## ## ## ## ## ## ## ## ##
8 31 17 8 5 14 23 0 20 2 20 34 ## 10 ## 27 10 23 1 3
## ## ## ## ## ## 9 ## 0 ## 6 ## ## ## ## ## ## ## ## ##
## ## ## ## ## 12 21 ## 27 0 18 ## ## 8 ## ## 17 ## ## ##
## ## ## ## ## ## 3 ## ## ## 0 ## ## ## ## ## ## ## ## ##
10 33 19 ## 1 21 28 ## 19 9 25 0 ## 17 ## 29 9 25 ## 5
18 5 27 3 9 9 18 ## 19 17 15 8 0 5 ## 37 9 33 ## 13
## ## ## ## ## 4 13 ## 19 ## 10 ## ## 0 ## ## 9 ## ## ##
21 8 30 ## 12 14 7 ## 29 20 4 11 ## 28 0 40 19 36 ## 16
17 4 26 ## 8 10 20 ## 25 16 17 7 ## 7 ## 0 15 32 ## 12
## ## ## ## ## ## 19 ## 10 ## 16 ## ## ## ## ## 0 ## ## ##
21 8 3 ## 12 14 14 ## 13 20 11 11 ## 11 ## 4 19 0 ## 16
## ## ## ## 4 ## ## ## ## ## ## ## ## ## ## ## ## ## 0 ##
5 28 14 ## 32 16 25 ## 17 4 22 31 ## 12 ## 24 7 20 ## 0

regex to split on anything not a digit

I would like to split strings on anything not a digit. In this particular case the strings were dates and times read in from an external .csv file and are not currently in as.POSIXct format.
Ideally I would like to split the strings using regex, but if there is a simpler way to convert them to six columns of numbers using a date / time function that would be of interest as well.
I have already succeeded in creating a regex that splits the strings into six columns, but this regex is not general.
Here are the data:
my.data <- read.csv(text = '
Date_Time
18/05/2011 07:32:40
19/05/2011 13:26:02
19/05/2011 13:32:47
19/05/2011 13:45:24
19/05/2011 14:57:27
19/05/2011 15:03:18
', header=TRUE, stringsAsFactors = FALSE, na.strings = 'NA', strip.white = TRUE)
Here is a regex statement that splits the strings into six columns:
my.date.time <- data.frame(do.call(rbind, strsplit(my.data$Date_Time,"[/|:|[:space:]]+") ))
The above statement is not general. Here is an unsuccessful attempt at making the regex general by specifying a split on anything that is not a digit:
data.frame(do.call(rbind, strsplit(my.data$Date_Time,"[^\\d]+") ))
After I split the strings into six columns I still need what seems like an excessive number of statements to convert the columns into numeric format:
colnames(my.date.time) <- c('my.day', 'my.month', 'my.year', 'my.hour', 'my.minute', 'my.second')
revised.data <- data.frame(my.data, my.date.time, stringsAsFactors = FALSE)
revised.data$my.day <- as.numeric(as.character(revised.data$my.day))
revised.data$my.month <- as.numeric(as.character(revised.data$my.month))
revised.data$my.year <- as.numeric(as.character(revised.data$my.year))
revised.data$my.hour <- as.numeric(as.character(revised.data$my.hour))
revised.data$my.minute <- as.numeric(as.character(revised.data$my.minute))
revised.data$my.second <- as.numeric(as.character(revised.data$my.second))
revised.data
str(revised.data)
Thank you for any assistance in generalizing the above regex (or streamlining the procedure using date / time functions). The apply function probably can eliminate most of the as.numeric(as.character) statements, although that is a relatively minor issue.
Give a try to \\D+
> x <- "18/05/2011 07:32:40"
> strsplit(x, "\\D+")
[[1]]
[1] "18" "05" "2011" "07" "32" "40"
or
> strsplit(x, "[^0-9]+")
[[1]]
[1] "18" "05" "2011" "07" "32" "40"
Maybe I missed something but here is my solution:
lisda <- apply(my.data, 1, strsplit, "[^[:digit:]]")
my.data2 <- t(data.frame(lisda))
my.data2
[,1] [,2] [,3] [,4] [,5] [,6]
Date_Time "18" "05" "2011" "07" "32" "40"
Date_Time.1 "19" "05" "2011" "13" "26" "02"
Date_Time.2 "19" "05" "2011" "13" "32" "47"
Date_Time.3 "19" "05" "2011" "13" "45" "24"
Date_Time.4 "19" "05" "2011" "14" "57" "27"
Date_Time.5 "19" "05" "2011" "15" "03" "18"
Just in case you want to convert them all to numeric.
apply(my.data2, 2, function(x) as.numeric(as.character(x)))
Using cSplit
library(splitstackshape)
tmp = cSplit(my.data, "Date_Time", "/")
out = cSplit(tmp, "Date_Time_3", ":")
if you read your data like this
my.data <- read.csv(text = 'Date Time
18/05/2011 07:32:40
19/05/2011 13:26:02
19/05/2011 13:32:47
19/05/2011 13:45:24
19/05/2011 14:57:27
19/05/2011 15:03:18', header=TRUE, sep =' ' ,stringsAsFactors = FALSE, na.strings = 'NA', strip.white = TRUE)
you could do
library(splitstackshape)
out = cSplit(my.data, splitCols = c("Date", "Time"), sep = c("/", ":"))
#> out
# Date_1 Date_2 Date_3 Time_1 Time_2 Time_3
#1: 18 5 2011 7 32 40
#2: 19 5 2011 13 26 2
#3: 19 5 2011 13 32 47
#4: 19 5 2011 13 45 24
#5: 19 5 2011 14 57 27
#6: 19 5 2011 15 3 18
You might consider using read.pattern from the gsubfn package for this:
library(gsubfn)
read.pattern(text = my.data$Date_Time, pattern = "\\d+")
# V1 V2 V3 V4 V5 V6
# 1 18 5 2011 7 32 40
# 2 19 5 2011 13 26 2
# 3 19 5 2011 13 32 47
# 4 19 5 2011 13 45 24
# 5 19 5 2011 14 57 27
# 6 19 5 2011 15 3 18
Then you can simply assign the column names as you desire.

How to split character and numerical separately in R

I have a dataframe which looks like this:
df= data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
And I want to split this into a data frame with 3 columns so that the output looks like:
name1 name2 name3
1 Alex 100.00
12 Rina Faso 92.31
113 john 00.00
I have tried stringr() and grep() and have got limited success. Lack of a delimiter makes it lot more difficult.
You could try
library(tidyr)
res <- extract(df, name, into=c('name1', 'name2', 'name3'),
'(\\d+)([^0-9]+)([0-9.]+)', convert=TRUE)
res
# name1 name2 name3
#1 1 Alex 100.00
#2 2 Rina Faso 92.31
#3 3 john 50.00
str(res)
# 'data.frame': 3 obs. of 3 variables:
#$ name1: int 1 2 3
#$ name2: Factor w/ 3 levels "Alex","john",..: 1 3 2
# $ name3: num 100 92.3 50
Update
Based on 'df' from #DavidArenburg's post
res <- extract(df, name, into=c('name1', 'name2', 'name3'),
'(\\d+)([^0-9]+)([0-9.]+)', convert=TRUE)
res
# name1 name2 name3
#1 121 Réunion 13.76
#2 2 Côte d'Ivoire 22.40
#3 3 john 50.00
Try with str_match from stringr:
str_match(df$name, "^([0-9]*)([A-Za-z ]*)([0-9\\.]*)")
# [,1] [,2] [,3] [,4]
# [1,] "1Alex100.00" "1" "Alex" "100.00"
# [2,] "2Rina Faso92.31" "2" "Rina Faso" "92.31"
# [3,] "3john50.00" "3" "john" "50.00"
So as.data.frame(str_match(df$name, "^([0-9]*)([A-Za-z ]*)([0-9\\.]*)")[,-1]) should give you the desired result.
You could do like this also.
> df <- data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
> x <- do.call(rbind.data.frame, strsplit(as.character(df$name), "(?<=[A-Za-z])(?=\\d)|(?<=\\d)(?=[A-Za-z])", perl=T))
> colnames(x) <- c("name1", "name2", "name3")
> print(x, row.names=FALSE)
name1 name2 name3
1 Alex 100.00
12 Rina Faso 92.31
113 john 00.00
With base R it could be done abit uglier though it works with special characters too
with(df, cbind(sub("\\D.*", "", name),
gsub("[0-9.]", "", name),
gsub(".*[A-Za-z]", "", name)))
# [,1] [,2] [,3]
# [1,] "1" "Alex" "100.00"
# [2,] "2" "Rina Faso" "92.31"
# [3,] "3" "john" "50.00"
An example on special characters
df = data.frame(name= c("121Réunion13.76","2Côte d'Ivoire22.40","3john50.00"))
with(df, cbind(sub("\\D.*", "", name),
gsub("[0-9.]", "", name),
gsub(".*[A-Za-z]", "", name)))
# [,1] [,2] [,3]
# [1,] "121" "Réunion" "13.76"
# [2,] "2" "Côte d'Ivoire" "22.40"
# [3,] "3" "john" "50.00"
Base R not ugly solutions:
proto=data.frame(name1=numeric(),name2=character(),name3=numeric())
strcapture("(\\d+)(\\D+)(.*)",as.character(df$name),proto)
name1 name2 name3
1 1 Alex 100.00
2 12 Rina Faso 92.31
3 113 john 0.00
read.table(text=gsub("(\\d+)(\\D+)(.*)","\\1|\\2|\\3",df$name),sep="|")
V1 V2 V3
1 1 Alex 100.00
2 12 Rina Faso 92.31
3 113 john 0.00
You could use the package unglue :
df <- data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
library(unglue)
unglue_unnest(df, name, "{name1}{name2=\\D+}{name3}", convert = TRUE)
#> name1 name2 name3
#> 1 1 Alex 100.00
#> 2 12 Rina Faso 92.31
#> 3 113 john 0.00

Positive look ahead in R - passing variables

I got stuck in a regular expression.
I usually use this line of code to find overlapping repetitions in strings:
gregexpr("(?=ATGGGCT)",text,perl=TRUE)
[[1]]
[1] 16 45 52 75 203 210 266 273 327 364 436 443 480 506 534 570 649
attr(,"match.length")
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
attr(,"useBytes")
[1] TRUE
Now I want to give to gregexpr a pattern contained in a variable:
x="GGC"
and of course if I pass the variable x, gregexpr is going to search "x" and not what the variable contains
gregexpr("(?=x)",text,perl=TRUE)
[[1]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE
How can I pass my variable to gregexpr in this case of positive look ahead?
I'd play with the sprintf function:
x <- "AGA"
text <- "ACAGAGACTTTAGATAGAGAAGA"
gregexpr(sprintf("(?=%s)", x), text, perl=TRUE)
## [[1]]
## [1] 3 5 12 16 18 21
## attr(,"match.length")
## [1] 0 0 0 0 0 0
## attr(,"useBytes")
## [1] TRUE
sprintf substitutes the occurrence of %s by the value of x.
You could use paste0 which is short for paste(x, sep="") ...
x <- "GGC"
text <- 'ATGGGCTATGGGCTATGGGCTATGGGCT'
gregexpr(paste0('(?=', x, ')'), text, perl=TRUE)
# [[1]]
# [1] 4 11 18 25
# attr(,"match.length")
# [1] 0 0 0 0
# attr(,"useBytes")
# [1] TRUE
And if you want to access the overlapping matches, take a look at Overlapping matches in R
The fn$ prefix in gsubfn package supports string interpolation:
library(gsubfn)
# test data
text <- "ATGGGCTAAATGGGCT"
x <- "GGGC"
fn$gregexpr("(?=$x)", text, perl = TRUE)
See ?fn , the gsubfn home page and the gsubfn vignette, vignette("gsubfn") .
ok I solved it in this way:
text="ATGGGCTAAATGGGCT"
x="GGC"
c=paste("(?=",x,")",sep="")
r=gregexpr(c,text,perl=TRUE)

Combining (cbind) vectors of different length

I have several vectors of unequal length and I would like to cbind them. I've put the vectors into a list and I have tried to combine the using do.call(cbind, ...):
nm <- list(1:8, 3:8, 1:5)
do.call(cbind, nm)
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 1
# [7,] 7 3 2
# [8,] 8 4 3
# Warning message:
# In (function (..., deparse.level = 1) :
# number of rows of result is not a multiple of vector length (arg 2)
As expected, the number of rows in the resulting matrix is the length of the longest vector, and the values of the shorter vectors are recycled to make up for the length.
Instead I'd like to pad the shorter vectors with NA values to obtain the same length as the longest vector. I'd like the matrix to look like this:
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 NA
# [7,] 7 NA NA
# [8,] 8 NA NA
How can I go about doing this?
You can use indexing, if you index a number beyond the size of the object it returns NA. This works for any arbitrary number of rows defined with foo:
nm <- list(1:8,3:8,1:5)
foo <- 8
sapply(nm, '[', 1:foo)
EDIT:
Or in one line using the largest vector as number of rows:
sapply(nm, '[', seq(max(sapply(nm,length))))
From R 3.2.0 you may use lengths ("get the length of each element of a list") instead of sapply(nm, length):
sapply(nm, '[', seq(max(lengths(nm))))
You should fill vectors with NA before calling do.call.
nm <- list(1:8,3:8,1:5)
max_length <- max(unlist(lapply(nm,length)))
nm_filled <- lapply(nm,function(x) {ans <- rep(NA,length=max_length);
ans[1:length(x)]<- x;
return(ans)})
do.call(cbind,nm_filled)
This is a shorter version of Wojciech's solution.
nm <- list(1:8,3:8,1:5)
max_length <- max(sapply(nm,length))
sapply(nm, function(x){
c(x, rep(NA, max_length - length(x)))
})
Here is an option using stri_list2matrix from stringi
library(stringi)
out <- stri_list2matrix(nm)
class(out) <- 'numeric'
out
# [,1] [,2] [,3]
#[1,] 1 3 1
#[2,] 2 4 2
#[3,] 3 5 3
#[4,] 4 6 4
#[5,] 5 7 5
#[6,] 6 8 NA
#[7,] 7 NA NA
#[8,] 8 NA NA
Late to the party but you could use cbind.fill from rowr package with fill = NA
library(rowr)
do.call(cbind.fill, c(nm, fill = NA))
# object object object
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA
If you have a named list instead and want to maintain the headers you could use setNames
nm <- list(a = 1:8, b = 3:8, c = 1:5)
setNames(do.call(cbind.fill, c(nm, fill = NA)), names(nm))
# a b c
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA