Subset elements in a list based on a logical condition

Subset elements in a list based on a logical condition - list

How can I subset a list based on a condition (TRUE, FALSE) in another list? Please, see my example below:
l <- list(a=c(1,2,3), b=c(4,5,6,5), c=c(3,4,5,6))
l
$a
[1] 1 2 3
$b
[1] 4 5 6 5
$c
[1] 3 4 5 6
cond <- lapply(l, function(x) length(x) > 3)
cond
$a
[1] FALSE
$b
[1] TRUE
$c
[1] TRUE
> l[cond]
Error in l[cond] : invalid subscript type 'list'

This is what the Filter function was made for:
Filter(function(x) length(x) > 3, l)
$b
[1] 4 5 6 5
$c
[1] 3 4 5 6

Another way is to use sapply instead of lapply.
cond <- sapply(l, function(x) length(x) > 3)
l[cond]

[ is expecting a vector, so use unlist on cond:
l[unlist(cond)]
$b
[1] 4 5 6 5
$c
[1] 3 4 5 6

> l[as.logical(cond)]
$b
[1] 4 5 6 5
$c
[1] 3 4 5 6

I recently learned lengths(), which gets the length of each element of a list. This allows us to avoid making another list including logical values as the OP tried.
lengths(l)
#a b c
#3 4 4
Using this in a logical condition, we can subset list elements in l.
l[lengths(l) > 3]
$b
[1] 4 5 6 5
$c
[1] 3 4 5 6

Well im am very new to R but as it is a functional language by far the best solution according to the previous answers is something like:
filter <- function (inputList, selector) sapply(inputList, function (element) selector(element))
Assume you have a complex list like yours:
myList <- list(
a=c(1,2,3),
b=c(4,5,6,5),
c=c(3,4,5,6))
Then you can filter the elements like:
selection <- myList[filter(myList, function (element) length(element) > 3]
Well of course this also works for list that just contain a value at the first level:
anotherList <- list(1, 2, 3, 4)
selection <- myList[filter(anotherList, function (element) element == 2)]
Or you can put it all together like:
filter <- function (inputList, selector) inputList[sapply(inputList, function (element) selector(element))]

cond <- lapply(l, length) > 3
l[cond]

l <- list(a=c(1,2,3), b=c(4,5,6,5), c=c(3,4,5,6))
l[lengths(l) > 3]
$b
[1] 4 5 6 5
$c
[1] 3 4 5 6
If a condition on value is needed:
cond <- lapply(l, function(i) i > 3)
res <- Map(`[`, l, cond)
res
$a
numeric(0)
$b
[1] 4 5 6 5
$c
[1] 4 5 6

Related

Use lapply to plot data in a list and use names of list elements as plot titles [duplicate]

This question already has an answer here:
Adding lists names as plot titles in lapply call in R
(1 answer)
Closed 7 years ago.
If I have the following list:
comp.surv <- list(a = 1:4, b = c(1, 2, 4, 8), c = c(1, 3, 8, 27))
comp.surv
# $a
# [1] 1 2 3 4
#
# $b
# [1] 1 2 4 8
#
# $c
# [1] 1 3 8 27
I can use lapply to plot each list element:
lapply(comp.surv, function(x) plot(x))
However, I want to include the name of each list element as plot title (main). For my example data, the title of each graph would be a,b and c respectively. First thing, is that I have a gsub rule that given comp.surv$a, I return a :
gsub(comp.surv\\$([a-z]+), "\\1", deparse(sustitute((comp.surv$a)))
# "a"
Which is good. However I cannot embed this result into my lapply statement above. Any ideas?
In the mean time I have tried getting round this by creating a function this to include the main parameter:
splot <- function(x){
plot(x, main = gsub(comp.surv\\$([a-z]+), "\\1" deparse(sustitute((x))))
}
lapply(comp.surv, function(x) splot(x))
This will plot each sub-variable of comp.surv, but all the titles are blank.
Can anyone recommend if I am going down the right track?

One possibility would be to loop over the names of the list:
lapply(names(comp.surv), function(x) plot(comp.surv[[x]], main = x))
Or slightly more verbose, loop over the list indices:
lapply(seq_along(comp.surv), function(x) plot(comp.surv[[x]], main = names(comp.surv)[x]))

Is that what you want?
ns=names(comp.surv)
lapply(ns, function(x) plot(comp.surv[[x]], main=x,ylab="y"))

How to properly manipulate a string column in a data frame in R?

I have a data.frame with a string column that contains periods e.g "a.b.c.X". I want to split out the string by periods and retain the third segment e.g. "c" in the example given. Here is what I'm doing.
> df = data.frame(v=c("a.b.a.X", "a.b.b.X", "a.b.c.X"), b=seq(1,3))
> df
v b
1 a.b.a.X 1
2 a.b.b.X 2
3 a.b.c.X 3
And what I want is
> df = data.frame(v=c("a.b.a.X", "a.b.b.X", "a.b.c.X"), b=seq(1,3))
> df
v b
1 a 1
2 b 2
3 c 3
I'm attempting to use within, but I'm getting strange results. The value in the first row in the first column is being repeated.
> get = function(x) { unlist(strsplit(x, "\\."))[3] }
> within(df, v <- get(as.character(v)))
v b
1 a 1
2 a 2
3 a 3
What is the best practice for doing this? What am I doing wrong?
Update:
Here is the solution I used from #agstudy's answer:
> df = data.frame(v=c("a.b.a.X", "a.b.b.X", "a.b.c.X"), b=seq(1,3))
> get = function(x) gsub(".*?[.].*?[.](.*?)[.].*", '\\1', x)
> within(df, v <- get(v))
v b
1 a 1
2 b 2
3 c 3

Using some regular expression you can do :
gsub(".*?[.].*?[.](.*?)[.].*", '\\1', df$v)
[1] "a" "b" "c"
Or more concise:
gsub("(.*?[.]){2}(.*?)[.].*", '\\2', v)

The problem is not with within but with your get function. It returns a single character ("a") which gets recycled when added to your data.frame. Your code should look like this:
get.third <- function(x) sapply(strsplit(x, "\\."), `[[`, 3)
within(df, v <- get.third(as.character(v)))

Here is one possible solution:
df[, "v"] <- do.call(rbind, strsplit(as.character(df[, "v"]), "\\."))[, 3]
## > df
## v b
## 1 a 1
## 2 b 2
## 3 c 3

The answer to "what am I doing wrong" is that the bit of code that you thought was extracting the third element of each split string was actually putting all the elements of all your strings in a single vector, and then returning the third element of that:
get = function(x) {
splits = strsplit(x, "\\.")
print("All the elements: ")
print(unlist(splits))
print("The third element:")
print(unlist(splits)[3])
# What you actually wanted:
third_chars = sapply(splits, function (x) x[3])
}
within(df, v2 <- get(as.character(v)))

Find the location of a character in string

I would like to find the location of a character in a string.
Say: string = "the2quickbrownfoxeswere2tired"
I would like the function to return 4 and 24 -- the character location of the 2s in string.

You can use gregexpr
gregexpr(pattern ='2',"the2quickbrownfoxeswere2tired")
[[1]]
[1] 4 24
attr(,"match.length")
[1] 1 1
attr(,"useBytes")
[1] TRUE
or perhaps str_locate_all from package stringr which is a wrapper for gregexpr stringi::stri_locate_all (as of stringr version 1.0)
library(stringr)
str_locate_all(pattern ='2', "the2quickbrownfoxeswere2tired")
[[1]]
start end
[1,] 4 4
[2,] 24 24
note that you could simply use stringi
library(stringi)
stri_locate_all(pattern = '2', "the2quickbrownfoxeswere2tired", fixed = TRUE)
Another option in base R would be something like
lapply(strsplit(x, ''), function(x) which(x == '2'))
should work (given a character vector x)

Here's another straightforward alternative.
> which(strsplit(string, "")[[1]]=="2")
[1] 4 24

You can make the output just 4 and 24 using unlist:
unlist(gregexpr(pattern ='2',"the2quickbrownfoxeswere2tired"))
[1] 4 24

find the position of the nth occurrence of str2 in str1(same order of parameters as Oracle SQL INSTR), returns 0 if not found
instr <- function(str1,str2,startpos=1,n=1){
aa=unlist(strsplit(substring(str1,startpos),str2))
if(length(aa) < n+1 ) return(0);
return(sum(nchar(aa[1:n])) + startpos+(n-1)*nchar(str2) )
}
instr('xxabcdefabdddfabx','ab')
[1] 3
instr('xxabcdefabdddfabx','ab',1,3)
[1] 15
instr('xxabcdefabdddfabx','xx',2,1)
[1] 0

To only find the first locations, use lapply() with min():
my_string <- c("test1", "test1test1", "test1test1test1")
unlist(lapply(gregexpr(pattern = '1', my_string), min))
#> [1] 5 5 5
# or the readable tidyverse form
my_string %>%
gregexpr(pattern = '1') %>%
lapply(min) %>%
unlist()
#> [1] 5 5 5
To only find the last locations, use lapply() with max():
unlist(lapply(gregexpr(pattern = '1', my_string), max))
#> [1] 5 10 15
# or the readable tidyverse form
my_string %>%
gregexpr(pattern = '1') %>%
lapply(max) %>%
unlist()
#> [1] 5 10 15

You could use grep as well:
grep('2', strsplit(string, '')[[1]])
#4 24

Setting vector as a list component in R

I want to create a list which has 1 element called 'a' that holds a vector of doubles.
l<-list('a'=1:1000)
does the trick.
However, what if I want to do it dynamically?
l<-list()
l['a']<-1:1000
does not work!
How can I allocate enough memory when creating the list?
Thanks

Then you do
> l<-list()
> l[['a']]<-1:10
> l
$a
[1] 1 2 3 4 5 6 7 8 9 10
which works fine. With lists, [...] gives you a list with the selected elements, where [[...]] gives you the selected element. See also the help page ?Extract
EDIT :
or, as Tim said, l$a <- 1:10 does the same. The advantage of [[...]] lies in
> l <- list()
> aname <- 'a'
> l[[aname]] <- 1:10
> l
$a
[1] 1 2 3 4 5 6 7 8 9 10

Generating a vector of the number of items in each list item

I have a list containing 98 items. But each item contains 0, 1, 2, 3, 4 or 5 character strings.
I know how to get the length of the list and in fact someone has asked the question before and got voted down for presumably asking such an easy question.
But I want a vector that is 98 elements long with each element being an integer from 0 to 5 telling me how many character strings there are in each list item.
I was expecting the following to work but it did not.
lapply(name.of.list,length())
From my question you will see that I do not really know the nomeclature of lists and items. Feel free to straighten me out.

Farrel, I do not exactly follow as 'item' is not an R type. Maybe you have a list of length 98 where each element is a vector of character string?
In that case, consider this:
R> fl <- list(A=c("un", "deux"), B=c("one"), C=c("eins", "zwei", "drei"))
R> lapply(fl, function(x) length(x))
$A
[1] 2
$B
[1] 1
$C
[1] 3
R> do.call(rbind, lapply(fl, function(x) length(x)))
[,1]
A 2
B 1
C 3
R>
So there is you vector of the length of your list, telling you how many strings each list element has. Note the last do.call(rbind, someList) as we got a list back from lapply.
If, on the other hand, you want to count the length of all the strings at each list position, replace the simple length(x) with a new function counting the characters:
R> lapply(fl, function(x) { sapply(x, function(y) nchar(y)) } )
$A
un deux
2 4
$B
one
3
$C
eins zwei drei
4 4 4
R>
If that is not want you want, maybe you could mock up some example input data?
Edit:: In response to your comments, what you wanted is probably:
R> do.call(rbind, lapply(fl, length))
[,1]
A 2
B 1
C 3
R>
Note that I pass in length, the name of a function, and not length(), the (displayed) body of a function. Because that is easy to mix up, I simply apply almost always wrap an anonymous function around as in my first answer.
And yes, this can also be done with just sapply or even some of the **ply functions:
R> sapply(fl, length)
A B C
2 1 3
R> lapply(fl, length)
[1] 2 1 3
R>

All this seems very complicated - there is a function specifically doing what you were asking for:
lengths #note the plural "s"
Using Dirks sample data:
fl <- list(A=c("un", "deux"), B=c("one"), C=c("eins", "zwei", "drei"))
lengths(fl)
will return a named integer vector:
A B C
2 1 3

The code below accepts a list and returns a vector of lengths:
x = c("vectors", "matrices", "arrays", "factors", "dataframes", "formulas",
"shingles", "datesandtimes", "connections", "lists")
xl = list(x)
fnx = function(xl){length(unlist(strsplit(x, "")))}
lv = sapply(x, fnx)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Subset elements in a list based on a logical condition - list

This is what the Filter function was made for: Filter(function(x) length(x) > 3, l) $b [1] 4 5 6 5 $c [1] 3 4 5 6

Another way is to use sapply instead of lapply. cond <- sapply(l, function(x) length(x) > 3) l[cond]

[ is expecting a vector, so use unlist on cond: l[unlist(cond)] $b [1] 4 5 6 5 $c [1] 3 4 5 6

> l[as.logical(cond)] $b [1] 4 5 6 5 $c [1] 3 4 5 6

cond <- lapply(l, length) > 3 l[cond]

l <- list(a=c(1,2,3), b=c(4,5,6,5), c=c(3,4,5,6)) l[lengths(l) > 3] $b [1] 4 5 6 5 $c [1] 3 4 5 6 If a condition on value is needed: cond <- lapply(l, function(i) i > 3) res <- Map(`[`, l, cond) res $a numeric(0) $b [1] 4 5 6 5 $c [1] 4 5 6

Related

Use lapply to plot data in a list and use names of list elements as plot titles [duplicate]

How to properly manipulate a string column in a data frame in R?

Find the location of a character in string

Setting vector as a list component in R

Generating a vector of the number of items in each list item

Categories

Resources