R: combine lists of interest

R: combine lists of interest - list

I have a list like df_all (see below).
A = matrix( ceiling(10*runif(8)), nrow=4)
colnames(A) = c("country", "year_var")
dfa = data.frame(A)
df1 = dfa[1,]
df2 = dfa[2,]
df3 = dfa[3,]
df4 = dfa[4,]
df_all = list(df1, df2, df3, df4)
df_all
Now I want to combine the list of interest by using variable a.
a <- "2,3,4"
b <- strsplit(a, ",")[[1]]
To combine this lists, I use the folling loop:
for (i in 1:length(b)){
c<-b[i]
aa <- df_all[c:c]
print(aa)
}
Now my question is, How can I combine this result and save this as as variable?
Thanks!

Would this work for you:
basnum<-as.integer(b)
do.call(rbind, df_all[basnum])
Through df_all[basnum], a list with only the relevant data.frames is created.
do.call takes a function and a list as parameters (and some more but not relevant right now). The items of the list are then passed on as parameters to the function.
So in this case, the above is the equivalent to calling:
rbind(df_all[[2]], df_all[[3]], df_all[[4]])
And this produces one data.frame holding all the rows of interest.

Related

Apply function to NA matrix in R

I want to use the apply to use a function that I used to pull data via regular expressions and fill a matrix with it.
planetdata = function(dline) {
new_line = unlist(strsplit(as.character(dline),"</td><td>"))
new_first_value = substring(new_line[1],9)
new_last_value =substring(new_line[11],1,nchar(new_line[11])-10)
new_line[1] <- new_first_value
new_line[11] <- new_last_value
new_data <- new_line
return(new_data)
}
new.dt = dt[21:1912]
exo.mat = matrix(data = NA, nrow=1892, ncol = 11)
colnames(exo.mat) <- c(exo.col.names)
apply(exo.mat,2,function(new.dt) planetdata(new.dt))
However, my matrix does not change and all the values are still NA.
Why is this happening?

Did you mean this? exo.mat[] <- apply(new.dt, 2, planetdata)
R usually passes by value, not by reference. Modifying a variable inside a function will generally not modify it outside. You need to save the value out explicitly.
Also, you were passing in the empty matrix to apply(), it just didn't look that way because you made an anonymous function with a new.dt parameter, which is different from the new.dt variable you had in your session.

Multiple lists of the same length to csv

I have a couple List<string>s, with the format like this:
List 1 List 2 List 3
1 A One
2 B Two
3 C Three
4 D Four
5 E Five
So in code form, it's like:
List<string> list1 = {"1","2","3","4","5"};
List<string> list2 = {"A","B","C","D","E"};
List<string> list3 = {"One","Two","Three","Four","Five"};
My questions are:
How do I transfom those three lists to a CSV format?
list1,list2,list3
1,A,one
2,b,two
3,c,three
4,d,four
5,e,five
Should I append , to the end of each index or make the delimeter its own index within the multidimensional list?

If performance is your main concern, I would use an existing csv library for your language, as it's probably been pretty well optimized.
If that's too much overhead, and you just want a simple function, I use the same concept in some of my code. I use the join/implode function of a language to create a list of comma separated strings, then join that list with \n.
I'm used to doing this in a dynamic language, but you can see the concept in the following pseudocode example:
header = {"List1", "List2", "List3"}
list1 = {"1","2","3","4","5"};
list2 = {"A","B","C","D","E"};
list3 = {"One","Two","Three","Four","Five"};
values = {header, list1, list2, list3};
for index in values
values[index] = values[index].join(",");
values = values.join("\n");

R Subset Dataset Using Regular Expression

Is there a way to make the R code below run quicker (i.e. vectorized to avoid use of for loops)?
My example contains two data frames. First is dimension n1*p. One of the p columns contains names. Second data frame is a column vector (n2*1). It contains names as well. I want to keep all rows of the first data frame, where some part of the name in the column vector of the second data frame appears in the corresponding first data frame. Sorry for the brutal explanation.
Example (Data frame 1):
x y
Doggy 1
Hello 2
Hi Dog 3
Zebra 4
Example (Data frame 2)
z
Hello
Dog
So in the above example I want to keep rows 1,2,3 but NOT 4. Since "Dog" appears in "Doggy" and "Hi Dog". And "Hello" appears in "Hello". Exclude row four since no part of "Hello" or "Dog" appears in "Zebra".
Below is my R code to do this...runs fine. However, for my real task. Data frame 1 has 1 million rows and data frame 2 has 50 items to match on. So runs pretty slow. Any suggestion on how to speed this up are appreciated.
x <- c("Doggy", "Hello", "Hi Dog", "Zebra")
y <- 1:4
dat <- as.data.frame(cbind(x,y))
names(dat) <- c("x","y")
z <- as.data.frame(c("Hello", "Dog"))
names(z) <- c("z")
dat$flag <- NA
for(j in 1:length(z$z)){
for(i in 1:dim(dat)[1]){
if ( is.na(dat$flag[i])==TRUE ) {
dat$flag[i] <- length(grep(paste(z[j,1]), dat[i,1], perl=TRUE, value=TRUE))
} else {
if (dat$flag[i]==0) {
dat$flag[i] <- length(grep(paste(z[j,1]), dat[i,1], perl=TRUE, value=TRUE))
} else {
if (dat$flag[i]==1) {
dat$flag[i]==1
}
}
}
}
}
dat1 <- subset(dat, flag==1)
dat1

Try this:
dat[grep(paste(z$z, collapse = "|"), dat$x), ]
or
subset(dat, grepl(paste(z$z, collapse = "|"), x))

This question inspired a boolean text search function (%bs%) in the qdap package and thus I thought I'd share the approach to this question:
library(qdap)
dat[dat$x %bs% paste(z$z, collapse = "OR"), ]
In this case no less typing but if multiple or/and statements are involved this may be a useful approach.

Function to subset dataframe using pattern list argument

I have a pattern list
patternlist <- list('one' = paste(c('a','b','c'),collapse="|"), 'two' = paste(1:5,collapse="|"), 'three' = paste(c('k','l','m'),collapse="|"))
that I want to select from to extract rows from a data frame
dataframez <- data.frame('letters' = c('a','b','c'), 'numbers' = 1:3, 'otherletters' = c('k','l','m'))
with this function
pattern.record <- function(x, column="letters", value="one")
{
if (column %in% names(x))
{
result <- x[grep(patternlist$value, x$column, ignore.case=T),]
}
else
{
result <- NA
}
return(result)
}
oddly enough, I get an error when I run it:
> pattern.record(dataframez)
Error in grep(patternlist$value, x$column, ignore.case = T) :
invalid 'pattern' argument

The problem is your use of the `$` operator.
In your function, it is looking a column \ named element called column
It is far simpler here to use `[[`
Then x[[column]] uses what column is defined as, not column as a name.
The relevant lines in ?`$` are
Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.
You are trying to use value and column as computed indices (i.e. computing what value and column are defined as), thus you need `[[`.
The function becomes
pattern.record <- function(x, column="letters", value="one", pattern_list)
{
if (column %in% names(x))
{
result <- x[grep(pattern_list[[value]], x[[column]], ignore.case=T),]
}
else
{
result <- NA
}
return(result)
}
pattern.record(dataframez, patternlist = pattern_list)
## letters numbers otherletters
## 1 a 1 k
## 2 b 2 l
## 3 c 3 m
note that I've also added an argumentpattern_list so it does not depend on an object named patternlist existing somewhere in the parent environments (in your case the global environment.

Can one add a data.frame to itself?

I want to append or add a data.frame to itself...
Much in the same way the one adds:
n <- n + t
I have a function that creates a data.frame.
I have been using:
g <- function(compareA,compareB) {
for (i in 1:1000) {
ttr <- t.test(compareA, compareA, var.equal = TRUE)
tt_pvalues[i] <- ttr$p.value
}
name_tag <- paste(nameA, nameB, sep = "_Vs_")
tt_titles <- data.frame(name_tag, tt_titles)
# character vector which I want to add to a list
ALL_pvalues <- data.frame(tt_pvalues, ALL_pvalues)
# adding a numeric vector of values to a larger data.frame
}
Would cbind be better here?

There are two methods that would "add or append" data to a data.frame by columns and one that would append by rows. Assuming tag is the data.frame, and tt_titles is a vector of the same length that 'tag' has rows, then either of these would work:
tag <- cbind(tag, tt_titles)
# tt_titles could also be a data.frame with same number of rows
Or:
tag[["tt_titles"]] <- tt_titles
Now let's assume that we have instead two data.frames with the same column.names:
bigger.df <- rbind(tag, tag2)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

R: combine lists of interest - list

Related

Apply function to NA matrix in R

Multiple lists of the same length to csv

R Subset Dataset Using Regular Expression

Function to subset dataframe using pattern list argument

Can one add a data.frame to itself?

Categories

Resources