Apply function to NA matrix in R - regex

I want to use the apply to use a function that I used to pull data via regular expressions and fill a matrix with it.
planetdata = function(dline) {
new_line = unlist(strsplit(as.character(dline),"</td><td>"))
new_first_value = substring(new_line[1],9)
new_last_value =substring(new_line[11],1,nchar(new_line[11])-10)
new_line[1] <- new_first_value
new_line[11] <- new_last_value
new_data <- new_line
return(new_data)
}
new.dt = dt[21:1912]
exo.mat = matrix(data = NA, nrow=1892, ncol = 11)
colnames(exo.mat) <- c(exo.col.names)
apply(exo.mat,2,function(new.dt) planetdata(new.dt))
However, my matrix does not change and all the values are still NA.
Why is this happening?

Did you mean this? exo.mat[] <- apply(new.dt, 2, planetdata)
R usually passes by value, not by reference. Modifying a variable inside a function will generally not modify it outside. You need to save the value out explicitly.
Also, you were passing in the empty matrix to apply(), it just didn't look that way because you made an anonymous function with a new.dt parameter, which is different from the new.dt variable you had in your session.

Related

R: replacing values in string all at once

I have a data frame that looks like this:
USequence
# 1 GATCAGATC
# 2 ATCAGAC
I'm trying to create a function that would replace all the G's with C's, A's with T's, C's with G's, and T's with A's:
USequence
# 1 CTAGTCTAG
# 2 TAGTCTG
This is what I have right now, the function accepts k, a data frame with a column named USequence.
conjugator <- function(k) {
k$USequence <- str_replace_all(k$USequence,"A","T")
k$USequence <- str_replace_all(k$USequence,"T","A")
k$USequence <- str_replace_all(k$USequence,"G","C")
k$USequence <- str_replace_all(k$USequence,"C","G")
}
However the obvious problem would be that this is doesn't replace the characters at once, but rather in steps which would not return the desired result. Any suggestions? Thanks
You could use chartr
df1$USequence <- chartr('GATC', 'CTAG', df1$USequence)
df1$USequence
#[1] "CTAGTCTAG" "TAGTCTG"
Or
library(gsubfn)
gsubfn('[GATC]', list(G='C', A='T', T='A', C='G'), df1$USequence)
#[1] "CTAGTCTAG" "TAGTCTG"

sapply function in R is not giving me the desired result

I'm trying to use sapply instead of a 'for' loop but I'm not getting the result I'm expecting, I've tested each line separated and the code is working but when I use sapply is not. I'm looking for some hints on what might be wrong:
event <- c('Astronomical Low Tide', 'Avalanche', 'Blizzard', 'Coastal Flood',
'Cold/Wind Chill', 'Debris Flow', 'Dense Fog', 'Dense Smoke', 'Drought',
'Dust Devil', 'Dust Storm','Excessive Heat', 'Extreme Cold/Wind Chill',
'Flash Flood', 'Flood', 'Frost/Freeze', 'Funnel Cloud', 'Freezing Fog',
'Hail', 'Heat', 'Heavy Rain', 'Heavy Snow', 'High Surf', 'High Wind',
'Hurricane/Typhoon', 'Ice Storm', 'Lake/Effect Snow', 'Lakeshore Flood',
'Lightning', 'Marine Hail', 'Marine High Wind', 'Marine Strong Wind',
'Marine Thunderstorm Wind', 'Rip Current', 'Seiche', 'Sleet',
'Storm Surge/Tide', 'Strong Wind', 'Thunderstorm Wind', 'Tornado',
'Tropical Depression', 'Tropical Storm', 'Tsunami', 'Volcanic Ash',
'Waterspout', 'Wildfire', 'Winter Storm', 'Winter Weather')
replace <- function(dt, x, col) {
idx <- grep(paste('(?i)', event[x], sep = ''), dt[, col])
dt[idx, col] <- event[x]
}
sapply(1:length(event), function(x) replace(stormdata, x, 8))
Basically, what I'm trying to do is to use every value on the event variable as a pattern on the grep function within the custom made replace function then I get the index of the rows that matched my pattern and stored them in the idx variable. After that I want to replace the rows in the data frame that correspond to the idx values with the value contained in the event variable.
I'm trying to create a loop with the sapply function to use every value on the event variable, so I want a loop that goes 48 times looking for each pattern in the data frame stormdata on its 8th column and replace them. BUT my code does nothing, after running it the data remains the same, no substitutions. When I run each line separately without the sapply it works.
I've looking everywhere, I can't find why isn't working. Help.
Try using global assignment eg stormdata[idx, col] <<- event[x] in your function. Not clean but probably will work.

Function to subset dataframe using pattern list argument

I have a pattern list
patternlist <- list('one' = paste(c('a','b','c'),collapse="|"), 'two' = paste(1:5,collapse="|"), 'three' = paste(c('k','l','m'),collapse="|"))
that I want to select from to extract rows from a data frame
dataframez <- data.frame('letters' = c('a','b','c'), 'numbers' = 1:3, 'otherletters' = c('k','l','m'))
with this function
pattern.record <- function(x, column="letters", value="one")
{
if (column %in% names(x))
{
result <- x[grep(patternlist$value, x$column, ignore.case=T),]
}
else
{
result <- NA
}
return(result)
}
oddly enough, I get an error when I run it:
> pattern.record(dataframez)
Error in grep(patternlist$value, x$column, ignore.case = T) :
invalid 'pattern' argument
The problem is your use of the `$` operator.
In your function, it is looking a column \ named element called column
It is far simpler here to use `[[`
Then x[[column]] uses what column is defined as, not column as a name.
The relevant lines in ?`$` are
Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.
You are trying to use value and column as computed indices (i.e. computing what value and column are defined as), thus you need `[[`.
The function becomes
pattern.record <- function(x, column="letters", value="one", pattern_list)
{
if (column %in% names(x))
{
result <- x[grep(pattern_list[[value]], x[[column]], ignore.case=T),]
}
else
{
result <- NA
}
return(result)
}
pattern.record(dataframez, patternlist = pattern_list)
## letters numbers otherletters
## 1 a 1 k
## 2 b 2 l
## 3 c 3 m
note that I've also added an argumentpattern_list so it does not depend on an object named patternlist existing somewhere in the parent environments (in your case the global environment.

Can one add a data.frame to itself?

I want to append or add a data.frame to itself...
Much in the same way the one adds:
n <- n + t
I have a function that creates a data.frame.
I have been using:
g <- function(compareA,compareB) {
for (i in 1:1000) {
ttr <- t.test(compareA, compareA, var.equal = TRUE)
tt_pvalues[i] <- ttr$p.value
}
name_tag <- paste(nameA, nameB, sep = "_Vs_")
tt_titles <- data.frame(name_tag, tt_titles)
# character vector which I want to add to a list
ALL_pvalues <- data.frame(tt_pvalues, ALL_pvalues)
# adding a numeric vector of values to a larger data.frame
}
Would cbind be better here?
There are two methods that would "add or append" data to a data.frame by columns and one that would append by rows. Assuming tag is the data.frame, and tt_titles is a vector of the same length that 'tag' has rows, then either of these would work:
tag <- cbind(tag, tt_titles)
# tt_titles could also be a data.frame with same number of rows
Or:
tag[["tt_titles"]] <- tt_titles
Now let's assume that we have instead two data.frames with the same column.names:
bigger.df <- rbind(tag, tag2)

R: combine lists of interest

I have a list like df_all (see below).
A = matrix( ceiling(10*runif(8)), nrow=4)
colnames(A) = c("country", "year_var")
dfa = data.frame(A)
df1 = dfa[1,]
df2 = dfa[2,]
df3 = dfa[3,]
df4 = dfa[4,]
df_all = list(df1, df2, df3, df4)
df_all
Now I want to combine the list of interest by using variable a.
a <- "2,3,4"
b <- strsplit(a, ",")[[1]]
To combine this lists, I use the folling loop:
for (i in 1:length(b)){
c<-b[i]
aa <- df_all[c:c]
print(aa)
}
Now my question is, How can I combine this result and save this as as variable?
Thanks!
Would this work for you:
basnum<-as.integer(b)
do.call(rbind, df_all[basnum])
Through df_all[basnum], a list with only the relevant data.frames is created.
do.call takes a function and a list as parameters (and some more but not relevant right now). The items of the list are then passed on as parameters to the function.
So in this case, the above is the equivalent to calling:
rbind(df_all[[2]], df_all[[3]], df_all[[4]])
And this produces one data.frame holding all the rows of interest.