lapply with a list of lists - list

I believe that there must be some related questions in the community, but I failed to find the one very informative to my case:
Basically, I am trying to produce three plots with the lapply function. Below are my codes.
p_grid <- seq(0,1,length.out=20)
prior_uni <- rep(1,20)
prior_bi <- ifelse( p_grid < 0.5 , 0 , 1)
prior_exp <- exp(-5*abs(p_grid-0.5))
prior_list <- list(prior_uni, prior_bi, prior_exp)
ggs <- lapply(prior_list, function(x){
likelihood <- dbinom(6,9, prob = p_grid)
unstd.post <- likelihood*x
std.post <- unstd.post/sum(unstd.post)
plot_post <- plot(p_grid,std.post,type="b", ylim = c(0,max(x)))
mtext(paste0(x))
}
)
By doing so, I get the plots but the mtext function does not work well. Instead of showing the title prior_uni, prior_bi, prior_exp respectively, it gives every single value of the list (e.g., prior_uni) with overlapping each other.
It is a bit confusing to me. According to the plot results, the function within lapply seems to take the three lists of prior_list, not every single value. In other words, x is the three elements of prior_list, not the sixty (3*20) elements, but the function mtext does oppositely.
I hope I have expressed clearly. Look for your responses.
Best regards,
Jilong

Related

Use map inside map?

I have two lists: One with some tidy data and another with models made with tidymodels package
data_list <- list(train,test)
model_fits <- list(tree,forest,xgb)
I want to make a new list with a confusion matrix for train and test for every model.
The function that calculates confusion matrix:
ConfMat <-
function(df,data){
df <-
predict(df,new_data = data, type = "class") %>%
mutate(truth = data$NetInc) %>%
conf_mat(truth,.pred_class)}
I have tried to do this (x,y is arbitrary).:
map(data_list,map(model_fits,ConfMat(x,y)))
My problem is that I have no idea how to actually set "x" and "y" right.
PS: double for loop works. I'm asking specifically for map solution or equivalent.
Appreciate all help i can get! cheers
Use an anonymous function -
library(purrr)
result <- map(data_list,function(x) map(model_fits,function(y) ConfMat(x,y)))
result

mutate at - count the number of times specified columns exceed some specified number,

This is a real-world problem but I am describing the reproducible iris example:
I want to count, for each row, the number of times columns with column names containing "Sepal", exceed a number 5. I want to assign the new result in a new column. I want to use dplyr for this task. My attempt is the following:
iris %>% mutate_at(vars(contains("Sepal")),list(greater_than_5=~apply(.,1,function(x) sum(x>5))))
However, I get an error:
dim(X) must have a positive length```
Any ideas?
Without seeing an example of what you want your output to look like and/or explanation, it's hard to determine exactly what you're looking for, so here are two possible solutions (including the one I mentioned in the comments above). The first I think will be unsatisfactory since your count will be the same for all rows because the tibble is static. The second uses R's bread and butter--a tidy (long form) data structure--to count only those "columns" that have non-NA values. There we add a row id that we can group by later, pivot to a tidy form, filter out NAs, count the number of column/parameter names that contain your word of interest, and the pivot back to your original wide form tibble. We need to wrap the value column in list form since the datatypes differ across the different parameters--you could convert everything to a character, of course, but you'll see this lets us recover the original types after unnesting.
library(tidyverse)
min_to_count = 5
title_word = "Sepal"
iris %>% mutate(name_count = ifelse(sum(str_detect(names(.),title_word)) > min_to_count, sum(str_detect(names(.),title_word)), NA_real_))
iris %>%
mutate(id = row_number()) %>%
pivot_longer(
-id,
names_to = "parameter", values_to = "value",
values_ptypes = list(value = list())
) %>%
filter(!is.na(unlist(value))) %>%
group_by(id) %>%
mutate(
name_count = ifelse(
sum(str_detect(parameter, title_word)) > min_to_count,
sum(str_detect(parameter, title_word)), NA_real_
)
) %>%
pivot_wider(
names_from = "parameter", values_from = "value"
) %>%
unnest_legacy()

Bin new data according to existing intervals given as factor levels

I have a factor with levels which represent intervals (as produced by cut):
> head(data.train$glucose)
[1] [0,126] [0,126] (126,199] [0,126] [0,126] [0,126]
Levels: [0,126] (126,199]
Now I want to generate a new factor with the same levels from a numeric vector so that when the respective number falls into the first interval (e.g. 24) it becomes [0,126] and if it falls into the second interval (e.g. 153) it becomes (126,199].
The number of intervals can differ as can the form of the brackets (depending on whether they are open or closed intervals).
I think I can use sub together with cut for that (as in the last example in the helpfile of cut) but I am not very good at it to make it general enough. Is there also another way which is a little bit more intuitive? But perhaps I am thinking too complicated anyway...
If you give a solution with sub please explain the expression. Please also do not give solutions with functions from other packages as I am developing a package myself and I want to keep it as lean as possible.
I was looking for an elegant way to do this, but ended up using regex like you suggested:
ints<-cut(1:10,5)
set.seed(345)
a<-runif(20,1,10)
# get levels
levs <- levels(ints)
# remove brackets
levs.num <- sub( "^[\\(\\[]{1}(.+)[\\)\\]]{1}$" , "\\1" ,levs , perl = TRUE)
levs.right <- sub( "^[\\(\\[]{1}.+([\\)\\]]{1})$" , "\\1" ,levs , perl = TRUE)
levs.left <- sub( "^([\\(\\[]{1}).+[\\)\\]]{1}$" , "\\1" ,levs , perl = TRUE)
# get breaks
breaks <- unique(as.numeric(unlist(strsplit(levs.num ,","))))
if(all(levs.right=="]")){
right.arg <- TRUE
}else if(all(levs.left=="[")){
right.arg <- FALSE
}else{
stop("problem")
}
table(cut(a,breaks , right = right.arg ))
My regex should select everything between [ or ( and ] or ) and return it

grepl() and lapply to fill missing values

I have the following data as an example:
fruit.region <- data.frame(full =c("US red apple","bombay Asia mango","gold kiwi New Zealand"), name = c("apple", "mango", "kiwi"), country = c("US","Asia","New Zealand"), type = c("red","bombay","gold"))
I would like R to be able to look at other items in the "full" (name) column that don't have values for "name", "country" and "type" and see if they match other items. For instance, if full had a 4th row with "bombay US mango" it would be able to identify that the country should read US, bombay should be under type and mango should be under name.
This is what I have so far, which merely identifies (logically) where the items match:
new.entry <- c("bombay US mango")
split.new.entry <- strsplit(new.entry, " ")
lapply(split.new.entry, function(x){
check = grepl(x, fruit.region, ignore.case=TRUE)
print(check)
})
I'm at a bit of a standstill..I've read through a number of regex posts and the r help guides on grepl but am not able to find a great solution. What I have doesn't fully identify a logical "match" vector so I'm unable to subset and use an if statement to concatenate on different elements. Ideally, I'd like to be able to replace these elements in data.table form as my fruit.region will actually be in a data table. Does anyone have any suggestions on the best approach?
Using the str_detect function from the stringr library. This gives a list, ready to rbind:
library(stringr)
addnewrow <- function(newfruit){
z<-lapply(fruit.region[,2:4], function(x) x[str_detect(new.entry, x)])
z$full <- newfruit
z
}
addnewrow(new.entry)
$name
[1] "mango"
$country
[1] "US"
$type
[1] "bombay"
$full
[1] "bombay US mango"
The next step would depend on your desired outcome - if you only want to add one, try:
rbind(fruit.region, addnewrow(new.entry))
If you have a lot:
z <- do.call(rbind, lapply(c(new.entry, new.entry), addnewrow))
rbind(fruit.region, z)
NB make sure your columns are character first:
fruit.region[] <- lapply(fruit.region, as.character)

aregexec matching with two data frames

One is the target data frame (targetframe) and the other dataframe works as a library (word.library) with some key values. Then I need the following algorithm: The algorithm should look up an approximate match between word.library$mainword and targetframe$words. After figuring out the approximate match the substrings in targetframe$words should be replaced with word.library$keyID.
Here are the two data frames mentioned above:
tragetframe <- data.frame(words= c("This is sentence one with the important word",
"This is sentence two with the inportant woord",
"This is sentence three with crazy sayings" ))
word.library <- data.frame(mainword = c("important word",
"crazy sayings"),
keyID = c("1001",
"2001"))
Here is my solution which works.
for(i in 1:nrow(word.library)){
positions <- aregexec(word.library[i,1], tragetframe$words, max.distance = 0.1)
res <- regmatches(tragetframe$words, positions)
res[lengths(res)==0] <- "XXXX" # deal with 0 length matches somehow
tragetframe$words <- Vectorize(gsub)(unlist(res), word.library[i,2], tragetframe$words)
tragetframe$words
}
However: I use a for loop which is not efficent (imagine I have two huge data frames). Has anyone an idea how to resovle this issue more efficiently?