I have results from three different tests for each participant:Vocabulary > scores out of 104Cloze_Test > scores out of 22Read_hours > in terms of hours (differs between 4-50 hours)So the scores have different scales.I want to create a single variable using the data from these three columns.What is the best way?Standardize them and combine? If yes how to combine them? I did Standardized usingdata$Vocabulary <as.data.frame(scale(data$sVocabulary))but can't combine them.I have tried these:data.Pax %>% mutate(Reading = cbind(sVocab, scloze, sreadhours))I also tried PCA but not sure if this is the right way to use PCA...scoredata<-data[,c("Vocabulary","Cloze_Test","Read_hours")]pca_profscore <- prcomp(scoredata, center = T, scale.= T)data.Pax["Reading"] <- vectorizedscore <- pca_profscore$x[,1]summary(pca_profscore)print(pca_profscore)pca_profscore$rotation[,1]
I believe that there must be some related questions in the community, but I failed to find the one very informative to my case:
Basically, I am trying to produce three plots with the lapply function. Below are my codes.
p_grid <- seq(0,1,length.out=20)
prior_uni <- rep(1,20)
prior_bi <- ifelse( p_grid < 0.5 , 0 , 1)
prior_exp <- exp(-5*abs(p_grid-0.5))
prior_list <- list(prior_uni, prior_bi, prior_exp)
ggs <- lapply(prior_list, function(x){
likelihood <- dbinom(6,9, prob = p_grid)
unstd.post <- likelihood*x
std.post <- unstd.post/sum(unstd.post)
plot_post <- plot(p_grid,std.post,type="b", ylim = c(0,max(x)))
mtext(paste0(x))
}
)
By doing so, I get the plots but the mtext function does not work well. Instead of showing the title prior_uni, prior_bi, prior_exp respectively, it gives every single value of the list (e.g., prior_uni) with overlapping each other.
It is a bit confusing to me. According to the plot results, the function within lapply seems to take the three lists of prior_list, not every single value. In other words, x is the three elements of prior_list, not the sixty (3*20) elements, but the function mtext does oppositely.
I hope I have expressed clearly. Look for your responses.
Best regards,
Jilong
Assume I have a drop down in sidebarPanel - location from which I can select a maximum of 2 options. I want to create an if loop where in - chosing 'Saddle Joint' and 'Gliding Joint' from the drop down leads to selection of objects 'x' and 'y' in another sidebarPanel - datasets - basically creating a linkage.
I tried this piece of code, but it doesn't work:
if (input$location== "Saddle Joint" & input$location== "Gliding Joint") {
updateCheckboxGroupInput(session,
"datasets", "Datasets:", choices = c("x","y"),
selected= c("x","y"))
}
Do take a look at the screenshot for better picture!
Thanks!
Screenshot
Issue was with the boolean in your if statement. Use this:
"Saddle Joint" %in% input$location & "Gliding Joint" %in% input$location
Also could use:
all(c("Saddle Joint","Gliding Joint") %in% input$location)
I have the following data as an example:
fruit.region <- data.frame(full =c("US red apple","bombay Asia mango","gold kiwi New Zealand"), name = c("apple", "mango", "kiwi"), country = c("US","Asia","New Zealand"), type = c("red","bombay","gold"))
I would like R to be able to look at other items in the "full" (name) column that don't have values for "name", "country" and "type" and see if they match other items. For instance, if full had a 4th row with "bombay US mango" it would be able to identify that the country should read US, bombay should be under type and mango should be under name.
This is what I have so far, which merely identifies (logically) where the items match:
new.entry <- c("bombay US mango")
split.new.entry <- strsplit(new.entry, " ")
lapply(split.new.entry, function(x){
check = grepl(x, fruit.region, ignore.case=TRUE)
print(check)
})
I'm at a bit of a standstill..I've read through a number of regex posts and the r help guides on grepl but am not able to find a great solution. What I have doesn't fully identify a logical "match" vector so I'm unable to subset and use an if statement to concatenate on different elements. Ideally, I'd like to be able to replace these elements in data.table form as my fruit.region will actually be in a data table. Does anyone have any suggestions on the best approach?
Using the str_detect function from the stringr library. This gives a list, ready to rbind:
library(stringr)
addnewrow <- function(newfruit){
z<-lapply(fruit.region[,2:4], function(x) x[str_detect(new.entry, x)])
z$full <- newfruit
z
}
addnewrow(new.entry)
$name
[1] "mango"
$country
[1] "US"
$type
[1] "bombay"
$full
[1] "bombay US mango"
The next step would depend on your desired outcome - if you only want to add one, try:
rbind(fruit.region, addnewrow(new.entry))
If you have a lot:
z <- do.call(rbind, lapply(c(new.entry, new.entry), addnewrow))
rbind(fruit.region, z)
NB make sure your columns are character first:
fruit.region[] <- lapply(fruit.region, as.character)
Lets have the following dataframe inside R:
df <- data.frame(sample=rnorm(1,0,1),params=I(list(list(mean=0,sd=1,dist="Normal"))))
df <- rbind(df,data.frame(sample=rgamma(1,5,5),params=I(list(list(shape=5,rate=5,dist="Gamma")))))
df <- rbind(df,data.frame(sample=rbinom(1,7,0.7),params=I(list(list(size=7,prob=0.7,dist="Binomial")))))
df <- rbind(df,data.frame(sample=rnorm(1,2,3),params=I(list(list(mean=2,sd=3,dist="Normal")))))
df <- rbind(df,data.frame(sample=rt(1,3),params=I(list(list(df=3,dist="Student-T")))))
The first column contains a random number of a probability distribution and the second column stores a list with its parameters and name.
The dataframe df looks like:
sample params
1 0.85102972 0, 1, Normal
2 0.67313218 5, 5, Gamma
3 3.00000000 7, 0.7, ....
4 0.08488487 2, 3, Normal
5 0.95025523 3, Student-T
Q1: How can I have the list of name distributions for all records? df$params$dist does not work. For a single record is easy, for example the third one: df$params[[3]]$dist
Q2: Is there any alternative way of storing data like this? something like a multi-dimensional dataframe? I do not want to add columns for each parameter because it will scatter the dataframe with missing values.
It's probably more natural to store information like this in a pure list structure, than in a data frame:
distList <- list(normal = list(sample=rnorm(1,0,1),params=list(mean=0,sd=1,dist="Normal")),
gamma = list(sample=rgamma(1,5,5),params=list(shape=5,rate=5,dist="Gamma")),
binom = list(sample=rbinom(1,7,0.7),params=list(size=7,prob=0.7,dist="Binomial")),
normal2 = list(sample=rnorm(1,2,3),params=list(mean=2,sd=3,dist="Normal")),
tdist = list(sample=rt(1,3),params=list(df=3,dist="Student-T")))
And then if you want to extract just the distribution name from each, we can use sapply to loop over the list and extract just that piece:
sapply(distList,function(x) x[[2]]$dist)
normal gamma binom normal2 tdist
"Normal" "Gamma" "Binomial" "Normal" "Student-T"
If you absolutely must store this information in a data frame, one way of doing so springs to mind. You're currently using a params column in your data frame to store the parameters associated with the distributions. Perhaps a better way of doing this would be to (i) identify the maximum number of parameters that you'll need for any distribution, (ii) store the distribution names in a field called df$distribution, and (iii) store the parameters in dedicated parameter columns, the meaning of which will have to be decided upon based on the type of distribution.
For instance, any row with df$distribution = 'Normal' should have df$param1 = and df$param2 = . A row with df$distribution='Student' should have df$param1 = and df$param2 = NA. Something like the following:
dg <- data.frame(sample=rnorm(1, 0, 1), distribution='Normal',
param1=0, param2=1)
dg <- rbind(dg, data.frame(sample=rgamma(1, 5, 5),
distribution='Gamma', param1=5, param2=5))
dg <- rbind(dg, data.frame(sample=rt(1, 3), distribution='Student',
param1=3, param2=NA))
It's ugly, but it will give you what you want. And don't worry about the missing values; missing values are a fact of life when dealing with non-trivial data frames. They can be dealt with easily in R by appropriate use of things like na.rm and complete.cases().
Based on the data frame you have above,
sapply(df$params,"[[","dist")
(or lapply if you prefer) would work.
I would probably put at least the names of the distributions in their own column, even if you want the parameters to be in variable-length lists.