trying to append a list, but something breaks - list

I'm trying to create an empty list which will have as many elements as there are num.of.walkers. I then try to append, to each created element, a new sub-list (length of new sub-list corresponds to a value in a.
When I fiddle around in R everything goes smooth:
list.of.dist[[1]] <- vector("list", a[1])
list.of.dist[[2]] <- vector("list", a[2])
list.of.dist[[3]] <- vector("list", a[3])
list.of.dist[[4]] <- vector("list", a[4])
I then try to write a function. Here is my feeble attempt that results in an error. Can someone chip in what am I doing wrong?
countNumberOfWalks <- function(walk.df) {
list.of.walkers <- sort(unique(walk.df$label))
num.of.walkers <- length(unique(walk.df$label))
#Pre-allocate objects for further manipulation
list.of.dist <- vector("list", num.of.walkers)
a <- c()
# Count the number of walks per walker.
for (i in list.of.walkers) {
a[i] <- nrow(walk.df[walk.df$label == i,])
}
a <- as.vector(a)
# Add a sublist (length = number of walks) for each walker.
for (i in i:num.of.walkers) {
list.of.dist[[i]] <- vector("list", a[i])
}
return(list.of.dist)
}
> num.of.walks.per.walker <- countNumberOfWalks(walk.df)
Error in vector("list", a[i]) : vector size cannot be NA

Assuming 'walk.df' is something like:
walk.df <- data.frame(label=sample(1:10,100,T),var2=1:100)
then:
countNumberOfWalks <- function(walk.df) {
list.of.walkers <- sort(unique(walk.df$label))
num.of.walkers <- length(unique(walk.df$label))
list.of.dist <- vector("list", num.of.walkers)
for (i in 1:num.of.walkers) {
list.of.dist[[i]] <- vector("list",
nrow(walk.df[walk.df$label == list.of.walkers[i],]))}
return(list.of.dist)
}
Will achieve what you're after.

Related

Problem generating wavelet variance/covarance

How do I solve the error?
Here's my script:
A=x
B=y
wf <- "la8"
J <- 8
N<- length(A)
A.modwt <- modwt(A, wf, J)
B.modwt <- modwt(B, wf, J)
A.modwt.bw <- brick.wall(A.modwt, wf)
B.modwt.bw <- brick.wall(B.modwt, wf)
WCOV = wave.covariance(A.modwt.bw, B.modwt.bw)
Error in x[c((n%/%2):n, 1:(n%/%2 - 1))] :
only 0's may be mixed with negative subscripts

Merge multiple lists with different element lengths into a data frame in a function

I have a function to extract rules of Decision Tree
data(iris)
names(iris)[names(iris) == "Sepal.Length"] <- "SL"
names(iris)[names(iris) == "Sepal.Width"] <- "SW"
names(iris)[names(iris) == "Petal.Length"] <- "PL"
names(iris)[names(iris) == "Petal.Width"] <- "PW"
library(rpart)
set.seed(10)
pohon <- rpart(Species ~ ., iris,
method='class',
control=rpart.control(minsplit = 5, cp=0))
library(reshape)
rules.rpart <- function(model){
if (!inherits(model, "rpart")) stop("Not a legitimate rpart tree")
frm <- model$frame
names <- row.names(frm)
ylevels <- attr(model, "ylevels")
ds.size <- model$frame[1,]$n
for (i in 1:nrow(frm))
{
if (frm[i,1] == "<leaf>")
{
prediksi=ylevels[frm[i,]$yval]
pth <- path.rpart(model, nodes=as.numeric(names[i]), print.it=F)
urutan=unlist(pth)[-1]
ur <- pth[-1]
a=paste(urutan)
a1=t(data.frame(a))
df=data.frame(prediksi,a1)
print(bind_rows(list(df)))
}}}
rules.rpart(pohon)
bb <- rules.rpart(pohon)
bb
My questions is:
How can I convert the output into a single data frame from several lists (different number of lists) with different element lengths?
Why I can't define the output into an object named "bb"? why does bb become NULL when called?

Outputting the results from bife object to Latex in Rmarkdown?

I'm estimating a fixed-effects probit model using the bife package in R. I'm trying to extract the output into something I can use with either stargazer or texreg so I can output them into a paper using Rmarkdown to create a LaTeX object. I'm aware I can manually extract the coefficients and standard errors, etc., but I'm wondering if there isn't a more efficient way to coerce this object into something that'd work with either package.
Here's a reproducible example:
install.packages("bife")
library(bife)
data("iris")
iris$big <- ifelse(iris$Sepal.Length > median(iris$Sepal.Length),1,0)
output <- bife(big ~ Sepal.Width + Petal.Length | Species, data=iris, "logit")
I think I found an alternative solution for this one, even if it is probably too late
Basically, first, I went on the repository of the package "texreg" and found this function:
extract.bife <- function(model,
include.loglik = TRUE,
include.deviance = TRUE,
include.nobs = TRUE,
...) {
s <- summary(model)
coefficient.names <- rownames(s$cm)
co <- s$cm[, 1]
se <- s$cm[, 2]
pval <- s$cm[, 4]
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.loglik == TRUE) {
lik <- logLik(model)
gof <- c(gof, lik)
gof.names <- c(gof.names, "Log Likelihood")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.deviance == TRUE) {
gof <- c(gof, deviance(model))
gof.names <- c(gof.names, "Deviance")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.nobs == TRUE) {
n <- s$nobs["nobs"]
gof <- c(gof, n)
gof.names <- c(gof.names, "Num. obs.")
gof.decimal <- c(gof.decimal, FALSE)
}
tr <- createTexreg(
coef.names = coefficient.names,
coef = co,
se = se,
pvalues = pval,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
So for your example, just apply it on your model and use the function texreg and you may have a Latex-"like" output
tr <- extract.bife(output)
texreg(tr)
I hope it will help!
Best

R deleting duplicates when duplicates slightly differ by 1 or 2 letters

I am collecting tweets with the twitteR package and get many duplicates. This code works fine:
tweets <- searchTwitter(keyword, n=500, lang="en", since=NULL, until=NULL, retryOnRateLimit=100)
mydata <- sapply(tweets, function(x) x$getText())
mydata <- unique(mydata, incomparables = F, nmax = NA)
The problem is that it actually doesn't delete any duplicates because it doesn't recognises them as such. The duplicate tweets typically contain shortened URLS that differ by 1 or 2 digits. So I tried to clean the tweets of URLS with this code:
tweets <- searchTwitter(keyword, n=500, lang="en", since=NULL, until=NULL,
retryOnRateLimit=100)
mydata <- sapply(tweets, function(x) x$getText())
mydata <- data.frame(tweetsText, stringsAsFactors = FALSE)
names(mydata) <- c('words')
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
mydata$words <- removeURL(mydata$words)
removeURL <- function(x) gsub("https[[:alnum:]]*", "", x)
mydata$words <- removeURL(mydata$words)
mydata$words <- unique(mydata$words, incomparables = F, nmax = NA)
Now I get the error message:
Error in $<-.data.frame(*tmp*, "words", value = c("Tripping around #DisneySprings.....) : replacement has 295 rows, data has 300
Advice? Thanks!
Your error is easily reproducible:
mydata <- data.frame(list(w = c(0, 1, 0, 1)))
mydata$words <- c(0, 1, 1)
# Error in `$<-.data.frame`(`*tmp*`, "words", value = c(0, 1, 1)) :
# replacement has 3 rows, data has 4
This just means that you need to assign a vector of the same length as the length of the data frame.
To filter out duplicate values you need to change the last line of your code to:
res <- mydata[!duplicated(mydata$words), ]

rankall : returning the correct data frame to rank hospitals on performance

this is a solution(not working well) to a coursera problem. I'm trying to rank a data frame containing the names of hospitals based on their performance on 3 different conditions. (I found another to this question at How to subset a row from list based on condition). I think I'm not subsetting right and I don't return the correct data frame at the end. really new to programming and R. thank you for your help.
rankall <- function(outcome, num = 'best'){
data <- read.csv('outcome-of-care-measures.csv', colClasses = 'character')
data[,11] <- as.numeric(data[,11])
data[,17] <- as.numeric(data[,17])
data[17] <- as.numeric(data[,23])
states <- sort(unique(data$State))
conditions <- data[c(11,17,23)]
if(!state %in% states){stop('invalid state')}
if(!outcome %in% conditions){stop('invalid outcome')}
for (i in 1:length(states)){
statedata <-data[data$State == state[i],]
if(outcome == 'heart attack'){column <- (statedata[,11]}
if(outcome == 'heart failure') {column <-(statedata[,17]}
if(outcome == 'pneumonia') {column <- statedata[,23]}
rankedhospitals <- c()
rankcondition <- rank(column, na.last = NA)
if (num == 'best'){num <- 1}
if(num == 'worst'){num <- nrow(rankcondition)}
rankedhospitals[i] <- statedata$Hospital.Name[order(column, statedata$Hospital.Name)[num]]
rankedhospitals <- cbind(rankedhospitals,states[num,2])
}
return (c('rankedhospitals', 'states'))
}