How to fix row_to_name error in R markdown? - r-markdown

I´m trying to put my code, which runs perfectly, in an R markdown but I keep getting the row_to_name error when I knit, but no when I run the chunk. Honestly I don´t know how to fix it.
The code:
homicidios<-read.csv("/Users/julia/Documents/rdoctorado/tasa estatal homicidios.csv",header = TRUE)
#Transform the vector to data frame
thomicidios<-data.frame(homicidios)
# transpose the matrix To compute stats
th<-t(homicidios)
#delete first row and set it as header
thomicidios<-row_to_names(th,1,remove_row = TRUE)
#Transform the vector to data frame
thomicidios2<-data.frame(thomicidios)
#Convert string to integer
thomicidios2[c(1:33)] <- lapply(thomicidios2[c(1:33)], as.integer)
#Summary of the data
summary(thomicidios2)
The error is:
Error in row_to_names(th, 1, remove_row = TRUE) :
could not find funtion "row_to_names"
Calls: ... handle -> withCallingHandlers -> withVisible -> eval -> eval

I believe you need to add library(janitor) within "this" code chunk.

Related

R Markdown "function %>% not found"

I have a problem with R Markdown. In the Markdown file, I start by loading the necessary packages (including tidyverse) in a code chunk. But in the last code chunk, an error pops up which says
"could not find function %>% Calls: ...withVisible ->
eval_with_user_handlers -> eval -> eval Execution halted"
Can you please explain why this occurs and how I can fix it?
Thanks!
As requested:
This is the first code chunk where I load the packages:
library(tidyverse)
library(readxl)
library(psych)
options(scipen = 999)
And, this is the code chunk where the error occurs:
DE_GDP_1965_2020 <- DE_GDP_1965_1969 %>%
full_join(DE_GDP_1970_2020)
model <- lm(GDP ~ Period, data = DE_GDP_1965_2020)
summary(model)
Thanks!

Predict zero-inflated negative binomial to raster with raster::predict

I am trying to predict a ZINB model to a raster. My code looks like this:
m1 <- pscl::zeroinfl(Herbround ~ LandcoverCode | LandcoverCode, data = data, dist = 'negbin')
r <- raster("data\\final.tif") # read in raster
names(r) <- "LandcoverCode" # match terminology in the model
#predict model to raster
predict(r, m1)
Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]) :
contrasts apply only to factors
In addition: Warning message:
In model.frame.default(delete.response(object$terms$full), newdata, :
variable 'LandcoverCode' is not a factor
I then get the error above. If I use an lm or glm model it works without issues and I know that my LandcoverCode is in fact a factor with multiple levels (6 to be exact). I have tried adding in:
predict(r, m1, type = "count")
But that doesn't work either. Any help with predicting a ZINB model to a raster would be greatly appreciated, thanks.

Error with CRS argument while reprojecting

I'm trying to iterate multiple rasters (+500) in a for loop but I'm facing some problems.
First I want to reproject them from CRS EPSG:4326 to CRS EPSG: 32614, then resample them by using a mask raster which has a smaller resolution as well as extension and finally writing a result raster for each raster in the working directory, but I've been obtaining the following error message regarding the CRS argument:
Error in CRS(x) : PROJ4 argument-value pairs must begin with +: E:\Proyecto PM2.5\2_PM_2.5_Processing\Test/AOD_MOD_CDTDB_April_2016.tif
I took a look at multiple posts here, but I couldn't go over this problem. Below is my code, any help will be really appreciated from this R beginner
#find all tifs in your directory
dir<-"E:\\Proyecto PM2.5\\2_PM_2.5_Processing\\Test"
#get a list of all files with .tif in the name in the directory
files<-list.files(path=dir, pattern='.tif', full.names = TRUE)
#raster with the expected characteristics: extension, cellsize, number of pixels
r_ref <- raster("E:\\Proyecto PM2.5\\3_PM_2.5_Entrega\\temporal\\Raster_C.tif")
for (file in files){
name <- file
projectRaster(name,crs="+init=epsg:32614")
resample(file,r_ref,method="ngb")
savename<-sub("ZMVM",name,basename(file))
writeRaster(r,file=savename,)
}
You do
for (file in files){
name <- file
projectRaster(name,crs="+init=epsg:32614")
So name is the same as file (why do you make a copy?) --- a filename.
You ask projectRaster to project a character string (file name). What you intended is surely something like this
for (file in files){
r <- raster(file)
projectRaster(r, crs="+init=epsg:32614")

How to combine two for loops working with netcdf lists?

I have problems combining multiple for loops. I will give an example with two of them, I would like to combine. If I know how to do it with two I will also be able to do it with multiple loops.
If anyone knows how to write this as lapply function that would also be nice.
require(ncdf4)
#### download files from this link to directory: (I just downloaded manually,two files are sufficient to answer the example)
#### ftp://rfdata:forceDATA#ftp.iiasa.ac.at/WFDEI/LWdown_daily_WFDEI/
setwd("C:/place_where_I_have_downloaded_my_files_from_link/")
temp = list.files(pattern="*.nc") #list imported netcdf files
list2env(
lapply(setNames(temp, make.names(gsub("*.nc$", "", temp))),
nc_open), envir = .GlobalEnv) #import all parameters lists to global environment
#### first loop - # select parameter out of netcdf files and combine into a List of 2
list_temp<-list() #create empty list before loop
for (t in temp[1:2]){
list_temp[t]<-list(data.frame(LWdown=ncvar_get(nc_open(t),"LWdown")[428,176,],xcoor=176,ycoor=428))
}
LW_bind<-do.call(rbind,list_temp)
rownames(LWdown_1to2)<-NULL
#### second loop # select parameter out of onenetcdf file per x-coordinate and combine into a List of 2
list_temp<-list() #create empty list before loop
for (x in 176:177){
list_temp[t]<-list(data.frame(LWdown=ncvar_get(nc_open(temp[1]),"LWdown")[428,x,],xcoor=x,ycoor=428))
}
LW_bind<-do.call(rbind,list_temp)
rownames(LWdown_1to2)<-NULL
How I tried to combine but didn't work:
#### combined loops
list_temp<-list()
for (t in temp[1:2]){for (x in 176:177){
#ncin<-list()
ncin<-nc_open(t)
list_temp[x][t]<-list(data.frame(LWdown=ncvar_get(ncin,"LWdown")[428,x,],x=x,y=428))
}}
LWdown_1to2<-do.call(rbind,list_temp)
rownames(LWdown_1to2)<-NULL
I already solved my problem. See below. But I am still curious how one could solve the two for loops as described above, so I will leave the question open an unanswered.
Here is my solution:
require(arrayhelpers);require(stringr);require(plyr);require(ncdf4)
# store all files from ftp://rfdata:forceDATA#ftp.iiasa.ac.at/WFDEI/ in the following folder:
setwd("C:/folder")
temp = list.files(pattern="*.nc") #list all the file names
param<-gsub("_\\S+","",temp,perl=T) #extract parameter from file name
xcoord=seq(176,180,by=1) #The X-coordinates you are interested in
ycoord=seq(428,433,by=1) #The Y-coordinates you are interested in
list_var<-list() # make an empty list
for (t in 1:length(temp)){
temp_year<-str_sub(temp[],-9,-6) #take string number last place minus 9 till last place minus 6 to extract the year from file name
temp_month<-str_sub(temp[],-5,-4) #take string number last place minus 9 till last place minus 6 to extract the month from file name
temp_netcdf<-nc_open(temp[t])
temp_day<-rep(seq(1:length(ncvar_get(temp_netcdf),"day"))),length(xcoord)*length(ycoord)) # make a string of day numbers the same length as amount of values
dim.order<-sapply(temp_netcdf[["var"]][[param[t]]][["dim"]],function(x) x$name) # gives the name of each level of the array
start <- c(lon = 428, lat = 176, tstep = 1) # indicates the starting value of each variable
count <- c(lon = 6, lat = 5, tstep = length(ncvar_get(temp_netcdf,"day"))) # indicates how many values of each variable have to be present starting from start
tempstore<-ncvar_get(temp_netcdf, param[t], start = start[dim.order], count = count[dim.order]) # array with parameter values
df_temp<-array2df (tempstore, levels = list(lon=ycoord, lat = xcoord, day = NA), label.x = "value") # convert array to dataframe
Add_date<-sort(as.Date(paste(temp_year[t],"-",temp_month[t],"-",temp_day,sep=""),"%Y-%m-%d"),decreasing=FALSE) # make vector with the dates
list_var[t]<-list(data.frame(Add_date,df_temp,parameter=param[t])) #add dates to data frame and store in a list of all output files
### nc_close(temp_netcdf) #close nc file to prevent data loss and errors
}
All_NetCDF_var_in1df<-do.call(rbind,list_var)

How to rename a column of a data frame with part of the data frame identifier in R?

I've got a number of files that contain gene expression data. In each file, the gene name is kept in a column "Gene_symbol" and the expression measure (a real number) is kept in a column "RPKM". The file name consists of an identifier followed by _ and the rest of the name (ends with "expression.txt"). I would like to load all of these files into R as data frames, for each data frame rename the column "RPKM" with the identifier of the original file and then join the data frames by "Gene_symbol" into one large data frame with one column "Gene_symbol" followed by all the columns with the expression measures from the individual files, each labeled with the original identifier.
I've managed to transfer the identifier of the original files to the names of the individual data frames as follows.
files <- list.files(pattern = "expression.txt$")
for (i in files) {var_name = paste("Data", strsplit(i, "_")[[1]][1], sep = "_"); assign(var_name, read.table(i, header=TRUE)[,c("Gene_symbol", "RPKM")])}
So now I'm at a stage where I have dataframes as follows:
Data_id0001 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"),RPKM=c(2.43,5.24,6.53))
Data_id0002 <- data.frame(Gene_symbol=c("geneA","geneB","geneC"),RPKM=c(4.53,1.07,2.44))
But then I don't seem to be able to rename the RPKM column with the id000x bit. (That is in a fully automated way of course, looping through all the data frames I will generate in the real scenario.)
I've tried to store the identifier bit as a comment with the data frames but seem to be unable to assign the comment from within a loop.
Any help would be appreciated,
mce
You should never work this way in R. You should always try keeping all your data frames in a list and operate over them using function such as lapply etc. Thus, instead of using assign, just create an empty list of length of your files list and fill it with the for loop
For your current situation, we can fixed it using ls and mget combination in order to pull this data frames from the global environment into a list and then change the columns of interest.
temp <- mget(ls(pattern = "Data_id\\d+$"))
lapply(names(temp), function(x) names(temp[[x]])[2] <<- gsub("Data_", "", x))
temp
#$Data_id0001
# Gene_symbol id0001
# 1 geneA 2.43
# 2 geneB 5.24
# 3 geneC 6.53
#
# $Data_id0002
# Gene_symbol id0002
# 1 geneA 4.53
# 2 geneB 1.07
# 3 geneC 2.44
You could eventually use list2env in order to get them back to the global environment, but you should use with caution
thanks a lot for your suggestions! I think I get the point. The way I'm doing it now (see below) is hopefully a lot more R-like and works fine!!!
Cheers,
Maik
library(plyr)
files <- list.files(pattern = "expression.txt$")
temp <- list()
for (i in 1:length(files)) {temp[[i]]=read.table(files[i], header=TRUE)[,c("Gene_symbol", "RPKM")]}
for (i in 1:length(temp)) {temp[[i]]=rename(temp[[i]], c("RPKM"=strsplit(files[i], "_")[[1]][1]))}
combined_expression <- join_all(temp, by="Gene_symbol", type="full")