Problem with the "stackApply" function in R - r-raster

I have a problem with the "stackApply" function from the raster-package. First I want to stack three raster layers (each layer has one band) - that works. And then I want to create a raster-object that shows in which of the three bands/layers the minimum value occurs (each pixel in the raster layers has a different value). But I get various error messages. Does anyone have an idea how I can solve the problem?
Thank you
stacktest<-stack(test,test1,test2)
min_which <- stackApply(stacktest, indices=1, fun=function(x, na.rm=NULL)which.min(x))
Error in setValues(out, v) : values must be a vector
Error in is.infinite(v) : not implemented standard method for type 'list'

Here is a minimal, self-contained, reproducible example:
Example data from ?stackApply
library(raster)
r <- raster(ncol=10, nrow=10)
values(r) <- 1:ncell(r)
s <- stack(r,r,r,r,r,r)
s <- s * 1:6
Now use these data with your function (I removed the na.rm=NULL as it is not used)
w <- stackApply(s, indices=1, fun=function(x, ...) which.min(x) )
w
#class : RasterLayer
#dimensions : 10, 10, 100 (nrow, ncol, ncell)
#resolution : 36, 18 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#crs : +proj=longlat +datum=WGS84 +no_defs
#source : memory
#names : index_1
#values : 1, 1 (min, max)
Same for which.max
w <- stackApply(s, indices=1, fun=function(x, na.rm=NULL) which.max(x) )
w
# (...)
#values : 6, 6 (min, max)
This suggest it works fine. In most cases that means that you probably have cells that are NA
s[1:10] <- NA
w <- stackApply(s, indices=1, fun=function(x, ...) which.min(x) )
# Error in setValues(out, v) : values must be numeric, logical or factor
It is easy to see why this error occurs
which.min(3:1)
#[1] 3
which.min(c(3:1, NA))
#[1] 3
which.min(c(NA, NA, NA))
#integer(0)
If all values are NA, which.min does not return NA as expected. Instead it returns an empty vector. That can be fixed like this
which.min(c(NA, NA, NA))[1]
#[1] NA
And you can do
w <- stackApply(s, indices=1, fun=function(x, ...) which.min(x)[1] )
However, using stackApply with indices=1 is not a good approach. You should generally use calc to compute cell values across all layers.
y <- calc(s, function(x) which.min(x)[1])
But in this case you can use the more straightforward
z <- which.min(s)

Related

rasterFromXYZ missing value where TRUE/FALSE needed

I have been having some strange error messages from the rasterFromXYZ function in the R raster package. Here is an example
library(raster)
xyz <- data.frame(x = c(5.463636, 5.481818, 5.5), y = c(51.42727, 51.42727, 51.42727), z = c(1.2,1.3,1.6))
r <- rasterFromXYZ(xyz)
##error
Error in if (nc > (2^31 - 1)) return(FALSE) :
missing value where TRUE/FALSE needed
In addition: Warning message:
In min(dy) : no non-missing arguments to min; returning Inf
##specifying the resolution as 1
r <- rasterFromXYZ(xyz, res = 1)
##different error
Error in rasterFromXYZ(xyz, res = 1) : x cell sizes are not regular
The x coordinates are perfectly regular. What am I doing wrong?
The x-coordinates are OK, but there is only one unique y-coordinate value. So there is no way to guess the vertical resolution.
xyz
# [,1] [,2] [,3]
#[1,] 5.463636 51.42727 1.2
#[2,] 5.481818 51.42727 1.3
#[3,] 5.500000 51.42727 1.6
If you set the resultion to 1 that does not match the x-coordinates, but you can do
rasterFromXYZ(xyz, res=c(NA, 1))
#class : RasterLayer
#dimensions : 1, 3, 3 (nrow, ncol, ncell)
#resolution : 0.018182, 1 (x, y)
#extent : 5.454545, 5.509091, 50.92727, 51.92727 (xmin, xmax, ymin, ymax)
#crs : NA
#source : memory
#names : layer
#values : 1.2, 1.6 (min, max)
The development version now gives a better error message:
r <- rasterFromXYZ(xyz)
#Error in rasterFromXYZ(xyz) : more than one unique y value needed

R deleting duplicates when duplicates slightly differ by 1 or 2 letters

I am collecting tweets with the twitteR package and get many duplicates. This code works fine:
tweets <- searchTwitter(keyword, n=500, lang="en", since=NULL, until=NULL, retryOnRateLimit=100)
mydata <- sapply(tweets, function(x) x$getText())
mydata <- unique(mydata, incomparables = F, nmax = NA)
The problem is that it actually doesn't delete any duplicates because it doesn't recognises them as such. The duplicate tweets typically contain shortened URLS that differ by 1 or 2 digits. So I tried to clean the tweets of URLS with this code:
tweets <- searchTwitter(keyword, n=500, lang="en", since=NULL, until=NULL,
retryOnRateLimit=100)
mydata <- sapply(tweets, function(x) x$getText())
mydata <- data.frame(tweetsText, stringsAsFactors = FALSE)
names(mydata) <- c('words')
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
mydata$words <- removeURL(mydata$words)
removeURL <- function(x) gsub("https[[:alnum:]]*", "", x)
mydata$words <- removeURL(mydata$words)
mydata$words <- unique(mydata$words, incomparables = F, nmax = NA)
Now I get the error message:
Error in $<-.data.frame(*tmp*, "words", value = c("Tripping around #DisneySprings.....) : replacement has 295 rows, data has 300
Advice? Thanks!
Your error is easily reproducible:
mydata <- data.frame(list(w = c(0, 1, 0, 1)))
mydata$words <- c(0, 1, 1)
# Error in `$<-.data.frame`(`*tmp*`, "words", value = c(0, 1, 1)) :
# replacement has 3 rows, data has 4
This just means that you need to assign a vector of the same length as the length of the data frame.
To filter out duplicate values you need to change the last line of your code to:
res <- mydata[!duplicated(mydata$words), ]

Shapefile: XY coordinate and Longitude/Latitude Coordinate

I have the following two shapefiles:
> summary(precincts1)
Object of class SpatialPolygonsDataFrame
Coordinates:
min max
x -74.25545 -73.70002
y 40.49613 40.91540
Precinct Shape_Leng Shape_Area
Min. : 1.00 Min. : 17083 Min. : 15286897
1st Qu.: 31.50 1st Qu.: 29900 1st Qu.: 37593804
Median : 64.50 Median : 46887 Median : 65891025
Mean : 62.57 Mean : 65720 Mean :111231564
3rd Qu.: 95.50 3rd Qu.: 76375 3rd Qu.:133644443
Max. :123.00 Max. :309518 Max. :781725787
and
> summary(bnd_nhd)
Object of class SpatialPolygonsDataFrame
Coordinates:
min max
x 871512.3 912850.5
y 982994.4 1070956.9
SHAPE_area SHAPE_len
Min. : 3173813 Min. : 7879
1st Qu.: 9687122 1st Qu.:13514
Median :14363449 Median :17044
Mean :19674314 Mean :19516
3rd Qu.:27161251 3rd Qu.:23821
Max. :68101106 Max. :49269
Their coordinate systems are different. I can overlay the shapes for "precincts1" on the map with leaflet, but I cannot do the same with for "bnd_nhd". I am using shiny, maptools, and leaflet. How can I convert the shapefile or change the setting on the map so that I can overlay the map for "bnd_nhd"?
This should work:
library("rgdal")
library("leaflet")
bnd_nhd <- readOGR("C:/data/BND_Nhd88_cw.shp",
layer="BND_Nhd88_cw")
pol_wrd <- readOGR("C:/data/POL_WRD_2010_Prec.shp",
layer="POL_WRD_2010_Prec")
bnd_nhd4326 <- spTransform(bnd_nhd, CRS("+init=epsg:4326"))
pol_wrd4326 <- spTransform(pol_wrd, CRS("+init=epsg:4326"))
m <- leaflet() %>%
addTiles() %>%
addPolygons(data=bnd_nhd4326, weight=2, color="red", group="bnd_nhd") %>%
addPolygons(data=pol_wrd4326, weight=2, color="blue", group="pol_wrd") %>%
addLayersControl(
overlayGroups = c("bnd_nhd", "pol_wrd"),
options = layersControlOptions(collapsed = FALSE)
)
m

Subset all 3 digit numbers and collapse them with a separator in a data frame. R

I'm formating a data set so each entry has the adegenet format for codominant markers, such as:
Loci1
###/###
208/210
200/204
198/208
where the # represents any digit (the number is a allele size in basepairs). My data has some homozygous entries (all 3 digit integers with no separator) that have the the form of:
Loci1
###
208
198
I intend to paste the 3 digit string to itself with sep='/' to produce the first format. I've tried to use grep to subset these homozygous entries by finding all non ###/### and negating the match using the table matching such as:
a <- grep('\\b\\d{3}?[/]\\d{3}', score$Loci1, value =T ) # Subset all ###/###/
score[!(a %in% 1:nrow(score$Loci1)), ] # works but only on vectors...
After the subset I could paste. The problem arises when I apply this to a data frame. grep seems to treat the data frame as a list (which in part it is) and returns columns that have a match.
So in short how can I go from ### to ###/### in a data frame
self contained example of data:
score2 <- NULL
set.seed(9)
Loci1 <- NULL
Loci2 <- NULL
Loci3 <- NULL
for (i in 1:5) Loci1 <- append(Loci1, paste(sample(seq(from = 230, to=330, by=3), 2, replace = F), collapse = '/'))
for (i in 1:5) Loci2 <- append(Loci2, paste(sample(seq(from = 230, to=330, by=3), 2, replace = F), collapse = '/'))
for (i in 1:5) Loci3 <- append(Loci3, paste(sample(seq(from = 230, to=330, by=3), 2, replace = F), collapse = '/'))
score2 <- data.frame(Loci1, Loci2, Loci3, stringsAsFactors = F)
score2[2,3] <- strsplit(score2[2,3], split = '/')[1]
score2[5,2] <- strsplit(score2[3,3], split = '/')[1]
score2[1,1] <- strsplit(score2[1,1], split = '/')[1]
score2[c(1, 4),c(2,3)] <- NA
score2
You could just replace the 3 digit items with the separator and a copy:
sub("^(...)$", "\\1/\\1", Loci1)
Use lapply with an anonymized function:
data.frame( lapply(score2, function(x) sub("^(...)$", "\\1/\\1", x) ) )
Loci1 Loci2 Loci3
1 251/251 <NA> <NA>
2 251/329 320/257 260/260
3 275/242 278/329 281/320
4 269/266 <NA> <NA>
5 296/326 281/281 326/314
(Not sure what the "paste-part" was supposed to refer to, but I think this was the intent of your question)
If the numeric values could have a varying number of digits then use a pattern argument like "^([0-9]{1,9})$"
An option using grep/paste,
m1 <- as.matrix(score2)
indx <- grep('^...$', m1)
m1[indx] <- paste(m1[indx], m1[indx], sep="/")
as.data.frame(m1)
# Loci1 Loci2 Loci3
#1 251/251 <NA> <NA>
#2 251/329 320/257 260/260
#3 275/242 278/329 281/320
#4 269/266 <NA> <NA>
#5 296/326 281/281 326/314
Or without converting to matrix, this can be done using lapply
score2[] <- lapply(score2, function(x) ifelse(grepl('^...$', x),
paste(x, x, sep="/"),x))

R: What's the easiest way to print out pairs of values from a data.frame?

I have a data.frame:
df<-data.frame(a=c("x","x","y","y"),b=c(1,2,3,4))
> df
a b
1 x 1
2 x 2
3 y 3
4 y 4
What's the easiest way to print out each pair of values as a list of strings like this:
"x1", "x2", "y1", "y2"
apply(df, 1, paste, collapse="")
with(df, paste(a, b, sep=""))
And this should be faster than apply.
About timing
For 10000 rows we get:
df <- data.frame(
a = sample(c("x","y"), 10000, replace=TRUE),
b = sample(1L:4L, 10000, replace=TRUE)
)
N = 100
mean(replicate(N, system.time( with(df, paste(a, b, sep="")) )["elapsed"]), trim=0.05)
# 0.005778
mean(replicate(N, system.time( apply(df, 1, paste, collapse="") )["elapsed"]), trim=0.05)
# 0.09611
So increase in speed is visible for few thousands.
It's because Shane's solution call paste for each row separately. So there is nrow(df) calls of paste, in my solution is one call.
Also, you can use sqldf library:
library("sqldf")
df<-data.frame(a=c("x","x","y","y"),b=c(1,2,3,4))
result <- sqldf("SELECT a || cast(cast(b as integer) as text) as concat FROM df")
You will get the following result:
concat
1 x1
2 x2
3 y3
4 y4