Qgis or Python: converting a CSV file of simple locations to raster? - python-2.7

I have a CSV file as follows:
Diversity,Longitude,Latitude
7,114.99638889,-33.85333333
6,114.99790583,-33.85214594
10,115,-33.85416667
2,115.0252075,-33.84447519
I would like to convert it to a raster file with a set 'no data' value over most of the area and the values in cells at the long/lat locations.
Is there an easy way to do that in Qgis or python?
Cheers,
Steve

Not what you asked for, but here is how you can approach it in R
get the data:
d <- read.csv('file.csv')
d <- cbind(d[,2:3], d[,1])
load the raster package:
library(raster)
If your data are regularly spaced:
r <- rasterFromXYZ(d)
writeRaster(r, 'file.tif')
else create an empty raster and rasterize:
r <- raster(extent(d[,1:2]))
res(r) <- 1 # adjust this and other parameters as you see fit
r <- rasterize(d[,1:2], d[,3], fun=mean)

Related

How to update fillColor palette to selected input in shiny map?

I am having trouble transitioning my map from static to reactive so a user can select what data they want to look at. Somehow I'm not successfully connecting the input to the dataframe. My data is from a shapefile and looks roughly like this:
NAME Average Rate geometry
1 Alcona 119.7504 0.1421498 MULTIPOLYGON (((-83.88711 4...
2 Alger 120.9212 0.1204398 MULTIPOLYGON (((-87.11602 4...
3 Allegan 128.4523 0.1167062 MULTIPOLYGON (((-85.54342 4...
4 Alpena 114.1528 0.1410852 MULTIPOLYGON (((-83.3434 44...
5 Antrim 124.8554 0.1350004 MULTIPOLYGON (((-84.84877 4...
6 Arenac 127.8809 0.1413534 MULTIPOLYGON (((-83.7555 43...
In the server section below, you can see that I tried to use reactive to get the selected variable and when I write print(select) it does print the correct variable name, but when I try to put it into the colorNumeric() function it's clearly not being recognized. The map I get is all just the same shade of blue instead of different shades based on the value of the variable in that county.
ui <- fluidPage(
fluidRow(
selectInput(inputId="var",
label="Select variable",
choices=list("Average"="Average",
"Rate"="Rate"),
selected=1)
),
fluidRow(
leafletOutput("map")
)
)
server <- function(input, output, session) {
# Data sources
counties <- st_read("EITC_counties.shp") %>%
st_transform(crs="+init=epsg:4326")
counties_clean <- select(counties, NAME, X2020_Avg., X2020_Takeu)
counties_clean <- counties_clean %>%
rename("Average"="X2020_Avg.",
"Rate"="X2020_Takeu")
# Map
variable <- reactive({
input$var
})
output$map <- renderLeaflet({
select <- variable()
print(select)
pal <- colorNumeric(palette = "Blues", domain = counties_clean$select, na.color = "black")
color_pal <- counties_clean$select
leaflet()%>%
setView( -84.51, 44.18, zoom=5) %>%
addPolygons(data=counties_clean, layerId=~NAME,
weight = 1, smoothFactor=.5,
fillOpacity=.7,
fillColor=~pal(color_pal()),
highlightOptions = highlightOptions(color = "white",
weight = 2,
bringToFront = TRUE)) %>%
addProviderTiles(providers$CartoDB.Positron)
})
}
shinyApp(ui, server)
I've tried making the reaction into an event and also using the observe function using a leaflet proxy but it only produced errors. I also tried to skip the reactive definition and just put input$var directly into the palette (counties_clean$input$var), but it similarly did not show any color variation.
When I previously created a static map setting the palette using counties_clean$Average it came out correctly, but replacing Average with a user input is where I appear to be going wrong. Thanks in advance for any guidance you can provide and please let me know if I can share any additional clarification.
Unfortunately, your code is not reproducible without the data, but the mistake is most likely in this line
color_pal <- counties_clean$select
What this line does, is to extract a column named select from your data. This column is not existing, so it will return NULL.
What you want though, is to extract a column whose name is given by the content of select, so you want to try:
color_pal <- counties_clean[[select]]

Rmarkdown plot and image side by side

I'm working on a report in R markdown.
I know there is a way to have different plots side by side, and there is also a way to have different images side by side.
But is it also possible to show a plot and an image side by side?
I have a ggplot bar graph that I would like to present next to an image of a map, it takes up too much space if I put the image below the graph.
Thanks,
Regards,
Freya
This could work to you. Here is the step by step, with an indication of the code between parentheses.
First, you need to load the image (y), by creating an object (photo).
Second, you create a ggplot with the image (photo_panel).
Finally, after creating your plot (p1), you use the library cowplot to plot a grid (plot_grid).
.Rmd file:
---
title: "Image + graph"
author: bttomio
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
## R Markdown
```{r image_graph}
y = "http://upload.wikimedia.org/wikipedia/commons/5/5d/AaronEckhart10TIFF.jpg"
download.file(y,'y.jpg', mode = 'wb')
library("jpeg")
photo <- readJPEG("y.jpg",native=TRUE)
library(ggplot2)
library(cowplot)
photo_panel <- ggdraw() + draw_image(photo, scale = 0.8)
df <- data.frame(Years=rep(2016:2017, each=4),
Quarters=rep(paste0("Q", 1:4), 2),
Series1=seq(100, 800, 100))
library(ggplot2)
p1 <- ggplot(df) +
geom_point(aes(x=Quarters, y=Series1)) +
facet_wrap( ~ Years, strip.position="bottom", scales="free_x") +
theme(panel.spacing=unit(0, "lines"),
strip.background=element_blank(),
strip.placement="outside",
aspect.ratio=1) # set aspect ratio
plot_grid(p1, photo_panel, ncol = 2)
```
Output:

Rpart - accuracy of bigrams

Good evening, everyone!
I am facing a problem in R. I have a dataset containing Amazon reviews of the Playstation 4 and I would like to create a prediction model with the help of rpart and also would like to have the accuracy of this model.
The reviews have been successfully loaded to R, a corpus has been created and some preprocessing tasks have been applied:
library(RWeka)
library(tm)
library(rpart)
corpus <- Corpus(VectorSource(tr.review.ps4$reviewText))
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, PlainTextDocument)
corpus <- tm_map(corpus, removeWords, stopwords('english'))
corpus <- tm_map(corpus, stemDocument)
The bigrams and a term document matrix are created with the following code:
BigramTokenizer <- function(x) {RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 2, max = 2))}
txtTdmBi <- TermDocumentMatrix(corpus, control = list(tokenize = BigramTokenizer, bounds = list(global=c(10, Inf))))
Then sparse-terms are deleted and a matrix is created:
dtm <- removeSparseTerms(txtTdmBi, 0.999)
dtmsparse <- as.data.frame(as.matrix(txtTdmBi))
The original dataset consists of 7561 objects. Therefore a training and test set is created as follows:
train <- dtmsparse[1:6500,]
test <- dtmsparse[6501:7561,]
Then the training is done. $overall refers to the star rating from one to five.
train$overall <- tr.review.ps4[1:6500,]$overall
When using unigrams the prediction model is created as follows:
model <- rpart(overall ~., data = train, method= 'class')
However, this is not working in my case because - I guess - the connection to the original review dataset has to be established. But how? I don't have an idea.
When I am entering this code I get following error-output:
Error in terms.formula(formula, data = data) :
Can anyone help me? Thanks a lot.
Best regards
Paul
today I still was searching for a solution of my problem. Luckily I found the mistake.
The errore message occured because the TermDocumentMatrix was in the wrong postion.
I had to transpose the matrix with the following code:
txtTdmBi.t=t(txtTdmBi)
Finally it worked.
Best regards
Paul

parsing access.log to data.frame

I want to parse an access.log in R. It has the following form and I want to get it into a data.frame:
TIME="2013-07-25T06:28:38+0200" MOBILE_AGENT="0" HTTP_REFERER="-" REQUEST_HOST="www.example.com" APP_ENV="envvar" APP_COUNTRY="US" APP_DEFAULT_LOCATION="New York" REMOTE_ADDR="11.222.33.444" SESSION_ID="rstg35tsdf56tdg3" REQUEST_URI="/get/me/something" HTTP_USER_AGENT="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" REQUEST_METHOD="GET" REWRITTEN_REQUEST_URI="/index.php?url=/get/me/something" STATUS="200" RESPONSE_TIME="155,860ms" PEAK_MEMORY="18965" CPU="99,99"
The logs are 400MB per file and currently I have about 4GB logs so size matters.
Another thing.. There are two different log structures (different columns are included) so you can not assume to have the same columns always, but you can assume that only one kind of structure is parsed at a time.
What I have up to now is a regex for this structure:
(\\w+)[=][\"](.*?)[\"][ ]{0,1}
I can read the data in and somehow fit it into a dataframe using readlines, gsub and read.table but it is slow and messy.
Any ideas? Tnx!
You can do this for example:
text <- readLines(textConnection(text))
## since we can't use = as splitter (used in url) I create a new splitter
dd <- read.table(text=gsub('="','|"',text),sep=' ')
## use data.table since it is faster to apply operation by columns and bind them again
library(data.table)
DT <- as.data.table(dd)
DT.split <- DT[,lapply(.SD,function(x)
unlist(strsplit(as.character(x) ,"|",fixed=TRUE)))]
DT.split[c(F,T)]

How do you combine multiple boxplots from a List of data-frames?

This is a repost from the Statistics portion of the Stack Exchange. I had asked the question there, I was advised to ask this question here. So here it is.
I have a list of data-frames. Each data-frame has a similar structure. There is only one column in each data-frame that is numeric. Because of my data-requirements it is essential that each data-frame has different lengths. I want to create a boxplot of the numerical values, categorized over the attributes in another column. But the boxplot should include information from all the data-frames.
I hope it is a clear question. I will post sample data soon.
Sam,
I'm assuming this is a follow up to this question? Maybe your sample data will illustrate the nuances of your needs better (the "categorized over attributes in another column" part), but the same melting approach should work here.
library(ggplot2)
library(reshape2)
#Fake data
a <- data.frame(a = rnorm(10))
b <- data.frame(b = rnorm(100))
c <- data.frame(c = rnorm(1000))
#In a list
myList <- list(a,b,c)
#In a melting pot
df <- melt(myList)
#Separate boxplots for each data.frame
qplot(factor(variable), value, data = df, geom = "boxplot")
#All values plotted together as one boxplot
qplot(factor(1), value, data = df, geom = "boxplot")
a<-data.frame(c(1,2),c("x","y"))
b<-data.frame(c(3,4,5),c("a","b","c"))
boxplot(c(a[1],b[1]))
With the "1"'s i select the column i want out of the data-frame.
A data-frames can not have different column-lengths (has to have same number of rows for each column), but you can tell boxplot to plot multiple datasets in parallel.
Using the melt() function and base R boxplot:
#Fake data
a <- data.frame(a = rnorm(10))
b <- data.frame(b = rnorm(100))
c <- data.frame(c = rnorm(100) + 5)
#In a list
myList <- list(a,b,c)
#In a melting pot
df <- melt(myList)
# plot using base R boxplot function
boxplot(value ~ variable, data = df)