In R: "regcomp error: 'Missing ')'" --' regexec problems

In R: "regcomp error: 'Missing ')'" --' regexec problems - regex

product <- c('Model1','Model2','Model3')
price <- c(NA, NA, NA)
pricelist <- data.frame(product, price)
Using this data frame I run the following for loop to scan a document(called unformatted text in the code below) for the price and insert the price into the data frame
for (i in 1:nrow(pricelist))
{
p <- regexec( paste(pricelist[i,1], "(.*?)", "([0-9]+\\.[0-9][0-9])", sep = ""), unformattedtext)
p2 <- regmatches(unformattedtext, p)
p3 <- sapply(p2, function(x) x[3])
pricelist[i,2] <- p3
}
If I run this loop I get an error with regexec saying this-> regcomp error: 'Missing ')''
BUT if insert the values for i manually, using the exact same code, it functions correctly. Any idea on what I'm doing wrong? If it matters I'm using plyr for rbind.fill to get those NAs in my initial data frame.

Related

How to knit out table codes into table in R markdown

I am a basic-level learner of R. I am having a problem knitting out tables with a code my professor designed for the students. The code for table designs is set as below. I put this in my R markdown as below.
```{r, results="hide", message=FALSE, warning = FALSE, error = FALSE}
## my style latex summary of regression
jhp_report <- function(...){
output <- capture.output(stargazer(..., omit.stat=c("f", "ser")))
# The first three lines are the ones we want to remove...
output <- output[4:length(output)]
# cat out the results - this is essentially just what stargazer does too
cat(paste(output, collapse = "\n"), "\n")
}
```
After this, I tried printing this out with knitr.
```{r, message=FALSE, warning = FALSE, error = FALSE}
set.seed(1973)
N <- 100
x <- runif(N, 6, 20)
D <- rbinom(N, 1, .5)
t <- 1 + 0.5*x - .4*D + rnorm(N)
df.lm <- data.frame(y = y, x =x, D =D)
df.lm$D <- factor(df.lm$D, labels = c('Male', 'Female'))
##REGRESSION
reg.parallel <- lm(y ~ x + D, data = df.lm)
jhp_report(reg.parallel, title = "Result", label = "tab:D", dep.var.labels = "$y$")
```
As a result, instead of a table, it keeps on showing only the pure codes. I would like to know how I have to set up R markdown for it to print out the table instead of the codes. This is how the result looks like when I knit it.
I expected that there must be some setup options to print the table out. But I couldn't find the right one. Also, my assignment for class requires students to use this code. I did find other options like knitr::kable but I would like to use the given code for this assignment.
Thank you in advance!

Functions dplyr with rlang::last_error() in purrr::map loop in r

I'm using a function to calculate the length of linestring per cell by ID and store in a list, convert each element of the list into a RasterLayer and turn that list into a RasterStack, average all layers and get a single raster.
#function
# build_length_raster <- function(one_df) {
intersect_list <- by(
one_df ,
one_df$sub_id,
function(subid_df) sf::st_intersection(grid2, subid_df) %>%
dplyr::mutate(length = as.numeric(sf::st_length(.))) %>%
sf::st_drop_geometry()
)
list_length_grid <- purrr::map(intersect_list, function(x)
x %>% dplyr::left_join(x=grid2, by="cell", copy=T) %>%
dplyr::mutate(length=length) %>%
dplyr::mutate_if(is.numeric,coalesce,0)
)
list_length_raster <- purrr::map(list_length_grid, function(x)
raster::rasterize(x, r, field="length", na.rm=F, background=0)
)
list_length_raster2 <- unlist(list_length_raster, recursive=F)
raster_stack <- raster::stack(list_length_raster2)
raster_mean <- raster::stackApply(
raster_stack,
indices = rep(1,nlayers(raster_stack)),
fun = "mean", na.rm = TRUE)
#}
The function presents a step where, in order for the resulting grid of st_intersection() to have the same number of cells as it had initially, I use left_join(by="cell" column).Then I use mutate() to replace the NA's with 0. When I run the function steps for one dataframe from the list, it works perfectly, but when I put it inside map() to do this in a list, I get this error, which seems to refer to the dplyr functions:
final_list <- purrr::map(mylist, build_length_raster)
> rlang::last_error()
<error/rlang_error>
Join columns must be present in data.
x Problem with `cell`.
Backtrace:
1. purrr::map(mylist, build_length_raster)
15. dplyr:::left_join.data.frame(., x = grid, by = "cell", copy = T)
16. dplyr:::join_mutate(...)
17. dplyr:::join_cols(...)
18. dplyr:::standardise_join_by(by, x_names = x_names, y_names = y_names)
19. dplyr:::check_join_vars(by$y, y_names)
Run `rlang::last_trace()` to see the full context.
Is there a way to solved this problem?
MYDATA example
library(tidyverse)
library(sf)
library(purrr)
library(raster)
#data example
id <- c("844", "844", "844", "844", "844","844", "844", "844", "844", "844",
"844", "844", "845", "845", "845", "845", "845","845", "845", "845",
"845","845", "845", "845")
sub_id <- c("2017_844_1", "2017_844_1", "2017_844_1", "2017_844_1", "2017_844_2",
"2017_844_2", "2017_844_2", "2017_844_2", "2017_844_3", "2017_844_3",
"2017_844_3", "2017_844_3", "2017_845_1", "2017_845_1", "2017_845_1",
"2017_845_1", "2017_845_2","2017_845_2", "2017_845_2", "2017_845_2",
"2017_845_3","2017_845_3", "2017_845_3", "2017_845_3")
lat <- c(-30.6456, -29.5648, -27.6667, -31.5587, -30.6934, -29.3147, -23.0538,
-26.5877, -26.6923, -23.40865, -23.1143, -23.28331, -31.6456, -24.5648,
-27.6867, -31.4587, -30.6784, -28.3447, -23.0466, -27.5877, -26.8524,
-23.8855, -24.1143, -23.5874)
long <- c(-50.4879, -49.8715, -51.8716, -50.4456, -50.9842, -51.9787, -41.2343,
-40.2859, -40.19599, -41.64302, -41.58042, -41.55057, -50.4576, -48.8715,
-51.4566, -51.4456, -50.4477, -50.9937, -41.4789, -41.3859, -40.2536,
-41.6502, -40.5442, -41.4057)
df <- tibble(id = as.factor(id), sub_id = as.factor(sub_id), lat, long)
#converting to sf
df.sf <- df %>%
sf::st_as_sf(coords = c("long", "lat"), crs = 4326)
#creating grid
xy <- sf::st_coordinates(df.sf)
grid = sf::st_make_grid(sf::st_bbox(df.sf),
cellsize = .1, square = FALSE) %>%
sf::st_as_sf()
#creating raster
r <- raster::raster(grid, res=0.1)
#return grid because raster function changes number of cells
grid2 <- rasterToPolygons(r, na.rm=F) %>%
st_as_sf() %>% mutate(cell=1:ncell(r))
#creating linestring to each sub_id
df.line <- df.sf %>%
dplyr::group_by(sub_id, id) %>%
dplyr::summarize() %>%
sf::st_cast("LINESTRING")
#creating ID list
mylist<- split(df.line, df.line$id)
#separating one dataframe of list to test function
one_df <- df.line[df.line$id=="844",]
one_df$id <- droplevels(one_df$id)
one_df$sub_id <- droplevels(one_df$sub_id)

The specific error is caused because intersect_list has empty items in the list, which cannot be joined because they are empty, and hence have no columns to join by. If you modified the map function to only use non-empty items of intersect_list you would not get that error.
As you noted in the comments, removing the empty list entries with keep(intersect_list, ~ !is.null(.)) before mapping left_join onto the list items will fix the error.
However, I don't think this is the most elegant way to solve this problem. I might misunderstand what the goal is, but if it's to produce a raster from the total length of lines within each grid cell, I think a simpler approach without using purrr might work.
This is not the exact same as your product, but I'm keeping it simpler rn to illustrate an alternate approach. Here is a sum of the lengths in each cell as a stars object (similar to raster but plays better with the tidyverse and sf).
I'm starting off from your objects one_df and grid:
# Turn multiple lines into single MULTILINESTRING:
one_df %>%
st_union() ->
union_df
# Intersection of each grid cell with the MULTILINESTRING geometry:
grid %>%
st_intersection(union_df) ->
grid_lines
# Get lengths:
grid_lines %>%
mutate(length = st_length(x)) %>%
st_drop_geometry() ->
grid_lengths
# Join the calculated lengths back with the spatial grid,
# most of which will have NA for length
grid %>%
left_join(grid_lengths, by = "cell") ->
grid_with_lengths
# Rasterize the length field of the grid
grid_with_lengths %>%
dplyr::select(length) %>%
stars::st_rasterize() ->
length_stars
length_stars %>% mapview::mapview()

Outputting the results from bife object to Latex in Rmarkdown?

I'm estimating a fixed-effects probit model using the bife package in R. I'm trying to extract the output into something I can use with either stargazer or texreg so I can output them into a paper using Rmarkdown to create a LaTeX object. I'm aware I can manually extract the coefficients and standard errors, etc., but I'm wondering if there isn't a more efficient way to coerce this object into something that'd work with either package.
Here's a reproducible example:
install.packages("bife")
library(bife)
data("iris")
iris$big <- ifelse(iris$Sepal.Length > median(iris$Sepal.Length),1,0)
output <- bife(big ~ Sepal.Width + Petal.Length | Species, data=iris, "logit")

I think I found an alternative solution for this one, even if it is probably too late
Basically, first, I went on the repository of the package "texreg" and found this function:
extract.bife <- function(model,
include.loglik = TRUE,
include.deviance = TRUE,
include.nobs = TRUE,
...) {
s <- summary(model)
coefficient.names <- rownames(s$cm)
co <- s$cm[, 1]
se <- s$cm[, 2]
pval <- s$cm[, 4]
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.loglik == TRUE) {
lik <- logLik(model)
gof <- c(gof, lik)
gof.names <- c(gof.names, "Log Likelihood")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.deviance == TRUE) {
gof <- c(gof, deviance(model))
gof.names <- c(gof.names, "Deviance")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.nobs == TRUE) {
n <- s$nobs["nobs"]
gof <- c(gof, n)
gof.names <- c(gof.names, "Num. obs.")
gof.decimal <- c(gof.decimal, FALSE)
}
tr <- createTexreg(
coef.names = coefficient.names,
coef = co,
se = se,
pvalues = pval,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
So for your example, just apply it on your model and use the function texreg and you may have a Latex-"like" output
tr <- extract.bife(output)
texreg(tr)
I hope it will help!
Best

R deleting duplicates when duplicates slightly differ by 1 or 2 letters

I am collecting tweets with the twitteR package and get many duplicates. This code works fine:
tweets <- searchTwitter(keyword, n=500, lang="en", since=NULL, until=NULL, retryOnRateLimit=100)
mydata <- sapply(tweets, function(x) x$getText())
mydata <- unique(mydata, incomparables = F, nmax = NA)
The problem is that it actually doesn't delete any duplicates because it doesn't recognises them as such. The duplicate tweets typically contain shortened URLS that differ by 1 or 2 digits. So I tried to clean the tweets of URLS with this code:
tweets <- searchTwitter(keyword, n=500, lang="en", since=NULL, until=NULL,
retryOnRateLimit=100)
mydata <- sapply(tweets, function(x) x$getText())
mydata <- data.frame(tweetsText, stringsAsFactors = FALSE)
names(mydata) <- c('words')
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
mydata$words <- removeURL(mydata$words)
removeURL <- function(x) gsub("https[[:alnum:]]*", "", x)
mydata$words <- removeURL(mydata$words)
mydata$words <- unique(mydata$words, incomparables = F, nmax = NA)
Now I get the error message:
Error in $<-.data.frame(*tmp*, "words", value = c("Tripping around #DisneySprings.....) : replacement has 295 rows, data has 300
Advice? Thanks!

Your error is easily reproducible:
mydata <- data.frame(list(w = c(0, 1, 0, 1)))
mydata$words <- c(0, 1, 1)
# Error in `$<-.data.frame`(`*tmp*`, "words", value = c(0, 1, 1)) :
# replacement has 3 rows, data has 4
This just means that you need to assign a vector of the same length as the length of the data frame.
To filter out duplicate values you need to change the last line of your code to:
res <- mydata[!duplicated(mydata$words), ]

How to use gsub to remove one string from a line in R?

I have a dat file with data like this:
<Avg. Price>$103
<URL>http://www.tripadvisor.com/ShowUserReviews-g60878-d100506-r21086254-
<Author>expressoparking
etc.
Here's the script I've made to identify each row with an
<Author>
tag, delete that tag, and lastly put the rest of that row into a single column in a data frame.
y <- read.delim("hotel_100506.dat")
author <- grep("<Author>*", y$v1)
data_author <- y[author,]
author2 <- gsub("<Author>", "", author)
dataPractice <- data.frame(data_author)
I know that my data_author variable has the correct data, but the data.frame method doesn't bring any of it over. What's going on here?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++========================================================================
Okey doke - Here's the solution to putting the author tag line's information in a dataframe:
x <- read.delim("hotel_100505.dat", header = F)
Author <- grep("<Author>", x$V1)
data_Author <- x[Author,]
data_Author2 <- gsub("<Author>","",data_Author)
data <- data.frame(data_Author2)

The OP offers this as the solution, in a comment above:
x <- read.delim("hotel_100505.dat", header = F)
Author <- grep("<Author>", x$V1)
data_Author <- x[Author,]
data_Author2 <- gsub("<Author>","",data_Author)
data <- data.frame(data_Author2)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

In R: "regcomp error: 'Missing ')'" --' regexec problems - regex

Related

How to knit out table codes into table in R markdown

Functions dplyr with rlang::last_error() in purrr::map loop in r

Outputting the results from bife object to Latex in Rmarkdown?

R deleting duplicates when duplicates slightly differ by 1 or 2 letters

How to use gsub to remove one string from a line in R?

Categories

Resources