R 3.4 and mclapply strange behavior - is this a bug?

R 3.4 and mclapply strange behavior - is this a bug? - c++

I am not sure if this is a bug, so I prefer to post it here before filing.
After upgrading to from R 3.3.3 to R 3.4 I encounter the following message with mclapply:
Assertion failure at kmp_runtime.cpp(6480): __kmp_thread_pool == __null.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(6480).
OMP: Hint: Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see
Note that this behavior was not present in R 3.3.3 on the same machine and all the batch was working without any errors. Also note that I tried this with all possible values for enableJIT(X) with the same result.
The batch to (hopefully) reproduce it is here:
library(data.table)
load(file = "z.RData")
firmnames <- as.list(unique(z[, firm_name]))
f <- function(x, d = z) {
tmp <- d[dealid %in% unique(d[firm_name %in% x, dealid]), .(firm_name, firm_type, dealid, investment_year, investment_yearQ, round_number)][firm_name != x, ]
tmpY <- tmp[, .N, by = .(firm_type, investment_year, round_number)]
tmpQ <- tmp[, .N, by = .(firm_type, investment_yearQ, round_number)]
return(list(
firm_name = x,
by_year = tmpY,
by_quarter = tmpQ,
allroundsY = tmpY[, sum(N), by = .(firm_type, investment_year)],
allroundsQ = tmpQ[, sum(N), by = .(firm_type, investment_yearQ)]))
}
r <- mclapply(firmnames, f, mc.cores = detectCores(), mc.preschedule = FALSE)
The data for the reproducible example is here:
https://www.dropbox.com/s/2enoeapu7jgcxwd/z.Rdata?dl=0
The sessionInfo():
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel compiler stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.10.4 numbers_0.6-6 microbenchmark_1.4-2.1 zoo_1.8-0 doParallel_1.0.10 iterators_1.0.8
[7] foreach_1.4.3 RSclient_0.7-3 stringi_1.1.5 stringr_1.2.0 lubridate_1.6.0 plyr_1.8.4
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 lattice_0.20-35 codetools_0.2-15 grid_3.4.0 gtable_0.2.0 magrittr_1.5 scales_0.4.1 ggplot2_2.2.1
[9] lazyeval_0.2.0 tools_3.4.0 munsell_0.4.3 colorspace_1.3-2 tibble_1.3.0
Thank you in advance for help/hints,
Yan
EDIT: Slight edit, it turns out that this code cannot really reproduce the issue... However, I leave it here following the advice of data.table developer just in case someone else finds it helpful.

Related

AWS_ACCESS_KEY_ID: Missing access token for source AWS

I use sits in R on Windows 10 in order to receive satellite images from Amazon Web Services. It worked out already, but now I get this error message:
bfmn <- sits_cube(
source="AWS",
collection="SENTINEL-S2-L2A",
dir="C:/temp/final2",
bands = c("B02","B03","B04","B05","B08","B11","B12","SCL"),
start_date = "2019-03-01",
end_date = "2019-06-04",
roi=AOI,
delim = "_",
multicores = 2,
progress = TRUE)
Error: sits_cube: AWS_ACCESS_KEY_ID: Missing access token for source AWS (nzchar(Sys.getenv(x)) is not TRUE)
What do I set as AWS_ACCESS_KEY_ID? Does this I mean I have to log in in order to able to use the service?
Thanks
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C LC_TIME=German_Germany.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RPostgres_1.4.4
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lattice_0.20-45 here_1.0.1 png_0.1-8 withr_2.5.0 rprojroot_2.0.3 rappdirs_0.3.3 grid_4.2.2 lifecycle_1.0.3
[10] jsonlite_1.8.4 DBI_1.1.3 rlang_1.0.6 cli_3.4.1 rstudioapi_0.14 blob_1.2.3 Matrix_1.5-1 ellipsis_0.3.2 vctrs_0.5.1
[19] reticulate_1.27 tools_4.2.2 bit64_4.0.5 bit_4.0.5 hms_1.1.2 compiler_4.2.2 pkgconfig_2.0.3

it seems to me that your function is trying to access the environment variable AWS_ACCESS_KEY_ID.
double check that you have that value set using echo $AWS_ACCESS_KEY_ID for Linux/mac

Is it possible to save Kable tables to PDF with macOS Catalina?

I use the kableExtra package all the time to create tables in R Markdown and save them to PDF. However today I upgraded to MacOS Catalina and now when I export to PDF, I cannot open the PDF image and get the error, "insufficient data for an image" in Adobe Reader. I don't get an error message in R Markdown and the PDF appears in my files, but I cannot open it in Adobe. If I use another application to open it (i.e., Preview) the resolution is very poor - and not typical of PDF. Has anyone else run into this problem and is there a solution? Many thanks for your help!
My code to retrieve the error is below:
mtcars %>%
kable() %>%
kable_styling() %>%
save_kable("test.pdf")
And session info is:
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] metafor_2.1-0 Matrix_1.2-18 MASS_7.3-51.5 RColorBrewer_1.1-2 tidyr_1.0.2 knitr_1.28
[7] sandwich_2.5-1 plotROC_2.2.1 ggROC_1.0 pROC_1.16.1 dplyr_0.8.4 plyr_1.8.5
[13] foreign_0.8-75 gridExtra_2.3 kableExtra_1.1.0 scales_1.1.0 ggplot2_3.2.1 psych_1.9.12.31
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 lattice_0.20-38 ps_1.3.2 zoo_1.8-7 utf8_1.1.4 assertthat_0.2.1
[7] digest_0.6.24 R6_2.4.1 evaluate_0.14 httr_1.4.1 highr_0.8 pillar_1.4.3
[13] rlang_0.4.4 lazyeval_0.2.2 rstudioapi_0.11 callr_3.4.2 magick_2.3 rmarkdown_2.1
[19] labeling_0.3 webshot_0.5.2 readr_1.3.1 stringr_1.4.0 munsell_0.5.0 compiler_3.6.2
[25] xfun_0.12 pkgconfig_2.0.3 mnormt_1.5-6 htmltools_0.4.0 tidyselect_1.0.0 tibble_2.1.3
[31] fansi_0.4.1 viridisLite_0.3.0 crayon_1.3.4 withr_2.1.2 grid_3.6.2 nlme_3.1-144
[37] jsonlite_1.6.1 gtable_0.3.0 lifecycle_0.1.0 magrittr_1.5 cli_2.0.1 stringi_1.4.6
[43] farver_2.0.3 xml2_1.2.2 vctrs_0.2.2 tools_3.6.2 glue_1.3.1 purrr_0.3.3
[49] hms_0.5.3 processx_3.4.2 parallel_3.6.2 yaml_2.2.1 colorspace_1.4-1 rvest_0.3.5

Render to "pdf_document" output format in rmarkdown getting stuck on knitr asis_output function

New to Rmarkdown (and markdown in general). I've inherited some code that works great for the html_document output format but not for pdf_document. It seems to get stuck on the knitr asis_output function in the .Rmd script. When I comment out chunks containing that function, it writes to pdf no problem. Here's some troubleshooting I've tried:
xfun::session_info('rmarkdown')
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.1, RStudio 1.2.1335
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
Locale: en_CA.UTF-8 / en_CA.UTF-8 / en_CA.UTF-8 / C / en_CA.UTF-8 / en_CA.UTF-8
:Package version:
base64enc_0.1.3 digest_0.6.20 evaluate_0.14 glue_1.3.1 graphics_3.6.1 grDevices_3.6.1 highr_0.8
htmltools_0.4.0 jsonlite_1.6 knitr_1.25 magrittr_1.5 markdown_1.1 methods_3.6.1 mime_0.7
Rcpp_1.0.2 rlang_0.4.0 rmarkdown_1.16 stats_3.6.1 stringi_1.4.3 stringr_1.4.0 tinytex_0.17.1
tools_3.6.1 utils_3.6.1 xfun_0.10 yaml_2.2.0
Pandoc version: 2.7.3
Sys.getenv('PATH')
[1] "/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Library/TeX/texbin:/opt/X11/bin"
tinytex::tinytex_root()
[1] "/usr/local/texlive/2019"
(tinytex::tlmgr_path())
tlmgr path add add_link_dir_dir: /usr/local/share/info/dir exists;
not making symlink. add_link_dir_dir: destination
/usr/local/share/man/man5 not writable, no links from
/usr/local/texlive/2019/texmf-dist/doc/man/man5.tlmgr: An error has
occurred. See above messages. Exiting. add of symlinks had 1
error(s), see messages above.[1] 6
So maybe the problem is a path issue? In which case I have no clue how to fix. Or should I be using an alternative to the asis_output function? Any help is much appreciated. Here's the relevant bits of my code:
In the R script:
id <- 44
rmarkdown::render('mymarkdown.Rmd',
output_format = "pdf_document",
output_file = paste("report_", id,".pdf", sep=''),
output_dir = '/Users/myname/Documents/test')
In the Rmd file:
---
title: "Monitoring Activity Summary Report"
mode: selfcontained
date: "November 2019"
output:
pdf_document: default
html_document: default
self_contained: yes
---
[some code chunks...]
[then these code chunks that get stuck only for "pdf_document"...]
``` {r setup_Samp1a, echo=FALSE}
sampling_1 <- !is.na(sampling_unique[1])```
```{r conditional block, eval = sampling_1}
asis_output("### 3.1 Sampling 1\\n") # Header that is only shown if
sampling_1 == TRUE```
The error message
! Undefined control sequence.
<argument> 3.1 Sampling 1\n
Error: Failed to compile /Users/myname/Documents/test/report_44.tex.
See https://yihui.name/tinytex/r/#debugging for debugging tips. See
report_44.log for more info.

Failed to detect version from SPARK_HOME or SPARK_HOME_VERSION

I'm trying to follow a tutorial for using spark from RStudio on DSX, but I'm running into the following error:
> library(sparklyr)
> sc <- spark_connect(master = "CS-DSX")
Error in spark_version_from_home(spark_home, default = spark_version) :
Failed to detect version from SPARK_HOME or SPARK_HOME_VERSION. Try passing the spark version explicitly.
I took the above code snippet from the connect to spark dialog in RStudio:
So I took a look at SPARK_HOME:
> Sys.getenv("SPARK_HOME")
[1] "/opt/spark"
Ok, Lets check that dir exists:
> dir("/opt")
[1] "ibm"
I'm guessing this is the cause of the problem?
NOTE: there are a few similar questions on stackoverflow, but none of them are about IBM's Data Science Experience (DSX).
Update 1:
I tried the following:
> sc <- spark_connect(config = "CS-DSX")
Error in config$spark.master : $ operator is invalid for atomic vectors
Update 2:
An extract from my config.yml. Note that I have many more spark services in my, I've just pasted the first one:
default:
method: "shell"
CS-DSX:
method: "bluemix"
spark.master: "spark.bluemix.net"
spark.instance.id: "7a4089bf-3594-4fdf-8dd1-7e9fd7607be5"
tenant.id: "sdd1-7e9fd7607be53e-39ca506ba762"
tenant.secret: "xxxxxx"
hsui.url: "https://cdsx.ng.bluemix.net"
Note that my config.yml was generated for me.
Update 3:
My .Rprofile looks like this:
# load sparklyr library
library(sparklyr)
# setup SPARK_HOME
if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
Sys.setenv(SPARK_HOME = "/opt/spark")
}
# setup SparkaaS instances
options(rstudio.spark.connections = c("CS-DSX","newspark","cleantest","4jan2017","Apache Spark-4l","Apache Spark-3a","ML SPAAS","Apache Spark-y9","Apache Spark-a8"))
Note that my .Rprofile was generated for me.
Update 4:
I uninstalled sparklyr and restarted the session twice. Next I tried to run:
library(sparklyr)
library(dplyr)
sc <- spark_connect(config = "CS-DSX")
However, the above command hung. I stopped the command and checked the version of sparklyr which seems to be ok:
> ip <- installed.packages()
> ip[ rownames(ip) == "sparklyr", c(0,1,3) ]
Package Version
"sparklyr" "0.4.36"

You cannot use master parameter to connect to bluemix spark service if that is the intent since your kernels are defined in config.yml file, you should be using config parameter instead to connect.
config.yml is loaded up with your available kernel information(spark instances).
Apache Spark-ic:
method: "bluemix"
spark.master: "spark.bluemix.net"
spark.instance.id: "41a2e5e9xxxxxx47ef-97b4-b98406426c07"
tenant.id: "s7b4-b9xxxxxxxx7e8-2c631c8ff999"
tenant.secret: "XXXXXXXXXX"
hsui.url: "https://cdsx.ng.bluemix.net"
Please use config
sc <- spark_connect(config = "Apache Spark-ic")
as suggested in tutorial:-
http://datascience.ibm.com/blog/access-ibm-analytics-for-apache-spark-from-rstudio/
FYI,
By Default, you are connected to , i am working on finding how to change version with config parameter.
> version <- invoke(spark_context(sc), "version")
print(version)
[1] "2.0.2"
Thanks,
Charles.

I had the same issue and fix it as follows:
go to C:\Users\USER_NAME\AppData\Local/spark/ and delete everything you'll find in the directory
Then, in the R console run:
if (!require(shiny)) install.packages("shiny");
library(shiny)
if (!require(sparklyr)) install.packages("sparklyr");
library(sparklyr)
spark_install()

Using the getwd() function in the output_dir parameter for rmarkdown::render (rmarkdown R package) gives unexpected result

I have this strange scenario in R where the rmarkdown::render() function is performing unexpectedly. Given the following simplistic scripts/test.Rmd
---
title: test
---
```{r test}
plot(1:10)
```
I then have the following R code:
> getwd()
[1] "/projects/test_project"
library('rmarkdown')
rmarkdown::render("scripts/test.Rmd", output_file = 'test.html', output_dir = paste( getwd(), '/', 'reports', sep = '') )
This ends up producing output here:
/projects/test_project/scripts/reports/test.html
Which is unexpected to me because:
> paste( getwd(), '/', 'reports', sep = '') )
/projects/test_project/reports
I would expected the test.html to be generated here /projects/test_project/reports/test.html. Interestingly, when I forgo the getwd() and use the path as a character string:
> render("scripts/test.Rmd", output_file = 'test.html', output_dir = "/projects/test_project/reports" )
This will generate the file in the expected location. Any ideas what is happening here?
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin14.1.0 (64-bit)
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] splines grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] rmarkdown_0.4.2 scales_0.2.4 broom_0.3.5 data.table_1.9.4 gplots_2.15.0 RColorBrewer_1.1-2 reshape2_1.4.1 dplyr_0.3.0.2 ggplot2_1.0.0
[10] stringr_0.6.2 plyr_1.8.1 survival_2.37-7 xtable_1.7-4 fields_7.1 maps_2.3-9 spam_1.0-1 knitr_1.8 argparse_1.0.1
[19] proto_0.3-10 vimcom_1.0-0 setwidth_1.0-3 colorout_1.1-0
loaded via a namespace (and not attached):
[1] assertthat_0.1 bitops_1.0-6 caTools_1.17.1 chron_2.3-45 colorspace_1.2-4 DBI_0.3.1 digest_0.6.8 evaluate_0.5.5 findpython_1.0.1
[10] formatR_1.0 gdata_2.13.3 getopt_1.20.0 gtable_0.1.2 gtools_3.4.1 htmltools_0.2.6 KernSmooth_2.23-13 lazyeval_0.1.10 magrittr_1.5
[19] MASS_7.3-35 munsell_0.4.2 parallel_3.1.2 psych_1.4.8.11 Rcpp_0.11.3 rjson_0.2.15 tidyr_0.2.0 tools_3.1.2 yaml_2.1.13

I am facing the same issue and feel it might be because of a small bug in the render() function. I have raised a issue on its github page:
https://github.com/rstudio/rmarkdown/issues/416
As a workaround/hack I am using:
require(rmarkdown)
fl = render("awesome.Rmd")
file.copy(fl, getwd())
Hope that helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

R 3.4 and mclapply strange behavior - is this a bug? - c++

Related

AWS_ACCESS_KEY_ID: Missing access token for source AWS

Is it possible to save Kable tables to PDF with macOS Catalina?

Render to "pdf_document" output format in rmarkdown getting stuck on knitr asis_output function

Failed to detect version from SPARK_HOME or SPARK_HOME_VERSION

Using the getwd() function in the output_dir parameter for rmarkdown::render (rmarkdown R package) gives unexpected result

Categories

Resources