I'm creating pdf versions of html slides generated from R markdown files. I've come across a puzzling behaviour. When I run pagedown::chrome_print on the html with output specified as xaringan::moon_reader, the operation fails with the timeout message:
Error in force(expr) : Failed to generate output in 30 seconds (timeout)
Here is an example of a call to convert such a xaringan html file which produces this timeout error on my machine:
pagedown::chrome_print("https://stat540-ubc.github.io/lectures/lectures_2020/lect09_MultipleTesting.html")
The Rmd source for this html is located here. If I increase the timeout argument of chrome_print to something very large (several thousand), the operation appears to take a lot of resources (computer fans turn on, machine becomes hot), but the pdf output is eventually produced. However, if I instead change the output to slidy_presentation in the Rmd instead of xaringan::moon_reader, chrome_print runs successfully on the html and produces a pdf in just a few seconds (with no change to the default timeout argument).
I have the same issue with other slide decks that I have created with a similar template to the one I linked above, but this doesn't happen with every xaringan html file. For example, I am able to use chrome-print to successfully convert this xaringan html file to pdf (with no change to the default timeout argument):
pagedown::chrome_print("https://annakrystalli.me/talks/xaringan/xaringan.html")
Other things I tried:
I installed decktape and used the xaringan::decktape on the xaringan html file, which also produced a timeout error. Though I'm not sure how to increase the time using this method, so I don't know if it would eventually work if given enough time.
I tried using the latest versions of Google Chrome and Chromium with the chrome_print function and got the same results as described above. I'm using Mac OSX 10.15.5.
I would like to stick with xaringan html slides as they have some features I prefer. However, the current method of conversion to pdf is not sustainable for me since I will need to convert many similar htmls, as well as update them periodically. If anyone has come across this or can suggest what might be causing this extreme slowdown when converting my xaringan htmls to pdfs, I'd appreciate your input.
Related
When my colleague and I run the same Rmd file on our respective computers they produce different .tex files. This is a problem, because the tex-file my computer produces doesn't compile. Apparently there is some invisible local setting that is different between our computers but what could it be? I updated all the Rpackages I use but to no avail.
The Rmd file starts with
output:
bookdown::pdf_document2:
keep_tex: yes
toc: false
And both of us compile it by simply hitting the knit-button in Rstudio.
Noticeable differences in the tex-files are:
extra linebreaks in different places
a line that is commented out in the rmd-file (<!-- blabla -->) appears in my tex-file, not in his, but some other out-commented lines appear in neither (as they should)
at the end of lines in tables there is a \strut inserted in my tex-file but not in his
Section heads read \hypertaget{blabla} in his file but not mine
For none of these difference I can find any place in the Rmd-file where any choice w.r.t to this is made - apparently some local settings file I am not aware of is used in the process??
Please let me know if you need more information.
EDIT: we found a partial answer and full solution, but I am still interested in what the underlying mechanism is. It turned out that I was using an older version of Rstudio. (It took me long to find that out because the check for updates tool in Rstudio kept telling me that I was using the newest version, but that is a separate issue.) Using the same version of Rstudio we get the same result.
The translation from Rmd to tex has multiple steps:
All the code chunks are extracted and executed via knitr, resulting in a md file.
The md file is translated to tex via pandoc.
For most people pandoc comes bundled with RStudio. So when you updated that, you got a more recent pandoc version. You can test for the used pandoc version with rmarkdown::pandoc_version().
I need to read a PDF's page size rather cheaply, so my user can select specific pages (and load them in higher detail).
The only way I see to do this with Magick++ API is using the STL call readImages. This does load in all the pages of the PDF as Magick::Images, and gets quite expensive for large PDF documents (order of 50 pages takes about 15s on my machine.)
I did read a post on ImageMagick's forums that speaks about the ReadOptions class (not documented at time of writing) you can pass to readImages method to read lower density image, but this still takes too long. (About 10s). None of the other options on ReadOptions really make a big difference with regards to speed.
Here is the code I have at the moment:
std::vector<Magick::Image> PDFImageList;
Magick::ReadOptions readOptions;
readOptions.density(Magick::Geometry(2,2));
readOptions.size(Magick::Geometry(1,1));
readOptions.depth(8);
// This call takes too long.
Magick::readImages(&PDFImageList, m_pathToPDFFile, readOptions);
int numberOfPages = PDFImageList.size();
I have also tried the Magick::Image.ping() method, and can't find any data that it returns that relates to the page number.
Any other attribute or undocumented ImageMagick++ feature that I can try to get the page count cheaply?
Using another question's answer and Qt's process class, the program now runs the following on the commandline:
gs -q -dNODISPLAY -c "(input.pdf) (r) file runpdfbegin pdfpagecount = quit"
Which returns page number as last line of standard out. Since the gs executable is a requirement of ImageMagick's PDF reading functionality, I'm happy with this solution. It is also quite fast. (Less than a second for the ~50 page PDF)
I need to detect when a web page along with all its contents and embedded URLs, Images, Audio, Video etc. are completely downloaded. I debugged the file docshell/base/nsDocShell.cpp --> nsDocShell::LoadURI(). It gives me clear indication that page load is complete, but only problem is that I can still see page download in progress while above API indicated page download is over. Can someone help regarding this, Am I looking/digging into wrong location? I don't have to write AddOn or Extension, but only dig into base source code and take the information.
Well, there is no singular "all downloaded" state. After "everything" is downloaded, parsers might be still active, causing more downloads as more image tags and script tags are parsed, and so on. Parser might be paused even, e.g. when encountering synchronous script tags which.
The nearest "all complete" state is likely when document.readyState is set to "complete" and the load/error event for the document is dispatched.
This happens (IIRC) in nsDocumentViewer::LoadComplete.
Even then more external resources might be loaded as a result of scripts inserting new tags, CSS rules being triggered, XMLHttpRequests and so on. Modern web sites are for that reason never fully loaded.
May be the title of my question is really awful but I couldn't figure a better way to frame it. So the problem is I have a Silverlight web app that does some processing and generates an Excel file as output. THe Excel generation code uses OpenXML format to create various XML parts and packages and using System.Packaging.CompressionOptions I compress the file generated. Now, when the browser (IE 9) shows a download options box, if I click Open to open the file in Excel and then do a SaveAs, it saves the file with a further reduced size as opposed to if I hit Save directly on the download box in which case it saves it with whatever size the file was created with.
Any ideas why these 2 ways of saving the same file result in different sizes?
Cheers
Depending on how you used the OpenXML library, there might be some inefficiencies or errors. Resaving the file in Excel will fix any duplicate formatting, update the metadata (possibly reducing it) and fix any validation errors. I encourage getting the Open XML SDK 2.0 Productivity Tool provided with the OpenXML SDK to check for any validation errors and to better understand where more inefficiencies might lie. It is possible to automatically resave the file using Excel by using Interop (using C# anyways).
I've hit a weird snag developing a reports using ColdFuions 8. The report prints a number of large HTML tables and the customer wants them to be formated ins such a way that when they are printed out the user will get 2 of the tables per page.
So it wasn't hard to make a page break by using a
<p style="page-break-before: always"></p>
However, while I got the desired effect while using the development ColdFusion environment on my laptop, I get a diffent effect when printing reports generated on the test web server. There the reports print out with a much larger font so that the second table spills over onto a second page.
Has anyone else experienced this or has a recomendation for how to try and tackle it?
When a report needs to be printed, I would recommend using CFDOCUMENT to create a PDF. You get much more control over the final output including changing page orientation which is great for tables that are wide. It honors a lot of HTML and CSS including the page-break-before style so you shouldn't have to do to much conversion to use it outside of wrapping the report area with CFDOCUMENT tags. It has been available since CF7 so it should work for you on CF8.