Thumbnail the first page of a pdf from a stream in GraphicsMagick - amazon-web-services

I know how to use GraphicsMagick to make a thumbnail of the first page of a pdf if I have a pdf file and am running gm locally. I can just do this:
gm(pdfFileName + "[0]")
.background("white")
.flatten()
.resize(200, 200)
.write("output.jpg", (err, res) => {
if (err) console.log(err);
});
If I have a file called doc.pdf then passing doc.pdf[0] to gm works beautifully.
But my problem is I am generating thumbnails on an AWS Lambda function, and the Lambda takes as input data streamed from a source S3 bucket. The relevant slice of my lambda looks like this:
// Download the image from S3, transform, and upload to a different S3 bucket.
async.waterfall([
function download(next) {
s3.getObject({
Bucket: sourceBucket,
Key: sourceKey
},
next);
},
function transform(response, next) {
gm(response.Body).size(function(err, size) { // <--- gm USED HERE
.
.
.
Everything works, but for multipage pdfs, gm is generating a thumbnail from the last page of the pdf. How do I get the [0] in there? I did not see a page selector in the gm documentation as all their examples used filenames, not streams I believe there should be an API, but I have not found one.
(Note: the [0] is really important not only because the last page of multipage PDFs are sometimes blank, but I noticed when running gm on the command line with large pdfs, the [0] returns very quickly while without the [0] the whole pdf is scanned. On AWS Lambda, it's important to finish quickly to save on resources and avoid timeouts!)

You can use .selectFrame() method, which is equivalent to specifying [0] directly in file name.
In your code:
function transform(response, next) {
gm(response.Body)
.selectFrame(0) // <--- select the first page
.size(function(err, size) {
.
.
.
Don't get confused about the name of function. It work not only with frames for GIFs, but also works just fine with pages for PDFs.
Checkout this function source on GitHub.
Credits to #BenFortune for his answer to similar question about GIFs first frame. I've took it as inspiration and tested this solution with PDFs, it actually works.
Hope it helps.

Related

Do Quarto publish on quarto-pub (book format) images from web links?

I want to publish with the command
quarto::quarto_publish_site()
my book-website.
The book-website is already setup on quarto-pub. If I don't add any image as a web link, the website runs and can be uploaded.
Now I add any image as a weblink, this is a exemplary code
![](https://www.website.com/wp-content/uploads/sites/2/2022/04/picture.jpg)
When I render it locally, it works. When I launch the command to publish it
compilation failed- error Unable to load picture or PDF file 'https://www.website.com/wp-content/uploads/sites/2/2022/04/picture.jpg'.
The publishing process is interrupted after this error. This is exactly the same if I launch the command from Terminal.
Is this intended to prevent to publish on quarto-pub links from other websites?
Or I can do something to avoid to download all these pics?
Including images via URL is not supposed to work for PDF output, which is not a Quarto issue but comes from how Pandoc translates !()[] to LaTeX.
Instead, you could automatically generate a local copy of the file (if not available) and then include the image in an R code chunk like this:
```{r, echo=FALSE, fig.cap='Kid', dpi=100}
if(!file.exists("kid.jpg")) {
download.file(
url = "https://edit.co.uk/uploads/2016/12/Image-1-Alternatives-to-stock-photography-Thinkstock.jpg",
destfile = "kid.jpg",
mode = "auto")
}
knitr::include_graphics("kid.jpg")
```
(of course, including the image via !()["kid.jpg"] at different location will work too once the file exists locally.)

Retrieving the progress of getObject (aws-sdk)

I'm using node.js with the aws-sdk (for S3).... When I am downloading a huge file from s3, how can I regularly retrieve the progress of the download so that the front-end can show a progress bar? Currently I am using getObject. (https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getObject-property)
The code to download the file works. Here's a snippet of my code...
return await new Promise((resolve, reject) => {
this.s3.getObject(params, (error, data) => {
if (error) {
reject(error);
} else {
resolve(data.Body);
}
});
I'm just not sure how to hook into the progress as it's downloading. Thanks in advance for any insight!
You can utilize S3 byte-range fetching which allows the fetching of small parts of a file in S3. This capability then allows us to fetch large objects by dividing the file download into multiple parts which brings the following advantages:
Part download failure does not require full re-downloading of the file.
Download pause/resume capability.
Download progress tracking
Retry packets that failed or interrupted by network issues
Sniff headers located in the first few bytes of the file if we just need to get metadata from the files.
You can split the file download by your size of choice (I propose 1-4mb at a time) and download the parts chunk by chunk, when each of the get object promises complete, you can trace how many have completed. A good start is by looking at the AWS documentation.
STREAMING OPTION
Another option is to use a stream and track the amount of bytes received:
const { ContentLength: contentLength } = await s3.headObject(params).promise();
const rs = s3.getObject(s3Params).createReadStream();
let progress = 0;
rs.on('data', function (chunk) {
// Advance your progress by chunk.length
progress += chunk.length;
console.log(`Progress: ${progress / contentLength}%`);
});
// ... pipe to write stream

file payload in google cloud function bucket triggered

I have a question about a Google Cloud functions triggered by an event on a storage bucket (I’m developing it in Python).
I have to read the data of the file just finalized (a PDF file) on the bucket that is triggering the event and I was looking for the file payload on the event object passed to my function (data, context) but it seems there is not payload on that object.
Do I have to use the cloud storage library to get the file from the bucket ? Is there a way to get the payload directly from the context of the triggered function ?
Enrico
From checking the more complete examplein the Firebase documentation, it indeed seems that the payload of the file is not included in the parameters. That make sense, since there's no telling how big the file is that was just finalized, and if that will even fit in the memory of your Functions runtime.
So you'll have to indeed grab the file from the bucket with a separate call, based on the information in the metadata. The full Firebase example grabs the filename and other info from its context/data with:
exports.generateThumbnail = functions.storage.object().onFinalize(async (object) => {
const fileBucket = object.bucket; // The Storage bucket that contains the file.
const filePath = object.name; // File path in the bucket.
const contentType = object.contentType; // File content type.
const metageneration = object.metageneration; // Number of times metadata has been generated. New objects have a value of 1.
...
I'll see if I can find a more complete example. But I'd expect it to work similarly on raw Google Cloud Functions, which Firebase wraps, even when using Python.
Update: from looking at this Storage/Function/PubSub documentation that the Python binding is apparently based on, it looks like the the path should be available as data['resource'] or as data['name'].

Ionic 2 not seeing images

Im currently working with Ionic 2.0.0-beta.32. After some searching around, I found the following code to add images to my project then at build time have them auto import into the final build
I add a directory in app called img, and place all my images in there, then the code in the gulpfile is as follows
gulp.task('images', function() {
return gulp.src(['app/img/*'])
.pipe(gulp.dest('www/build/img'));
});
and also all the runSequences
runSequence(
['images', 'sass', 'html', 'fonts', 'scripts'],
This all works good and moves the images to the www at build time, but when i run
ionic serve --lab
none of the images show, I've tried using the following
../img/imgname
/img/imgname
img/imgname
build/img/imgname
None of the above shows my image.
Any help would be great, im pulling out my hair here
Do you want to use Gulp to modify your images (i.e., reduce their sizes) or something like that? If the answer is no, then you don't need to add that task to your gulp file
gulp.task('images', function() {
return gulp.src(['app/img/*'])
.pipe(gulp.dest('www/build/img'));
});
Why? Because if the image won't be changed, it doesn't make sense that each time that you run your tasks, the images gets copied again and again (without changes).
A simple way to work with images would be to put your images in a www\images folder, and then reference them in your code like this:
<img src="images/myImage.png" />
Since they're in the www folder, they're not going to be deleted (which happens if you place them in the build folder of your app).

Strange behaviour when uploading scaled file using Fine Uloader

I have implemented FileUploader 4.4 and it works perfectly when uploading multiple files using Coldfusion.
The end point code is very simple and looks like this:
<cffile action="upload"
destination="#application.Config.imageDir#"
nameconflict="overwrite"
filefield="FORM.qqFile" />
<cfif CFFILE.contenttype EQ "image" OR ListFindNoCase("jpg,jpeg,gif,png", CFFILE.serverFileExt)>
<cfset local.fileName = CFFILE.serverFile />
</cfif>
Whenever I upload single, or multiple images, the local.filename variable is correctly set to the image file name value that you usually see in qqfilename, for example "image0001.jpg"
The javasript code to send this data is simply:
$('#fine-uploader').fineUploader({
request: {
endpoint: '<cfoutput>#application.Config.fineUploaderProxy#?cfc=#cfcName#&functionName=#functionName#</cfoutput>'
}
});
However, as soon as I add scaling, then a strange behaviour starts occurring. The scaled version is sent to my upload handler, however, the file name that is being sent is always "blob" for every scaled image, instead of the name I would expect, being something like
"image0001(small).jpg"
The code I add to activate the scaling is simply:
$('#fine-uploader').fineUploader({
scaling: {
sizes: [
{name: "small", maxSize: 50}
]
},
request: {
endpoint: '<cfoutput>#application.Config.fineUploaderProxy#?cfc=#cfcName#&functionName=#functionName#</cfoutput>'
}
});
Could someone please help me as to why the filename "blob" is being with the qqfile instead of the actual file name? I am using the latest version of Chrome.
Thanks
This is due to the fact that Fine Uploader doesn't actually send a File when it generates a scaled version off a reference file. The entity is specifically a Blob. While a File has a name property, a Blob does not. It is because of this that the browser, when constructing the multipart segment for a Blob, sets the filename parameter to "blob". There are ways to overcome this, but not reliably cross browser. So, Fine Uploader will always send the actual file name in a "qqfilename" parameter. Your server should look at this value to reliably determine the files name in all cases.