file payload in google cloud function bucket triggered - google-cloud-platform

I have a question about a Google Cloud functions triggered by an event on a storage bucket (I’m developing it in Python).
I have to read the data of the file just finalized (a PDF file) on the bucket that is triggering the event and I was looking for the file payload on the event object passed to my function (data, context) but it seems there is not payload on that object.
Do I have to use the cloud storage library to get the file from the bucket ? Is there a way to get the payload directly from the context of the triggered function ?
Enrico

From checking the more complete examplein the Firebase documentation, it indeed seems that the payload of the file is not included in the parameters. That make sense, since there's no telling how big the file is that was just finalized, and if that will even fit in the memory of your Functions runtime.
So you'll have to indeed grab the file from the bucket with a separate call, based on the information in the metadata. The full Firebase example grabs the filename and other info from its context/data with:
exports.generateThumbnail = functions.storage.object().onFinalize(async (object) => {
const fileBucket = object.bucket; // The Storage bucket that contains the file.
const filePath = object.name; // File path in the bucket.
const contentType = object.contentType; // File content type.
const metageneration = object.metageneration; // Number of times metadata has been generated. New objects have a value of 1.
...
I'll see if I can find a more complete example. But I'd expect it to work similarly on raw Google Cloud Functions, which Firebase wraps, even when using Python.
Update: from looking at this Storage/Function/PubSub documentation that the Python binding is apparently based on, it looks like the the path should be available as data['resource'] or as data['name'].

Related

Invoking binary in aws lambda with rust

So I have the following rust aws lambda function:
use std::io::Read;
use std::process::{Command, Stdio};
use lambda_http::{run, service_fn, Body, Error, Request, RequestExt, Response};
use lambda_http::aws_lambda_events::serde_json::json;
/// This is the main body for the function.
/// Write your code inside it.
/// There are some code example in the following URLs:
/// - https://github.com/awslabs/aws-lambda-rust-runtime/tree/main/examples
async fn function_handler(_event: Request) -> Result<Response<Body>, Error> {
// Extract some useful information from the request
let program = Command::new("./myProggram")
.stdout(Stdio::piped())
.output()
.expect("failed to execute process");
let data = String::from_utf8(program.stdout).unwrap();
let parsed = data.split("\n").filter(|x| !x.is_empty()).collect::<Vec<&str>>();
// Return something that implements IntoResponse.
// It will be serialized to the right response event automatically by the runtime
let resp = Response::builder()
.status(200)
.header("content-type", "application/json")
.body(json!(parsed).to_string().into())
.map_err(Box::new)?;
Ok(resp)
}
#[tokio::main]
async fn main() -> Result<(), Error> {
tracing_subscriber::fmt()
.with_max_level(tracing::Level::INFO)
// disable printing the name of the module in every log line.
.with_target(false)
// disabling time is handy because CloudWatch will add the ingestion time.
.without_time()
.init();
run(service_fn(function_handler)).await
}
The idea here is that I want to return the response from the binary in JSON format.
I'm compiling the function with cargo lambda which is producing bootstrap file, then I'm zipping it manually by including the bootstrap binary and the myProgram binary.
When I test my function in the lambda panel by sending event to it I get the response with the right headers etc. but the response body is empty.
I'm deploying my function thru the aws panel, on Custom runtime on Amazon Linux 2 by uploading the zip file.
When I test locally with cargo lambda watch and cargo lambda invoke the response body is filled with the myProgram stdout parsed to json.
Any ideas or thoughts on what goes wrong in the actual cloud are much appreciated!
My problem was with the dynamically linked libraries in the binary. It is actually python binary and it was missing specific version of GLIBC.
The easiest solution in my case was to compile myProgram on Amazon Linux 2

How to fetch a process variable of type File in an external JS worker?

The case:
I have a process deployed on Camunda engine (running on localhost), which is triggered by an API call.
In the API call, a process variable of type File is sent, as shown below:
"file" : {
"value": base64str,
"type":"File",
"valueInfo":{
"filename":nameOfFile, //this variable stores the name of the file that the user uploaded in the form on the website, e.g., *myAttachment.pdf
* "mimetype":typeOfFile, //this variable stores the mimetype of the file that the user uploaded in the form on the website, e.g., *application/pdf*
"encoding":"UTF-8"
}
};
Once the process is triggered, a process variable "file" of type "File" is created.
Then, I have an external worker written in javascript. In this worker, I want to fetch the base64 value of the process variable "file" of type "File".
Then, once I have the base64 value of the process variable "file", I will have a REST API call to a third system to pass the attachment there.
In order to complete this logic, I need somehow to complete the step 4. However, I can't get the base64 value of the process variable file. How could I solve this?
I have tried the following, but it does not work.
var file=task.variables.get("file");
var content=file.content;

Can't upload file to aws s3 asp.net

fileTransferUtility = new TransferUtility(s3Client);
try
{
if (file.ContentLength > 0)
{
var filePath = Path.Combine(Server.MapPath("~/Files"), Path.GetFileName(file.FileName));
var fileTransferUtilityRequest = new TransferUtilityUploadRequest
{
BucketName = bucketName,
FilePath = filePath,
StorageClass = S3StorageClass.StandardInfrequentAccess,
PartSize = 6291456, // 6 MB.
Key = keyName,
CannedACL = S3CannedACL.PublicRead
};
fileTransferUtilityRequest.Metadata.Add("param1", "Value1");
fileTransferUtilityRequest.Metadata.Add("param2", "Value2");
fileTransferUtility.Upload(fileTransferUtilityRequest);
fileTransferUtility.Dispose();
}
Im getting this error
The file indicated by the FilePath property does not exist!
I tried changing the path to the actual path of the file to C:\Users\jojo\Downloads but im still getting the same error.
(Based on a comment above indicating that file is an instance of HttpPostedFileBase in a web application...)
I don't know where you got Server.MapPath("~/Files") from, but if file is an HttpPostedFileBase that's been uploaded to this web application code then it's likely in-memory and not on your file system. Or at best it's on the file system in a temp system folder somewhere.
Since your source (the file variable contents) is a stream, before you try to interact with the file system you should see if the AWS API you're using can accept a stream. And it looks like it can.
if (file.ContentLength > 0)
{
var transferUtility = new TransferUtility(/* constructor params here */);
transferUtility.Upload(file.InputStream, bucketName, keyName);
}
Note that this is entirely free-hand, I'm not really familiar with AWS interactions. And you'll definitely want to take a look at the constructors on TransferUtility to see which one meets your design. But the point is that you're currently looking to upload a stream from the file you've already uploaded to your web application, not looking to upload an actual file from the file system.
As a fallback, if you can't get the stream upload to work (and you really should, that's the ideal approach here), then your next option is likely to save the file first and then upload it using the method you have now. So if you're expecting it to be in Server.MapPath("~/Files") then you'd need to save it to that folder first, for example:
file.SaveAs(Path.Combine(Server.MapPath("~/Files"), Path.GetFileName(file.FileName)));
Of course, over time this folder can become quite full and you'd likely want to clean it out.

Boto3/S3/Django: Avoid blocking the event loop while file hash being calculated

From a gunicorn+gevent worker, making (multipart) uploads to S3 via django-storages, how can I make sure that during the calculation of the hash of file/part contents, the event loop is not blocked?
Looking in the source for django-storages, there doesn't really seem to be any scope for passing in the md5(/another hash), and in the source for botocore there doesn't seem to be scope for yielding the event loop while the hash is being calculated.
There is now a PR for django-storages that calculates the MD5 hash as data received by the application.
In summary, this changes S3Boto3StorageFile so on init and new part
self._file_md5 = hashlib.md5()
and then on receive of content_bytes
self._file_md5.update(content_bytes)
and then when each part is uploaded, the ContentMD5 parameter is passed to the part upload function
ContentMD5=base64.b64encode(self._file_md5.digest()).decode('utf-8')

How to check if MTOM attachment is empty

I am developing webservice based on CXF. One of the requests is that client should be able to upload the optional PDF file as a part of message. This was pretty trivial:
I have added this with getter and setter to my transfer object:
#XmlMimeType("application/octet-stream")
#XmlElement(name = "InvoicePdf", required = false)
private DataHandler invoicePdf = null;
I have also enabled support for MTOM:
Endpoint endpoint = Endpoint.publish("/myWs", new WsImpl(getServletContext()));
SOAPBinding binding = (SOAPBinding) endpoint.getBinding();
binding.setMTOMEnabled(true);
And the usage:
DataHandler pdf2 = p_invoice.getInvoicePdf();
//pdf2.getInputStream();
//pdf2.writeTo(outputstream);
Everything works great. I am able to receive and process the file. However there might be the case when client do not upload the file since it is optional. The problem is that even though the client do not sent the file I am not able to notice it.
pdf2 is not null
pdf2.getInputStream() is not null
pdf2.getInputStream() contains some data. I would like to skip parsing the input stream and looking for PDF signature. Since it is a lot easier to forward the inputstrem to desired outpustream (write to file)
I have not found in DataHandler or DataSource (pdf2.getDataSource() ) API any appropriate method or field for determining file existance. I see in debug that the empty DataHandler contains DataSource which length is 9, which is a lot less then correct PDF file. Unfortunately the length property is not accessible at all.
Any idea how to determine if the file was sent or not?
The solution is to skip xml tag for this attachment in SOAP message. So my mistake was sending empty tag:
<InvoicePdf></InvoicePdf>
Then you get behavior described in question. However if you skip this tag entirely then DataHandel is null, so I am able to distinguish is file was sent or not.