Write into a file in S3 directly - amazon-web-services

Write into a file in S3 directly - amazon-web-services

I have a requirement where I need to write a long String log for each execution of my lambda function, this String is basically log having record of failed and success cases.
as this code is for lambda its not possible to create file in a physical location like local file directly and upload to s3. but I need to directly create a file and write the String in one go.
Thanks in advance.

AWS Lambda functions can write data to the /tmp/ directory. There is a 512MB limit, so delete any files before your Lambda function exits so that it is clear for the next function execution.
You can therefore create a 'local' file and upload it to Amazon S3.
Alternatively, you can use the Amazon S3 API to create an object (PutObject()) while specifying the Body content without creating an actual file. This might be easier for you.
By the way, if your Lambda function prints output, it will automatically be captured by the Lambda function and stored in CloudWatch Logs. It will be intermixed with other messages from Lambda (eg showing the amount of memory used), but it's an easy way to log information.

Related

How to move file to a sublevel in s3 without triggering lambda?

An excel file (say my_excel_file.xlsx) will be uploaded to s3://my-bucket/a/b/
A trigger is set in lambda with following properties:
bucket-name: my-bucket, prefix: a/b/
I want my lambda to:
Read the excel file uploaded to s3://my-bucket/a/b/ into a pandas dataframe
After processing it, move the excel file to s3://my-bucket/a/b/archive/ with the name: my_excel_file_timestamp.xlsx
In case I am able to achieve the above step, will the lambda get invoked recursively? If yes, is there a workaround?

Since Amazon S3 event is configured to trigger on prefix a/b/, then it will trigger the AWS Lambda function when an object is placed into a/b/archive/.
I recommend adding a line of code at the top of the Lambda function that checks the Key, which is passed to the function via the event parameter. It should check if the Key starts with a/b/archive/ (or similar rule) -- if so, it should exit the function immediately. This will not incur a significant cost because it will exit quickly and Lambda is only charged per millisecond.
The alternative is to put your archive folder in a different location.

AWS Lambda avoid recursive trigger

I'm downloading data from an API and writing it to a csv file that I store in an S3 bucket. I'm then copying my file from this input bucket into an output bucket with a Lambda function. From the output bucket I'm ingesting it into a MySQL RDS instance with another Lambda function.
The copy-to-another-bucket and upload-to-RDS lambda functions both get triggered when I create a new object in a bucket. Since I'm appending to my csv file, the upload-to-RDS function gets triggered way more than it should and I end up with ~30 rows in my database instead of 6.
I thought by copying the files between S3 buckets I could avoid this, but it doesn't help. Is there any way to only upload the csv file to the database once it has been written and not while it's being updated? Can I delay the trigger maybe?
The only other solution I can think of is to skip the copy-to-another-bucket function altogether and to schedule the upload-to-RDS function.

You need to realize that S3 doesn't support updating an existing file. If you are appending a row to an existing CSV file in S3, then that operation requires uploading the entire contents of the CSV file to S3 again, which S3 sees as a new object.
If you need to store a temporary version of the CSV file in S3 while you are updating it, then you should store it in a separate path, like s3://your_bucket/tmp and then when you have completed your updates, move it to the final path like s3://your_bucket/complete and only configure the Lambda trigger on the /complete path.

Is there a way to pass file size as an input parameter from an AWS S3 bucket to a StepFunctions state machine?

I'm currently struggling to configure automated input to my AWS StepFunctions state machine. Basically, I am trying to set up a state machine that is notified whenever an object create event takes place in a certain S3 bucket. When that happens, the input is passed to a choice state which checks the file size. If the file is small enough, it invokes a Lambda function to process the file contents. If the file is too large, it invokes a Lambda to split up the file into files of manageable size, and then invokes the other Lambda to process the contents of those files. The problem with this is that I cannot figure out a way to pass the file size in as input to the state machine.
I am generally aware of how input is passed to StepFunctions, and I know that S3 Lambda triggers contain file size as a parameter, but I still haven't been able to figure out a practical way of passing file size as an input parameter to a StepFunctions state machine.
I would greatly appreciate any help on this issue and am happy to clarify or answer any questions that you have to the best of my ability. Thank you!

Currently S3 events can't triggers Step Function directly, so one option would be to create a S3 event that triggers a lambda. The lambda works as a proxy and passes the file info to the step function and kicks it off, also you can select data you want and only pass selective data to Step Functions.
The other option is to configure a state machine as a target for a CloudWatch Events rule. This will start an execution when files are added to an Amazon S3 bucket.
The first option is more flexible.

AWS Lambda generates large size files to S3

Currently we are having a aws lambda (java based runtime) which takes a SNS as input and then perform business logic and generate 1 XML file , store it to S3.
The implementation now is create the XML at .tmp location which we know there is space limitation of aws lambda (500mb).
Do we have any way to still use lambda but can stream XML file to S3 without using .tmp folder?
I do research but still do not find solution for it.
Thank you.

You can directly load an object to s3 from memory without having to store it locally. You can use the put object API for this. However, keep in mind that you still have time and total memory limits with lambda as well. You may run out of those too if your object size is too big.

If you can split the file into chunks and don't require to update the beginning of the file while working with its end you can use multipart upload providing a ready to go chunk and then free the memory for the next chunk.
Otherwise you still need a temporary storage for form all the parts of the XML. You can use DynamoDB or Redis and when you collect there all the parts of the XML you can start uploading it part by part, then cleanup the db (or set TTL to automate the cleanup).

AWS Lambda function getting called repeatedly

I have written a Lambda function which gets invoked automatically when a file comes into my S3 bucket.
I perform certain validations on this file, modify the particular and put the file at the same location.
Due to this "put", my lambda is called again and the process goes on till my lambda execution times out.
Is there any way to trigger this lambda only once?
I found an approach where I can store the file name in DynamoDB and can apply a check in lambda function, but can there be any other approach where DynamoDB's use can be avoided?

You have a couple options:
You can put the file to a different location in s3 and delete the original
You can add a metadata field to the s3 object when you update it. Then check for the presence of that field in s3 so you know if you have processed it already. Now this might not work perfectly since s3 does not always provide the most recent data on reads after updates.

AWS allows different type of s3 event triggers. You can try playing s3:ObjectCreated:Put vs s3:ObjectCreated:Post.

You can upload your files in a folder, say
s3://bucket-name/notvalidated
and store the validated in another folder, say
s3://bucket-name/validated.
Update your S3 Event notification to invoke your lambda function whenever there is a ObjectCreate(All) event in the /notvalidated prefix.

The second answer does not seem to be correct (put vs post) - there is not really a concept of update in S3 in terms of POST or PUT. The request to update an object will be the same as the initial POST of the object. See here for details on the available S3 events.
I had this exact problem last year - I was doing an image resize on PUT and every time a file was overwritten, it would be triggered again. My recommended solution would be to have two folders in your s3 bucket - one for the original file and one for the finalized file. You could then create the lambda trigger with the lambda prefix so it only checks the files in the original folder

The events are triggered in S3 based on if the object is put/post/copy/complete Multipart Upload - All these operations corresponds to ObjectCreate as per AWS documentation .
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
The best solution is to restrict your S3 object create event to particular bucket location. So that any change in that bucket location will trigger lambda function.
You can do the modification in some other bucket location which is not configured to trigger lambda function when object is created in that location.
Hope it helps!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Write into a file in S3 directly - amazon-web-services

Related

How to move file to a sublevel in s3 without triggering lambda?

AWS Lambda avoid recursive trigger

Is there a way to pass file size as an input parameter from an AWS S3 bucket to a StepFunctions state machine?

AWS Lambda generates large size files to S3

AWS Lambda function getting called repeatedly

Categories

Resources