How to Trigger codepipeline with s3 with dynamic s3 object key? - amazon-web-services

I have a bucket whose objects have commitId as their name. I want to pass these commitID to my codepipeline and utilise those id for slack messages.
I am trying to trigger Codepipeline when a zip file is uploaded to s3, however as I can see in the documents, it can only trigger with a static bucket key. I want to trigger with any file name
https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-S3.html
I am dealing with a use case where the uploaded object in s3 will have dynamic object keys.
How to deal with this situation?
I have read this question so I know using s3 with lambda and then trigger pipeline with lambda but this will still not work because I need to pass zip file to codebuild

TL;DR Have the Lambda record the ID in commit_id.txt and add it to the bundle.
I understand you want to execute a pipeline when an arbitrary object, say a5bf8c1.zip is added to a S3 path, say MyPipelineBucket/commits/. The pipeline has a S3 source, say MyPipelineBucket/source.zip. Your pipeline executions also require the file name value (a5bf8c1).
Set up S3 Event Notifications on the bucket. Apply object key name filtering on the MyPipelineBucket/commits/ prefix.
Set a Lambda Function as the destination
The Lambda receives the Commit ID in the event notification payload as the triggering file name. Write it to commit_id.txt file. Using the SDK, get the MyPipelineBucket/commits/a5bf8c1.zip bundle from S3. Add commit_id.txt to the bundle. Put the new bundle to MyPipelineBucket/source.zip. This will trigger an execution.
In your pipeline, your CodeBuild commands now have access to the Commit ID. For instance, you can set the Commit ID as an environment variable:
COMMIT_ID=$(cat commit_id.txt)
echo COMMIT_ID # -> a5bf8c1

Related

How to set up directory level triggers in AWS S3 for Lambda?

I have a directory structure as shown below
S3 Bucket
-logs/
-product1_log.txt
-product2_log.txt
-images/
-products/
There are a couple of directories mentioned above in the S3 bucket, now whenever a new file gets added to the logs folder, I have a lambda function that updates the timestamp in my MongoDB.
Requirement
Trigger lambda function only when logs folder gets updated, update to other folders should not trigger the lambda
Exact same use case described in the below link.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-filtering.html

get file details whenever a file land in s3 bucket & then trigger a glue job for each individual files

I have a general glue job which will execute based on the file name that is landing in s3 bucket. So right now we are creating an event based trigger which will execute a workflow. (Event is whenever a file land in s3). Is there any way to get the file details like file name or s3 uri and then pass this info to the workflow as a parameter.
how can i use this fiile name as a parameter for my glue job?
The AWS docs explain how to pass arguments to a glue job and how to read them in the job itself. So basically your lambda would call start_job_run and pass Arguments (e.g. S3 object name) to your job. The job, when started, would read the arguments passed and perform actions based on them.

AWS Services: Use cloudwatch s3 put event to trigger batch job with the file url as an environemnt variable

Objective:
Whenever an object is stored in the bucket, trigger a batch job (aws batch) and pass the uploaded file url as an environment variable
Situation:
I currently have everything set up. I've got the s3 bucket with cloudwatch triggering batch jobs, but I am unable to get the full file url or to set environment variables.
I have followed the following tutorial: https://docs.aws.amazon.com/batch/latest/userguide/batch-cwe-target.html "To create an AWS Batch target that uses the input transformer".
The job is created and processed in AWS batch, and under the job details, i can see the parameters received are:
S3bucket: mybucket
S3key: view-0001/custom/2019-08-07T09:40:04.989384.json
But the environment variables have not changed, and the file URL does not contain all the other parameters such as access and expiration tokens.
I have also not found any information about what other variables can be used in the input transformer. If anyone has a link to a manual, it would be welcome.
Also, in the WAS CLI documentation, it is possible to set the environment variables when submitting a job, so i guess it should be possible here as well? https://docs.aws.amazon.com/cli/latest/reference/batch/submit-job.html
So the question is, how to submit a job with the file url as an environment variable?
You could accomplish this by triggering a Lambda function off the bucket and generating a pre-signed URL in the Lambda function and starting a Batch job from the Lambda function.
However, a better approach would be to simply access the file within the Batch function using the bucket and key. You could use the AWS SDK for your language or simply use awscli. For example you could download the file:
aws s3 cp s3://$BUCKET/$KEY /tmp/file.json
On the other hand, if you need a pre-signed URL outside of the Batch function, you could generate one with the AWS SDK or awscli:
aws s3 presign s3://$BUCKET/$KEY
With either of these approaches with accessing the file within the Batch job, you will need to configure the instance role of your Batch compute environment with IAM access to your S3 bucket.

How to access newly added file in S3 bucket using python

I am new to aws I access files in S3 using python I need to access newly added files in the bucket. Instead of accessing every data present in the bucket. Can anyone help me to reslove this issue?. Thank you
Take a look at S3 event notifications. You can have all new files be sent to an SQS queue and have your python program read from the queue to get new files. There is not a way to query S3 for new files. Or if you could create a lambda function to operate on new files as well depending on what you need to do with new files.
You can create a lambda with your business logic in python and finally you create an event in s3:
Enter S3 bucket
Select "Properties" tab
Advance settign section -> Events
New Event:
Important: You must have the lambda created.
Diagram:

Does AWS S3 Have a concept of files being 'updated'?

I'd like to write a Lambda function that is triggered when files are added or modified in an s3 bucket and processes them and moves them elsewhere, clobbering older versions of the files.
I'm wondering if AWS Lambda can be configured to trigger when files are updated?
After reviewing the Boto3 documentation for s3 it looks like the only things that could happen in a s3 bucket would be creations and deletions.
Additionally, the AWS documentation seems to indicate there is no way to trigger things on 'updates' to S3.
Am I correct in thinking there is no real concept of an 'update' to a file in S3 and that an update would actually be when something was destroyed and recreated? If I'm mistaken, how can I trigger a Lambda function when an S3 file is changed in a bucket?
No, there is no concept of updating a file on S3. A file on S3 is updated the same way it is uploaded in the first place - through a PUT object request. (Relevant answer here.) An S3 bucket notification configured to trigger on a PUT object request can execute a Lambda function.
There is now a new functionality for S3 buckets. Under properties there is the possibility to enable versioning for this bucket. And if you set a trigger for creating on S3 assigned to your Lambda function - this will executed every time if you 'update' the same file as it is a new version.