I'm looking to allow multiple clients can upload files to an S3 bucket (or buckets). The S3 create event would trigger a notification that would add a message to an SNS topic. This works, but I'm having issues deciding how to identify which client uploaded the file. I could get this to work by explicitly checking the uploaded file's subfolder/S3 name, but I'd much rather automatically add the client identifier as an attribute to the SNS message.
Is this possible? My other thought is using a Lambda function as a middle man to add the attribute and pass it along to the SNS Topic, but again I'd like to do it without the Lambda function if possible.
The Event Message Structure sent from S3 to SNS includes a field:
"userIdentity":{
"principalId":"Amazon-customer-ID-of-the-user-who-caused-the-event"
},
However, this also depends upon the credentials that were used when the object was uploaded:
If users have their individual AWS credentials, then the Access Key will be provided
If you are using a pre-signed URL to permit the upload, then the Access Key will belong to the one used in the pre-signed URL and your application (which generated the pre-signed URL) would be responsible for tracking the user who requested the upload
If you are generating temporary credentials for each client (eg by calling AssumeRole, then then Role's ID will be returned
(I didn't test all the above cases, so please do test them to confirm the definition of Amazon-customer-ID-of-the-user-who-caused-the-event.)
If your goal is to put your own client identifier in the message, then the best method would be:
Configure the event notification to trigger a Lambda function
Your Lambda function uses the above identifier to determine which user identifier within your application triggered the notification (presumably consulting a database of application user information)
The Lambda function sends the message to SNS or to whichever system you wish to receive the message (SNS might not be required if you send directly)
You can add user-defined metadata to your files before you upload the file like below:
private final static String CLIENT_ID = "client-id";
ObjectMetadata meta = new ObjectMetadata();
meta.addUserMetadata(CLIENT_ID, "testid");
s3Client.putObject(<bucket>, <objectKey>, <inputstream of the file>, meta);
Then when downloading the S3 files:
ObjectMetadata meta = s3Client.getObjectMetadata(<bucket>, <objectKey>);
String clientId = meta.getUserMetaDataOf(CLIENT_ID);
Hope this is what you are looking for.
Related
I use the S3 SDK CPP and have the following cenario:
I get some information sent from a client to my server (client wants to download from S3)
With the information sent I create a S3 key
I want to check if the key exists (has a file) on the S3
I create a presigned URL that allows the client to download a file from S3
Send URL to client
Client downloads the file
Before I execute step 4 I want to check if the key really exists on the S3. The client can't download a file that does not exist anyway.
I have an AWS::S3Client object. Do I really need to create a TransferManager for this or is there a simple way to handle this with the client object?
The client itself does not have a relation to S3 so I can't check it there. The server has to do all the work.
I found a working solution:
auto client = Aws::MakeShared<Aws::S3::S3Client>("client", getCredentials(), getClientConfig());
Aws::S3::Model::HeadObjectRequest request;
request.WithBucket(<bucketname>).WithKey(<s3key>);
const auto response = client->HeadObject(request);
response.IsSuccess(); //Is key existing on s3?
Issue an authenticated HTTP HEAD request against the object. You can use:
HeadObject
HeadObjectAsync
To quote:
The HEAD operation retrieves metadata from an object without returning
the object itself. This operation is useful if you're only interested
in an object's metadata. To use HEAD, you must have READ access to the
object.
Specifically, in an origin response triggered function (EX. With 404 Status), how can I read an HTML file stored in S3 and use its content for the response body?
(I would like to manually return a custom error page just as CloudFront does, but choosing it based on cookies).
NOTE: The HTML file in S3 is stored in the same bucket of my website. OAI Enabled.
Thank you very much!
Lambda#Edge functions don't currently¹ have direct access to any body content from the origin.
You will need to grant your Lambda Execution Role the necessary privileges to read from the bucket, and then use s3.getObject() from the JavaScript SDK to fetch the object from the bucket, then use its body.
The SDK is already in the environment,² so you don't need to bundle it with your code. You can just require it, and create the S3 client globally, outside the handler, which saves time on subsequent invocations.
'use strict';
const AWS = require('aws-sdk');
const s3 = new AWS.S3({ region: 'us-east-2' }); // use the correct region for your bucket
exports.handler ...
Note that one of the perceived hassles of updating a Lambda#Edge function is that the Lambda console gives the impression that redeploying it is annoyingly complicated... but you don't have to use the Lambda console to do this. The wording of the "enable trigger and replicate" checkbox gives you the impression that it's doing something important, but it turns out... it isn't. Changing the version number in the CloudFront configurarion and saving changes accomplishes the same purpose.
After you create a new version of the function, you can simply go to the Cache Behavior in the CloudFront console and edit the trigger ARN to use the new version number, then save changes.
¹currently but I have submitted this as a feature request; this could potentially allow a response trigger to receive a copy of the response body and rewrite it. It would necessarily be limited to the maximum size of the Lambda API (or smaller, as generated responses are currently limited), and might not be applicable in this case, since I assume you may be fetching a language-specific response.
²already in the environment. If I remember right, long ago, Lambda#Edge didn't include the SDK, but it is always there, now.
I am creating temporary credentials via AWS Security Token Service (AWS STS).
And Using these credentials to upload a file to S3 from S3 JAVA SDK.
I need some way to restrict the size of file upload.
I was trying to add policy(of s3:content-length-range) while creating a user, but that doesn't seem to work.
Is there any other way to specify the maximum file size which user can upload??
An alternative method would be to generate a pre-signed URL instead of temporary credentials. It will be good for one file with a name you specify. You can also force a content length range when you generate the URL. Your user will get URL and will have to use a specific method (POST/PUT/etc.) for the request. They set the content while you set everything else.
I'm not sure how to do that with Java (it doesn't seem to have support for conditions), but it's simple with Python and boto3:
import boto3
# Get the service client
s3 = boto3.client('s3')
# Make sure everything posted is publicly readable
fields = {"acl": "private"}
# Ensure that the ACL isn't changed and restrict the user to a length
# between 10 and 100.
conditions = [
{"acl": "private"},
["content-length-range", 10, 100]
]
# Generate the POST attributes
post = s3.generate_presigned_post(
Bucket='bucket-name',
Key='key-name',
Fields=fields,
Conditions=conditions
)
When testing this make sure every single header item matches or you'd get vague access denied errors. It can take a while to match it completely.
I believe there is no way to limit the object size before uploading, and reacting to that would be quite hard. A workaround would be to create an S3 event notification that triggers your code, through a Lambda funcation or SNS topic. That could validate or delete the object and notify the user for example.
AWS lambda makes it possible to run code in response to events, such as the uploading of a file to s3. However, the lambda callback notifies the event invoker, and not the user who initiated the event.
Consider the following scenario:
A user uploads a file to s3
That file is processed
User receives notification that the processing is complete
How would you do this with AWS lambda?
When uploading the file, add the email address or other identifier to the object as Object User-Defined Metadata.
When uploading an object, you can also assign metadata to the object.
You provide this optional information as a name-value (key-value) pair
when you send a PUT or POST request to create the object. When
uploading objects using the REST API the optional user-defined
metadata names must begin with "x-amz-meta-" to distinguish them from
other HTTP headers. When you retrieve the object using the REST API,
this prefix is returned. When uploading objects using the SOAP API,
the prefix is not required. When you retrieve the object using the
SOAP API, the prefix is removed, regardless of which API you used to
upload the object.
When the Lambda function completes the file processing, it can read that same metadata, and send an appropriate notification to the user.
I am considering moving to lambdas and after spending some time reading docs and various blogs with user experiences I am still struggling with a simple question. Is there a proposed/proper way to use lambda with existing s3 files?
I have an s3 bucket that contains archived data spanning a couple of years. The size of these data is rather large (hundreds of GB). Each file is a simple txt file. Each line in the file represents an event and it's just a comma separated string.
My endgame is to consume these files, parse each one of them line by line apply some transformation, create batches of lines and send them to an external service. From what I've read so far, if I write a proper lambda, this will be triggered by an s3 event (for example an upload of a new file).
Is there a way to apply the lambda to all the existing contents of my bucket?
Thanks
For existing resources you would need to write a script that gets a listing of all your resources and sends each item to a Lambda function somehow. I'd probably look into sending the location of each of your existing S3 objects to a Kenesis stream and configure a Lambda function to pull records from that stream and process them.
Try using s3cmd.
s3cmd modify --recursive --add-header="touched:touched" s3://path/to/s3/bucket-or-folder
This will modify metadata and invoke an event for lambda
I had a similar problem I solved it with minimal changes to my existing Lambda function. The solution involves creating API Gateway trigger (in addition to S3 trigger) - the API gateway trigger is used to process historical files in S3 & the regular S3 trigger will processes my files as new files are uploaded to my S3 bucket.
Initially - I started by building my function to expect a S3 event as trigger. Recall that the S3 events have this structure - so I would look for the S3 bucket name and key to process - like so:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'], encoding='utf-8')
temp_dir = tempfile.TemporaryDirectory()
video_filename = os.path.basename(key)
local_video_filename = os.path.join(temp_dir.name, video_filename)
s3_client.download_file(bucket, key, local_video_filename)
But when you send the API Gateway trigger there is no "Records" object in the request/event. You can use query parameters in the API Gateway Trigger - so the modification required to the above snippet of code is:
if 'Records' in event:
# this means we are working off of an S3 event
records_to_process = event['Records']
else:
# this is for ad-hoc posts via API Gateway trigger for Lambda
records_to_process = [{
"s3":{"bucket": {"name": event["queryStringParameters"]["bucket"]},
"object":{"key": event["queryStringParameters"]["file"]}}
}]
for record in records_to_process:
# below lines of code s same as the earlier snippet of code
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'], encoding='utf-8')
temp_dir = tempfile.TemporaryDirectory()
video_filename = os.path.basename(key)
local_video_filename = os.path.join(temp_dir.name, video_filename)
s3_client.download_file(bucket, key, local_video_filename)
Postman result of sending the post request
Try to copy your bucket content and catch create events with lambda.
copy:
s3cmd sync s3://from/this/bucket/ s3://to/this/bucket
for larger buckets:
https://github.com/paultuckey/s3_bucket_to_bucket_copy_py