I have a system where after a file is uploaded to S3, a Lambda job raises a queue message and I use it to maintain a list of keys in a MySQL table.
I am trying to generate a pre-signed URL based on the records in my table.
i have two records currently
/41jQnjTkg/thumbnail.jpg
/41jQnjTkg/Artist+-+Song.mp3
Generating pre-signed URL using :
var params = {
Bucket: bucket,
Expires: Settings.UrlGetTimeout,
Key: record
};
S3.getSignedUrl('getObject', params);
The URL with thumbnail.jpg works perfectly fine, but the one with +-+ fails. The original file name on local disk was "Artist - Song.mp3". S3 replaced spaces with '+'. Now when I am generating a URL using the exact same filename that S3 uses, it doesn't work; I get a "Specified Key doesn't exist" error from S3.
What must I do to generate URLs consistently for all filenames?
I solved this after a little experimentation.
Instead of directly storing key that S3 provides in their S3 event message, I am first replacing '+' character with space (as they are originally on the disk) and then URL decoding it.
return decodeURIComponent(str.replace(/\+/img, " "));
Now generating a S3 Pre-Signed URL works as expected.
Before MySQL has the following records:
/41jQnjTkg/thumbnail.jpg
/41jQnjTkg/Artist+-+Song.mp3
Now:
/41jQnjTkg/thumbnail.jpg
/41jQnjTkg/Artist - Song.mp3
I personally feel there is an inconsistency with S3's api/event messages.
Had i generated a Signed URL directly using the Key that S3 itself provided in SQS event message, It wouldn't have worked. One must do this string replacement step & URL decoding on the key in order to use it to get a proper working url.
Not sure if this is by design or a bug.
The second file's name is coming to you form-urlencoded. The + is actually a space, and if you had other characters (like parenthesis) they would be percent-escaped. You need to run your data through a URL decoder before working with it further.
Side-note: if the only thing your Lambda function does is create an SQS message, you can do that directly from S3 without writing your own function.
Related
I'm using an AWS Lambda function to create a file and save it to my bucket on S3, it is working fine. After executing the putObject method, I get a data object, but it only contains an Etag of the recently added object.
s3.putObject(params, function(err, data) {
// data only contains Etag
});
I need to know the exact URL that I can use in a browser so the client can see the file. The folder has been already made public and I can see the file if I copy the Link from the S3 console.
I tried using getSignedUrl but the URL it returns is used for other purposes, I believe.
Thanks!
The SDKs do not generally contain a convenience method to create a URL for publicly-readable objects. However, when you called PutObject, you provided the bucket and the object's key and that's all you need. You can simply combine those to make the URL of the object, for example:
https://bucket.s3.amazonaws.com/key
So, for example, if your bucket is pablo and the object key is dogs/toto.png, use:
https://pablo.s3.amazonaws.com/dogs/toto.png
Note that S3 keys do not begin with a / prefix. A key is of the form dogs/toto.png, and not /dogs/toto.png.
For region-specific buckets, see Working with Amazon S3 Buckets and AWS S3 URL Styles. Replace s3 with s3.<region>.amazonaws.com or s3-<region>.amazonaws.com in the above URLs, for example:
https://seamus.s3.eu-west-1.amazonaws.com/dogs/setter.png (with dot)
https://seamus.s3-eu-west-1.amazonaws.com/dogs/setter.png (with dash)
If you are using IPv6, then the general URL form will be:
https://BUCKET.s3.dualstack.REGION.amazonaws.com
For some buckets, you may use the older path-style URLs. Path-style URLs are deprecated and only work with buckets created on or before September 30, 2020. They are used like this:
https://s3.amazonaws.com/bucket/key
https://s3.amazonaws.com/pablo/dogs/toto.png
https://s3.eu-west-1.amazonaws.com/seamus/dogs/setter.png
https://s3.dualstack.REGION.amazonaws.com/BUCKET
Currently there are TLS and SSL certificate issues that may require some buckets with dots (.) in their name to be accessed via path-style URLs. AWS plans to address this. See the AWS announcement.
Note: General guidance on object keys where certain characters need special handling. For example space is encoded to + (plus sign) and plus sign is encoded to %2B. Also here.
in case you got the s3bucket and filename objects and want to extract the url, here is an option:
function getUrlFromBucket(s3Bucket,fileName){
const {config :{params,region}} = s3Bucket;
const regionString = region.includes('us-east-1') ?'':('-' + region)
return `https://${params.Bucket}.s3${regionString}.amazonaws.com/${fileName}`
};
You can do an another call with this:
var params = {Bucket: 'bucket', Key: 'key'};
s3.getSignedUrl('putObject', params, function (err, url) {
console.log('The URL is', url);
});
I'm currently building a lambda function which the iot trigger passes event['key'] value which based on the value of event['key'] it will update the index.html that is stored in s3 bucket. For example, if event['key'] = 'Yes', the html will display a string 'hi'.
I'm not quite sure how I'd be able to update html since I'm fairly new to AWS. I know there's like an API that has that functionality but can't seem to find it. putObject seems fairly close but it's not the one that I'm looking for since it needs to update the string value in html. Any way to do so?
Details can vary based on environment / stack you using to write your Lambda.
If you want to update your index.html file located in S3 based on (IoT or any) trigger - your Lambda needs to getObject (read that file from S3) modify content - by simple find and replace or more advanced parsing, traversing, and DOM manipulation - and putObject back to S3.
So I have this issue that whenever I try to post a file to the s3 bucket with a pre-signed URL, the key for the metadata is being forced in lowercases?
I've looked at the Pre-signed URL it already sets the lowercase part when the URL has been genereted and Im wondering why? and how do I solve this issue?
I've tried to create a manual key-value pair in the s3 bucket on a file, where I clearly can set a key with capital letters as well?
const params = {
Bucket: 'buckets3',
Key: 'hoho-fileUpload-' + uuid.v4(),
Metadata: {"FooBar": "FooBar"},
Expires: 600
};
current output in the s3:
x-amz-meta-foobar: FooBar
Wishing output:
x-amz-meta-FooBar: FooBar
There is nothing you can do, AWS stores S3 metadata in lower case.
User-defined metadata is a set of key-value pairs. Amazon S3 stores
user-defined metadata keys in lowercase.
From: Object Meta Data and scroll to the bottom. It's the paragraph just above the Note.
On top of WaltDe's answered, I'd recommend you convert your metadata keys to kebab case when sending to aws s3 (foo-bar) so you can convert it back to pascal case in your lambda or code or wherever you use the metadata.
I'm using S3 to backup large files that are critical to my business. Can I be confident that once uploaded, these files are verified for integrity and are intact?
There is a lot of documentation around scalability and availability but I couldn't find any information talking about integrity and/or checksums.
When uploading to S3, there's an optional request header (which in my opinion should not be optional, but I digress), Content-MD5. If you set this value to the base64 encoding of the MD5 hash of the request body, S3 will outright reject your upload in the event of a mismatch, thus preventing the upload of corrupt data.
The ETag header will be set to the hex-encoded MD5 hash of the object, for single part uploads (with an exception for some types of server-side encryption).
For multipart uploads, the Content-MD5 header is set to the same value, but for each part.
When S3 combines the parts of a multipart upload into the final object, the ETag header is set to the hex-encoded MD5 hash of the concatenated binary-encoded (raw bytes) MD5 hashes of each part, plus - plus the number of parts.
When you ask S3 to do that final step of combining the parts of a multipart upload, you have to give it back the ETags it gave you during the uploads of the original parts, which is supposed to assure that what S3 is combining is what you think it is combining. Unfortunately, there's an API request you can make to ask S3 about the parts you've uploaded, and some lazy developers will just ask S3 for this list and then send it right back, which the documentarion warns against, but hey, it "seems to work," right?
Multipart uploads are required for objects over 5GB and optional for uploads over 5MB.
Correctly used, these features provide assurance of intact uploads.
If you are using Signature Version 4, which also optional in older regions, there is an additional integrity mechanism, and this one isn't optional (if you're actually using V4): uploads must have a request header x-amz-content-sha256, set to the hex-encoded SHA-256 hash of the payload, and the request will be denied if there's a mismatch here, too.
My take: Since some of these features are optional, you can't trust that any tools are doing this right unless you audit their code.
I don't trust anybody with my data, so for my own purposes, I wrote my own utility, internally called "pedantic uploader," which uses no SDK and speaks directly to the REST API. It calculates the sha256 of the file and adds it as x-amz-meta-... metadata so it can be fetched with the object for comparison. When I upload compressed files (gzip/bzip2/xz) I store the sha of both compressed and uncompressed in the metadata, and I store the compressed and uncompressed size in octets in the metadata as well.
Note that Content-MD5 and x-amz-content-sha256 are request headers. They are not returned with downloads. If you want to save this information in the object metadata, as I described here.
Within EC2, you can easily download an object without actually saving it to disk, just to verify its integrity. If the EC2 instance is in the same region as the bucket, you won't be billed for data transfer if you use an instance with a public IPv4 or IPv6 address, a NAT instance, an S3 VPC endpoint, or through an IPv6 egress gateway. (You'll be billed for NAT Gateway data throughput if you access S3 over IPv4 through a NAT Gateway). Obviously there are ways to automate this, but manually, if you select the object in the console, choose Download, right-click and copy the resulting URL, then do this:
$ curl -v '<url from console>' | md5sum # or sha256sum etc.
Just wrap the URL from the console in single ' quotes since it will be pre-signed and will include & in the query string, which you don't want the shell to interpret.
You can perform an MD5 checksum locally, and then verify that against the MD5 checksum of the object on S3 to ensure data integrity. Here is a guide
I am attempting to verify an S3 Put Object using a local hash impl as follows:
MD5 md5 = MD5.Create();
byte[] inputBytes = Encoding.ASCII.GetBytes(str);
byte[] hash = md5.ComputeHash(inputBytes);
StringBuilder sb = new StringBuilder();
foreach (byte byt in hash)
{
sb.Append(byt.ToString("X2"));
}
return sb.ToString().ToLower();
From AWS Documentation this should match the PutObjectResponse.ETag property which is based on the content body (not Meta) of my put request. In this case I cam posting a JSON document which is the source of my hash.
All is fine except when I use AWS Managed KMS Server-side encryption at which point my hashes do not match. Is it not possible to verify the content body to that posted since it appears the ETAG hash is based on the encrypted content body, not the origional PUT content.
It is not possible to verify the integrity of the content/body using the ETag, because, as documented, the ETag no longer matches the expected value when using SSE with KMS keys.
However, you can prevent S3 from accepting an incomplete or corrupted upload, by sending a Content-MD5 header with the upload. The value you supply will be the binary (not hex) md5 of the payload, encoded in base64, and not url-escaped. If the payload doesn't match this hash, S3 won't accept the upload or store the object, and will return an error.
The SDK you are using might do this for you automatically, or you may he able to enable it; I don't know, since I use my own written-from-scratch libraries for all AWS service interactions.