Download files using AWS Lambda - amazon-web-services

Download files using AWS Lambda - amazon-web-services

I have an iOS app that calls a AWS Lambda function. I would like the Lambda function to get some files from a server and send it back to the iOS app, through the Lambda proxy feature.
I am directly invoking the Lambda function from my app using the generated SDK. I couldn't find a documentation explaining how to exchange data other than JSON encoded requests.
How should I go about this?

Lambda's only interface to the outside world is JSON.
To return text data from Lambda, you have to return it as a string, and deserialize the string from the JSON response.
To return binary data from Lambda, the data must be first transformed (encoded) using an encoding that cannot ever produce a series of bytes that is not also a valid sequence of UTF-8 characters, because JSON cannot serialize non-character data (and not every possible combination of bytes corresponds to a valid character or characters). The common strategy for doing this is to use base-64 encoding. Base-64 losslessly converts a series of bytes (octets) into a different series of bytes that are also always valid 7 bit ASCII characters, using an 8:6 conversion ratio (coding 6 bits per byte). You would then need to decode this data back from base-64 to binary.
You can do this decoding of JSON and base-64 on the client, but there are also a couple of other options, if you don't like that idea.
Both API Gateway and CloudFront's Lambda#Edge feature provide a built-in conversion capability to convert the base-64 payload (from the JSON Lambda response) back to binary, if you don't want to do it on the client.
API Gateway supports all Lambda runtimes, and expects this format...
"isBase64Encoded": true,
"body": "the-base-64-encoded-body",
Lambda#Edge supports only Node.js Lambda functions, but is less expensive than API Gateway. It expects a base-64 response to include...
"body": "the-base-64-encoded-body",
"bodyEncoding": "base64",
The viability of either approach depends on your security needs. API Gateway supports authentication via IAM as well as other mechanisms. CloudFront + Lambda#Edge doesn't support IAM auth but can be used with CloudFront signed URLs or Cognito or other custom authorization mechanisms.
If the "files" you mentioned are coming from a server, API Gateway can also proxy these files directly from the server, without a Lambda function handling the content (though depending on your security needs, a Lambda Custom Authorizer might be desirable, to authenticate the request but then simply tell API Gateway to allow the request to be forwarded to the backend).
Or, if the files are objects from S3, then you can access S3 directly, similiar to the way you are accessing Lambda, now.

Related

AWS S3: Cost of listing all object versions

In the scenario of listing all versions of an object using its key as a prefix:
import boto3
bucket = 'bucket name'
key = 'key'
s3 = boto3.resource('s3')
versions = s3.Bucket(bucket).object_versions.filter(Prefix=key)
for version in versions:
obj = version.get()
print(obj.get('VersionId'), obj.get('ContentLength'), obj.get('LastModified'))
Do I get charged only for listing the objects that are matching the prefix?
If so, is each object/version listed treated as a separate list request?

No, each object/version listed is not treated as a separate list request. You're only paying for the API requests to S3 (at something like $0.005 per 1000 API requests). A single API request will return many (up to 1000) objects/versions that match the indicated prefix. The prefix filtering itself happens server-side in S3.
The way to get a handle on this is to understand that AWS SDK calls ultimately result in API requests to AWS service endpoints e.g. S3 APIs. What you need to do is work out how your SDK client requests map to the underlying API requests to determine what is likely happening.
If your request is a simple 'list objects in my bucket' case, the boto3 SDK is going to make one or more ListObjectsV2 API calls. I say "or more" because the SDK may need to make more than one API request because API requests typically yield a maximum number of results (e.g. 1000 objects in a ListObjectsV2 response). If there are 2500 objects in the bucket, for example, then three ListObjectsV2 requests would need to be made to the S3 API.
If your request is 'list objects in my bucket with a given prefix', then you need to know what capabilities are present on the ListObjectsV2 API call. Importantly, prefix is one of the parameters. This is how you know that S3 itself is doing the filtering on your supplied prefix (where you have indicated .filter(Prefix=key) in your code). If this were not a feature of the underlying S3 API, then your SDK (boto3 etc.) would be the one doing the filtering on prefix and that would be a much more expensive and vastly slower operation, because the SDK would have to list all objects, potentially resulting in many more LIST requests, and filter them client-side. Note: the ListObjectVersions API is similar to ListObjectsV2 in this regard and both support prefix.
Also, note that VersionId, Size, and LastModifed are all attributes that appear in the ListObjectVersions response, so no further API requests are needed to fetch this information.
So, in your case, assuming that there are fewer than 1000 object versions that match your indicated prefix, I believe that this equates to one S3 API request to ListObjectVersions (and this is considered a LIST request rather than a GET request for billing afaik, even though it is a GET HTTP request to https://mybucket.s3.amazonaws.com/?versions under the covers).

Accessing raw url using AWS API Gateway

Is it possible to access the raw url using AWS API Gateway (and Lambda)?
Alternatively, is it possible to access the original, undecoded query string paramters?
We are integrating against a third party service, that calls our API and encodes the query string params from Windows-1252. (E.g. the finnish letter Ä is encoded as %C4 instead of %C3%84). API Gateway seems to automatically decode the query string parameters and assume UTF-8, which means, that Ä (and Ö and Å) result in \ufffd.
For reference: https://www.w3schools.com/tags/ref_urlencode.asp

Damn, it really doesn't look possible...
I started off writing how you can use Lambda Proxy Integration with event.queryStringParameters, but that parses the data into a key-value object.
Then I went down the road of Mapping Templates in API Gateway, but again there doesn't seem to be any property that shows the whole querystring.
As much as I didn't want it to be true, I can only conclude that it is not possible...
I think your best option is to encode the parameter as base64 on the client, then decode in the Lambda function using Object.keys(event.queryStringParameters)[0].

AWS S3 Upload Integrity

I'm using S3 to backup large files that are critical to my business. Can I be confident that once uploaded, these files are verified for integrity and are intact?
There is a lot of documentation around scalability and availability but I couldn't find any information talking about integrity and/or checksums.

When uploading to S3, there's an optional request header (which in my opinion should not be optional, but I digress), Content-MD5. If you set this value to the base64 encoding of the MD5 hash of the request body, S3 will outright reject your upload in the event of a mismatch, thus preventing the upload of corrupt data.
The ETag header will be set to the hex-encoded MD5 hash of the object, for single part uploads (with an exception for some types of server-side encryption).
For multipart uploads, the Content-MD5 header is set to the same value, but for each part.
When S3 combines the parts of a multipart upload into the final object, the ETag header is set to the hex-encoded MD5 hash of the concatenated binary-encoded (raw bytes) MD5 hashes of each part, plus - plus the number of parts.
When you ask S3 to do that final step of combining the parts of a multipart upload, you have to give it back the ETags it gave you during the uploads of the original parts, which is supposed to assure that what S3 is combining is what you think it is combining. Unfortunately, there's an API request you can make to ask S3 about the parts you've uploaded, and some lazy developers will just ask S3 for this list and then send it right back, which the documentarion warns against, but hey, it "seems to work," right?
Multipart uploads are required for objects over 5GB and optional for uploads over 5MB.
Correctly used, these features provide assurance of intact uploads.
If you are using Signature Version 4, which also optional in older regions, there is an additional integrity mechanism, and this one isn't optional (if you're actually using V4): uploads must have a request header x-amz-content-sha256, set to the hex-encoded SHA-256 hash of the payload, and the request will be denied if there's a mismatch here, too.
My take: Since some of these features are optional, you can't trust that any tools are doing this right unless you audit their code.
I don't trust anybody with my data, so for my own purposes, I wrote my own utility, internally called "pedantic uploader," which uses no SDK and speaks directly to the REST API. It calculates the sha256 of the file and adds it as x-amz-meta-... metadata so it can be fetched with the object for comparison. When I upload compressed files (gzip/bzip2/xz) I store the sha of both compressed and uncompressed in the metadata, and I store the compressed and uncompressed size in octets in the metadata as well.
Note that Content-MD5 and x-amz-content-sha256 are request headers. They are not returned with downloads. If you want to save this information in the object metadata, as I described here.
Within EC2, you can easily download an object without actually saving it to disk, just to verify its integrity. If the EC2 instance is in the same region as the bucket, you won't be billed for data transfer if you use an instance with a public IPv4 or IPv6 address, a NAT instance, an S3 VPC endpoint, or through an IPv6 egress gateway. (You'll be billed for NAT Gateway data throughput if you access S3 over IPv4 through a NAT Gateway). Obviously there are ways to automate this, but manually, if you select the object in the console, choose Download, right-click and copy the resulting URL, then do this:
$ curl -v '<url from console>' | md5sum # or sha256sum etc.
Just wrap the URL from the console in single ' quotes since it will be pre-signed and will include & in the query string, which you don't want the shell to interpret.

You can perform an MD5 checksum locally, and then verify that against the MD5 checksum of the object on S3 to ensure data integrity. Here is a guide

Consuming RSS feed with AWS Lambda and API Gateway

I'm a newbie rails programmer, and I have even less experience with all the AWS products. I'm trying to use lambda to subscribe to and consume an rss feed from youtube. I am able to send the subscription request just fine with HTTParty from my locally hosted rails app:
query = {'hub.mode':'subscribe', 'hub.verify':'sync', 'hub.topic': 'https://www.youtube.com/feeds/videos.xml?channel_id=CHANNELID', 'hub.callback':'API Endpoint for Lambda'}
subscribe = 'HTTParty.post(https://pubsubhubbub.appspot.com/subscribe, :query=>query)
and it will ping the lambda function with a get request. I know that I need to echo back a hub.challenge string, but I don't know how. The lambda event is empty, I didn't see anything useful in the context. I tried formatting the response in the API gateway but that didn't work either. So right now when I try to subscribe I get back a 'Challenge Mismatch' error.
I know this: https://pubsubhubbub.googlecode.come/git/pubsubhubbub-core-0.3.html#subscribing explains what I'm trying to do better than what I just did, and section 6.2.1 is where the breakdown is. How do I set up either the AWS Lambda function and/or the API Gateway to reflect back the 'hub.challenge' verification token string?

You need to use the parameter mapping functionality of API Gateway to map the parameters from the incoming query string to a parameter passed to your Lambda function. From the documentation link you provided, it looks like you'll at least need to map the hub.challenge query string parameter, but you may also need the other parameters (hub.mode, hub.topic, and hub.verify_token) depending on what validation logic (if any) that you're implementing.
The first step is to declare your query string parameters in the method request page. Once you have declared the parameters open the integration request page (where you specify which Lambda function API Gateway should call) and use the "+" icon to add a new template. In the template you will have to specify a content type (application/json), and then the body you want to send to Lambda. You can read both query string and header parameters using the params() function. In that input mapping field you are creating the event body that is posted to AWS Lambda. For example: { "challenge": "$input.params('hub.challenge')" }
Documentation for mapping query string parameters

Limit Size Of Objects While Uploading To Amazon S3 Using Pre-Signed URL

I know of limiting the upload size of an object using this method: http://doc.s3.amazonaws.com/proposals/post.html#Limiting_Uploaded_Content
But i would like to know how it can be done while generating a pre-signed url using S3 SDK on the server side as an IAM user.
This Url from SDK has no such option in its parameters : http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
Neither in this:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getSignedUrl-property
Please note: I already know of this answer: AWS S3 Pre-signed URL content-length and it is NOT what i am looking for.

The V4 signing protocol offers the option to include arbitrary headers in the signature. See:
http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html
So, if you know the exact Content-Length in advance, you can include that in the signed URL. Based on some experiments with CURL, S3 will truncate the file if you send more than specified in the Content-Length header. Here is an example V4 signature with multiple headers in the signature
http://docs.aws.amazon.com/general/latest/gr/sigv4-add-signature-to-request.html

You may not be able to limit content upload size ex-ante, especially considering POST and Multi-Part uploads. You could use AWS Lambda to create an ex-post solution. You can setup a Lambda function to receive notifications from the S3 bucket, have the function check the object size and have the function delete the object or do some other action.
Here's some documentation on
Handling Amazon S3 Events Using the AWS Lambda.

For any other wanderers that end up on this thread - if you set the Content-Length attribute when sending the request from your client, there a few possibilities:
The Content-Length is calculated automatically, and S3 will store up to 5GB per file
The Content-Length is manually set by your client, which means one of these three scenarios will occur:
The Content-Length matches your actual file size and S3 stores it.
The Content-Length is less than your actual file size, so S3 will truncate your file to fit it.
The Content-Length is larger than your actual file size, and you will receive a 400 Bad Request
In any case, a malicious user can override your client and manually send a HTTP request with whatever headers they want, including a much larger Content-Length than you may be expecting. Signed URLs do not protect against this! The only way is to setup an POST policy. Official docs here: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-HTTPPOSTConstructPolicy.html
More details here: https://janac.medium.com/sending-files-directly-from-client-to-amazon-s3-signed-urls-4bf2cb81ddc3?postPublishedType=initial
Alternatively, you can have a Lambda that automatically deletes files that are larger than expected.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js