Possible to set Accept-Ranges header on Amazon S3 - amazon-web-services

I have an application with very short-lived(5s) access tokens, paranoid client, and some of their users are accessing the S3 stored files using mobile connections so the lag can be quite high.
I've noticed that Amazon forcefully sends out the Accept-Ranges header on all requests, and I'd like to disable that for the files in question. So it would always download the entire file the first time around instead of downloading it chunks.
The main offender I've noticed for this is Chromes built-in PDF viewer. It'll start viewing the PDF, get a 200 response. Then it'll reconnect with a 206 header and start downloading the file in two chunks. If Chrome is too slow to start the download of all chunks before the access token expires it'll keep spamming requests towards S3 (600+ requests when I closed the window).
I've tried setting the header by changing it in the S3 console but while it says it saved it successfully it gets cleared instantly. I also tried to set the header with the signed request, as you can do for Content-Disposition for example, but S3 ignored the passed in header.
Or is there any other way to force a client to download the entire file at once?

Seems like it's not possible. Made the token expire later in hope it would take care of most cases.
But in case it doesn't make the client happy I will try and proxy it locally and remove all headers I don't like. Following this guide, https://coderwall.com/p/rlguog.

Related

PUT Presigned URL expires before download of file completes

Bit of an odd case where my API is sent a presigned URL to write to, and a presigned URL to download the file from.
The problem is that if they send a very large file, the presigned url we need to write to can expire before it gets to that step (some processing happens in between the read/write).
Is it possible to 'open' the connection for writing early to make sure it doesn't expire, and then start writing once the earlier process is done? Or maybe there is a better way of handling this.
The order goes:
Receive API request with a downloadUrl and an uploadUrl
Download the file
Process the file
Upload the file to the uploadUrl
TL;DR: How can I ensure the url for #4 doesn't expire before I get to it?
When generating the pre-signed URL, you have complete control over the time duration. For example, this Java code shows how to set the time when creating a GetObjectPresignRequest object:
GetObjectPresignRequest getObjectPresignRequest = GetObjectPresignRequest.builder()
.signatureDuration(Duration.ofMinutes(10))
.getObjectRequest(getObjectRequest)
.build();
So you can increase the time limit in such situations.

How do I add simple licensing to api when using AWS Cloudfront to cache queries

I have an application deployed on AWS Elastic Beanstalk, I added some simple licensing to stop abuse of the api, the user has to pass a licensekey as a field
i.e
search.myapi.com/?license=4ca53b04&query=fred
If this is not valid then the request is rejected.
However until the monthly updates the above query will always return the same data, therefore I now point search.myapi.com to an AWS CloudFront distribution, then only if query is not cached does it go to actual server as
direct.myapi.com/?license=4ca53b04&query=fred
However the problem is that if two users make the same query they wont be deemed the same by Cloudfront because the license parameter is different. So the Cloudfront caching is only working at a per user level which is of no use.
What I want to do is have CloudFront ignore the license parameters for caching but not the other parameters. I dont mind too much if that means user could access CloudFront with invalid license as long as they cant make successful query to server (since CloudFront calls are cheap but server calls are expensive, both in terms of cpu and monetary cost)
Perhaps what I need is something in front of CloudFront that does the license check and then strips out the license parameter but I don't know what that would be ?
Two possible come to mind.
The first solution feels like a hack, but would prevent unlicensed users from successfully fetching uncached query responses. If the response is cached, it would leak out, but at no cost in terms of origin server resources.
If the content is not sensitive, and you're only trying to avoid petty theft/annoyance, this might be viable.
For query parameters, CloudFront allows you to forward all, cache on whitelist.
So, whitelist query (and any other necessary fields) but not license.
Results for a given query:
valid license, cache miss: request goes to origin, origin returns response, response stored in cache
valid license, cache hit: response served from cache
invalid license, cache hit: response served from cache
invalid license, cache miss: response goes to origin, origin returns error, error stored in cache.
Oops. The last condition is problematic, because authorized users will receive the cached error if the make the same query.
But we can fix this, as long as the origin returns an HTTP error for an invalid request, such as 403 Forbidden.
As I explained in Amazon CloudFront Latency, CloudFront caches responses with HTTP errors using different timers (not min/default/max-ttl), with a default of t minutes. This value can be set to 0 (or other values) for each of several individual HTTP status codes, like 403. So, for the error code your origin returns, set the Error Caching Minimum TTL to 0 seconds.
At this point, the problematic condition of caching error responses and playing them back to authorized clients has been solved.
The second option seems like a better idea, overall, but would require more sophistication and probably cost slightly more.
CloudFront has a feature that connects it with AWS Lambda, called Lambda#Edge. This allows you to analyze and manipulate requests and responses using simple Javascript scripts that are run at specific trigger points in the CloudFront signal flow.
Viewer Request runs for each request, before the cache is checked. It can allow the request to continue into CloudFront, or it can stop processing and generate a reaponse directly back to the viewer. Generated responses here are not stored in the cache.
Origin Request runs after the cache is checked, only for cache misses, before the request goes to the origin. If this trigger generates a response, the response is stored in the cache and the origin is not contacted.
Origin Response runs after the origin response arrives, only for cache misses, and before the response goes onto the cache. If this trigger modifies the response, the modified response stored in the cache.
Viewer Response runs immediately before the response is returned to the viewer, for both cache misses and cache hits. If this trigger modifies the response, the modified response is not cached.
From this, you can see how this might be useful.
A Viewer Request trigger could check each request for a valid license key, and reject those without. For this, it would need access to a way to validate the license keys.
If your client base is very small or rarely changes, the list of keys could be embedded in the trigger code itself.
Otherwise, it needs to validate the key, which could be done by sending a request to the origin server from within the trigger code (the runtime environment allows your code to make outbound requests and receive responses via the Internet) or by doing a lookup in a hosted database such as DynamoDB.
Lambda#Edge triggers run in Lambda containers, and depending on traffic load, observations suggest that it is very likely that subsequent requests reaching the same edge location will be handled by the same container. Each container only handles one request at a time, but the container becomes available for the next request as soon as control is returned to CloudFront. As a consequence of this, you can cache the results in memory in a global data structure inside each container, significantly reducing the number of times you need to ascertain whether a license key is valid. The function either allows CloudFront to continue processing as normal, or actively rejects the invalid key by generating its own response. A single trigger will cost you a little under $1 per million requests that it handles.
This solution prevents missing or unauthorized license keys from actually checking the cache or making query requests to the origin. As before, you would want to customize the query string whitelist in the CloudFront cache behavior settings to eliminate license from the whitelist, and change the error caching minimum TTL to ensure that errors are not cached, even though these errors should never occur.

Making An HTTP PUT through BrightScript to AWS S3 Bucket with pre-signed url

I've set up an AWS api which obtainins a pre-signed URL for uploading to an AWS S3 bucket.
The pre-signed url has a format like
https://s3.amazonaws.com/mahbukkit/background4.png?AWSAccessKeyId=someaccesskeyQ&Expires=1513287500&x-amz-security-token=somereallylongtokenvalue
where backgournd4.png would be the file I'm uploading.
I can successfully use this URL through Postman By:
configuring it as a PUT call,
setting the body to Binary so I can select the file,
setting the header to Content-Type: image/png
HOWEVER, I'm trying to make this call using BrightScript running on a BrightSign player. I'm pretty sure I'm supposed to be using the roURTransfer object and PutFromFile function described in this doucmentation:
http://docs.brightsign.biz/display/DOC/roUrlTransfer
Unfortunately, I can't find any good working examples showing how to do this.
Could anyone who has experience with BrightScript help me out? I'd really appreciate it.
you are on the right track.
i would do
sub main()
tr = createObject("roUrlTransfer")
headers = {}
headers.addreplace("Content-Type","image/png")
tr.AddHeaders(headers)
info = {}
info.method = "PUT"
info.request_body_file = <fileName>
if tr.AsyncMethod(info)
print "File put Started"
else
print "File put did not start"
end if
delay(100000)
end sub()
note i have used two different methods to populate the two associative arrays. you need to use the addreplace method (rather then the shortcut of .) when the key contains special characters like '-'
this script should work , though i don't have a unit on hand to do a syntax check.
also you should set up a message port etc and Listen to the event that is generated to confirm if the put was successful and/or what the response code is.
note when you read responses from url events. if the response code from the server is anything other then 200 the BrightSign will trash the response body and you can not read it. This is not helpful as services like dropbox like to do a 400 response with more info on what was wrong (bad API key etc) in the body. so in that case you are left in the dark doing trial and error to figure out what was wrong.
good luck, sorry i didn't see this question sooner.

How do i check if cache is working on aws s3

How do I check if cacheControl is working on aws s3 file upload?
'CacheControl' => 'max-age=2592000',
In Google Chrome, go to inspect element and keep the network tab open. Then request the file in browser using the http url. You will be able to see whether you get it from the server or cache looking at the size column. It will show as (from memory cache), (from disk cache) or the file size if its received from server. Also you can observe 'Not modified' when its received from a cache.

Sustain an http connection while django processes a big request (20mins+)

I've got a django site that is producing a csv download. The content of the csv is dictated by user defined parameters. It's possible that users will set parameters that require significant thinking time on the server. I need a way of sustaining the http connection so the browser doesn't kick up an error message. I heard that it's possible to send intermittent http headers to do this. Can anyone point me in the right direction to set this up on a django site?
(unfortunatly I'm stuck with the possibility of slow reports - improving my sql won't mitigate this)
Don't do it online. Trigger an offline task, use a bit of Javascript to repeatedly call a view that checks if the task has finished, and redirect to the finished file when it's ready.
Instead of blocking the user and it's browser for 20 minutes (which is not a good idea) do the time-consuming task in the background. When the task will finish and generate the result simply notify the user so that he/she will just need to download the ready result.