AWS S3 presigned URL limit - amazon-web-services

Is there any limit on the number of pre signed URL's per object in AWS S3 presigned URL's. Say If I want to create 1000 presigned url's per object in a 2 minutes. Is that valid scenario ?

You can create as many signed URLs as you wish. Depending on your motivation and strategy, however, there is a practical limitation on the number of unique presigned URLs for the exact same object.
S3 (in S3 regions that were first deployed before 2014) supports two authentication algorithms, V2 and V4, and the signed urls look very different since the algorithms are very different.
In V2, the signed URL for a given expiration time will always look the same, if signed by the same AWS key.
If you sign the url for an object, set to expire one minute in the future... and immediately repeat the process, the two signed URLs will be identical.
Next, exactly one second later, sign a url for the same object to expire 59 seconds in the future, and that new signed URL will also be identical.
Why? Because in V2, the expiration time is an absolute wall clock time in UTC, and the particular time in history when you actually generated the signed URL doesn't change anything.
V4 is different. In the scenario above, the first two would still be identical, but the second one would not, because V4 auth includes the date and time when you created the signed url, or when you say you did. The expiration time is relative to the signing time, instead of absolute.
Note that both forms of signed URL are tamper-resistant -- the expiration time is embedded in the url, but attempting to tweak it after signing will invalidate the signing and make it useless.
If you need to generate a large number of signed urls for the same object, you'll need to increment the expiration time for each individual signing attempt in order to get unique values. (Edit: or not, if you're feeling clever... see below).
It also occurs to me that you may be under the impression that S3 has an active role in the signing process, but it doesn't. That's all done in your local code.
S3 isn't aware, in any sense, of the signed urls you generate unless or until they are used. When a signed request arrives, S3 does exactly the same thing your code will do -- it canonicalizes certain attributes of the request, and generates a signature. Then it compares what it generated with what your code should have generated, given precisely the same parameters. If their generated signature matches your provided signature (and the key you used has permission to perform the requested action) then the request succeeds.
Update: it turns out, there is an unofficial mechanism that allows you to embed additional "entropy" into the signing process, generating unique, per-user (for example) signed URLs for the same object and expiration time.
Under V2 authentication, which doesn't nornally want you to include non-S3-specific parameters in your signing logic, it looks suspiciously like a bug as well as a feature... add &x-amz-meta-{anything-here}={unique-value-here} query string parameters to your URL. These are used as headers in PUT request but are meaningless in a GET request, and yet, if present, S3 still requires them to be included in the signature calculation, even though the parameter keys and values will ultimately be discarded by S3... but the added values are tamper-resistant and can't be maliciously removed or altered without invalidating the signature.
The same mechanism works in V4, even though it's for a different reason.
Credit for this technique: http://www.bennadel.com/blog/2488-generating-pre-signed-query-string-authentication-amazon-s3-urls-with-user-specific-data.htm

The accepted answer is now outdated. For future viewers, there is no need to include anything as extra header as now AWS includes a Signature field in every signed url which is different everytime you make a request.

Yes. In fact, i believe AWS can't even limit that, as there is no such API call on S3. URL signing is done purely by the SDK.
But if creating so many URLs is a good idea or not is completely context dependent though...

Related

Is it considered bad practice to a new generate presigned URL per HTTP request?

I've been looking through the presigned URL documentation, and have not encountered much information about how often I can generate these presigned URLs. Judging by the length of the identifier, I'm thinking it's probably collision-safe to generate a new one for every URL request.
I think a more conventional method is to run a cron job to generate a new one and store it in a record DB with the file key, and this would be perfectly doable, but I was wondering if I could skip this step and just generate it on the fly.
There is no way to have a presigned url without expiry time and the maximum expiry time you can set is 1 week. These URLs are designed to be temporary to allow users to access your S3 bucket, either for READ the object or WRITE an Object (or update an existing object).
So as an answer, it is not a bad practice to generate a new presigned URL per request since its nature to be temporary.

AWS presigned url

I was wondering whether the AWSAccessKeyId in the presigned urls are static? Do these ever change (either over time) or are they unique linked to the user that generated it?
It depends on the credentials used to sign the URL.
If it begins with AKIA, then it's a long-lived user access key. These exist as long as the user chooses to let them exist, although it's a good practice to rotate your keys regularly.
If it begins with ASIA, then it's an assumed role key. These typically expire in an hour (although they can live longer).
What's your real question?

Google Cloud CDN started ignoring query strings for storage buckets

Some months ago activated Cloud CDN for storage buckets. Our storage data is regularly changed via a backend. So to invalidate the cached version we added a query param with the changedDate to the url that is served to the client.
Back then this worked well.
Sometime in the last months (probably weeks) Google seemed to change that and is now ignoring the query string for caching from storage buckets.
First part: Does anyone know why this is changed and why noone was
notified about it?
Second part: How can you invalidate the Cache for a particular object
in a storage bucket without sending a cache-invalidation request
(which you shouldn't) everytime?
I don't like the idea of deleting the old file and uploading a new file with changed filename everytime something is uploaded...
EDIT:
for clarification: the official docu ( cloud.google.com/cdn/docs/caching ) already states that they now ignore query strings for storage buckets:
For backend buckets, the cache key consists of the URI without the query > string. Thus https://example.com/images/cat.jpg, https://example.com/images/cat.jpg?user=user1, and https://example.com/images/cat.jpg?user=user2 are equivalent.
We were affected by this also. After contacting Google Support, they have confirmed this is a permanent change. The recommended work around is to either use versioning in the object name, or use cache invalidation. The latter sounds a bit odd as the cache invalidation documentation states:
Invalidation is intended for use in exceptional circumstances, not as part of your normal workflow.
For backend buckets, the cache key consists of the URI without the query string, as the official documentation states.1 The bucket is not evaluating the query string but the CDN should still do that. I could reproduce this same scenario and currently is still possible to use a query string as cache buster.
Seems like the reason for the change is that the old behavior resulted in lost caching opportunities, higher costs and higher latency. The only recommended workaround for now is to create the new objects by incorporating the version into the object's name (which seems is not valid options for your case), or using cache invalidation.
Invalidating the cache for a particular object will require to use a particular query. Maybe a Cache-Control header allowing such objects to be cached for a certain time may be your workaround. Cloud CDN cache has an expiration time defined by the "Cache-Control: s-maxage", "Cache-Control: max-age", and/or Expires headers 2.
According to the doc, when using backend bucket as origin for Cloud CDN, query strings in the request URL are not included in the cache key:
For backend buckets, the cache key consists of the URI without the protocol, host, or query string.
Maybe using the query string to identify different versions of cached content is not the best practices promoted by GCP. But for some legacy issues, it has to be.
So, one way to workaround this is make backend bucket to be a static website (do NOT enable CDN here), then use custom origins (Cloud CDN backed by Internet network endpoint groups backend service) which points to that static website.
For backend service, query string IS part of cache key.
For backend services, Cloud CDN defaults to using the complete request URI as the cache key
That's it. Yes, It is tedious but works!

Why does Amazon require lexicographically ordering query string parameters when signing requests?

AWS' query parameter ordering code can be seen on their Github repository.
I have thought about why they might require API clients to sign requests:
intermediate proxies might canonicalize URLs and mess up the original query string order
The URI RFC specifies absolutely nothing about the order of the query string parameters, or that it should be preserved
My best guess is that, because of the RFC, Amazon reckoned they'd play it safe and require both sides to sign the ORDERED request.
I do, however, would like the final/official word on this. Surely the implementors had a good reason for this requirement.
The request signature ensures that the sender and receiver can agree on exactly what was sent in the request and that no intermediate parties tampered with it.
Many parts of an HTTP request can change without changing the semantics of the request. For example the HTTP headers can be re-ordered, as can the query parameters as you rightly point out.
So the request must be canonicalized into a form that removes these ambiguities and that both parties will use to sign the request. Otherwise each party could generate different signatures for the same request. Ordering the query parameters is just part of this process. Amazon describes their canonicalization process and their motivation in the docs for the AWS V4 signature format.

Improve my Shared Secret Algorithm/Methodology & suggest a Encryption Protocol

I am looking for protocol/algorithm that will allow me to use a shared secret between my App & a HTML page.
The shared secret is designed to ensure only people who have the app can access the webpage.
My Problem: I do not know what algorithm(my methodology to validate a valid access to the HTML page) & what encryption protocol I should use for this.
People have suggested to me that I use HMAC SHAXXX or DES or AES, I am unsure which I should use - do you have any suggestions?
My algorithm is like so:
I create a shared secret that the App & the HTML page know of(lets call it "MySecret"). To ensure that that shared secret is always unique I will add the current date & minute to the end of the secret then hash it using XXX algorithm/protocol(HMAC/AES/DES). So the unencrypted secret will be "MySecret08/17/2011-11-11" & lets say the hash of that is "xyz"
I then add this hash to the url CGI: http://mysite.com/comp.py?sharedSecret=xyz
The comp.py script then uses the same shared secret & date combination, hashes it, then checks that the resulting hash is the same as the CGI variable sharedSecret("xyz"). If it is then I know a valid user is accessing the webpage.
Can you think of a better methodology to ensure on valid people can access my webpage(the webpage allows the user to enter a competition)?
I think I am on the correct track using a shared secret but my methodology for validating the secret seems flawed especially if the hash algorithm doesn't produce the same result for the same in put all the time.
especially if the hash algorithm doesn't produce the same result for the same in put all the time.
Then the hash is broken. Why wouldn't it?
You want HMAC in the simple case. You are "signing" your request using the shared secret, and the signature is verified by the server. Note that the HMAC should include more data to prevent replay attacks - in fact it should include all query parameters (in a specified order), along with a serial number to prevent the replay of the same message by an eavesdropper. If all you are verifying is the shared secret, anyone overhearing the message can continue to use this shared secret until it expires. By including a serial number, or a short validity range, you can configure the server to flag that.
Note that this is still imperfect. TLS supports client and server side certificate support - why not use that?
The looks like it would work. Clock drift could be a problem, you may need to validate a range of, say, +/- 3 minutes if it fails for the exact time.
flawed especially if the hash algorithm doesn't produce the same result for the same input all the time
Well, that would be a broken hash algorithm then. A hash reliable produces the same output for the same input every time (and almost always a different output for a different input).
Try using some sort of network encryption. Your web server should be able to handle that type of authentication automatically. All that remains is for you to write it into your app (which you have to do anyway). Depending on your app platform, you may be able to do that automatically as well.
Google these: Kerberos, SPNEGO and HTTP 401 Authorization Required. You may be able to get away with simple hard-coded user name and password HTTP headers and run your connections over HTTPS. That way you have less custom code on your server and your server takes care of authenticating your requests for you. Not to mention you are taking advantage of some additional features of HTTP.