Is there a way to configure Amazon Cloudfront to delay the time before my S3 object reaches clients by specifying a release date? [closed] - amazon-web-services

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I would like to upload content to S3 and but schedule a time at which Cloudfront delivers it to clients rather than immediately vending it to clients upon processing. Is there a configuration option to accomplish this?
EDIT: This time should be able to differ per object in S3.

There is something of a configuration option to allow this, and it does allow you to restrict specific files -- or path prefixes -- from being served up prior to a given date and time... though it's slightly... well, I don't even know what derogatory term to use to describe it. :) But it's the only thing I can come up with that uses entirely built-in functionality.
First, a quick reminder, that public/unauthenticated read access to objects in S3 can be granted at the bucket level with bucket policies, or at the object level, using "make everything public" when uploading the object in the console, or sending x-amz-acl: public-read when uploading via the API. If either or both of these is present, the object is publicly readable, except in the face of any policy denying the same access. Deny always wins over Allow.
So, we can create a bucket policy statement matching a specific file or prefix, denying access prior to a certain date and time.
{
"Version": "2012-10-17",
"Id": "Policy1445197123468",
"Statement": [
{
"Sid": "Stmt1445197117172",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::example-bucket/hello.txt",
"Condition": {
"DateLessThan": {
"aws:CurrentTime": "2015-10-18T15:55:00.000-0400"
}
}
}
]
}
Using a wildcard would allow everything under a specific path to be subject to the same restriction.
"Resource": "arn:aws:s3:::example-bucket/cant/see/these/yet/*",
This works, even if the object is public.
This example blocks all GET requests for matching objects by anybody, regardless of permissions they may have. Signed URLs, etc., are not sufficient to override this policy.
The policy statement is checked for validity when it is created; however, the object being matched does not have to exist, yet, so if the policy is created before the object, that doesn't make the policy invalid.
Live test:
Before the expiration time: (unrelated request/response headers removed for clarity)
$ curl -v example-bucket.s3.amazonaws.com/hello.txt
> GET /hello.txt HTTP/1.1
> Host: example-bucket.s3.amazonaws.com
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Sun, 18 Oct 2015 19:54:55 GMT
< Server: AmazonS3
<
<?xml version="1.0" encoding="UTF-8"?>
* Connection #0 to host example-bucket.s3.amazonaws.com left intact
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>AAAABBBBCCCCDDDD</RequestId><HostId>g0bbl3dyg00kbunc4Ofl1n3n0iz3h3rehahahasqlbot1337kenqweqwel24234kj41l1ke</HostId></Error>
After the specified date and time:
$ curl -v example-bucket.s3.amazonaws.com/hello.txt
> GET /hello.txt HTTP/1.1
> Host: example-bucket.s3.amazonaws.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Sun, 18 Oct 2015 19:55:05 GMT
< Last-Modified: Sun, 18 Oct 2015 19:36:17 GMT
< ETag: "78016cea74c298162366b9f86bfc3b16"
< Accept-Ranges: bytes
< Content-Type: text/plain
< Content-Length: 15
< Server: AmazonS3
<
Hello, world!
These tests were done against the S3 REST endpoint for the bucket, but the website endpoint for the same bucket yields the same results -- only the error message is in HTML rather than XML.
The positive aspect of this policy is that since the object is public, the policy can be removed any time after the date passes, because it is denying access before a certain time, rather than allowing access after a certain time -- logically the same, but implemented differently. (If the policy allowed access after rather than denying access before, the policy would have to stick around indefinitely; this way, it can just be deleted.)
You could use custom error documents in either S3 or CloudFront to present the viewer with a slightly nicer output... probably CloudFront, since you can select customize each error code individually, creating a custom 403 page.
The major drawbacks to this approach are, of course, that the policy must be edited for each object or path prefix and even though it works per-object, it's not something that's set per object.
And there is a limit to how many policy statements you can include, because of the size restriction on bucket policies:
Note
Bucket policies are limited to 20 KB in size.
http://docs.aws.amazon.com/AmazonS3/latest/dev/access-policy-language-overview.html
The other solution that comes to mind involves deploying a reverse proxy component (such as HAProxy) in EC2 between CloudFront and the bucket, passing the requests through and reading the custom metadata from the object's response headers, looking of a header such as x-amz-meta-embargo-until: 2015-10-18T19:55:00Z and comparing its value to the system clock; if the current time is before the cutoff time, the proxy would drop the connection from S3 and replace the response headers and body with a locally-generated 403 message, so the client would not be able to fetch the object until the designated time had passed.
This solution seems fairly straightforward to implement, but requires a non-built-in component, so it doesn't meet the constraint of the question and I haven't built a proof of concept; however, I already use HAProxy with Lua in front of some buckets to give S3 some other capabilities not offered natively, such as removing sensitive custom metadata from responses and modifying, and directing the browser to apply an XSL stylesheet to, the XML on S3 error responses, so there's no obvious reason that comes to mind why this application wouldn't work equally well.

Lambda#edge can apply your customized access control easily

Related

AWS s3 upload api call returning 411 status

I have been trying to perform AWS s3 rest api call to upload document to s3 bucket. The document is in the form of a byte array.
PUT /Test.pdf HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: **********
Content-Type: application/pdf
Content-Length: 5039151
x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD
x-amz-date: 20180301T055442Z
When we perform the api call, it gives the response status 411 i.e Length Required. We have already added the Content-Length header with the byte array length as value. But still the issue is repeating. Please help to resolve the issue.
x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD is only used with the non-standards-based chunk upload API. This is a custom encoding that allows you to write chunks of data to the wire. This is not the same thing as the Multipart Upload API, and is not the same thing as Transfer-Encoding: chunked (which S3 doesn't support for uploads).
It's not clear why this would result in 411 Length Required but the error suggests that S3 is not happy with the format of the upload.
For a standard PUT upload, x-amz-content-sha256 must be set to the hex-encoded SHA-256 hash of the request body, or the string UNSIGNED-PAYLOAD. The former is recommended, because it provides an integrity check. If for any reason your data were to become corrupted on the wire in a way that TCP failed to detect, S3 would automatically reject the corrupt upload and not create the object.
See also https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html

Restricting access to AWS S3 bucket based on referer

I'm trying to restrict access to a S3 bucket and only allowing certain domains from a list based on the referer.
The bucket policy is basically:
{
"Version": "2012-10-17",
"Id": "http referer domain lock",
"Statement": [
{
"Sid": "Allow get requests originating from specific domains",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::example.com/*",
"Condition": {
"StringLike": {
"aws:Referer": [
"*othersite1.com/*",
"*othersite2.com/*",
"*othersite3.com/*"
]
}
}
}
]
}
This othersite1,2 and 3 call an object that i have stored in my s3 bucket under the domain example.com.
I also have a cloudfront distribution attached to the bucket. I'm using * wildcard before and after the string condition. The referer can be othersite1.com/folder/another-folder/page.html. The referer may also use http or https.
I don't know why I'm getting 403 Forbidden error.
I'm doing this basically because i don't want other sites to call that object.
Any help would be greatly appreciated.
As is necessary for correct caching behavior, CloudFront strips almost all of the request headers off of a request before forwarding it to the origin server.
Referer | CloudFront removes the header.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior
So, if your bucket is trying to block requests based on the referring page, as is sometimes done to prevent hotlinking, S3 will not -- by default -- be able to see the Referer header, because CloudFront doesn't forward it.
And, this is a very good illustration of why CloudFront doesn't forward it. If CloudFront forwarded the header and then blindly cached the result, whether the bucket policy had the intended effect would depend on whether the first request was from one of the intended sites, or from elsewhere -- and other requesters would get the cached response, which might be the wrong response.
(tl;dr) Whitelisting the Referer header for forwarding to the origin (in the CloudFront Cache Behavior settings) solves this issue.
But, there is a bit of a catch.
Now that you are forwarding the Referer header to S3, you've extended the cache key -- the list of things against which CloudFront caches responses -- to include the Referer header.
So, now, for each object, CloudFront will not serve a response from cache unless the incoming request's Referer header matches exactly one from an already-cached request... otherwise the request has to go to S3. And, the thing about the referer header, it's the referring page, not the referring site, so each page from the authorized sites will have its own cached copy of these assets in CloudFront.
This, itself, is not a problem. There is no charge for these extra copies of objects, and this is how CloudFront is designed to work... the problem is, it reduces the likelihood of a given object being in a given edge cache, since each object will necessarily be referenced less. This becomes less significant -- to the point of insignificance -- if you have a large amount of traffic, and more significant if your traffic is smaller. Fewer cache hits means slower page loads and more requests going to S3.
There is not a correct answer to whether or not this is ideal for you, because it is very specific to exactly how you are using CloudFront and S3.
But, here's the alternative:
You can remove the Referer header from the whitelist of headers to forward to S3 and undo that potential for negatively impacting cache hits, by configuring CloudFront to fire a Lambda#Edge Viewer Request trigger that will inspect each request as it comes in the front door, and block those requests that don't come from referring pages that you want to allow.
A Viewer Request trigger fires after the specific Cache Behavior is matched, but before the actual cache is checked, and with most of the incoming headers still intact. You can allow the request to proceed, optionally with modifications, or you can generate a response and cancel the rest of the CloudFront processing. That's what I'm illustrating, below -- if the host part of the Referer header isn't in the array of acceptable values, we generate a 403 response; otherwise, the request continues, the cache is checked, and the origin consulted only as needed.
Firing this trigger adds a small amount of overhead to every request, but that overhead may amortize out to being more desirable than a reduced cache hit rate. So, the following is not a "better" solution -- just an alternate solution.
This is a Lambda function written in Node.js 6.10.
'use strict';
const allow_empty_referer = true;
const allowed_referers = ['example.com', 'example.net'];
exports.handler = (event, context, callback) => {
// extract the original request, and the headers from the request
const request = event.Records[0].cf.request;
const headers = request.headers;
// find the first referer header if present, and extract its value;
// then take http[s]://<--this-part-->/only/not/the/path.
// the || [])[0]) || {'value' : ''} construct is optimizing away some if(){ if(){ if(){ } } } validation
const referer_host = (((headers.referer || [])[0]) || {'value' : ''})['value'].split('/')[2];
// compare to the list, and immediately allow the request to proceed through CloudFront
// if we find a match
for(var i = allowed_referers.length; i--;)
{
if(referer_host == allowed_referers[i])
{
return callback(null,request);
}
}
// also test for no referer header value if we allowed that, above
// usually, you do want to allow this
if(allow_empty_referer && referer_host === "")
{
return callback(null,request);
}
// we did not find a reason to allow the request, so we deny it.
const response = {
status: '403',
statusDescription: 'Forbidden',
headers: {
'vary': [{ key: 'Vary', value: '*' }], // hint, but not too obvious
'cache-control': [{ key: 'Cache-Control', value: 'max-age=60' }], // browser-caching timer
'content-type': [{ key: 'Content-Type', value: 'text/plain' }], // can't return binary (yet?)
},
body: 'Access Denied\n',
};
callback(null, response);
};

Google IAP Public Keys Expiry?

This page provides public keys to decrypt headers from Google's Identity Aware Proxy. Making a request to the page provides its own set of headers, one of which is Expires (it contains a datetime).
What does the expiration date actually mean? I have noticed it fluctuating occasionally, and have not noticed the public keys changing at the expiry time.
I have read about Securing Your App With Signed Headers, and it goes over how to fetch the keys after every key ID mismatch, but I am looking to make a more efficient cache that can fetch the keys less often based on the expiry time.
Here are all the headers from the public keys page:
Accept-Ranges →bytes
Age →1358
Alt-Svc →quic=":443"; ma=2592000; v="39,38,37,36,35"
Cache-Control →public, max-age=3000
Content-Encoding →gzip
Content-Length →519
Content-Type →text/html
Date →Thu, 29 Jun 2017 14:46:55 GMT
Expires →Thu, 29 Jun 2017 15:36:55 GMT
Last-Modified →Thu, 29 Jun 2017 04:46:21 GMT
Server →sffe
Vary →Accept-Encoding
X-Content-Type-Options →nosniff
X-XSS-Protection →1; mode=block
The Expires header controls how long HTTP caches are supposed to hold onto that page. We didn't bother giving Google's content-serving infrastructure any special instructions for the keyfile, so whatever you're seeing there is the default value.
Is there a reason the "refresh the keyfile on lookup failure" approach isn't a good fit for your application? I'm not sure you'll be able to do any better than that, since:
Unless there's a bug or problem, you should never get a key lookup failure.
Even if you did have some scheduled key fetch, it'd probably still be advisable to refresh the keyfile on lookup failure as a fail-safe.
We don't currently rotate the keys super-frequently, though that could change in the future (which is why we don't publish the rotation interval), so it shouldn't be a significant source of load. Are you observing that refreshing the keys is impacting you?
--Matthew, Google Cloud IAP engineer

Cloudfront TTL not working

I'm having a problem and tried to follow answers here in forum, but with no success whatsoever.
In order to generate thumbnails, I have set up the following schema:
S3 Account for original images
Ubuntu Server using NGINX and Thumbor
Cloudfront
The user uploads original images to S3, which will be pulled through Ubuntu Server with Cloudfront in front of the request:
http://cloudfront.account/thumbor-server/http://s3.aws...
The big deal is, that we often loose objects in Cloudfront, I want them to stay 360 days in cache.
I get following response through Cloudfront URL:
Cache-Control:max-age=31536000
Connection:keep-alive
Content-Length:4362
Content-Type:image/jpeg
Date:Sun, 26 Oct 2014 09:18:31 GMT
ETag:"cc095261a9340535996fad26a9a882e9fdfc6b47"
Expires:Mon, 26 Oct 2015 09:18:31 GMT
Server:nginx/1.4.6 (Ubuntu)
Via:1.1 5e0a3a528dab62c5edfcdd8b8e4af060.cloudfront.net (CloudFront)
X-Amz-Cf-Id:B43x2w80SzQqvH-pDmLAmCZl2CY1AjBtHLjN4kG0_XmEIPk4AdiIOw==
X-Cache:Miss from cloudfront
After a new refresh, I get:
Age:50
Cache-Control:max-age=31536000
Connection:keep-alive
Date:Sun, 26 Oct 2014 09:19:21 GMT
ETag:"cc095261a9340535996fad26a9a882e9fdfc6b47"
Expires:Mon, 26 Oct 2015 09:18:31 GMT
Server:nginx/1.4.6 (Ubuntu)
Via:1.1 5e0a3a528dab62c5edfcdd8b8e4af060.cloudfront.net (CloudFront)
X-Amz-Cf-Id:slWyJ95Cw2F5LQr7hQFhgonG6oEsu4jdIo1KBkTjM5fitj-4kCtL3w==
X-Cache:Hit from cloudfront
My Nginx responses as following:
Cache-Control:max-age=31536000
Content-Length:4362
Content-Type:image/jpeg
Date:Sun, 26 Oct 2014 09:18:11 GMT
Etag:"cc095261a9340535996fad26a9a882e9fdfc6b47"
Expires:Mon, 26 Oct 2015 09:18:11 GMT
Server:nginx/1.4.6 (Ubuntu)
Why does Cloudfront not store my objects as indicated? Max-Age is set?
Many thanks in advance.
Your second request shows that the object was indeed cached. I assume you see that, but the question doesn't make it clear.
The Cache-Control: max-age only specifies the maximum age of your objects in the Cloudfront Cache at any particular edge location. There is no minimum time interval for which your objects are guaranteed to persist... after all, Cloudfront is a cache, which is volatile by definition.
If an object in an edge location isn't frequently requested, CloudFront might evict the object—remove the object before its expiration date—to make room for objects that are more popular.
— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html
Additionally, there is no concept of Cloudfront as a whole having a copy of your object. Each edge location's cache appears to operate independently of the others, so it's not uncommon to see multiple requests for relatively popular objects coming from different Cloudfront edge locations.
If you are trying to mediate the load on your back-end server, it might make sense to place some kind of cache that you control, in front of it, like varnish, squid, another nginx or a custom solution, which is how I'm accomplishing this in my systems.
Alternately, you could store every result in S3 after processing, and then configure your existing server to check S3, first, before attempting the work of resizing the object again.
Then why is there a documented "minimum" TTL?
On the same page quoted above, you'll also find this:
For web distributions, if you add Cache-Control or Expires headers to your objects, you can also specify the minimum amount of time that CloudFront keeps an object in the cache before forwarding another request to the origin.
I can see why this, and the tip phrase cited on the comment, below...
The minimum amount of time (in seconds) that an object is in a CloudFront cache before CloudFront forwards another request to your origin to determine whether an updated version is available. 
...would seem to contradict my answer. There is no contradiction, however.
The minimum ttl, in simple terms, establishes a lower boundary for the internal interpretation of Cache-Control: max-age, overriding -- within Cloudfront -- any smaller value sent by the origin server. Server says cache it for 1 day, max, but configured minimum ttl is 2 days? Cloudfront forgets about what it saw in the max-age header and may not check the origin again on subsequent requests for the next 2 days, rather than checking again after 1 day.
The nature of a cache dictates the correct interpretation of all of the apparent ambiguity:
Your configuration limits how long Cloudfront MAY serve up cached copies of an object, and the point after which it SHOULD NOT continue to return the object from its cache. They do not mandate how long Cloudfront MUST maintain the cached copy, because Cloudfront MAY evict an object at any time.
If you set the Cache-Control: header correctly, Cloudfront will consider the larger of max-age or your Minimum TTL as the longest amount of time you want them to serve up the cached copy without consulting the origin server again.
As your site traffic increases, this should become less of an issue, since your objects will be more "popular," but fundamentally there is no way to mandate that Cloudfront maintain a copy of an object.

S3 PUT Bucket to a location endpoint results in a MalformedXML exception

I'm trying to create an AWS s3 bucket using libCurl thusly:
Location end-point
curl_easy_setopt(curl, CURLOPT_URL, "http://s3-us-west-2.amazonaws.com/");
Assembled RESTful HTTP header:
PUT / HTTP/1.1
Date:Fri, 18 Apr 2014 19:01:15 GMT
x-amz-content-sha256:ce35ff89b32ad0b67e4638f40e1c31838b170bbfee9ed72597d92bda6d8d9620
host:tempviv.s3-us-west-2.amazonaws.com
x-amz-acl:private
content-type:text/plain
Authorization: AWS4-HMAC-SHA256 Credential=AKIAISN2EXAMPLE/20140418/us-west-2/s3/aws4_request, SignedHeaders=date;x-amz-content-sha256;host;x-amz-acl;content-type, Signature=e9868d1a3038d461ff3cfca5aa29fb5e4a4c9aa3764e7ff04d0c689d61e6f164
Content-Length: 163
The body contains the bucket configuration
http://s3.amazonaws.com/doc/2006-03-01/">us-west-2
I get the following exception back.
MalformedXMLThe XML you provided was not well-formed or did not validate against our published schema
I've been able to carry out the same operation through the aws cli.
Things I've also tried.
1) In the xml, used \ to escape the quotes (i.e., xmlns=\"http:.../\").
2) Not providing a CreateBucketConfiguration ("Although s3 documentation suggests this is not allowed when sending the request to a location endpoint").
3) A get service call to the same end point is listing all the provisioned buckets correctly.
Please do let me know if there is anything else I might be missing here.
Ok, the problem was that I was not transferring the entire xml across as was revealed by a wireshark trace. Once I fixed it, the problem went away.
Btw... escaping the quotes with a \ works but the & quot ; does not.