AWS API Gateway GET response always cached - amazon-web-services

Update: I figured it out, please see the answer post below.
I have an AWS API Gateway api defined with various resources and various GET and POST methods.
Everything works mostly fine.
POSTs are going through.
GETs return a response (JSON payload) except that the returned value seems to be a cached value.
My GET api calls a Lambda function that calls a query to RDS.
I can confirm my responses are stale because:
When I manually Query the RDS, I get the updated value
I have Cloud Watch logs enabled and the lambda function does not get called (I believe I have it set up correctly because when I test invoke the lambda, I can get Cloud Watch logs)
It did refresh once, but I think that was because I crossed some (like 1hr) caching threshold or something.
I understand that API Gateway generates a CloudFront behind the scenes.
And I feel that this is what is doing the caching. But that's just a guess and I have no proof. Maybe some kind of default caching TTL?
I obviously have caching turned off on my API Gateway stage.
I even tried enabling it, setting the TTL to 1, flushing the cache, and disabling cache again.
Each stage of that testing still returned the stale values.
I do not know if it is relevant, but additional details:
I have CORS enabled ("*")
I have Cognito authorizers enabled
I pass in the JWT token via the Authorization header (this is all working fine)
Is there some header I'm supposed to pass to request an uncached value?
I went to CloudFront, but here are no configurations there.
All other posts on API Gateway caching seem to be about caching not working or people asking about cache key specificity.
I haven't seen anything about the value ALWAYS being cached no matter what. So I feel like I'm missing something obvious...
Any help or debugging tips would be much appreciated!

Ok, so I feel like an idiot for answering my own question but hopefully it helps someone one day.
This was not an API Gateway caching issue.
The problem was a pymysql connection & lambda session caching issue.
My Lambda was using pymysql to query the MySQL RDS.
Per recommended performance reasons, I reused the connection across lambdas (meaning I did not close the connection each time).
The solution was to call
conn.commit()
after I did my fetchall()
What was happening was that my subsequent calls were returning a cached query result (termed a consistent read. Thanks! #Michael - sqlbot) I believe I probably had more than one lambda containers or something so when I was inactive for a while (ie busy reading stackoverflow posts), the lambda would unload. Then my next API gateway attempt would reinitialize a fresh lambda handler and a branch new connection would be created (without a cache). So this is why it seems to "sometimes work, then stop".
Apologies if I wasted anyone's time.

Thank you for asking and answering your own question. I had the same issue and was pulling hair out trying to find which API Gateway setting was caching the result.

Related

submit PUT request through CloudFront

Can anyone please help me before I go crazy?
I have been searching for any documentation/sample-code (in JavaScript) for uploading files to S3 via CloudFront but I can't find a proper guide.
I know I could use Tranfer Acceleration feature for faster uploads and yeah, Transfer Acceleration essentially does the job through CloudFront Edge Points but as long as I searched, it is possible to make the POST/PUT request via AWS.CloudFront...
Also read an article posted in 2013 says that AWS just added a functionality to make POST/PUT requests but says not a single thing about how to do it!?
CloudFront documentation for JavaScript sucks, it does not even show any sample codes. All they do is assuming that we already know all the things about the subject. If I knew, why would I dive into documentation in the first place.
I believe there is some confusion here about adding these requests. This feature was added simply to allow POST/PUT requests to be supported for your origin so that functionality in your application such as form submissions or API requests would now function.
The recommended approach as you pointed out is to make use of S3 transfer acceleration, which actually makes use of the CloudFront edge locations.
Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

Express Gateway policy

I am testing out EG policies for my microservices app. One requirement is that whenever express gateway receives a request, I want to invoke a particular service, parse its result, and based on the result decide to proceed for downstream calls or return an error. It appears to be a very standard requirement. Is there any existing policy (could not find any here) for such scenarios or do I need to write a custom one? Thanks
this is Vincenzo — I am the maintainer of Express Gateway :)
Unfortunately you spotted a lack of Express Gateway, which is "post proxy" policies. Fundamentally right now the proxy policy is the last one to be executed and there's nothing else you can do before sending the request to the downstream client.
This is a limitation that we definitely need to fix, although you're the first one to bring up this use case.
This does not mean that you cannot do it now. I think it'd be kind of easy as well, but unfortunately you'd need to fork the Gateway and add some code.
If you could articulate a little bit more your use case, we might evaluate if there's a way to make it happen in the next release :)

How do I configure an Amazon AWS Lambda function to prevent tailing the log in the response?

Please see this:
http://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html
LogType
You can set this optional parameter to Tail in the request
only if you specify the InvocationType parameter with value
RequestResponse. In this case, AWS Lambda returns the base64-encoded
last 4 KB of log data produced by your Lambda function in the
x-amz-log-result header.
Valid Values: None | Tail
So this means any user with valid credentials for invoking a function can also read the logs this function emits?
If so, this is an obvious vulnerability that can give some attacker useful information regarding processing of invalid input.
How do I configure an Amazon AWS Lambda function to prevent tailing the log in the response?
Update 1
1) Regarding the comment: "If a hacker can call your Lambda function, you have
more problems than seeing log files."
Not true: Lambda functions are also meant to be called directly form client code, using the SDK.
As an example, see the picture below from the book "AWS Lambda in Action":
2) Regarding the comment: "How is this a vulnerability exactly? Only someone you have provided AWS IAM credentials would be able to invoke the Lambda function."
Of course, clients do have some credentials, most of the time (for example,
from having signed in to your mobile app with their Facebook account, through Amazon Cognito). Am I supposed to trust all my users?
3) Regarding the comment: "Only if you have put some secure information to be logged."
Logs may contain sensible information. I'm not talking about secure information like passwords, but simply information to help the development team debugging, or the security team finding out about attacks. Applications may log all kinds of information, including why some invalid input failed, which can help an attacker learn what is the valid input. Also, attackers can see all the information the security team is logging about their attacks. Not good. Even privacy may be at risk depending on what you log.
Update 2
It would also solve my problem if I could somehow detect the Tail parameter in the Lambda code. Then I would just fail with a "Tail now allowed" message. Unfortunately the Context object doesn't seem to contain this information.
I think you can't configure AWS Lambda to prevent tailing the log in the response. However, you could use your own logging component instead of using the one provided by Amazon Lambda to avoid the possibility to expose them via the LogType parameter.
Otherwise, I see your point about adding complexity, but using API Gateway is the most common solution to provide the possibility to invoke Lambdas for clients applications that you do not trust.
You're right, not only it's a bad practice, it's obviously (as you already understood) introducing security vulnerabilities.
If you look carefully in the book you will also find this part:
which explains that in order to be more secure, the client requests should hit Amazon API gateway which will expose a clean API interface and which will call the relevant lambda-function without exposing it to the outer-world.
An example of such API is demo'ed in a previous page:
By introducing a middle-layer between the client and AWS-lambda, we take care of authentication, authorization, access and all other points of potential vulnerability.
This is a comment.
While this should be a comment, I am sorry that I do not have yet enough stackoverflow reputation to do so.
Before commenting on this, please note that lambda Invoke may result in more than one execution of your lambda (per AWS documentation)
Invocations occur at least once in response to an event and functions must be idempotent to handle this.
As the LogType is documented as a valid option, I don't think you can prevent it in your backend. However, you need to have a workaround to handle it. I can think of
1- Generate a junk 4KB tail log (by console.log() for example). Then, the attacker will get a junk info. (incur cost only in case of attacker)
2- Use step functions. This is not only to hide the log but to overcome the problem of 'Invocations occur at least once' and have a predictable execution of your backend. It incurs cost though.

Pass IAM identity of AWS API-Gateway calls to backend server

We want to set-up an existing API as SAAS using AWS
Our code has been deployed via elastic-beanstalk, and we created access to the methods via Gateway to manage permissions.
We're now trying to log the user's activity, for billing purposes
Currently, the best solution we found involves a full logging of the calls (Enabled CloudWatch Logs + Log full requests/responses data), which looks quite heavy, and may even end up beeing expensive.
We reworked the request body in the integration request, by adding a mapping template for the body, but this seems heavy and complicated, whe hope there was a better solution we missed.
Basically, we replaced the default "passthrough" with a generated basic "passthrough" code, and added a value "MyUserArn" : "$context.identity.userArn" in it, which fills the requests body with a large mess, but looks like "The most reliable way to avoid to breaking something".
We'd like to just add the IAM user identifier in a header, or query string parameter, but failed to find if this is even possible. Several posts mention an "Invoke with caller credentials" option, but we didn't find this either.
Is is something related to cognito or something else ?
Are we doing something wrong ?
You have a couple different options for getting this information, both of which have trade offs:
Your current solution pulling the value from $context.identity in a mapping template and sending to your Lambda as part of the body. It seems like you are opposed to this given your "large mess" comment, but ultimately you have control over the content passed to your Lambda.
Enable "user caller credentials" on your method and then use identity value inside your Lambda. Currently this only works if you've used credentials vended from a Cognito authentication flow and does require that Lambda invocation also be part of your role policy, but doesn't require any modification of the template.
UPDATE Apologies, I somehow missed you were using Beanstalk and not Lambda. You can definitely just add a header to your integration request and simply have pull its value from $context.identity.userArn.
UPDATE 2 Double apologies, when using context variables in headers, you omit the $ so you need to use context.identity.userArn.

Returning images through AWS API Gateway

I'm trying to use AWS API Gateway as a proxy in front of an image service.
I'm able to get the image to come through but it gets displayed as a big chunk of ASCII because Content-Type is getting set to "application/json".
Is there a way to tell the gateway NOT to change the source Content-Type at all?
I just want "image/jpeg", "image/png", etc. to come through.
I was trying to format a string to be returned w/o quotes and discovered the Integration Response functionality. I haven't tried this fix myself, but something along these lines should work:
Go to the Method Execution page of your Resource,
click on Integration Response,
expand Method Response Status 200,
expand Mapping Templates,
click "application/json",
click the pencil next to Output Passthrough,
change "application/json" to "image/png"
Hope it works!
I apologize, in advance, for giving an answer that does not directly answer the question, and instead suggests you adopt a different approach... but based in the question and comments, and my own experience with what I believe to be a similar application, it seems like you may be using the the wrong tool for the problem, or at least a tool that is not the optimal choice within the AWS ecosystem.
If your image service was running inside Amazon Lambda, the need for API Gateway would be more apparent. Absent that, I don't see it.
Amazon CloudFront provides fetching of content from a back-end server, caching of content (at over 50 "edge" locations globally), no charge for the storage of cached content, and you can configure up to 100 distinct hostnames pointing to a single Cloudfront distribution, in addition to the default xxxxxxxx.cloudfront.net hostname. It also supports SSL. This seems like what you are trying to do, and then some.
I use it, quite successfully for exactly the scenario you describe: "a proxy in front of an image service." Exactly what my image service and your image service do may be different (mine is a resizer that can look up the source URL of missing/never before requested images, fetch, and resize) but fundamentally it seems like we're accomplishing a similar purpose.
Curiously, the pricing structure of CloudFront in some regions (such as us-east-1 and us-west-2) is such that it's not only cost-effective, but in fact using CloudFront can be almost $0.005 cheaper than not using it per gigabyte downloaded.
In my case, in addition to the back-end image service, I also have an S3 bucket with a single file in it, attached to a single path in the CloudFront distribution (as a second "custom origin"), for the sole purpose of serving up /robots.txt, to control direct access to my images by well-behaved crawlers. This allows the robots.txt file to be managed separately from the image service itself.
If this doesn't seem to address your need, feel free to comment and I will clarify or withdraw this answer.
#kjsc: we finally figured out how to get this working on an alternate question with base64 encoded image data which you may find helpful in your solution:
AWS Gateway API base64Decode produces garbled binary?
To answer your question, to get the Content-Type to come through as a hard-coded value, you would first go into the method response screen and add a Content-Type header and whatever Content type you want.
Then you'd go into the Integration Response screen and set the Content type to your desired value (image/png in this example). Wrap 'image/png' in single quotes.