Is there any way for the IP once denied by a WAF rule to be unbarred again passing through the rule? - google-cloud-platform

I have set up Google Cloud Armor security policy referring to https://cloud.google.com/armor/docs/rules-language-reference. It worked fine. My simulated SQL injection attack from my office was detected and subsequent accesses were blocked. Stackdriver log entry shows corresponding enforcedSecurityPolicy outcome of "deny" and applied expression ID was "owasp-crs-v030001-id942421-sqli". The key WAF rule is as follows:
evaluatePreconfiguredExpr('xss-stable') && evaluatePreconfiguredExpr('sqli-stable')
One point I cannot control. After my simulated attack, all accesses from my office are blocked all the way along. Once I detached and re-attached the Cloud Armor security policy from and to LB, the access from my office are still blocked. Deleting that security policy and re-created it again does not help. This implies there is an unseen persistent database of SQLi & XSS attackers and my office IP might be registered in it, causing that 'all-the-time' denial.
Question is : how can I remove my IP from that unseen 'SQLi & XSS blacklist' database to regain backend access without modifying rules? In our Cloud Armor production operation, once-forbidden IP may want to regain access to the target backend service after its attack source is removed.
Certainly, if I add higher priority permission rule than the WAF rule, I can regain access to the target backend, but WAF check will be bypassed, which is not what I want.
Thank you in advance for your time.
R.Kurishima

I had a similar situation and almost concluded the same thing you did -- that there's some kind of hidden blacklist. But after playing around some more, I figured out that, instead, some other non-malicious cookies in my request were triggering owasp-crs-v030001-id942421-sqli ("Restricted SQL Character Anomaly Detection (cookies): # of special characters exceeded (3)" -- and later owasp-crs-v030001-id942420-sqli ("Restricted SQL Character Anomaly Detection (cookies): # of special characters exceeded (8)"). Not a hidden blacklist.
Near as I can tell, these two rules use the number of 'special' characters in the Cookies header, and not the number of special characters in each cookie. Furthermore, the equals sign -- which is used for each cookie -- counts as a special character. Same with the semicolon. Irritating.
So this request will trigger 942420:
curl 'https://example.com/' -H 'cookie: a=a; b=b; c=c; d=d; e=e;'
And this will trigger 942421:
curl 'https://example.com/' -H 'cookie: a=a; b=b;'
So probably best to disable these two rules, something like
evaluatePreconfiguredExpr('sqli-canary', [
'owasp-crs-v030001-id942420-sqli',
'owasp-crs-v030001-id942421-sqli'
])

Related

modsecurity blocking but not logging a violation

So it's not that there are no logs, there are actually many violations logged, its just an issue I'm having with a few people; 10s of violations out of millions of requests. To make it easy to differentiate between modsecurity and backend violations I changed SecDefaultAction to a status of 406, works like a charm.
It's not a performance issue, the modsecurity servers are in an auto-scaling group and hardly taxed. I can see in our Kinesis logs the return code of 406 being sent to the user, as well as actually seeing the 406 in their browser. There is no corresponding modsecurity violation though.
The modsecurity servers are all behind load balancers and dont see the users IPs, I dont have any DOS or IP Reputation on anyway.
The only thing I really have to go on is, while we were in DetectionOnly these particular users would trigger a 930120 when they logged in.
"request": "GET /a/environment_settings.js HTTP/2.0", "id": "930120"
"Matched Data: <omitted> found within REQUEST_COOKIES:access_token: <omitted>
We turned the rule on and I wrote the following in crs-after:
SecRuleUpdateTargetByTag "attack-lfi" "!REQUEST_COOKIES:access_token"
Everybody was fine logging in except for this one user. Unfortunately I have nothing to go on because while they get a 406, nothing is logged for it. At one time 941150 would silently increment the anomaly counter but that rule isn't in play here. I was wondering if there are any other rules that may silently increment. Or any thoughts on how to debug this.
OWASP ModSecurity Core Rule Set dev-on-duty here. To resolve the false positive with the CRS rule 930120 you can do the following:
Put the following tuning rule into crs-after (you're right here).
SecRuleUpdateTargetById 930120 !REQUEST_COOKIES:access_token
I highly recommend the tuning tutorials of CRS co-lead Christian that can be found here: https://www.netnea.com/cms/apache-tutorials/. There you'll also find a tuning cheat sheet.
In the logs you should see the rules that increment the anomaly score. There shouldn't be a rule that increments silently.

Making GCP Load-balancer HTTP logs integrate with Cloud Trace (in GCP Logs Explorer)?

Cloud Trace and Cloud Logging integrate quite nicely in most cases, described in https://cloud.google.com/trace/docs/trace-log-integration
Unfortunately, this doesn't seem to include the HTTP request logs generated by a Load Balancer when request logging is enabled.
The LB logs show the traces icon, and are correctly associated with an overall trace in the Cloud Trace system, but the context menu 'show trace details' is greyed out for those log items.
A similar problem arose with my application level logging/tracing, and was solved by setting the traceSampled attribute on the LogEntry, but this can't work for LB logs, since I'm not in control of their generation.
In this instance I'm tracing 100% of requests since the service is M2M and fairly low volume, but in the general case it makes sense that the LB can't know if something is actually generating traces without being told.
I can't find any good references in the docs, but in theory a response header indicating it was sampled could be observed by the LB and cause it to issue the appropriate log.
Any ideas if such a feature exists, in this form or any other?
(Last-ditch workaround might be to use Logs Router to feed LB logs into a pubsub queue (and exclude them from normal logging sinks), and resubmit them to the normal sink(s) with fields appropriately set by some Cloud Function or other pubsub consumer, but that seems like a lot of work and complexity for this purpose)
There is currently a Feature Request created for this, you can follow the status in the following link 1.
As a workaround, you could implement target proxies along with your Load Balancer, according to the documentation for a Global external HTTP(S) load balancer:
The proxies set HTTP request/response headers as follows:
Via: 1.1 google (requests and responses)
X-Forwarded-Proto: [http | https] (requests only)
X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options> (requests only) Contains parameters for Cloud Trace.
X-Forwarded-For: [<supplied-value>,]<client-ip>,<load-balancer-ip>
(see X-Forwarded-For header) (requests only)
You can find the complete documentation about external HTTP(S) load balancers and target proxies here 2.
And finally, take a look at the documentation on how to use and configure target proxies here 3.

Should django health-check endpoint /ht/ be accessible from everybody?

From the documentation reported here I read
This project checks for various conditions and provides reports when
anomalous behavior is detected.The following health checks are bundled
with this project: cache, database, storage, disk and memory
utilization (viapsutil), AWS S3 storage, Celery task queue, Celery
ping, RabbitMQ, Migrations
and from use case section
The primary intended use case is to monitor conditions via HTTP(S),
with responses available in HTML and JSONformats. When you get back a
response that includes one or more problems, you can then decide the
appropriate courseof action, which could include generating
notifications and/or automating the replacement of a failing node with
a newone
And then
The /ht/ endpoint will respond aHTTP 200 if all checks passed and a HTTP
500 if any of the tests failed.
From a security point of view: should this url (https://example.com/ht) be reachable from everybody? It seems to give away different information.

Does Cloudwatch not log Lambda functions from Cloudfront?

Does Cloudfront require special settings to trigger a log?
I have the following flow:
Devices -> Cloudfront -> API Gateway -> Lambda Function
which works, but Cloudwatch doesn't seem to create logs for the lambda function (or API Gateway).
However, the following flow creates logs:
Web/Curl -> API Gateway -> Lambda Function
In comments, above, we seem to have arrived at a conclusion that unanticipated client-side caching (or caching somewhere between the client and the AWS infrastructure) may be a more appropriate explanation for the observed behavior, since there is no known mechanism by which an independent CloudFront distribution could access a Lambda function via API Gateway and cause those requests not to be logged by Lambda.
So, I'll answer this with a way to confirm or reject this hypothesis.
CloudFront injects a header into both requests and responses, X-Amz-Cf-Id, containing opaque tokens that uniquely identify the request and the response. Documentation refers to these as "encrypted," but for our purposes, they're simply opaque values with a very high probability of uniqueness.
In spite of having the same name, the request header and the response header are actually two uncorrelated values (they don't match each other on the same request/response).
The origin-side X-Amz-Cf-Id is sent to the origin server in the request is only really useful to AWS engineers, for troubleshooting.
But the viewer-side X-Amz-Cf-Id returned by CloudFront in the response is useful to us, because not only is it unique to each response (even responses from the CloudFront cache have different values each time you fetch the same object) but it also appears in the CloudFront access logs as x-edge-request-id (although the documentation does not appear to unambiguously state this).
Thus, if the client side sees duplicate X-Amz-Cf-Id values across multiple responses, there is something either internal to the client or between the client and CloudFront (in the client's network or ISP) that is causing cached responses to be seen by the client.
Correlating the X-Amz-Cf-Id from the client across multiple responses may be useful (since they should never be the same) and with the CloudFront logs may also be useful, since this confirms the timestamp of the request where CloudFront actually generated this particular response.
tl;dr: observing the same X-Amz-Cf-Id in more than one response means caching is occurring outside the boundaries of AWS.
Note that even though CloudFront allows min/max/default TTLs to impact how long CloudFront will cache the object, these settings don't impact any downstream or client caching behavior. The origin should return correct Cache-Control response headers (e.g. private, no-cache, no-store) to ensure correct caching behavior throughout the chain. If the origin behavior can't be changed, then Lambda#Edge origin response or viewer response triggers can be used to inject appropriate response headers -- see this example on Server Fault.
Note also that CloudFront caches 4xx/5xx error responses for 5 minutes by default. See Amazon CloudFront Latency for an explanation and steps to disable this behavior, if desired. This feature is designed to give the origin server a break, and not bombard it with requests that are presumed to continue to fail, anyway. This behavior may cause various problems in testing as well as production, so there are cases where it should be disabled.

How do I add simple licensing to api when using AWS Cloudfront to cache queries

I have an application deployed on AWS Elastic Beanstalk, I added some simple licensing to stop abuse of the api, the user has to pass a licensekey as a field
i.e
search.myapi.com/?license=4ca53b04&query=fred
If this is not valid then the request is rejected.
However until the monthly updates the above query will always return the same data, therefore I now point search.myapi.com to an AWS CloudFront distribution, then only if query is not cached does it go to actual server as
direct.myapi.com/?license=4ca53b04&query=fred
However the problem is that if two users make the same query they wont be deemed the same by Cloudfront because the license parameter is different. So the Cloudfront caching is only working at a per user level which is of no use.
What I want to do is have CloudFront ignore the license parameters for caching but not the other parameters. I dont mind too much if that means user could access CloudFront with invalid license as long as they cant make successful query to server (since CloudFront calls are cheap but server calls are expensive, both in terms of cpu and monetary cost)
Perhaps what I need is something in front of CloudFront that does the license check and then strips out the license parameter but I don't know what that would be ?
Two possible come to mind.
The first solution feels like a hack, but would prevent unlicensed users from successfully fetching uncached query responses. If the response is cached, it would leak out, but at no cost in terms of origin server resources.
If the content is not sensitive, and you're only trying to avoid petty theft/annoyance, this might be viable.
For query parameters, CloudFront allows you to forward all, cache on whitelist.
So, whitelist query (and any other necessary fields) but not license.
Results for a given query:
valid license, cache miss: request goes to origin, origin returns response, response stored in cache
valid license, cache hit: response served from cache
invalid license, cache hit: response served from cache
invalid license, cache miss: response goes to origin, origin returns error, error stored in cache.
Oops. The last condition is problematic, because authorized users will receive the cached error if the make the same query.
But we can fix this, as long as the origin returns an HTTP error for an invalid request, such as 403 Forbidden.
As I explained in Amazon CloudFront Latency, CloudFront caches responses with HTTP errors using different timers (not min/default/max-ttl), with a default of t minutes. This value can be set to 0 (or other values) for each of several individual HTTP status codes, like 403. So, for the error code your origin returns, set the Error Caching Minimum TTL to 0 seconds.
At this point, the problematic condition of caching error responses and playing them back to authorized clients has been solved.
The second option seems like a better idea, overall, but would require more sophistication and probably cost slightly more.
CloudFront has a feature that connects it with AWS Lambda, called Lambda#Edge. This allows you to analyze and manipulate requests and responses using simple Javascript scripts that are run at specific trigger points in the CloudFront signal flow.
Viewer Request runs for each request, before the cache is checked. It can allow the request to continue into CloudFront, or it can stop processing and generate a reaponse directly back to the viewer. Generated responses here are not stored in the cache.
Origin Request runs after the cache is checked, only for cache misses, before the request goes to the origin. If this trigger generates a response, the response is stored in the cache and the origin is not contacted.
Origin Response runs after the origin response arrives, only for cache misses, and before the response goes onto the cache. If this trigger modifies the response, the modified response stored in the cache.
Viewer Response runs immediately before the response is returned to the viewer, for both cache misses and cache hits. If this trigger modifies the response, the modified response is not cached.
From this, you can see how this might be useful.
A Viewer Request trigger could check each request for a valid license key, and reject those without. For this, it would need access to a way to validate the license keys.
If your client base is very small or rarely changes, the list of keys could be embedded in the trigger code itself.
Otherwise, it needs to validate the key, which could be done by sending a request to the origin server from within the trigger code (the runtime environment allows your code to make outbound requests and receive responses via the Internet) or by doing a lookup in a hosted database such as DynamoDB.
Lambda#Edge triggers run in Lambda containers, and depending on traffic load, observations suggest that it is very likely that subsequent requests reaching the same edge location will be handled by the same container. Each container only handles one request at a time, but the container becomes available for the next request as soon as control is returned to CloudFront. As a consequence of this, you can cache the results in memory in a global data structure inside each container, significantly reducing the number of times you need to ascertain whether a license key is valid. The function either allows CloudFront to continue processing as normal, or actively rejects the invalid key by generating its own response. A single trigger will cost you a little under $1 per million requests that it handles.
This solution prevents missing or unauthorized license keys from actually checking the cache or making query requests to the origin. As before, you would want to customize the query string whitelist in the CloudFront cache behavior settings to eliminate license from the whitelist, and change the error caching minimum TTL to ensure that errors are not cached, even though these errors should never occur.