modsecurity blocking but not logging a violation - mod-security

So it's not that there are no logs, there are actually many violations logged, its just an issue I'm having with a few people; 10s of violations out of millions of requests. To make it easy to differentiate between modsecurity and backend violations I changed SecDefaultAction to a status of 406, works like a charm.
It's not a performance issue, the modsecurity servers are in an auto-scaling group and hardly taxed. I can see in our Kinesis logs the return code of 406 being sent to the user, as well as actually seeing the 406 in their browser. There is no corresponding modsecurity violation though.
The modsecurity servers are all behind load balancers and dont see the users IPs, I dont have any DOS or IP Reputation on anyway.
The only thing I really have to go on is, while we were in DetectionOnly these particular users would trigger a 930120 when they logged in.
"request": "GET /a/environment_settings.js HTTP/2.0", "id": "930120"
"Matched Data: <omitted> found within REQUEST_COOKIES:access_token: <omitted>
We turned the rule on and I wrote the following in crs-after:
SecRuleUpdateTargetByTag "attack-lfi" "!REQUEST_COOKIES:access_token"
Everybody was fine logging in except for this one user. Unfortunately I have nothing to go on because while they get a 406, nothing is logged for it. At one time 941150 would silently increment the anomaly counter but that rule isn't in play here. I was wondering if there are any other rules that may silently increment. Or any thoughts on how to debug this.

OWASP ModSecurity Core Rule Set dev-on-duty here. To resolve the false positive with the CRS rule 930120 you can do the following:
Put the following tuning rule into crs-after (you're right here).
SecRuleUpdateTargetById 930120 !REQUEST_COOKIES:access_token
I highly recommend the tuning tutorials of CRS co-lead Christian that can be found here: https://www.netnea.com/cms/apache-tutorials/. There you'll also find a tuning cheat sheet.
In the logs you should see the rules that increment the anomaly score. There shouldn't be a rule that increments silently.

Related

Is there any way for the IP once denied by a WAF rule to be unbarred again passing through the rule?

I have set up Google Cloud Armor security policy referring to https://cloud.google.com/armor/docs/rules-language-reference. It worked fine. My simulated SQL injection attack from my office was detected and subsequent accesses were blocked. Stackdriver log entry shows corresponding enforcedSecurityPolicy outcome of "deny" and applied expression ID was "owasp-crs-v030001-id942421-sqli". The key WAF rule is as follows:
evaluatePreconfiguredExpr('xss-stable') && evaluatePreconfiguredExpr('sqli-stable')
One point I cannot control. After my simulated attack, all accesses from my office are blocked all the way along. Once I detached and re-attached the Cloud Armor security policy from and to LB, the access from my office are still blocked. Deleting that security policy and re-created it again does not help. This implies there is an unseen persistent database of SQLi & XSS attackers and my office IP might be registered in it, causing that 'all-the-time' denial.
Question is : how can I remove my IP from that unseen 'SQLi & XSS blacklist' database to regain backend access without modifying rules? In our Cloud Armor production operation, once-forbidden IP may want to regain access to the target backend service after its attack source is removed.
Certainly, if I add higher priority permission rule than the WAF rule, I can regain access to the target backend, but WAF check will be bypassed, which is not what I want.
Thank you in advance for your time.
R.Kurishima
I had a similar situation and almost concluded the same thing you did -- that there's some kind of hidden blacklist. But after playing around some more, I figured out that, instead, some other non-malicious cookies in my request were triggering owasp-crs-v030001-id942421-sqli ("Restricted SQL Character Anomaly Detection (cookies): # of special characters exceeded (3)" -- and later owasp-crs-v030001-id942420-sqli ("Restricted SQL Character Anomaly Detection (cookies): # of special characters exceeded (8)"). Not a hidden blacklist.
Near as I can tell, these two rules use the number of 'special' characters in the Cookies header, and not the number of special characters in each cookie. Furthermore, the equals sign -- which is used for each cookie -- counts as a special character. Same with the semicolon. Irritating.
So this request will trigger 942420:
curl 'https://example.com/' -H 'cookie: a=a; b=b; c=c; d=d; e=e;'
And this will trigger 942421:
curl 'https://example.com/' -H 'cookie: a=a; b=b;'
So probably best to disable these two rules, something like
evaluatePreconfiguredExpr('sqli-canary', [
'owasp-crs-v030001-id942420-sqli',
'owasp-crs-v030001-id942421-sqli'
])

Random “upstream connect error or disconnect/reset before headers” between services with Istio 1.3

So, this problem is happening randomly (it seems) and between different services.
For example we have a service A which needs to talk to service B, and some times we get this error, but after a while, the error goes away. And this error doesn't happen too often.
When this happens, we see the error log in service A throwing the “upstream connect error” message, but none in service B. So we think it might be related with the sidecars.
One thing we notice is that in service B, we get a lot of this error messages in the istio-proxy container:
[src/istio/mixerclient/report_batch.cc:109] Mixer Report failed with: UNAVAILABLE:upstream connect error or disconnect/reset before headers. reset reason: connection failure
And according to documentation when a request comes in, envoy asks Mixer if everything is good (authorization and other things), and if Mixer doesn’t reply, the request is not success. So that’s why exists an option called policyCheckFailOpen.
We have that in false, I guess is a sane default, we don’t want the request to go through if Mixer cannot be reached, but why can’t?
disablePolicyChecks: true
policyCheckFailOpen: false
controlPlaneSecurityEnabled: false
NOTE: istio-policy is running with the istio-proxy sidecar. Is that correct?
We don’t see that error in some other service which can also fail.
Another log that I can see a lot, and this one happens in all the services not running as root with fsGroup defined in the YAML files is:
watchFileEvents: "/etc/certs": MODIFY|ATTRIB
watchFileEvents: "/etc/certs/..2020_02_10_09_41_46.891624651": MODIFY|ATTRIB
watchFileEvents: notifying
One of the leads I'm chasing is about default circuitBreakers values. Could that be related with this?
Thanks
The error you are seeing is because of a failure to establish a connection to istio-policy
Based on this github issue
Community members add two answers here which could help you with your issue
If mTLS is enabled globally make sure you set controlPlaneSecurityEnabled: true
I was facing the same issue, then I read about protocol selection. I realised the name of the port in the service definition should start with for example http-. This fixed the issue for me. And . if you face the issue still you might need to look at the tls-check for the pods and resolve it using destinationrules and policies.
istio-policy is running with the istio-proxy sidecar. Is that correct?
Yes, I just checked it and it's with sidecar.
Let me know if that help.

Webhook call failed: URL_REJECTED error in DialogFlow v2 Fulfillments

Error description
Upon calling DialogFlow v2 detectIntent API, we randomly get an internal error with status code 13:
Webhook call failed. Fetch failure with no HTTP status code. Status: State: URL_REJECTED Reason: 67
This error seems to happen randomly. The same request can succeed or fail.
Interesting point, the service has been deteriorating since Friday 23th August 2019, to fail on almost every call today.
Our investigation
We didn't find anything at all about URL_REJECTED with DialogFlow or Google on internet.
But we found the meaning of the status code 13 on this page:
Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors.
We also checked that we aren't banning Google IP, our that our load-balancing is not messed up (we thought of that since it would make sense with random fails).
The webhook is up and running, and we can call it ourselves. The problem seems to happen in Google's infra, as the error code 13 seems to show.
(I answer immediatly because we fixed it before posting the question. But I posted nevertheless because it may be useful for others)
The problem was that the webhook was called using http.
Setting https solved the problem.
It seems that Google activated a webhook policy of rejecting unsecure calls in their servers.
It may have been deployed gradually on their cluster, which would explain the gradual degradation.
We know that we should have migrated to https a long time ago, but still we didn't find any mention of the application of this policy on the net.
Thank you for posting this. I came across the same issue. Changed my webhook to HTTPS seems to fix the problem.

Unusual request activity log found in django server

Following is the screenshot of the server activity log.I can see that many requests are automatically raised in the server.How can I avoid this.?
It looks like someone is fuzzing your website and scanning to find any common file names or extensions that commonly have security vulnerabilities. One way to limit this behaviour is to implement rate limiting whereby you might limit the number of requests a user makes that result in HTTP 404 Not Found during some time period before giving them a temporary ban. Note: this solution doesn't stop this from happening but it does buy you time and may deter the attacker or researcher

Appfabric Cache Perfmon Errors

We have a critical system that is highly dependent on Appfabric Caching. The setup we use is three nodes which serves around 2000 simultaneous connections and 150-200 requests/second.
Configurations are the default ones. We receives maybe 5-10 "ErrorCode:SubStatus" each day which is unacceptable.
I have added some performance counters but I can't see anything weird except that we sometimes see values on "Total Failure Exceptions / sec" and "Total Failure Exceptions" is increasing but one 2-3 times a day.
I would like to see what these errors comes from but I can't find them in any logs in the Event Viewer (enabled them all according to documentation). Does anyone know if these errorc could be logged somewhere and/or if it possible to seem them in any other way?
We receives maybe 5-10 "ErrorCode:SubStatus" each day which is
unacceptable.
Between 5 or 10 errors per day, with 150 requests/sec per day ?. It's quite anecdotic. Your cache client have to always handle properly caching errors. A network failure can always occurs.
5-10 "ErrorCode:SubStatus" is quite obsur. There are more than 50 error codes in AppFabric Caching. Try to get exactly these error codes. See full list here.
would like to see what these errors comes from but I can't find them
in any logs in the Event Viewer (enabled them all according to
documentation). Does anyone know if these errorc could be logged
somewhere and/or if it possible to seem them in any other way?
The only documentation available is here. The event viewer is useful to regularly monitor the health of the cache cluster. However, when troubleshooting an error, it is possible to get an even more detailed log of the cache cluster activities. I'm not sure, this will help you a lot because it's sometimes too specific.