I have a splunk query that produces a summarises errors by frequency
index="pc_1" LogLevel=ERROR
| eval Message=split(_raw,"|")
| stats count(LogLevel) as Frequency by Message
| sort -Frequency
This produces results in the form
Message
Frquency
No such user
137
unable to deliver mail to example#email.com: Unable to reach server
70
unable to deliver mail to example1#email.com: Unable to reach server
43
unable to authenticate user 3456
8
unable to deliver mail to example2#email.com: Unable to reach server
6
unable to authenticate user 2321
5
unable to authenticate user 13321
3
...
.
...
.
...
.
unable to deliver mail to examplen#email.com: Unable to reach server
1
As you can notice in the results produced, some similar errors are being split based on difference in ids of users emails, and machine ids.
I am looking for a way I can group this based on similarities in strings. Currently what I am using is the replace the strings with a common regexp and then find the frequency
index="pc_1" LogLevel=ERROR
| eval Message=split(_raw,"|")
| eval Message=replace("unable to deliver mail to (.)* Unable to reach server", "unable to deliver mail to [email]: Unable to reach server")
| eval Message=replace("unable to authenticate user \d+", "unable to authenticate user [userId]")
| stats count(LogLevel) as Frequency by Message
| sort -Frequency
This approach works but is quite cumbersome as there are a number of different types of errors and if this solution is to be implemented then it require going through each error and developing a regular expression for each.
Is there a way this can be improved with a query that can summarize this error more effectively?
Answer for posterity:
Perhaps the cluster command will help. It groups like messages together.
Related
using Amazon-RDS, with medium sized instance (db.t2.medium) has max connections limit of aroud 400, still get almost full db connections, even when only 2 users are using the app, using it with mobile apis only (android) not making calls from anywhere else.
What might be the issue, where are all these connections coming from ?
DDOS ? can ddos led to this, but we bought brand new server
You're probably not closing connections when you're done with them.
Log into the database as the root user and execute this query:
select HOST, COMMAND, count(*) from INFORMATION_SCHEMA.PROCESSLIST group by 1, 2;
It will give you output that looks like this:
+-----------+---------+----------+
| HOST | COMMAND | count(*) |
+-----------+---------+----------+
| localhost | Query | 1 |
| localhost | Sleep | 1 |
+-----------+---------+----------+
If you have two users with stable IP addresses, you'll probably see four lines of output: two for each user, with a high count for Sleep. This indicates that you're leaving connections open.
If you're running on mobile, however, the IP addresses may not be stable. You'll need to do a second level of analysis to see if they're all from the same ISP(s).
The only way that a DDOS would fill up your connection pool is if you've leaked the database password. If that's the case, you should consider your database corrupted and start over (with more attention to security).
Background: I'm running a Plesk CentOS 6.7 server with 30+ domains. I'm getting huge amounts of spam from a specific TLD (.top in this case). I'm running SpamAssassin and using RBL list (xbl.spamhaus.org). SpamAssassin is flagging most of these messages as spam, but enough are getting through that my server is getting rate limited by Google's mail servers (due to some of my user's email accounts being forwarded to Gmail). I get ZERO legit email from this domain, and memory usage is up a few percent recently, so I'm trying to save some overhead and improve my server reputation by blocking these messages before they even get to Postfix.
I would like to write a filter for fail2ban that would match connections from this TLD, and ban the corresponding IP addresses.
Here are example log entries:
Mar 20 03:12:43 mydomain postfix/smtpd[6557]: connect from whatevermonkey.top[66.199.245.168]
Mar 20 05:07:38 mydomain postfix/smtpd[13299]: connect from someonecat.top[216.169.126.67]
So can anyone help with a REGEX that I could plug in to fail2ban that would match all 'connect from' which included the '.top' TLD?
I've been trying to work this up based on my working postfix-sasl filter (below), but my regex chi is not strong enough... Here is my working filter for postfix-sasl which matches failed login attempts:
failregex = ^%(__prefix_line)swarning: [-._\w]+\[<HOST>\]: SASL ((?i)LOGIN|PLAIN|(?:CRAM|DIGEST)-MD5) authentication failed(: [ A-Za-z0-9+/:]*={0,2})?\s*$
Again, I just want to match IP addresses that are preceded by 'somespammyserver.top' Any help greatly appreciated.
Something like this might be of help:
connect from [\w.]+\.top\[([.\d]+)\]
# look for connect from literally
# followed by \w = a-z0-9_ and . greedily
# followed by .top[
# capture everything that is a digit or a dot into the first group
# (hence the ())
# followed immediately by a closing bracket ]
See a demo on regex101.com.
I am using a Logstash + Elasticsearch stack to aggregate logs from a few interrelated apps.
I am trying to get Monit to alert whenever the word 'ERROR' is returned as part of an Elasticsearch REST query from Monit, but the 'content' regex check does not seem to be working for me. (I am sending email and SMS alerts from Monit via M/Monit.)
I know my Monit and M/Monit instances are configured properly because I can get alerts for server pings and file checksum changes, etc. just fine.
My Monit Elasticsearch HTTP query looks like this:
check host elasticsearch_error with address 12.34.56.789
if failed
url http://12.34.56.789:9200/_search?q=severity%3AERROR%20AND%20timestamp%3A>now-2d
and content = "ERROR"
then alert
BTW, %20 escapes 'space', %3A escapes ':'
My logstash only has error log entries that are between one and two days old. i.e., when I run
http://12.34.56.789:9200/_search?q=severity%3AERROR%20AND%20timestamp%3A>now-2d
in the browser, I see errors (with the word 'ERROR') in the response body, but when I run
http://12.34.56.789:9200/_search?q=severity%3AERROR%20AND%20timestamp%3A>now-1d
I do not. (Note the one-day difference.) This is expected behavior. Note: my response body is a JSON with the "ERROR" string in a child element a few levels down. I don't know if this affects how Monit processes the regex.
When I run the check as above I see
'elasticsearch_error' failed protocol test [HTTP] at
INET[12.34.56.789:9200/_search
q=severity%3AERROR%20AND%20timestamp%3A>now-2d]
via TCP -- HTTP error: Regular expression doesn't match:
regexec() failed to match
in the log. Good. Content == "ERROR" is true. I can alert from this (even though I find the Connection failed message in the Monit browser dashboard a little irritating...should be something like Regex failure.)
The Problem
When I 'monit reload' and run the check with
url http://12.34.56.789:9200/_search?q=severity%3AERROR%20AND%20timestamp%3A>now-1d
I STILL get the regexec() failed to match error as above. Note, I return no "ERROR" string in the response body. Content == "ERROR" is false. Why does this check fail? Any light shed on this issue will be appreciated!
The Answer
Turns out this problem is about URL encoding for the Elasticsearch query.
I used url http://12.34.56.789:9200/_search?q=severity:ERROR×tamp:>now-36d in the check to get Monit to make a request that looks like 12.34.56.789:9200/_search?q=severity:ERROR×tamp:%3Enow-36d. Note change in encoding. This seems to work.
The actual URL used by monit can be seen by starting monit in debug mode using monit -vI.
Side Question
The 'content' object seems to respect '=' and '==' and '!='. '=' is referenced in the documentation, but a lot of third-party examples use '=='. What is the most correct use?
Side Question Answer
The helpful folks on the M/Monit team advise that "=" is an alias for "==" in the Monit configuration file.
I added the solution I found to my question above.
During these days I would investigate about the mod_security integration with Google Safe Browsing but I can't generate the local GSB database with the HTTP call. Firstly I generated an API Key related to my Google user and if I tried to call the URL to retrieve the lists all works.
http://safebrowsing.clients.google.com/safebrowsing/list?client=api&apikey=<myapikey>&appver=1.0&pver=2.2
After that I generated the new MAc KEY over ssl call as reported below:
https://sb-ssl.google.com/safebrowsing/newkey?client=api&apikey=<myapikey>&appver=1.0&pver=2
I received a 200 responde with a clientkey (length 24 chars) and a wrappedkey (length 100 chars). After that I tried to download the list with the call below, but I received a 400 response.
http://safebrowsing.clients.google.com/safebrowsing/downloads?client=api&apikey=<myapikey>&appver=1.0&pver=2.2&wrkey=<mywrappedkey>.
Did someone find the same behavior?
just in case if someone here might know about this
I am implementing this with c++ in Windows environment
this is the url I received in earlier file listing, I did cut the tokens away to keep it shorter ,
when I send request like this to network , I only get bad request error
https://www.googleapis.com/drive/v2/files?maxResults=20&pageToken=valid_page_token&access_token=valid_access_token>
but when I add underscore _ to change pageToken to page_Token, I get file listing response from google drive .
https://www.googleapis.com/drive/v2/files?maxResults=20&page_Token=valid_page_token&access_token=valid_access_token>
So I wonder , which way this should be now, or do I always need to manipulate the string to get following request to be working fine