Regular expression for nxlog - regex

I have a set of logs which do not have a fixed line number.
I am trying to get these logs to Elasticsearch through nxlog.
------------------------------------------------------
Timestamp: 2020-03-03 00:05:04Z|User -DEFAULT|Cart -DEFAULT|Source -mscorlib|Message -StartIndex cannot be less than zero
------------------------------------------------------
------------------------------------------------------
Timestamp: 2020-03-03 03:05:53Z|User -Simulate Pricing | Initiate | |Cart -|Level -INFO|Severity -Information|Class -SalesOrderClient.vb|Function - Simulate(testVal As Object) Method :| Cart_ID: 8092662
Timestamp: 2020-03-03 00:00:30Z|User -|Cart -|Source -mscorlib|Message -Thread was being aborted.|StackTrace - at System.Threading.Thread.AbortInternal()
at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)
at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout)
at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.SendRequest(Message message, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
------------------------------------------------------
------------------------------------------------------
Timestamp: 2020-03-03 00:18:57Z|User -DEFAULT|Cart -DEFAULT|Source -mscorlib|Message -StartIndex cannot be less than zero.
------------------------------------------------------
This has a few logs with multiline (no fixedlines) and a few with a single line.
When im trying to fetch this data with nxlog, my third log is getting split into 4 different rows for each new line present.
How can I avoid this issue. What Regular expression can I use here? I want one full log to be in a single row, irrespective of how many line there are. The lines are present in the logs.
Thanks in Advance.

Use a regexp for HeaderLine with the xm_multiline module that matches your timestamp.
See the examples at https://nxlog.co/documentation/nxlog-user-guide/xm_multiline.html

Related

How to disable JSON format and send only the log message to Sumologic with Fluentbit?

We are using Fluentbit as as Sidecar container in our ECS fargate Cluster which is running a dotnet application, initially we faced the issue of fluentbit sending the logs in multiline and we solved it using Fluentbit Multilne feature. Now the logs are being sent to Sumologic in Multiple however it is being sent as Json format whereas we just want fluentbit send only the raw log
Logs are currently
{
date:1675120653.269619,
container_id:"xvgbertytyuuyuyu",
container_name:"XXXXXXXXXX",
source:"stdout",
log:"2023-01-30 23:17:33.269Z DEBUG [.NET ThreadPool Worker] Connection.ManagedDbConnection - ComponentInstanceEntityAsync - Executing stored proc: dbo.prcGetComponentInstance"
}
We want only the line
2023-01-30 23:17:33.269Z DEBUG [.NET ThreadPool Worker] Connection.ManagedDbConnection - ComponentInstanceEntityAsync - Executing stored proc: dbo.prcGetComponentInstance
You need to modify Fluent Bit configuration to have the following filters and output configuration:
fluent.conf:
## prepare headers for Sumo Logic
[FILTER]
Name record_modifier
Match *
Record headers.content-type text/plain
## Set headers as headers attribute
[FILTER]
Name nest
Match *
Operation nest
Wildcard headers.*
Nest_under headers
Remove_prefix headers.
[OUTPUT]
Name http
...
# use log key as body
body_key $log
# use headers key as headers
headers_key $headers
That way, you are going to craft HTTP request manually. This is going to send request per log, which is not necessary a good idea. In order to mitigate that you can add the following parser and use it (flush_timeout may need an adjustment):
parsers.conf
# merge everything as one big log
[MULTILINE_PARSER]
name multiline-all
type regex
flush_timeout 500
#
# Regex rules for multiline parsing
# ---------------------------------
#
# configuration hints:
#
# - first state always has the name: start_state
# - every field in the rule must be inside double quotes
#
# rules | state name | regex pattern | next state
# ------|---------------|--------------------------------------------
rule "start_state" ".*" "cont"
rule "cont" ".*" "cont"
fluent.conf:
[INPUT]
name tail
...
multiline.parser multiline-all

Create a cakephp filter for fail2ban

i would like to create a filter in fail2ban for searching and blocking bad request like "Controller class * could not be found."
For this problem i was create a cakephp.conf file in the filter.d directory in fail2ban. The Content:
[Definition]
failregex = ^[0-9]{4}\-[0-9]{2}\-[0-9]{2}.*Error:.*\nStack Trace:\n(\-.*|\n)*\n.*\n.*\nClient IP: <HOST>\n$
ignoreregex =
My example error log looks like this:
...
2020-10-08 19:59:46 Error: [Cake\Http\Exception\MissingControllerException] Controller class Webfig could not be found. in /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php on line 158
Stack Trace:
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php:46
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/BaseApplication.php:249
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:77
- /home/myapplication/htdocs/vendor/cakephp/authentication/src/Middleware/AuthenticationMiddleware.php:122
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:77
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Middleware/CsrfProtectionMiddleware.php:146
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:58
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Routing/Middleware/RoutingMiddleware.php:172
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Routing/Middleware/AssetMiddleware.php:68
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Error/Middleware/ErrorHandlerMiddleware.php:121
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:58
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Server.php:90
- /home/myapplication/htdocs/webroot/index.php:40
Request URL: /webfig/
Referer URL: http://X.X.X.X/webfig/
Client IP: X.X.X.X
...
X.X.X.X are replaced
But i can't match any ip adresses. The fail2ban tester says:
root#test:~# fail2ban-regex /home/myapplication/htdocs/logs/error.log /etc/fail2ban/filter.d/cakephp.conf
Running tests
=============
Use failregex filter file : cakephp, basedir: /etc/fail2ban
Use log file : /home/myapplication/htdocs/logs/error.log
Use encoding : UTF-8
Results
=======
Failregex: 0 total
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [719] {^LN-BEG}ExYear(?P<_sep>[-/.])Month(?P=_sep)Day(?:T| ?)24hour:Minute:Second(?:[.,]Microseconds)?(?:\s*Zone offset)?
`-
Lines: 15447 lines, 0 ignored, 0 matched, 15447 missed
[processed in 10.02 sec]
Missed line(s): too many to print. Use --print-all-missed to print all 15447 lines
i can't see any problems. Can you help me? :)
Thanks
The issue is your log is poor suitable to parse - it is a multiline log-file (IP takes place in other line as the failure message).
Let alone the line with IP does not has any ID (common information with line of failure), it can be still worse if several messages are crossing (so Client IP from other message that is not a failure, coming after failure message).
If you can change the log-format better do that (so date, IP and failure sign are in the same line), e.g. if you use nginx, organize a conditional logging for access log from php-location in error case like this.
See Fail2ban :: wiki :: Best practice for more info.
If you cannot do that (well better would be to change it), you can use multi-line buffering and parsing using maxlines parameter and <SKIPLINES> regex.
Your filter would be something like that:
[Definition]
# we ignore stack trace, so don't need to hold buffer window too large,
# 5 would be enough, but to be sure (if some log-messages crossing):
maxlines = 10
ignoreregex = ^(?:Stack |- /)
failregex = ^\s+Error: \[[^\]]+\] Controller class \S+ could not be found\..*<SKIPLINES>^((?:Request|Referer) URL:.*<SKIPLINES>)*^Client IP: <HOST>
To test it directly use:
fail2ban-regex --maxlines=5 /path/to/log '^\s+Error: \[[^\]]+\] Controller class \S+ could not be found\..*<SKIPLINES>^((?:Request|Referer) URL:.*<SKIPLINES>)*^Client IP: <HOST>' '^(?:Stack |- /)'
But as already said, it is really ugly - better you find the way to log everything in a single line.

fluentd regexp to extract events from a log file

I'm new to fluentd.
I have a log that I want to push to AWS with fluentd but I can't figure out what the regexp should be.
All the log lines, except the multilines, start with a UUID.
Here's a sample log:
6b0815f2-8ff1-4181-a4e6-058148288281 2020-11-03 13:00:05.976366 [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
And, I'm trying to get UUID, DateTime, and Message.
With this regex:
/^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)/gm
I'm getting the last word CS_DESTROY.
I tried fluentular and still got:
text:
f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914 2020-11-03 14:32:34.975779 [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
regexp:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)$
and got:
time 2020/11/03 14:32:34 +0000
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
message https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
It's missing what's between the datetime and "https".
Try:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>[^\[]*) (?<message>\[.*)$
Live at rubular: https://rubular.com/r/JQQXs5VTkr2IxM
Here's the output for both logs:
Match 1
UUID 6b0815f2-8ff1-4181-a4e6-058148288281
time 2020-11-03 13:00:05.976366
message [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
Match 2
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
time 2020-11-03 14:32:34.975779
message [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg

fail2ban scan for 403 in nginx access logs

I have setup some specific rules on nginx, blocking some urls and some extensions (aspx, sh, jsp, etc..).
I have also enable a custom access log file only for 403|429|410 errors, so that in only 1 place i can have all my access denied log.
My goal is to have fail2ban read this log and for every GET/POST that ends in a 403 error, IP should be banned.
1) nginx.conf will be logging the custom error log file like this:
log_format limit '$time_local - $remote_addr "$request" $status';
and this is a log entry:
03/Jan/2017:15:53:01 +0100 - 1.2.3.4 "GET /aaa.jsp HTTP/1.1" 403
2) i have a fail2ban filter like this (taken from here)
^<HOST> .* "(GET|POST) [^"]+" 403
3) i have tried with fail2ban-regex
fail2ban-regex /var/log/nginx/access-live-limitbot-website.log /etc/fail2ban/filter.d/nginx-403.conf
and this is the output
Results
=======
Failregex: 0 total
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [1] Day/MONTH/Year:Hour:Minute:Second
`-
Lines: 2 lines, 0 ignored, 0 matched, 2 missed
|- Missed line(s):
| 217.19.158.242 "POST /wp-login.php HTTP/1.1" 403
| 03/Jan/2017:15:53:01 +0100 - 217.19.158.242 "GET /aaa.jsp HTTP/1.1" 403
`-
and i will never get the entry matching the error code.
Will someone please help me with the regex based on my custom log?
thank you
Fail2ban is picky about the date format. Also, for ease of matching, I suggest reordering the items in the log.
For date format, see documentation here:
https://www.fail2ban.org/wiki/index.php/MANUAL_0_8
In order for a log line to match your failregex, it actually has to match in two parts: the beginning of the line has to match a timestamp pattern or regex, and the remainder of the line has to match your failregex. If the failregex is anchored with a leading ^, then the anchor refers to the start of the remainder of the line, after the timestamp and intervening whitespace.
The pattern or regex to match the time stamp is currently not documented, and not available for users to read or set. See Debian bug #491253. This is a problem if your log has a timestamp format that fail2ban doesn't expect, since it will then fail to match any lines. Because of this, you should test any new failregex against a sample log line, as in the examples below, to be sure that it will match. If fail2ban doesn't recognize your log timestamp, then you have two options: either reconfigure your daemon to log with a timestamp in a more common format, such as in the example log line above; or file a bug report asking to have your timestamp format included.
For the reorder, something like datetime - status - host (- other stuff), would help create a simple pattern such as 403.
Therefore your log should look like:
03-01-2017 15:53:01 403 1.2.3.4 "GET /aaa.jsp HTTP/1.1"
and your pattern can be
403 <HOST>
You can run this from the command line to validate as:
fail2ban-regex '03-01-2017 15:53:01 403 1.2.3.4 "GET /aaa.jsp HTTP/1.1"' '403 <HOST>'
Which produces the output:
Running tests
=============
Use regex line : 403 <HOST>
Use single line: 03-01-2017 15:53:01 403 1.2.3.4 "GET /aaa.jsp HTTP...
Matched time template Day-Month-Year Hour:Minute:Second
Got time using template Day-Month-Year Hour:Minute:Second
Results
=======
Failregex: 1 total
|- #) [# of hits] regular expression
| 1) [1] 403 <HOST>
`-
Ignoreregex: 0 total
Summary
=======
Addresses found:
[1]
1.2.3.4 (Tue Jan 03 15:53:01 2017)
Date template hits:
2 hit(s): Day-Month-Year Hour:Minute:Second
Success, the total number of match is 1

AWS logs agent setup

We have recently setup AWS logs agent on one of our test servers. Our log files usually contain multi-line events. e.g one of our log event is:
[10-Jun-2016 07:30:16 UTC] SQS Post Response: Array
(
[Status] => 200
[ResponseBody] => <?xml version="1.0"?><SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2009-02-01/"><SendMessageResult><MessageId>053c7sdf5-1e23-wa9d-99d8-2a0cf9eewe7a</MessageId><MD5OfMessageBody>8e542d2c2a1325a85eeb9sdfwersd58f</MD5OfMessageBody></SendMessageResult><ResponseMetadata><RequestId>4esdfr30-c39b-526b-bds2-14e4gju18af</RequestId></ResponseMetadata></SendMessageResponse>
)
The log agent reference documentation says to use 'multi_line_start_pattern' option for such logs. Our AWS Log agent config is as follows:
[httpd_info.log]
file = /var/log/httpd/info.log*
log_stream_name = info.log
initial_position = start_of_file
log_group_name = test.server.name
multi_line_start_pattern = '(\[)+\d{2}-[a-zA-Z]{3}+-\d{4}'
However, the logs agent reporting breaks on aforementioned and similar events. The way it is being reported to CloudWatch Logs is as follows:
Event 1:
[10-Jun-2016 11:21:26 UTC] SQS Post Response: Array
Event 2:
( [Status] => 200 [ResponseBody] => <?xml version="1.0"?><SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2009-02-01/"><SendMessageResult><MessageId>053c7sdf5-1e23-wa9d-99d8-2a0cf9eewe7a</MessageId><MD5OfMessageBody>8e542d2c2a1325a85eeb9sdfwersd58f</MD5OfMessageBody></SendMessageResult><ResponseMetadata><RequestId>4esdfr30-c39b-526b-bds2-14e4gju18af</RequestId></ResponseMetadata></SendMessageResponse>
Event 3:
)
Despite of the fact that its only a single event. Any clue whats going on here?
I think all you need to add is the following to your awslogs.conf
datetime_format = %d-%b-%Y %H:%M:%S UTC
time_zone = UTC
multi_line_start_pattern = {datetime_format}
http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html
multi_line_start_pattern
Specifies the pattern for identifying the start of a log message. A log message is made of a line that matches the pattern and any following lines that don't match the pattern. The valid values are regular expression or {datetime_format}. When using {datetime_format}, the datetime_format option should be specified. The default value is ‘^[^\s]' so any line that begins with non-whitespace character closes the previous log message and starts a new log message.
If that datetime format didn't work, you would need to update your regex to actually match your specific datetime. I don't think the one you have listed above actually works for your given format.
You could try this for instance:
[\d{2}-[\w]{3}-\d{4}\s{1}\d{2}:\d{2}:\d{2}\s{1}\w+]
does match
[10-Jun-2016 11:21:26 UTC]
See here: http://www.regexpal.com/?fam=96811
Once completed, issue a restart of the service and check to see if its parsing correctly.
$ sudo service awslogs restart