match logstring with regex - regex

Hopefully I find help here, cause I really don't have a clue about regex. I'm trying to create a logfile viewer with the monaco editor I started from this sample but my logstrings can be multiline and I would like to use a different dateformat. So assuming I have a logstring like this:
[2017-02-03 22:07:56] [info] [Memory] After GC, total memory:737mb, used: 268mb, reclaimed: 293
[2017-02-03 22:10:15] [info] [Memory] After GC, total memory:705mb, used: 247mb, reclaimed: 141
[2017-02-03 22:10:25] [info] [Memory] After GC, total memory:705mb, used: 258mb, reclaimed: 21
[2017-02-03 22:14:34] [warn] [Evaluator] org.mozilla.javascript.EcmaError: Cannot convert null to an object.
Caused by error in Business Rule: 'GlobalHideGlobalUsersFromNonAdmins' at line 5
2:
3: var encodedQueryString = 'sys_domain!=global';
4:
==> 5: var imp = gs.getImpersonatingUserName().toString();
6: if(imp.length > 0) {
7: encodedQueryString = encodedQueryString + '^ORuser_name=' + imp;
8: }
[2017-02-03 22:14:34] [warn] [Evaluator] org.mozilla.javascript.EcmaError: Cannot convert null to an object.
Caused by error in Business Rule: 'GlobalHideGlobalUsersFromNonAdmins' at line 1
==> 1: (function executeRule(current, previous /*null when async*/) {
2:
3: var encodedQueryString = 'sys_domain!=global';
4:
This currently doesn't match my dateformat and it will only match the first line of a logmessage if there are carriage returns it doesn't match to the next logmessage. Any RegexGuru here that could help me out? :)
monaco.languages.setMonarchTokensProvider('log', {
tokenizer: {
root: [
[/\[error.*/, "custom-error"],
[/\[warn.*/, "custom-warn"],
[/\[info.*/, "custom-info"],
[/\[debug.*/, "custom-debug"],
[/\[[a-zA-Z 0-9:]+\]/, "custom-date"],
]
}
});
UPDATE:
So here is the solution I've come up with. Apparently I'm still not able to match multiple lines between to [DATE] Strings. So for now I will just match e.g. [error] as a workaround. Maybe somebody can push me in the right direction...
monaco.languages.setMonarchTokensProvider('log', {
tokenizer: {
root: [
[/\[error\]/, "custom-error"],
[/\[warn\]/, "custom-warn"],
[/\[info\]/, "custom-info"],
[/\[debug\]/, "custom-debug"],
[/^\[\d{4}[./-]\d{2}[./-]\d{2} \d{2}[./:]\d{2}[./:]\d{2}]/, "custom-date"],
]
}
});

I think you might be missing an escape at the end of the pattern in your update - should be "]" for the closing bracket.
Here's a tighter pattern, extracting on what the subgroups of digits all share:
\[(\d{2,4}[\:\-\s\]])+
Could you give an example of what you want to capture in cases of "multiple lines between two [DATE] strings"?
Hope this helps!

Related

Create a cakephp filter for fail2ban

i would like to create a filter in fail2ban for searching and blocking bad request like "Controller class * could not be found."
For this problem i was create a cakephp.conf file in the filter.d directory in fail2ban. The Content:
[Definition]
failregex = ^[0-9]{4}\-[0-9]{2}\-[0-9]{2}.*Error:.*\nStack Trace:\n(\-.*|\n)*\n.*\n.*\nClient IP: <HOST>\n$
ignoreregex =
My example error log looks like this:
...
2020-10-08 19:59:46 Error: [Cake\Http\Exception\MissingControllerException] Controller class Webfig could not be found. in /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php on line 158
Stack Trace:
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php:46
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/BaseApplication.php:249
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:77
- /home/myapplication/htdocs/vendor/cakephp/authentication/src/Middleware/AuthenticationMiddleware.php:122
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:77
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Middleware/CsrfProtectionMiddleware.php:146
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:58
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Routing/Middleware/RoutingMiddleware.php:172
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Routing/Middleware/AssetMiddleware.php:68
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Error/Middleware/ErrorHandlerMiddleware.php:121
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:58
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Server.php:90
- /home/myapplication/htdocs/webroot/index.php:40
Request URL: /webfig/
Referer URL: http://X.X.X.X/webfig/
Client IP: X.X.X.X
...
X.X.X.X are replaced
But i can't match any ip adresses. The fail2ban tester says:
root#test:~# fail2ban-regex /home/myapplication/htdocs/logs/error.log /etc/fail2ban/filter.d/cakephp.conf
Running tests
=============
Use failregex filter file : cakephp, basedir: /etc/fail2ban
Use log file : /home/myapplication/htdocs/logs/error.log
Use encoding : UTF-8
Results
=======
Failregex: 0 total
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [719] {^LN-BEG}ExYear(?P<_sep>[-/.])Month(?P=_sep)Day(?:T| ?)24hour:Minute:Second(?:[.,]Microseconds)?(?:\s*Zone offset)?
`-
Lines: 15447 lines, 0 ignored, 0 matched, 15447 missed
[processed in 10.02 sec]
Missed line(s): too many to print. Use --print-all-missed to print all 15447 lines
i can't see any problems. Can you help me? :)
Thanks
The issue is your log is poor suitable to parse - it is a multiline log-file (IP takes place in other line as the failure message).
Let alone the line with IP does not has any ID (common information with line of failure), it can be still worse if several messages are crossing (so Client IP from other message that is not a failure, coming after failure message).
If you can change the log-format better do that (so date, IP and failure sign are in the same line), e.g. if you use nginx, organize a conditional logging for access log from php-location in error case like this.
See Fail2ban :: wiki :: Best practice for more info.
If you cannot do that (well better would be to change it), you can use multi-line buffering and parsing using maxlines parameter and <SKIPLINES> regex.
Your filter would be something like that:
[Definition]
# we ignore stack trace, so don't need to hold buffer window too large,
# 5 would be enough, but to be sure (if some log-messages crossing):
maxlines = 10
ignoreregex = ^(?:Stack |- /)
failregex = ^\s+Error: \[[^\]]+\] Controller class \S+ could not be found\..*<SKIPLINES>^((?:Request|Referer) URL:.*<SKIPLINES>)*^Client IP: <HOST>
To test it directly use:
fail2ban-regex --maxlines=5 /path/to/log '^\s+Error: \[[^\]]+\] Controller class \S+ could not be found\..*<SKIPLINES>^((?:Request|Referer) URL:.*<SKIPLINES>)*^Client IP: <HOST>' '^(?:Stack |- /)'
But as already said, it is really ugly - better you find the way to log everything in a single line.

Grok Filter for Confluence Logs

I am trying to write a Grok expression to parse Confluence logs and I am partially successful.
My Current Grok pattern is :
%{TIMESTAMP_ISO8601:conflog_timestamp} %{LOGLEVEL:conflog_severity} \[%{APPNAME:conflog_ModuleName}\] \[%{DATA:conflog_classname}\] (?<conflog_message>(.|\r|\n)*)
APPNAME [a-zA-Z0-9\.\#\-\+_%\:]+
And I am able to parse the below log line :
Log line 1:
2020-06-14 10:44:01,575 INFO [Caesium-1-1] [directory.ldap.cache.AbstractCacheRefresher] synchroniseAllGroupAttributes finished group attribute sync with 0 failures in [ 2030ms ]
However I do have other log lines such as :
Log line 2:
2020-06-15 09:24:32,068 WARN [https-jsse-nio2-8443-exec-13] [atlassian.confluence.pages.DefaultAttachmentManager] getAttachmentData Could not find data for attachment:
-- referer: https://confluence.jira.com/index.action | url: /download/attachments/393217/global.logo | traceId: 2a0bfc77cad7c107 | userName: abcd
and Log Line 3 :
2020-06-12 01:19:03,034 WARN [https-jsse-nio2-8443-exec-6] [atlassian.seraph.auth.DefaultAuthenticator] login login : 'ABC' tried to login but they do not have USE permission or weren't found. Deleting remember me cookie.
-- referer: https://confluence.jira.com/login.action?os_destination=%2Findex.action&permissionViolation=true | url: /dologin.action | traceId: 8744d267e1e6fcc9
Here the params "userName" , "referer", "url" and "traceId" may or maynot be present in the Log line.
I can write concrete grok expressions for each of these. Instead can we handle all these in the same grok expression ?
In shorts - Match all log lines..
If log line has "referer" param store it in a variable. If not, proceed to match rest of the params.
If log line has "url" param store it, if not try to match rest of the params.
Repeat for 'traceId' and 'userName'
Thank you..

Add additional field in fluentd

I have a messesae as below
{"log":"kubernetes.var.log.dev-2019-12-24.log\u0009{\"msg\":\"[2019-12-24 10:34:58] app.ERROR: [ApiExceptionHandler:onKernelException]: default not match= exception is Symfony\\\\Component\\\\HttpKernel\\\\Exception\\\\NotFoundHttpException [] []\"}\n","stream":"stdout","time":"2019-12-24T10:34:58.295814385Z"}
Now I want to split it in 4 parts:
file_name: kubernetes.var.log.dev-2019-12-24.log
time: 2019-12-24 10:34:58
messeage_type: app.ERROR
msg: all remainding messeage
In fluentd configuration, I set a regex like:
<parse>
#type "regexp"
expression [(?<time>.+)] (?<kind>.*ERROR|.*INFO): (?<msg>.*)$
</parse>
but it not work as expect, it show a warning
2019-12-27 02:34:30 +0000 [warn]: [fluentd-containers.log] pattern not match: "{\"log\":\"kubernetes.var.log.dev-2019-12-27.log\\u0009{\\\"msg\\\":\\\"[2019-12-27 02:34:27] security.INFO: Populated the TokenStorage with an anonymous Token. [] []\\\"}\\n\",\"stream\":\"stdout\",\"time\":\"2019-12-27T02:34:30.699454425Z\"}"
2019-12-27 02:34:30 +0000 [warn]: [fluentd-containers.log] pattern not match: "{\"log\":\"kubernetes.var.log.dev-2019-12-27.log\\u0009{\\\"msg\\\":\\\"[2019-12-27 02:34:27] app.INFO: [UserCtrl:Login]: request_data= {\\\\\\\"email\\\\\\\":\\\\\\\"tui#gmail.com\\\\\\\",\\\\\\\"password\\\\\\\":\\\\\\\"asfasfd\\\\\\\"} [] []\\\"}\\n\",\"stream\":\"stdout\",\"time\":\"2019-12-27T02:34:30.699458964Z\"}"
I think there is something wrong in regex, please advise me how to fix it
You need to escape [ and ]:
expression \[(?<time>.+)\] (?<kind>.*ERROR|.*INFO): (?<msg>.*)$

How to parse Apache Catalina.log

I am trying to find a way to parse a Catalina.log and i am really struggling.
This a piece of the code:
May 12, 2017 2:14:38 PM org.apache.coyote.AbstractProtocol init
SEVERE: Failed to initialize end point associated with ProtocolHandler ["http-apr-10.1.31.104-443"]
java.lang.Exception: Connector attribute SSLCertificateFile must be defined when using SSL with APR
at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:490)
at org.apache.tomcat.util.net.AbstractEndpoint.init(AbstractEndpoint.java:649)
at org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:434)
at org.apache.catalina.connector.Connector.initInternal(Connector.java:978)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.core.StandardService.initInternal(StandardService.java:559)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:821)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.startup.Catalina.load(Catalina.java:638)
at org.apache.catalina.startup.Catalina.load(Catalina.java:663)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:253)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:427)
I wanna get
Date = May 12, 2017 2:14:38 PM
class = org.apache.coyote.AbstractProtocol init
Error level = SEVERE
Error Msg = Failed to initialize end point associated with ProtocolHandler ["http-apr-10.1.321.224-443"]
Error Msg Body = java.lang.Exception: Connector attribute SSLCertificateFile must be defined when using SSL with APR
at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:490)....
i don even know where to start :)
any ideas are very welcomed
I have prepared for you the following regex:
((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM))\s(.+)(\r)?\n(FATAL|SEVERE|ERROR|WARN(ING)?|INFO|CONFIG|INFO|DEBUG):\s(.+)(\r)?\n(.+)(\r)?\n(?=\s+at.+java:\d+\))
You can use the following back reference to capture your groups
DATE -> $1
CLASS -> $4
ERROR_LEVEL -> $6
ERROR_MSG -> $8
ERROR_BODY -> $10
The regex will only fetch strings that met the following conditions:
starts by a date in the format specified in your post
after the date, the first line is composed of the class name
the 2nd line is composed of the error level and the error msg
the 3rd line is your error msg body
followed by a java strack trace of n lines starting by \s at and ending by java:\d+)
The regex works in the following way:
((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM))
This part will fetch the date in the format of your post:
3 char month followed by space(s) then 1 or 2 digits, ',' then year in 4 digit
then space(s), then time(column char, followed by space(s) then AM or PM
\s(.+)(\r)?\n
this part of the regex will allow you to get the rest of your first line corresponding to your class
(FATAL|SEVERE|ERROR|WARN(ING)?|INFO|CONFIG|INFO|DEBUG):\s(.+)(\r)?\n(.+)(\r)?\n
This part will allow you to get the error level (in this exhaustive list) followed by column and the following 2 lines corresponding to your error msg/body
(?=\s+at.+java:\d+\))
This last part is a condition the enforce that your error is followed by a java stack trace.
You might need to adapt a bit some parts of the regex (like the number of lines of the error body, error message) or the stack trace conditions but I think this is a great starting point for your case.
CHEERS!!!

pig REPLACE gives error

Let's assume that my file is named 'data' and looks like this:
2343234 {23.8375,-2.339921102} {(343.34333,-2.0000022)} 5-23-2013-11-am
I need to convert the 2nd field to a pair of coordinate numbers. So I wrote the follwoing code and called it basic.pig:
A = LOAD 'data' AS (f1:int, f2:chararray, f3:chararray. f4:chararray);
B = foreach A generate STRSPLIT(f2,',').$0 as f5, STRSPLIT(f2,',').$1 as f6;
C = foreach B generate REPLACE(f5,'{',' ') as f7, REPLACE(f6,'}',' ') as f8;
and then used (float) to convert the string to a float. But, the command 'REPLACE' fails to work and I get the following error:
-bash-3.2$ pig -x local basic.pig
2013-06-24 16:38:45,030 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled
Mar 22 2013, 02:13:53 2013-06-24 16:38:45,031 [main] INFO org.apache.pig.Main - Logging error messages to: /home/--/p/--test/pig_1372117125028.log
2013-06-24 16:38:45,321 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/isl/pmahboubi/.pigbootup not found
2013-06-24 16:38:45,425 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2013-06-24 16:38:46,069 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
Details at logfile: /home/--/p/--test/pig_1372117125028.log
And this is the details of the pig_137..log
Pig Stack Trace
---------------
ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 7, column 0. Encountered: <EOF> after : ""
at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3266)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1134)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:104)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
================================================================================
I've got data like this:
2724 1919 2012-11-18T23:57:56.000Z {(33.80981975),(-118.105289)}
2703 6401 2012-11-18T23:57:56.000Z {(55.83525609),(-4.07733138)}
1200 4015 2012-11-18T23:57:56.000Z {(41.49609152),(13.8411998)}
7104 9227 2012-11-18T23:57:56.000Z {(-24.95351118),(-53.46538723)}
and I can do this:
A = LOAD 'my_tsv_data' USING PigStorage('\t') AS (id1:int, id2:int, date:chararray, loc:chararray);
B = FOREACH A GENERATE REPLACE(loc,'\\{|\\}|\\(|\\)','');
C = LIMIT B 10;
DUMP C;
This error
ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
came to me because I had used different types of quotation marks. I started with ' and ended with ยด or `, and it took quite a while to find what went wrong. So it had nothing to do with line 7 (my script was not so long, and I shortened data to four lines which naturally did not help), nothing to do with column 0, nothing to do with EOF of data, and hardly anything to do with " marks which I didn't use. So quite misleading error message.
I found the cause by using grunt - pig command shell.