custom log grok pattern - regex

I am trying to parse a custom log line using grok pattern but I'm not able to completely parse the line.
Custom log line:
site 'TRT' : alias 'TRT,FAK,FAS,ATI,ONE,DVZ,TWO' : serveur 'Test10011' RAS : TRT / TRT serveur 'Test10011' OK
Grok pattern:
%{DATA:site}\:%{DATA:alias}\:%{DATA:server}\:%{DATA:msg}
Result:
{
"site": [
[
"site 'TRT' "
]
],
"alias": [
[
" alias 'TRT,FAK,FAS,ATI,ONE,DVZ,TWO' "
]
],
"server": [
[
" serveur 'Test10011' RAS "
]
],
"msg": [
[
""
]
]
}
I am not able to parse the last few items in the 'msg', . Could you please help ,where I'm going wrong? msg should contain "TRT / TRT serveur 'Test10011' OK"

It seems you just need to use GREEDYDATA instead of DATA pattern:
%{DATA:site}\s*:\s*%{DATA:alias}\s*:\s*%{DATA:server}\s*:\s*%{GREEDYDATA:msg}
I also suggest adding \s* around : to get rid of leading/trailing whitespaces.

Related

Logstash log time and date parsing

Hello I have below log
12-Apr-2021 16:11:41.078 WARNING [https-jsse-nio2-8443-exec-3] org.apache.catalina.realm.LockOutRealm.filterLockedAccounts An attempt was made to authenticate the locked user [user1]
I am trying to build a pattern for these for logstash.
I have following
%{MY_DATE_PATTERN:timestamp}\s%{WORD:severity}\s\[%{DATA:thread}\]\s%{NOTSPACE:type_log}
which parses below
{
"timestamp": [
"12-Apr-2021 16:01:01.505"
],
"severity": [
"FINE"
],
"thread": [
"https-jsse-nio2-8443-exec-8"
],
"type_log": [
"org.apache.catalina.realm.CombinedRealm.authenticate"
]
}
My Date stamp is a custom pattern it works with grok debugger but not with the system that i am using so i would need help to get date and time with regex. would anyone help me please?
12-Apr-2021 16:11:41.078 GROK REGEX for this
Instead of %{MY_DATE_PATTERN:timestamp}, you can use
(?<timestamp>%{MONTHDAY}-%{MONTH}-%{YEAR} %{HOUR}:%{MINUTE}:%{SECOND})
Legend:
%{MONTHDAY} - (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
%{MONTH} - \b(?:Jan(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|รค)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b
%{YEAR} - (?>\d\d){1,2}`
%{HOUR} - (?:2[0123]|[01]?[0-9])
%{MINUTE} - (?:[0-5][0-9])
%{SECOND} - (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?).

grok parse optional field pattern doesn't work

I've got a log like this:
ERROR_MESSAGE:Invalid Credentials,THROTTLED_OUT_REASON:API_LIMIT_EXCEEDED
I'm trying to parse it with grok using grok debugger:
ERROR_MESSAGE:%{GREEDYDATA:errorMassage},THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason}
It works, but sometimes the log comes without THROTTLED_OUT_REASON field.
ERROR_MESSAGE:%{GREEDYDATA:errorMassage}
In that case I tried below code since THROTTLED_OUT_REASON is an optional field.
ERROR_MESSAGE:%{GREEDYDATA:errorMassage}(,THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason})?
So this should work for both cases. The given output for the log with optional field is:
{
"errorMassage": [
[
"Invalid Credentials,THROTTLED_OUT_REASON:API_LIMIT_EXCEEDED"
]
],
"throttledOutReason": [
[
null
]
]
}
But the expected output for the log with optional field:
{
"errorMassage": [
[
"Invalid Credentials"
]
],
"throttledOutReason": [
[
"API_LIMIT_EXCEEDED"
]
]
}
expected output for the log without optional field:
{
"errorMassage": [
[
"Invalid Credentials"
]
],
"throttledOutReason": [
[
null
]
]
}
Can anyone suggest a solution which gives correct output for both type of logs?
Since you use GREEDYDATA it "eats" as much as it can get in order to fill errormessage.
I do not know GROK enough to tell you what alternative defined patterns there are, but you should be able to use a custom pattern:
ERROR_MESSAGE:(?<errorMassage>.*?),THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason}
I got the answer using #Skeeve 's idea.
Here it is for anyone who would come up with a similar question:
I've used custom pattern in order to avoid excess eating of GREEDYDATA (for errorMessage field).
ERROR_MESSAGE:(?<errorMassage>([^,]*)?)(,THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason})?

Elasticsearch Token filter for removing tokens with a single word

I have what seems to be a very simple problem though I can't get it to work.
I have a token stream of words and I want to remove any token that is a single word e.g. [the quick, brown, fox] should be outputted as [the quick].
I've tried using pattern_capture token filters and used many types of patterns but it only generates new tokens, and doesn't remove old ones.
Here is the analyzer I've built (abbreviated for clarity)
"analyzer": {
"job_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"some_custom_char_filter"
],
"filter": [
other filters....,
"dash_drop",
"trim",
"unique",
"drop_single_word"
]
}
},
"char_filter": {...},
"filter": {
"dash_drop": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"([^-]+)\\s?(?!-.+)",
"- (.+)"
]
},
"drop_single_word": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [**nothing here works**]
}
}
}
I know I'm using a whitespace tokenzier that breaks sentences into words, but not shown here is the use of shingles to create new nGrams.
The purpose of the dash_drop filter is used to split sentences with - into tokens without the - so for example: my house - my rules would split into [my house, my rules].
Any help is greatly apperciated.

Regex match if exist or ignore

i'm with a throble in regex
Input example:
/aaaa/admin.php?file=xpto.js&version=abcd123
/aaaa/admin.php
Output 1 -
url => /aaaa/admin.php
var => file=xpto.js&version=abcd123
Output 2 -
url => /aaaa/admin.php
i tried %{NOTSPACE:url}(?:/?%{NOTSPACE:var}) and a others but not worked
You may use
%{URIPATH:path}(?:%{URIPARAM:param})?
The patterns are provided at https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns.
The %{URIPATH:path} will match the path, while (?:%{URIPARAM:param})? will match 1 or 0 occurrences (due to the optional non-capturing group (?:...)?) of the query string.
If you need to get rid of the ? in the param, you may also use
(?<path>(?:/[A-Za-z0-9$.+!*'(){},~:;=##%_-]*)+)(?:\?(?<param>[A-Za-z0-9$.+!*'|(){},~##%&/=:;_?\[\]-]*))?
The output for the /aaaa/admin.php?file=xpto.js&version=abcd123 input:
{
"path": [
[
"/aaaa/admin.php"
]
],
"param": [
[
"?file=xpto.js&version=abcd123"
]
]
}
The output for /aaaa/admin.php:
{
"path": [
[
"/aaaa/admin.php"
]
],
"param": [
[
null
]
]
}
Try this regex :
(%{NOTSPACE:url})(?:\?(%{NOTSPACE:var}))?
Demo : http://regexr.com/3f6sm
Is this what your looking for?
([^\s?]+)(?:\?(\S+))?
You can test it here.
Also, you could just split the url string on ?

ValidateThis EqualTo ClientFieldName JavaScript Validation

The title sucks, I'm sorry.
It takes a little bit to setup my problem, so I'm going to try and simplify it.
My form uses structure notation.
<input type= "text"
name= "bank[routing_number]"
id= "bank_routing_number"
value= "#rc.bank[ "routing_number" ]#"
autocomplete= "off"
maxlength= "9" />
<input type= "text"
name= "bank[routing_number_confirmation]"
id= "bank_routing_number_confirmation"
value= "#rc.bank[ "routing_number_confirmation" ]#"
autocomplete= "off"
maxlength= "9" />
The ValidateThis rules work fine on the server. I'm running on ColdFusion 9.0.1.
The problem I have is the JavaScript code generated by ValidateThis.
This is the JavaScript rule for EqualTo.
fm['bank[routing_number_confirmation]'].rules('add',{"equalto":":input[name='routing_number']","messages":{"equalto":"Bank ACH Routing Numbers (ABA) must match."}}); fm['bank[routing_number_confirmation]'] = $(":input[name='bank[routing_number_confirmation]']",$form_register_new);
The relative bit is this:
":input[name='routing_number']"
I'm expecting this code to be:
":input[name='bank[routing_number]']"
Here are the ValidateThis rules for routing_number and routing_number_confirmation.
{ "name": "routing_number" ,
"clientFieldName": "bank[routing_number]" ,
"rules": [
{ "type": "required" ,
"failureMessage": "Bank ACH Routing Number (ABA) is required."
} ,
{ "type": "rangelength" ,
"params": [
{ "name": "minlength" , "value": "9" } ,
{ "name": "maxlength" , "value": "9"} ] ,
"failureMessage": "Bank ACH Routing Number (ABA) is 9 digits."
}
]
} ,
{ "name": "routing_number_confirmation" ,
"clientFieldName": "bank[routing_number_confirmation]" ,
"rules": [
{ "type": "required" ,
"failureMessage": "Confirm Bank ACH Routing Number (ABA) is required."
} ,
{ "type": "equalTo" ,
"params": [
{ "name": "comparePropertyName" ,
"value": "routing_number" }
] ,
"failureMessage": "Bank ACH Routing Numbers (ABA) must match."
} ,
{ "type": "rangelength" ,
"params": [
{ "name": "minlength" , "value": "9" } ,
{ "name": "maxlength" , "value": "9"} ] ,
"failureMessage": "Bank ACH Routing Number (ABA) is 9 digits."
}
]
}
This is the load order for ValidateThis scripts.
// jQuery and jQuery Validate are loaded.
#getColdboxOCM().get( "ValidateThis" ).getInitializationScript(
JSIncludes= false )#
// Other ValidateThis scripts
#getColdboxOCM().get( "ValidateThis" ).getValidationScript(
objectType= "registration/bank-account" ,
formName= rc.form.name )#
The other JavaScript rules for routing_number and routing_number_confirmation work just fine. I've added some custom rules to get around the issue, but is there a way I can fix this using ValidateThis?
I asked the question on the ValidateThis Google Group and got a prompt response.
http://groups.google.com/group/validatethis/browse_thread/thread/2b18af00d3f5ce98
This was a bug within ValidateThis, but was corrected and is now part of the development branch in github.