Grok debugging - Match first only regex not working as intended - regex

So I have the following log message:
[localhost-startStop-1] SystemPropertiesConfigurer$ExportingPropertyOverrideConfigurer loadProperties > Loading properties file from class path resource [SystemConfiguration.overrides]
I'm trying to match the first thread ( [localhost-startStop-1] ) with the following pattern:
EVENT_THREAD (\[.+?\])
This works when I pass it into regex101.com but doesn't work when I represent it as
%{(\[.+?\]):EVENT_THREAD} on grokdebugger for reasons unknown to me...
Can someone help me understand this?
Thanks,

See Grok help:
Sometimes logstash doesn’t have a pattern you need. For this, you have a few options.
First, you can use the Oniguruma syntax for named capture which will let you match a piece of text and save it as a field:
(?<field_name>the pattern here)
So, use (?<EVENT_THREAD>\[.+?\]).
Alternately, you can create a custom patterns file.
Create a directory called patterns with a file in it called extra (the file name doesn’t matter, but name it meaningfully for yourself)
In that file, write the pattern you need as the pattern name, a space, then the regexp for that pattern.
# contents of ./patterns/postfix:
EVENT_THREAD (?:\[.+?\])
Then use the patterns_dir setting in this plugin to tell logstash where your custom patterns
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{EVENT_THREAD:evt_thread}" }
}
}

Related

grok pattern works in debugger but not in logstash

I have a grok pattern that works in grok debugger, but I can't get it to work in logstash. In particular, it's the regex that fails. Basically, I want to match a string in the "source" field
"source" field I want to filter:
/var/log/containers/my-container-190662183-f6wlm_logging_logstash-testing-10897cc1fe13d419f73b8c7929377a1f98366a648e85e71163c4fdaac278e256.log
Grok filter:
filter {
grok {
match => { "path" => "%{GREEDYDATA}/%{CONTAINERAPP:container}*" }
patterns_dir => ["/root/patterns"]
}
}
Pattern:
CONTAINERAPP .+(?=-\d)
Note that I'm not trying to parse log lines here, just assign a field to a matched pattern. Other regexes produce the "container" field, just not quite the string that I need.
My guess is that the Oniguruma syntax does not support this particular regex, but if that's the case, why does the grok debugger say it's OK?

logstash grok filter regular expression works in debug tool but failed in actual execution

I'm trying to extract a filed out of log line, i use http://grokdebug.herokuapp.com/ to debug my regular expression with:
(?<action>(?<=action=).*(?=\&))
with input text like this:
/event?id=123&action={"power":"on"}&package=1
i was able to get result like this:
{
"action": [
"{"power":"on"}"
]
}
but when i copy this config to my logstash config file:
input { stdin{} }
filter {
grok {
match => { "message" => "(?<action>(?<=action=).*(?=\&))"}
}
}
output { stdout {
codec => 'json'
}}
the output says matching failed:
{"message":" /event?id=123&action={\"power\":\"on\"}&package=1","#version":"1","#timestamp":"2016-01-05T10:30:04.714Z","host":"xxx","tags":["_grokparsefailure"]}
i'm using logstash-2.1.1 in cygwin.
any idea why this happen?
You might experience an issue caused by a greedy dot matching subpattern .*. Since you are only interested in a string of text after action= till next & or end of string you'd better use a negated character class [^&].
So, use
[?&]action=(?<action>[^&]*)
The [?&] matches either a ? or & and works as a boundary here.
It doesn't answer your regexp question, but...
Parse the query string to a separate field and use the kv{} filter on it.

Regex in Sublime Text tmLanguage file doesn't use multiline

I'm trying to create a custom syntax language file to highlight and help with creating new documents in Sublime Text 2. I have come pretty far, but I'm stuck at a specific problem regarding Regex searches in the tmLanguage file. I simply want to be able to match a regex over multiple lines within a YAML document that I then convert to PList to use in Sublime Text as a package. It won't work.
This is my regex:
/(foo[^.#]*bar)/
And this is how it looks inside the tmLanguage YAML document:
patterns:
- include: '#test'
repository:
test:
comment: Tester pattern
name: constant.numeric.xdoc
match: (foo[^.#]*bar)
If I build this YAML to a tmLanguage file and use it as a package in Sublime Text, I create a document that uses this custom syntax, try it out and the following happens:
This WILL match:
foo 12345 bar
This WILL NOT match:
foo
12345
bar
In a Regex tester, they should and will both match, but in my tmLanguage file it does not work.
I also already tried to add modifiers to my regex in the tmLanguage file, but the following either don't work or break the document entirely:
match: (/foo[^.#]*bar/gm)
match: /(/foo[^.#]*bar/)/gm
match: /foo[^.#]*bar/gm
match: foo[^.#]*bar
Note: My Regex rule works in the tester, this problem occurs in the tmLanguage file in Sublime Text 2 only.
Any help is greatly appreciated.
EDIT: The reason I use a match instead of begin/end clauses is because I want to use capture groups to give them different names. If someone has a solution with begin and end clauses where you can still name 'foo', '12345' and 'bar' differently, that's fine by me too.
I found that this is impossible to do. This is directly from the TextMate Manual, which is the text editor Sublime Text is based on.
12.2 Language Rules
<...>
Note that the regular expressions are matched against only a single
line of the document at a time. That means it is not possible to use a
pattern that matches multiple lines. The reason for this is technical:
being able to restart the parser at an arbitrary line and having to
re-parse only the minimal number of lines affected by an edit. In most
situations it is possible to use the begin/end model to overcome this
limitation.
My situation is one of the few in which a begin/end model cannot overcome the limitation. Unfortunate.
Long time since asked, but are you sure you can't use begin/end? I had similar problems with begin/end until I got a better grasp of the syntax/logic. Here's a rough example from a json tmLanguage file I'm doing (don't know the proper YAML syntax).
"repository": {
"foobar": {
"begin": "foo(?=[^.#]*)", // not sure about what's needed for your circumstance. the lookahead probably only covers the foo line
"end": "bar",
"beginCaptures": {
"0": {
"name": "foo"
}
},
"endCaptures": {
"0": {
"name": "bar"
}
},
"patterns": [
{"include": "#test-after-foobarmet"}
]
},
"test-after-foobarmet": {
"comment": "this can apply to many lines before next bar so you may need more testing",
"comment2": "you could continue to have captures here that go to another deeper level...",
"name": "constant.numeric.xdoc",
"match": "anyOtherRegexNeeded?"
}
}
I didn't follow your
"i need to number the different sections between the '#' and '.'
characters."
, but you should be able to have a test in test-after-foobarmet with more captures if needed for naming different groups between foo bar.
There's are good explanation of TextMate Grammar here. May still suffer from some errors but explains it in a way that was helpful for me when I didn't know anything about the topic.

Case Insensitive Regex expression for getting file

I have a scenario where i am taking files from a folder for data loading which is having naming convention as .Customer_..txt.But also i would like to make this expression case insensitive so if any file named CUSTOMER_1234 comes.It will also accept that and process accordingly
Try the below regex:
(?i)customer(?-i).*\.txt
in the wildcard section of the "get files" steps or any other regex step you are using. This will filter out files starting with either "customer" or "CUSTOMER".
Attached a sample code here.
Hope this helps :)
Sample Screenshot:
Modifying my previous answer based on the comment below:
If you are looking to match the pattern "customer_" irrespective of case sensitivity, first of all you can easily do it using a Javascript "match" function. You just need to pass the file names in upper case and match with the uppercase pattern. This will easily fetch you the result. Check the JS snip below:
var pattern="customer_"; //pattern is the word pattern you want to match
var match_files= upper(files).match(upper(pattern)); // files in the list of files you are getting from the directory
if(upper(match_files)==upper(pattern)){
//set one flag as 'match'
}
else{
// set the flag as 'not match'
}
But in case you need to use regex expression only. Then you can try the below regex:
.*(?i)(customer|CUSTOMER).*(?-i)\.txt
This would work for "_123_Customer_1vasd.txt" patterns too.
Hope this helps :)

what is the regexp pattern for multiline (logstash)

Currently I have:
multiline {
type => "tomcat"
pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)|(---)"
what => "previous"
}
and this is part of my log:
TP-xxxxxxxxxxxxxxxxxxxxxxxx: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
at xxxxxx
Caused by: xxxxxxxxx
at xxxxxx
Caused by: xxxxxxxxx
--- The error occurred in xxxxxxxxx.
--- The error occurred xxxxxxxxxx.
My pattern doesn't work here. Probably because i added the (---) at the end. What is the correct regexp to also add the --- lines?
Thanks
You'll want to account for the other characters on the line as well:
(^---.*$)
I have put your regex and text into these online regex buddies and tried the suggestion of Eric:
http://www.regextester.com/
http://www.regexr.com/
Sometimes these online buddies really help to clear the mind. This picture shows what is recognized:
If I were stuck on this, I wouldn't focus on the regex itself any further. Rather I'd check these points:
As there are different regex dialects, what dialect is used by logstash? What does it mean to my pattern?
Are there any logstash specific modifiers that are not set and need to be set?
As Ben mentioned, there are further filter tools. Would it help to use grok instead?
If one log event start with a timestamp or a specific word, for example, in your logs if all logs start with TP, then you can use it as filter pattern.
multiline {
pattern => "^TP"
what => "previous"
negate => true
}
With this filter you can multiline your logs easy, no need to use complex patterns.