grok pattern works in debugger but not in logstash

grok pattern works in debugger but not in logstash - regex

I have a grok pattern that works in grok debugger, but I can't get it to work in logstash. In particular, it's the regex that fails. Basically, I want to match a string in the "source" field
"source" field I want to filter:
/var/log/containers/my-container-190662183-f6wlm_logging_logstash-testing-10897cc1fe13d419f73b8c7929377a1f98366a648e85e71163c4fdaac278e256.log
Grok filter:
filter {
grok {
match => { "path" => "%{GREEDYDATA}/%{CONTAINERAPP:container}*" }
patterns_dir => ["/root/patterns"]
}
}
Pattern:
CONTAINERAPP .+(?=-\d)
Note that I'm not trying to parse log lines here, just assign a field to a matched pattern. Other regexes produce the "container" field, just not quite the string that I need.
My guess is that the Oniguruma syntax does not support this particular regex, but if that's the case, why does the grok debugger say it's OK?

Related

Rethinkdb filter on non-matching regular expression

I want to find documents that DO NOT match a specific regex pattern, but I do not see any support for that in re2--the regex library that rethinkdb uses (https://github.com/google/re2/wiki/Syntax). Also, I tried doing this with server-side javascript with r.js() but I can't seem to get the syntax right to extract the string I want to match against in a nested field where each key name is a string key. I get undefined object errors on row["key"] as well as row.key in the following:
```
filter(r.js('(function(row){
var re = /(?!(json|JSON))$/;
return re.test(row.student_record.the_test);
})'))
```

r.js is a tool of last resort. Additionally, as far as I understand how JavaScript and RE2 work together, there should be a sort of binding via a specific object since /.../ is standardized in JavaScript (if RethinkDB even provides any to the r.js context, and most likely /.../ can never do what RE2 can do).
What about something like:
r.db('test')
.table('test')
.filter(doc =>
doc.hasFields({'student_record': { 'the_test': true }})
.and(doc('student_record')('the_test').match('(json|JSON)$').not())
) // matches all documents that have $.student_record.the_test not matching the regexp
or
r.db('test')
.table('test')
.filter(doc =>
doc.hasFields({'student_record': { 'the_test': true }})
.and(doc('student_record')('the_test').match('(json|JSON)$'))
.not()
) // matches even documents that do not have $.student_record.the_test
?
doc.hasFields({'student_record': { 'the_test': true }}) - Verifies the JSON path to exist for a given document.
doc('student_record')('the_test').match('(json|JSON)$') - Checks if $.student_record.test matches the regexp. By the way, are you probably looking for '\\.(?:json|JSON)$'? (See the escaped dot \. before the group).
not() - Inverts the expression result.

GROK (regular expressions), field with backslash, space and a long

I'm using Logstash to get some text out of a string and create a field.
The string of the message is:
"\"07/12/2016 16:21:24.652\",\"13.99\",\"1467351040\""
I can't figure it out how to get three results, being the first:
07/12/2016 16:21:24.652
The second
13.99
The third
1467351040

match => {
"message"=> [
"\\"%{DATESTAMP:a}\\",\\"%{NUMBER:b}\\",\\"%{NUMBER:c}\\""
]
}
To help the next time you have to craft a grok pattern:
GrokConstructor, to test your pattern
The main patterns
Grok filter documentation

That's the correct line indeed.
I had to remove one backslash for my own config. Thanks very much. Saves me a lot of time and stuff.
grok{ match => { "message"=> [ "\"%{DATESTAMP:a}\",\"%{NUMBER:b}\",\"%{NUMBER:c}\"" ]} }

Grok debugging - Match first only regex not working as intended

So I have the following log message:
[localhost-startStop-1] SystemPropertiesConfigurer$ExportingPropertyOverrideConfigurer loadProperties > Loading properties file from class path resource [SystemConfiguration.overrides]
I'm trying to match the first thread ( [localhost-startStop-1] ) with the following pattern:
EVENT_THREAD (\[.+?\])
This works when I pass it into regex101.com but doesn't work when I represent it as
%{(\[.+?\]):EVENT_THREAD} on grokdebugger for reasons unknown to me...
Can someone help me understand this?
Thanks,

See Grok help:
Sometimes logstash doesn’t have a pattern you need. For this, you have a few options.
First, you can use the Oniguruma syntax for named capture which will let you match a piece of text and save it as a field:
(?<field_name>the pattern here)
So, use (?<EVENT_THREAD>\[.+?\]).
Alternately, you can create a custom patterns file.
Create a directory called patterns with a file in it called extra (the file name doesn’t matter, but name it meaningfully for yourself)
In that file, write the pattern you need as the pattern name, a space, then the regexp for that pattern.
# contents of ./patterns/postfix:
EVENT_THREAD (?:\[.+?\])
Then use the patterns_dir setting in this plugin to tell logstash where your custom patterns
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{EVENT_THREAD:evt_thread}" }
}
}

logstash grok filter regular expression works in debug tool but failed in actual execution

I'm trying to extract a filed out of log line, i use http://grokdebug.herokuapp.com/ to debug my regular expression with:
(?<action>(?<=action=).*(?=\&))
with input text like this:
/event?id=123&action={"power":"on"}&package=1
i was able to get result like this:
{
"action": [
"{"power":"on"}"
]
}
but when i copy this config to my logstash config file:
input { stdin{} }
filter {
grok {
match => { "message" => "(?<action>(?<=action=).*(?=\&))"}
}
}
output { stdout {
codec => 'json'
}}
the output says matching failed:
{"message":" /event?id=123&action={\"power\":\"on\"}&package=1","#version":"1","#timestamp":"2016-01-05T10:30:04.714Z","host":"xxx","tags":["_grokparsefailure"]}
i'm using logstash-2.1.1 in cygwin.
any idea why this happen?

You might experience an issue caused by a greedy dot matching subpattern .*. Since you are only interested in a string of text after action= till next & or end of string you'd better use a negated character class [^&].
So, use
[?&]action=(?<action>[^&]*)
The [?&] matches either a ? or & and works as a boundary here.

It doesn't answer your regexp question, but...
Parse the query string to a separate field and use the kv{} filter on it.

what is the regexp pattern for multiline (logstash)

Currently I have:
multiline {
type => "tomcat"
pattern => "(^.+Exception: .+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)|(---)"
what => "previous"
}
and this is part of my log:
TP-xxxxxxxxxxxxxxxxxxxxxxxx: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
at xxxxxx
Caused by: xxxxxxxxx
at xxxxxx
Caused by: xxxxxxxxx
--- The error occurred in xxxxxxxxx.
--- The error occurred xxxxxxxxxx.
My pattern doesn't work here. Probably because i added the (---) at the end. What is the correct regexp to also add the --- lines?
Thanks

You'll want to account for the other characters on the line as well:
(^---.*$)

I have put your regex and text into these online regex buddies and tried the suggestion of Eric:
http://www.regextester.com/
http://www.regexr.com/
Sometimes these online buddies really help to clear the mind. This picture shows what is recognized:
If I were stuck on this, I wouldn't focus on the regex itself any further. Rather I'd check these points:
As there are different regex dialects, what dialect is used by logstash? What does it mean to my pattern?
Are there any logstash specific modifiers that are not set and need to be set?
As Ben mentioned, there are further filter tools. Would it help to use grok instead?

If one log event start with a timestamp or a specific word, for example, in your logs if all logs start with TP, then you can use it as filter pattern.
multiline {
pattern => "^TP"
what => "previous"
negate => true
}
With this filter you can multiline your logs easy, no need to use complex patterns.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

grok pattern works in debugger but not in logstash - regex

Related

Rethinkdb filter on non-matching regular expression

GROK (regular expressions), field with backslash, space and a long

Grok debugging - Match first only regex not working as intended

logstash grok filter regular expression works in debug tool but failed in actual execution

what is the regexp pattern for multiline (logstash)

Categories

Resources