_grokparsefailure on matching grok debugger pattern - regex

I'm having some problems getting logstash to recognize my pattern which seems to match on the Grok Debugger (https://grokdebug.herokuapp.com/).
It's a similar problem to the one found on this other StackOverflow question (logstash _grokparsefailure issues), but unfortunately the solution there does not seem to work.
These are the logs I'm trying to match:
Mon Jan 25 11:12:12.890 [conn44141] authenticate db: admin { authenticate: 1, user: "person", nonce: "f00000000f", key: "a0000000000e" }
"2015-01-25 14:46:31" id=Admin id=Admin,ou=user,dc=gooogle-wa,dc=com a000000a 100.00.00.01 INFO dc=gooooogle-wa,dc=com "cn=user,ou=AME Users,dc=goooogle,dc=com" BARF-4 aO.access "Not Available" 100.00.00.01
The pattern I'm using to parse these is, respectively:
if [type] == "openam" {
if [file] =~ "access" {
grok{
match => [ 'message', '\"%{TIMESTAMP_ISO8601:timestamp}\"(\s*)(%{QUOTEDSTRING:data_}|%{DATA:data_})(\s*)(%{QUOTEDSTRING:LoginID}|%{DATA:LoginID})(\s*)%{DATA:ContextID}(\s*)(\"%{DATA:IP}\"|%{IP:IP})(\s*)?%{LOGLEVEL:loglevel}(\s*)%{DATA:Domain}(\s*)\"%{DATA:LoggedBy}\"(\s*)(?<messageID>[a-zA-Z0-9._-]+)(\s*)(%{DATA:ModuleName})(\s*)\"%{DATA:NameID}\"(\s*)(%{IP:hostname}|%{GREEDYDATA:hostname}) '
]
add_tag => "openam_access"
}
}
else if [file] =~ "error" {
grok{
match => ['message', '\"%{TIMESTAMP_ISO8601:timestamp}\"(\s*)(%{QUOTEDSTRING:data_}|%{DATA:data_}) (\s*)(%{QUOTEDSTRING:LoginID}|%{DATA:LoginID}) (\s*)%{DATA:ContextID}(\s*)(\"%{DATA:IP}\"|%{IP:IP})(\s*)?%{LOGLEVEL:loglevel}(\s*)%{DATA:Domain}(\s*)\"%{DATA:LoggedBy}\"(\s*)(?<messageID>[a-zA-Z0-9._-]+)(\s*)(%{DATA:ModuleName})(\s*)\"%{DATA:NameID}\"(\s*)(%{IP:hostname}|%{GREEDYDATA:hostname})',
]
add_tag => "openam_error"
}
}
}
if [type] == "mongo" {
grok {
match => [
"message", "(?m)%{GREEDYDATA} \[conn%{NUMBER:mongoConnection}\] %{WORD:mongoCommand} %{WORD:mongoDatabase}.%{NOTSPACE:mongoCollection} %{WORD}: \{ %{GREEDYDATA:mongoStatement} \} %{GREEDYDATA} %{NUMBER:mongoElapsedTime:int}ms",
"message", "%{DATA:DayOfWeek} %{SYSLOGTIMESTAMP:timestamp} %{DATA:Thread} %{GREEDYDATA:msg} %{IP:ip}:%{NUMBER:port} ?#?%{NUMBER:ID}? %{GREEDYDATA:connections} ",
'message', '%{DATA:DayOfWeek} %{SYSLOGTIMESTAMP:timestamp} %{DATA:Thread} %{DATA:msg}: %{WORD:userType} \{ authenticate: %{NUMBER:authenticate}, user: %{QS:user}, nonce: %{QS:nonce}, key: %{QS:key} \}'
]
add_tag => "mongodb"
}
}
As you can check, the patterns will work fine on the debugger but for some reason on my kibana dashboard they show up with the _grokparsefailure tag. I suspect the it has either to do with me escaping characters or the use of {QS}/{QOUTEDSTRING}.
Thanks

Your patterns appear to be fine, but with
filter {
grok {
...
}
grok {
...
}
}
you're applying both patterns to all input strings, and an input string that matches the first pattern will never match the second and vice versa. Hence you always get the _grokparsefailure tag.
Do this instead:
filter {
grok {
match => ['message', 'pattern1',
'message', 'pattern2']
}
}
If you really have to use different grok filters, condition their inclusion with a sneak peak of the message:
filter {
if [message] =~ /^(Mon|Tue|Wed|Thu|Fri|Sat|Sun) / {
grok {
match => ['message', 'pattern1']
}
}
...
}
This will obviously be slower and means you'll have more regular expressions to maintain.

I've figured it out. It seems there was another error which was preventing my logstash conf from updating. Highly recommend the ./logstash --configtest for anyone in a similar spot.

Related

Parsing out PowerShell CommandLine Data from EventLog

Sending Windows Event Logs with WinLogBeat to Logstash - primarily focused on PowerShell events within the logs.
Example:
<'Data'>NewCommandState=Stopped SequenceNumber=1463 HostName=ConsoleHost HostVersion=5.1.14409.1005 HostId=b99970c6-0f5f-4c76-9fb0-d5f7a8427a2a HostApplication=C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe EngineVersion=5.1.14409.1005 RunspaceId=bd4224a9-ce42-43e3-b8bb-53a302c342c9 PipelineId=167 CommandName=Import-Module CommandType=Cmdlet ScriptName= CommandPath= CommandLine=Import-Module -Verbose.\nishang.psm1<'/Data'>
How can I extract the CommandLine= field using grok to get the following?
Import-Module -Verbose.\nishang.psm1
Grok is a wrapper around regular expressions. If you can parse data with a regex, you can implement it with grok.
Even though your scope is specific to the CommandLine field, parsing each of the fields in most key=value logs is pretty straightforward, and a single regex can be used for every field with some grok filters. If you intend to store, query, and visualize logs - the more data, the better.
Regular Expression:
First we start with the following:
(.*?(?=\s\w+=|\<|$))
.*? - Matches any character except for line terminators
(?=\s\w+=|\<|$)) - Positive lookahead that asserts the pattern must match the following
\s\w+= - Any word characters with a space prior to it, followed by a =
|\<|$ - Alternatively may match < or the end of the line so as not to include them in the matching group.
This means that each field can be parsed similar to the following:
CommandLine=(.*?(?=\s\w+=|\<|$))
Grok:
Now this means we can begin creating grok filters. The power of it is that reusable components may have semantic language applied to them.
/etc/logstash/patterns/powershell.grok:
# Patterns
PS_KEYVALUE (.*?(?=\s\w+=|\<|$))
# Fields
PS_NEWCOMMANDSTATE NewCommandState=%{PS_KEYVALUE:NewCommandState}
PS_SEQUENCENUMBER SequenceNumber=%{PS_KEYVALUE:SequenceNumber}
PS_HOSTNAME HostName=%{PS_KEYVALUE:HostName}
PS_HOSTVERSION HostVersion=%{PS_KEYVALUE:HostVersion}
PS_HOSTID HostId=%{PS_KEYVALUE:HostId}
PS_HOSTAPPLICATION HostApplication=%{PS_KEYVALUE:HostApplication}
PS_ENGINEVERSION EngineVersion=%{PS_KEYVALUE:EngineVersion}
PS_RUNSPACEID RunspaceId=%{PS_KEYVALUE:RunspaceId}
PS_PIPELINEID PipelineId=%{PS_KEYVALUE:PipelineId}
PS_COMMANDNAME CommandName=%{PS_KEYVALUE:CommandName}
PS_COMMANDTYPE CommandType=%{PS_KEYVALUE:CommandType}
PS_SCRIPTNAME ScriptName=%{PS_KEYVALUE:ScriptName}
PS_COMMANDPATH CommandPath=%{PS_KEYVALUE:CommandPath}
PS_COMMANDLINE CommandLine=%{PS_KEYVALUE:CommandLine}
Where %{PATTERN:label} will utilize the PS_KEYVALUE regular expression, and the matching group will be labeled with that value in JSON. This is where you can get flexible in naming fields you know.
/etc/logstash/conf.d/powershell.conf:
input {
...
}
filter {
grok {
patterns_dir => "/etc/logstash/patterns"
break_on_match => false
match => [
"message", "%{PS_NEWCOMMANDSTATE}",
"message", "%{PS_SEQUENCENUMBER}",
"message", "%{PS_HOSTNAME}",
"message", "%{PS_HOSTVERSION}",
"message", "%{PS_HOSTID}",
"message", "%{PS_HOSTAPPLICATION}",
"message", "%{PS_ENGINEVERSION}",
"message", "%{PS_RUNSPACEID}",
"message", "%{PS_PIPELINEID}",
"message", "%{PS_COMMANDNAME}",
"message", "%{PS_COMMANDTYPE}",
"message", "%{PS_SCRIPTNAME}",
"message", "%{PS_COMMANDPATH}",
"message", "%{PS_COMMANDLINE}"
]
}
}
output {
stdout { codec => "rubydebug" }
}
Result:
{
"HostApplication" => "C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\powershell.exe",
"EngineVersion" => "5.1.14409.1005",
"RunspaceId" => "bd4224a9-ce42-43e3-b8bb-53a302c342c9",
"message" => "<'Data'>NewCommandState=Stopped SequenceNumber=1463 HostName=ConsoleHost HostVersion=5.1.14409.1005 HostId=b99970c6-0f5f-4c76-9fb0-d5f7a8427a2a HostApplication=C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\powershell.exe EngineVersion=5.1.14409.1005 RunspaceId=bd4224a9-ce42-43e3-b8bb-53a302c342c9 PipelineId=167 CommandName=Import-Module CommandType=Cmdlet ScriptName= CommandPath= CommandLine=Import-Module -Verbose.\\nishang.psm1<'/Data'>",
"HostId" => "b99970c6-0f5f-4c76-9fb0-d5f7a8427a2a",
"HostVersion" => "5.1.14409.1005",
"CommandLine" => "Import-Module -Verbose.\\nishang.psm1",
"#timestamp" => 2017-05-12T23:49:24.130Z,
"port" => 65134,
"CommandType" => "Cmdlet",
"#version" => "1",
"host" => "10.0.2.2",
"SequenceNumber" => "1463",
"NewCommandState" => "Stopped",
"PipelineId" => "167",
"CommandName" => "Import-Module",
"HostName" => "ConsoleHost"
}

Logstash can not handle multiple heterogeneous inputs

Let's say you have 2 very different types of logs such as FORTINET and NetASQ logs and you want:
grok FORTINET using a regex, ang grok NETASQ using an other regex.
I know that with "type"in the input file and "condition" in the filter we can resolve this problem.
So I used this confing file to do it :
input {
file {
type => "FORTINET"
path => "/fortinet/*.log"
sincedb_path=>"/logstash-autre_version/var/.sincedb"
start_position => 'beginning'
}
file {
type => "NETASQ"
path => "/home/netasq/*.log"
}
}
filter {
if [type] == "FORTINET" {
grok {
patterns_dir => "/logstash-autre_version/patterns"
match => [
"message" , "%{FORTINET}"
]
tag_on_failure => [ "failure_grok_exemple" ]
break_on_match => false
}
}
if [type] == "NETASQ" {
# .......
}
}
output {
elasticsearch {
cluster => "logstash"
}
}
And i'm getting this error :
Got error to send bulk of actions: no method 'type' for arguments(org.jruby.RubyArray) on Java::OrgElasticsearchActionIndex::IndexRequest {:level=>:error}
But if don't use "type" and i grok only FORTINET logs it wroks.
What should i do ?
I'm not sure about this but maybe it helps:
I have the same error and I think that it is caused by the use of these if statements:
if [type] == "FORTINET"
your type field is compared to "FORTINET" but this is maybe not possible because "FORTINET" is a string and type isn't. Some times by setting a type to an input, if there is already a type, the type isn't replaced, but the new type is added to a list with the old type. You should have a look to your data in kibana (or wherever) and try to find something like this:
\"type\":[\"FORTINET\",\"some-other-type\"]
maybe also without all those \" .
If you find something like this try not to set the type of your input explicitly and compare the type in your if-statement to the some-other-type you have found.
Hope this works (I'm working with more complex inputs/forwarders and for me it doesn't, but it is worth a try)

Is it possible to use Logstash to remove_field if it does not match a certain value?

We want to filter a log using Logstash by removing fields if the field does not contain "_log". The remove_field syntax is available, but only works for removing a field if it matches a certain condition.
filter {
grok {
remove_field => [ "log_" ]
}
}
# This works for removing the log_ field, we want to remove everything that does NOT match log_.
Is it also possible to remove a field if it does not match a certain condition?
We tried using a regex that did just that, but that did not work (is it documented somewhere that you cannot use a regex?). Removing all other fields is also an option, but way more effort. We hope someone can help us fitering all fields that do not contain "log_".
The regexp should work:
filter {
if [field] !~ /pattern/ {
mutate {
remove_field => [ "field" ]
}
}
}
One of the ways you can do it is..
Tag the fields that do not match the condition (using tag_on_failure)
Remove the above tagged fields
Example:-
grok {
match => ["log_"]
tag_on_failure => ["_todelete"]
}
and then
grok {
remove_tag => [ "_todelete" ]
}

Logstash grok multiline message

My logs are formatted like this:
2014-06-19 02:26:05,556 INFO ok
2014-06-19 02:27:05,556 ERROR
message:space exception
at line 85
solution:increase space
remove files
There are 2 types of events:
-log on one line like the first
-log on multiple line like the second
I am able to process the one line event, but I am not able to process the second type, where I would like to stock the message in one variable and the solution in another.
This is my config:
input {
file {
path => ["logs/*"]
start_position => "beginning"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
negate => true
what => previous
}
}
}
filter {
#parsing of one line event
grok {
patterns_dir => "./patterns"
match=>["message","%{TIMESTAMP_ISO8601:timestamp} %{WORD:level} ok"]
}
#the parsing fail, so we assumed we are in multiline events, now I process them and I am stuck when I am getting to the new line.
if "_grokparsefailure" in [tags] {
grok {
patterns_dir => "./patterns"
match=>["message","%{TIMESTAMP_ISO8601:timestamp} %{WORD:level}\r\n"]
}
}
}
So this is what I have done, and I would like to have in my console output the following:
{
"#timestamp" => "2014-06-19 00:00:00,000"
"path" => "logs/test.log"
"level"=>"INFO"
},
{
"#timestamp" => "2014-06-19 00:00:00,000"
"path" => "logs/test.log"
"level"=>"ERROR"
"message" => "space exception at line 85"
"solution"=>"increase space remove files"
}
Concretely, I would like to get all the expression between two words ("message" and "solution" for the message variable, "solution" and the end of event for the solution variable), and that no matter if the expression is on one or multiple lines.
Thanks in advance
As for multiline grok, it's best to use special flag for pattern string:
grok {
match => ["message", "(?m)%{SYSLOG5424LINE}"]
}
It looks like you have two issues:
You need to correctly combine your multilines:
filter
{
multiline
{
pattern => "^ "
what => "previous"
}
}
This will combine any line that begins with a space into the previous line. You may end up having to use a "next" instead of a "previous".
Replace Newlines
I don't believe that grok matches across newlines.
I got around this by doing the following in your filter section. This should go before the grok section:
mutate
{
gsub => ["message", "\n", "LINE_BREAK"]
}
This allowed me to grok multilines as one big line rather than matching only till the "\n".

How to process multiline log entry with logstash filter?

Background:
I have a custom generated log file that has the following pattern :
[2014-03-02 17:34:20] - 127.0.0.1|ERROR| E:\xampp\htdocs\test.php|123|subject|The error message goes here ; array (
'create' =>
array (
'key1' => 'value1',
'key2' => 'value2',
'key3' => 'value3'
),
)
[2014-03-02 17:34:20] - 127.0.0.1|DEBUG| flush_multi_line
The second entry [2014-03-02 17:34:20] - 127.0.0.1|DEBUG| flush_multi_line Is a dummy line, just to let logstash know that the multi line event is over, this line is dropped later on.
My config file is the following :
input {
stdin{}
}
filter{
multiline{
pattern => "^\["
what => "previous"
negate=> true
}
grok{
match => ['message',"\[.+\] - %{IP:ip}\|%{LOGLEVEL:loglevel}"]
}
if [loglevel] == "DEBUG"{ # the event flush line
drop{}
}else if [loglevel] == "ERROR" { # the first line of multievent
grok{
match => ['message',".+\|.+\| %{PATH:file}\|%{NUMBER:line}\|%{WORD:tag}\|%{GREEDYDATA:content}"]
}
}else{ # its a new line (from the multi line event)
mutate{
replace => ["content", "%{content} %{message}"] # Supposing each new line will override the message field
}
}
}
output {
stdout{ debug=>true }
}
The output for content field is : The error message goes here ; array (
Problem:
My problem is that I want to store the rest of the multiline to content field :
The error message goes here ; array (
'create' =>
array (
'key1' => 'value1',
'key2' => 'value2',
'key3' => 'value3'
),
)
So i can remove the message field later.
The #message field contains the whole multiline event so I tried the mutate filter, with the replace function on that, but I'm just unable to get it working :( .
I don't understand the Multiline filter's way of working, if someone could shed some light on this, it would be really appreciated.
Thanks,
Abdou.
I went through the source code and found out that :
The multiline filter will cancel all the events that are considered to be a follow up of a pending event, then append that line to the original message field, meaning any filters that are after the multiline filter won't apply in this case
The only event that will ever pass the filter, is one that is considered to be a new one ( something that start with [ in my case )
Here is the working code :
input {
stdin{}
}
filter{
if "|ERROR|" in [message]{ #if this is the 1st message in many lines message
grok{
match => ['message',"\[.+\] - %{IP:ip}\|%{LOGLEVEL:loglevel}\| %{PATH:file}\|%{NUMBER:line}\|%{WORD:tag}\|%{GREEDYDATA:content}"]
}
mutate {
replace => [ "message", "%{content}" ] #replace the message field with the content field ( so it auto append later in it )
remove_field => ["content"] # we no longer need this field
}
}
multiline{ #Nothing will pass this filter unless it is a new event ( new [2014-03-02 1.... )
pattern => "^\["
what => "previous"
negate=> true
}
if "|DEBUG| flush_multi_line" in [message]{
drop{} # We don't need the dummy line so drop it
}
}
output {
stdout{ debug=>true }
}
Cheers,
Abdou
grok and multiline handling is mentioned in this issue https://logstash.jira.com/browse/LOGSTASH-509
Simply add "(?m)" in front of your grok regex and you won't need mutation. Example from issue:
pattern => "(?m)<%{POSINT:syslog_pri}>(?:%{SPACE})%{GREEDYDATA:message_remainder}"
The multiline filter will add the "\n" to the message. For example:
"[2014-03-02 17:34:20] - 127.0.0.1|ERROR| E:\\xampp\\htdocs\\test.php|123|subject|The error message goes here ; array (\n 'create' => \n array (\n 'key1' => 'value1',\n 'key2' => 'value2',\n 'key3' => 'value3'\n ),\n)"
However, the grok filter can't parse the "\n". Therefore you need to substitute the \n to another character, says, blank space.
mutate {
gsub => ['message', "\n", " "]
}
Then, grok pattern can parse the message. For example:
"content" => "The error message goes here ; array ( 'create' => array ( 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3' ), )"
Isn't the issue simply the ordering of the filters. Order is very important to log stash. You don't need another line to indicate that you've finished outputting multiline log line. Just ensure multiline filter appears first before the grok (see below)
P.s. I've managed to parse a multiline log line fine where xml was appended to end of log line and it spanned multiple lines and still I got a nice clean xml object into my content equivalent variable (named xmlrequest below). Before you say anything about logging xml in logs... I know... its not ideal... but that's for another debate :)):
filter {
multiline{
pattern => "^\["
what => "previous"
negate=> true
}
mutate {
gsub => ['message', "\n", " "]
}
mutate {
gsub => ['message', "\r", " "]
}
grok{
match => ['message',"\[%{WORD:ONE}\] \[%{WORD:TWO}\] \[%{WORD:THREE}\] %{GREEDYDATA:xmlrequest}"]
}
xml {
source => xmlrequest
remove_field => xmlrequest
target => "request"
}
}