Regex in logstash mutate gsub to replace a character in a string - regex

I am trying to get double quotes (placed at random places) from a string replaced with something else.
This is the logline-
msg="AUT30544: User chose to proceed on the sign-in notification page "Sign-In Notification Message""
Actually this was part of KV parsing in logstash's filter section. If you notice there is a quoted string inside of a string that itself is in double-quotes.
However, Below string gets correctly parsed in KV-
msg="AUT23278: User Limit realm restrictions successfully passed for /google_auth "
Now I created a regex to remove the double-quotes in problematic string-
https://regex101.com/r/o00oot/1/
Applied it in logstash but nothing changed.
Below is my config file-
input {
tcp {
port => 1301
}
}
filter {
if "type=vpn" in [message] {
dissect {
mapping => { "message" => "%{reserved} id=firewall %{message1}" }
}
#mutate { gsub => ["message1",':'," "] }
#mutate { gsub => ["message1",'"',''] }
mutate {gsub => ["msg","(.*)\"(.*)\"(\")", "\1 '\2 '\3"] }
kv { source => "message1" value_split => "=" whitespace => "strict" } #field_split => " " remove_char_value => '"' }
geoip { source => "src" }
# \/ end of if vpn type log
}
else { drop {} }
}
A similar logline that I could capture using tcpdump is-
<134>Oct 2 11:24:45 1xx.xx.43.101 1 2021-10-02T11:24:45+05:30 canopus.domain1.com2 PulseSecure: - - - id=firewall time="2021-10-02 11:24:45" pri=6 fw=172.20.43.101 vpn=ive user=user1 realm="google_auth" roles="" proto=auth src=2xx.176.114.94 dst= dstname= type=vpn op= arg="" result= sent= rcvd= agent="" duration= msg="AUT30544: User chose to proceed on the sign-in notification page "Sign-In Notification Message""
The stdout of same kind of message on stdout. I can see the double-quotes being escaped but still they create problem in parsing.
{
"type" => "vpn",
"user" => "user1",
"fw" => "1xx.xx.43.101",
"host" => "1xx.xx.4.63",
"realm" => "google_auth",
"src" => "1xx.66.50.112",
"port" => 33003,
"#version" => "1",
"message" => "<13>Oct 2 11:54:39 1xx.xx.43.101 396 <134>1 2021-10-02T11:54:39+05:30 canopus.domain1.com2 PulseSecure: - - - id=firewall time=\"2021-10-02 11:54:39\" pri=6 fw=1xx.xx.43.101 vpn=ive user=user1 realm=\"google_auth\" roles=\"\" proto=auth src=1xx.66.50.112 dst= dstname= type=vpn op= arg=\"\" result= sent= rcvd= agent=\"\" duration= msg=\"AUT30544: User chose to proceed on the sign-in notification page \"Sign-In Notification Message\"\"",
"geoip" => {
"location" => {
"lon" => 77.5937,
"lat" => 12.9719
},
If someone knows a KV plugin's native solution to this problem, I dont need to go through hassles of regex in gsub.

I'm not sure if you can use kv on whole message as you have, try to split it up so you get key/value part of the message in separate field and then use kv on it. That being said, I would suggest you to skip using gsub here completely because there is option called trim_value for kv filter.
With that option your configuration would look something like this. Disclaimer, this is not tested, maybe you'll have to play with regex inside of trim_value, but this is easier way to handle it.
input {
tcp {
port => 1301
}
}
filter {
if "type=vpn" in [message] {
dissect {
mapping => { "message" => "%{reserved} id=firewall %{message1}" }
}
kv {
source => "message1"
value_split => "="
whitespace => "strict"
trim_value => "\\\""
}
geoip {
source => "src"
}
}
else {
drop { }
}
}

Related

Logstash stops compiling when given custom pattern inside a filter

So, the problem is this: I have a custom pattern file in ./patterns directory.
It looks like this:
NODELISTENUM(([A-Za-z0-9]{0,20})(\-)?([A-Za-z0-9]{0,20})(\.[A-Za-z0-9]{0,20})?(\,)*([A-Za-z0-9]{0,20}(\-?[A-Za-z0-9]{0,20})*)(\.[A-Za-z0-9]{0,20})?)+
XCAT_1 ([a-z]{5,5})\s\-([A-Za-z])\s([a-z]{4,4})\s\-([A-Za-z])\s(?:%{XCNODELISTENUM})
XCAT_2 (\-([A-Za-z]\s(?:%{XCNODELISTENUM})\s[a-z]{5,5})\s\-([A-Za-z])\s([a-z]{4,4}))
XCAT (%{XCAT_1}|%{XCAT_2})
XCATCOMMEXEC ([a-z]{5,5})\s\-([A-Za-z])\s([a-z]{4,4})
OPTION (\-([A-Za-z]))
NODESINVOLVED (([A-Za-z0-9]{0,20})(\-)?([A-Za-z0-9]{0,20})(\.[A-Za-z0-9]{0,20})?(\,)*([A-Za-z0-9]{0,20}(\-?[A-Za-z0-9]{0,20})*)(\.[A-Za-z0-9]{0,20})?)+)
Filter in which those patterns are used looks like this:
filter {
if [type] == "syslog" and !("parsed_by_added_cron_filter" in [tags]) {
grok {
patterns_dir => ["./patterns"]
remove_tag => ["_grokparsefailure"]
match => {
"message" => ["%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: xCAT: Allowing %{XCATCOMMEXEC:xCAT_comm_exec} %{OPTION:option} ?%{NODESINVOLVED:nodes_involved} for %{USERNAME:xcat_user} from %{SYSLOGHOST:xcat_user_hostname}"]
}
add_field => [ "received_at", "%{#timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
}
syslog_pri { }
}
This is the message in the log that shows logstash stop compiling:
[2017-05-03T12:42:29,507][ERROR][logstash.pipeline ] Error registering plugin {:plugin=>"#<LogStash::FilterDelegator:0x30da3bcb #id=\"d2fe4d8a1b6009020b724f61f22506bdecdfdb3f-6\", #klass=LogStash::Filters::Grok, #metric_events=#<LogStash::Instrument::NamespacedMetric:0x2026f0d4 #metric=#<LogStash::Instrument::Metric:0x719b7df8 #collector=#<LogStash::Instrument::Collector:0x397c0497 #agent=nil, #metric_store=#<LogStash::Instrument::MetricStore:0x58197410 #store=#<Concurrent::Map:0x4fae9f97 #default_proc=nil>, #structured_lookup_mutex=#<Mutex:0x65704f27>, #fast_lookup=#<Concurrent::Map:0x3c71a7a2 #default_proc=nil>>>>, #namespace_name=[:stats, :pipelines, :main, :plugins, :filters, :\"d2fe4d8a1b6009020b724f61f22506bdecdfdb3f-6\", :events]>, #logger=#<LogStash::Logging::Logger:0x14329d83 #logger=#<Java::OrgApacheLoggingLog4jCore::Logger:0x3777882e>>, #filter=<LogStash::Filters::Grok patterns_dir=>[\"./patterns\"], remove_tag=>[\"_grokparsefailure\"], match=>{\"message\"=>[\"%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\\\\[%{POSINT:syslog_pid}\\\\])?: xCAT: Allowing %{XCATCOMMEXEC:xCAT_comm_exec} %{OPTION:option} ?%{NODESINVOLVED:nodes_involved} for %{USERNAME:xcat_user} from %{SYSLOGHOST:xcat_user_hostname}\"]}, add_field=>{\"received_at\"=>\"%{#timestamp}\", \"received_from\"=>\"%{host}\"}, id=>\"d2fe4d8a1b6009020b724f61f22506bdecdfdb3f-6\", enable_metric=>true, periodic_flush=>false, patterns_files_glob=>\"*\", break_on_match=>true, named_captures_only=>true, keep_empty_captures=>false, tag_on_failure=>[\"_grokparsefailure\"], timeout_millis=>30000, tag_on_timeout=>\"_groktimeout\">>", :error=>"pattern %{XCATCOMMEXEC:xCAT_comm_exec} not defined"}
i found the
NODELISTENUM(([A-Za-z0-9]{0,20})(-)?([A-Za-z0-9]{0,20})(.[A-Za-z0-9]{0,20})?(\,)([A-Za-z0-9]{0,20}(-?[A-Za-z0-9]{0,20}))(.[A-Za-z0-9]{0,20})?)+
you should have a space in first line NODELISTENUM
NODELISTENUM (([A-Za-z0-9]{0,20})(\-)?([A-Za-z0-9]{0,20})(\.[A-Za-z0-9]{0,20})?(\,)*([A-Za-z0-9]{0,20}(\-?[A-Za-z0-9]{0,20})*)(\.[A-Za-z0-9]{0,20})?)+
if still can't ,please remove one by one to debug ,seems a custom pattern is wrong

Logstash grok filter to tag received and bounced messages

Sthg makes me crazy, I would like to parse Postfix logs to know the status of emails, here is what I tried so far :
input {
file {path => "/var/log/mail.log"}
}
filter {
kv {
trim => "<>"
}
if [message] =~ /[ "status=bounced" ]/ {
grok {
patterns_dir => "/etc/logstash/patterns"
match => {"message" => "%{SYSLOGBASE} (?<QID>[0-9A-F]{10}): %{GREEDYDATA:message}"}
add_tag => "bounce"
}
}
}
output {
if "bounce" in [tags] {
stdout { codec => rubydebug }
}
}
Example of mail.log :
Jul 26 04:18:34 mx12 postfix/cleanup[20659]: 3mfHGL1r9gzyQP: message-id=<3mfHGL1r9gzyQP#www.mydomain.fr>
Jul 26 04:18:34 mx12 postfix/smtp[20662]: 3mfHGL1r9gzyQP: to=, relay=127.0.0.2[127.0.0.2]:25, delay=0.53, delays=0.13/0/0.23/0.16, dsn=2.0.0, status=sent / bounced
Result 1 :
I send an email to an existing email address, the status in mail.log is :
sent (250 ok) : OKAY
But here is what Logstash tells :
.. and I see that for every message generated by each postfix program (qmgr, smtp, qmgr again..). In other words, for all messages that even not contain "status=bounced".
Then I also tried :
if [message] =~ /[ "bounced" ]/ {
mutate {add_tag => [ "bounce" ]}
}
if [message] =~ /[ "message-id", "(.*)\#www\.mydomain\.fr" ]/ {
mutate { add_tag => [ "send" ] }
}
grok {
match => {"message" => "%{SYSLOGBASE} (?<QID>[0-9A-F]{10}): %{GREEDYDATA:message}"}
}
Result 2 :
Logstash add here always 2 tags : bounce + send :(
Result expected :
What I try to do is exactly this config file, but it was made with an old version of Logstash ("grep" for example is not available now), but this is exactly what I try to make working :
http://tales.itnobody.com/2013/07/using-logstash-to-log-smtp-bounces-like-a-boss.html
In one word :
Any entries with a DSN – RECORD: QID, dsn
Any entries matching message-id=< hashRegex > – RECORD: QID, message-id
As follow :
output{
if "bounce" in [tags] {
exec {
command => "php -f /path/LogDSN.php %{QID} %{dsn} &"
}
}
if "send" in [tags] {
exec {
command => "php -f /path/LogOutbound.php %{QID} %{message-id} &"
}
}
}
But there is a problem in my filter, that makes me crazy,
Any idea ??
I have found the problem.
It's coming from this test:
if [message] =~ /[ "bounced" ]/ {
mutate {add_tag => [ "bounce" ]}
}
The regex is the part between the /, so your regex is evaluated like that :
https://regex101.com/r/eaB5jp/2
So all your lines will match and get the tag.
In order to work, the test should be:
if [message] =~ /bounced/ {
mutate {add_tag => [ "bounce" ]}
}

Logstash grok chain conditional filters

I'm trying to create a grok pattern for a mixed log. This is my first time creating a conditional chain and I keep getting syntax errors:
opt/logstash/bin/logstash -f /opt/logstash/conf.d/sip-parser.conf -- configtest
Error: Expected one of #, in, not , ==, !=, <=, >=, <, >, =~, !~, and, or, xor, nand, { at line 27, column 14 (byte 580) after filter {
# separate soap calls from responses
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:logdate} \] %{LOGLEVEL:level} %{GREEDYDATA:type}"}
}
if [type]
My configfile:
input {
file{
path => "/home/steven/sip.log"
start_position => beginning
# logstash stores the lastrun=> so we trick it
sincedb_path => "/dev/null"
#if logentry does not start with date it's part of previous entry
codec => multiline {
pattern => "\[^%{TIMESTAMP_ISO8601:logdate}\]"
negate => "true"
what => "previous"
}
}
}
filter {
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:logdate} \] %{LOGLEVEL:level} %{GREEDYDATA:type}"}
}
# separate soap calls from responses
if ([type] ~= /AbstractLoggingInterceptor:\ Inbound Message$/) {
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:logdate} \] %{LOGLEVEL:level} %{GREEDYDATA:type}\n----------------------------\n%{GREEDYDATA:id}\n%{GREEDYDATA:responsecode}\n%{GREEDYDATA:encoding}\n%{GREEDYDATA:contenttype}\n%{GREEDYDATA:headers}\n%{GREEDYDATA:payload}\n--------------------------------------"}
}
}
else if ([type] ~= /AbstractLoggingInterceptor:\ Outbound Message$/) {
grok {
match => {"message" => "\[%{TIMESTAMP_ISO8601:logdate} \] %{LOGLEVEL:level} %{GREEDYDATA:type}\n---------------------------\n%{GREEDYDATA:id}\n%{GREEDYDATA:responsecode}\n%{GREEDYDATA:encoding}\n%{GREEDYDATA:contenttype}\n%{GREEDYDATA:headers}\n%{GREEDYDATA:payload}\n--------------------------------------"}
}
}
else {
grok {
match => {"message" => "\[%{TIMESTAMP_ISO8601:logdate} \] %{LOGLEVEL:level} %{GREEDYDATA:type}"}
}
}
}
output {
#elasticsearch {}
stdout{}
}
The logfile I'm trying to parse can be found here: http://pastebin.com/afbNfmjW
The individual grok patterns for each different type of entry have been tested in http://grokdebug.herokuapp.com/ but I can't chain these together. What am I doing wrong?
Your conditional grok{}s should not be inside the first grok, but peers to it:
grok { ... }
if [myField] == "value" {
grok { ... }
}
Also note that you're running a regular expression to see if you should run a regular expression. I would suggest sending multiple patterns to one grok stanza:
grok {
match => { "myField",
pattern1,
pattern2,
pattern3
}
}
by default, grok will stop processing them when one matches.

Logstash grok match pattern for message field

my log data is like,
.
There are total 4 lines are there(Starting from Date with Time).
My grok pattern is:
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:time} \[%{NUMBER:thread}\] %{LOGLEVEL:loglevel} %{JAVACLASS:class} - %{GREEDYDATA:msg} " } }
Problem is:
I am getting only some data of msg(GREEDYDATA) filed.
EX:
Below data is missing when the 4th line parsing
log is :
2015-01-31 15:58:57,400 [9] ERROR NCR.AKPOS.Enterprise_Comm.EventSender - EventSender.SendInvoice() Generate Message Error: System.ArgumentNullException: Value cannot be null.
Parameter name: value
at System.Xml.Linq.XAttribute..ctor(XName name, Object value)
at NCR.AKPOS.Enterprise_Comm.MessageHandlerObjecttoXMLHelper.CreateXMLFromInvoice(Invoice invoice, Customer_ID customer_id
Log stash typically parses each line at a time.
For java exceptions you need to look at the multiline plugin.
See an example here: https://gist.github.com/smougenot/3182192
Your grok format seems ok, but without an example cannot be tested.
You can use the grok debugger app to test out your patterns.
https://grokdebug.herokuapp.com/
Just remove the trailing white spaces from %{GREEDYDATA:msg} " } to
%{GREEDYDATA:msg}"}
So, total filter configuration is:
filter {
multiline{
pattern => "^%{TIMESTAMP_ISO8601}"
what => "previous"
negate=> true
}
# Delete trailing whitespaces
mutate {
strip => "message"
}
# Delete \n from messages
mutate {
gsub => ['message', "\n", " "]
}
# Delete \r from messages
mutate {
gsub => ['message', "\r", " "]
}
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:time} \[%{NUMBER:thread}\] %{LOGLEVEL:loglevel} %{JAVACLASS:class} - %{GREEDYDATA:msg}" }
}
if "Exception" in [msg] {
mutate {
add_field => { "msg_error" => "%{msg}" }
}
}
}

How to grok this input with logstash?

I'm trying to grok some lines with logstash, so first I created two patterns witch look like this :
AZ_LIST [1-9a-zA-Z,]+
AZ_STRING [a-zA-Z._-]+
and then I configured logstash to grok this input :
security=0 system=23 CPU=this.adresse_false Pvm=0,0,0,0,0,0,0,0 Vlan2=AZERT,566,2184,798,3312
My filter is :
filter {
grok {
patterns_dir => "/patterns"
match => [
"message" , "security=%{NUMBER:security} system=%{NUMBER:system} CPU=%{AZ_STRING:CPU} Pvm=%{AZ_LIST:Pvm} Vlan2=%{AZ_LIST:Vlan2}"
]
tag_on_failure => [ "failure_grok_exemple" ]
break_on_match => false
}
}
But these doesn't work
There is an error in your pattern. Your AZ_LIST do not include 0, but your logs have 0 EX: Pvm=0,0,0,0,0,0,0,0
This is my config, I can parse your log successfully.
filter {
grok {
patterns_dir => "./patterns/"
match => [
"message" , "security=%{NUMBER:security} system=%{NUMBER:system} CPU=%{AZ_STRING:CPU} Pvm=%{AZ_LIST:Pvm} Vlan2=%{AZ_LIST:Vlan2}"
]
}
}
Pattern:
AZ_LIST [0-9a-zA-Z,]+
AZ_STRING [a-zA-Z._-]+