Multiple regex matching in filebeat for message field - regex

I want to apply 2 regex expression with filebeat to drop events matching the content in message field.
I am able to make it work for single regex condition, but I am not sure how to configure multiple regex conditions.
regex list:
message: "(?i)cron"
message: "^now ([0-9]{4})-([0-1][0-9])-([0-3][0-9])\s([0-1][0-9]|[2][0-3]):([0-5][0-9]):([0-5][0-9])$"
Following is the config I have done for single regex which will match "cron" case insensitive text anywhere in the message
- drop_event:
when:
regexp:
message: "(?i)cron"
Refering to the Filebeat docs, I tried multiple configs but then filebeat won't startup:
Try 1:
- drop_event:
or:
- regexp:
message: "(?i)cron"
- regexp:
message: "^now ([0-9]{4})-([0-1][0-9])-([0-3][0-9])\s([0-1][0-9]|[2][0-3]):([0-5][0-9]):([0-5][0-9])$"
Try 2:
- if:
regexp:
message: "(?i)cron"
then:
drop_event:
- if:
regexp:
message: "^now ([0-9]{4})-([0-1][0-9])-([0-3][0-9])\s([0-1][0-9]|[2][0-3]):([0-5][0-9]):([0-5][0-9])$"
then:
drop_event:

Figured Out How we can apply multiple filter using or operator in filebeat. I was close in the second attempt in the post. When is required, after that we can use whatever operator we like or and etc.
Here's example of how I am using it
processors:
- drop_event.when:
or:
- contains:
container.name: "nginx"
- contains:
container.name: "mongo"
- contains:
container.name: "mysql"
- contains:
container.name: "redis"
- equals:
container.name: "tecnativa/tcp-proxy"
- drop_event.when:
or:
- regexp:
message: "(?i)cron"
- regexp:
message: "In On Child added message"
- regexp:
message: "In on Child removed message"
- regexp:
message: "then Moment"
- regexp:
message: "call_duration"
- regexp:
message: "now Moment"
- regexp:
message: "CHAT NOTIFICATION CODE"

Related

fluentd regexp to extract events from a log file

I'm new to fluentd.
I have a log that I want to push to AWS with fluentd but I can't figure out what the regexp should be.
All the log lines, except the multilines, start with a UUID.
Here's a sample log:
6b0815f2-8ff1-4181-a4e6-058148288281 2020-11-03 13:00:05.976366 [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
And, I'm trying to get UUID, DateTime, and Message.
With this regex:
/^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)/gm
I'm getting the last word CS_DESTROY.
I tried fluentular and still got:
text:
f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914 2020-11-03 14:32:34.975779 [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
regexp:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)$
and got:
time 2020/11/03 14:32:34 +0000
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
message https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
It's missing what's between the datetime and "https".
Try:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>[^\[]*) (?<message>\[.*)$
Live at rubular: https://rubular.com/r/JQQXs5VTkr2IxM
Here's the output for both logs:
Match 1
UUID 6b0815f2-8ff1-4181-a4e6-058148288281
time 2020-11-03 13:00:05.976366
message [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
Match 2
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
time 2020-11-03 14:32:34.975779
message [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg

JAVAMETHOD grok pattern with optional thread number at the end

I'm trying to parse log4j messages:
2019-12-02 20:48:20.198utc DEBUG UnknownElementContentHandler,streamLock-9-th-11:32 - blabla
2019-11-19 23:40:04.014utc WARN AnnotationBinder,localhost-startStop-1:611 - blabla
2019-11-19 23:40:04.014utc INFO CovImCtl,main:109 - blabla
with grok pattern
%{TIMESTAMP_ISO8601:timestamp}utc%{SPACE}%{LOGLEVEL:level}%{SPACE}%{JAVACLASS:class},%{JAVAMETHOD1:method}:%{POSINT:lineno}%{SPACE}-%{SPACE}%{GREEDYDATA:message}
with using a variation on the standard:
JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*)
JAVAMETHOD1 (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_\-0-9]*)
The JAVAMETHOD worked for "main" but not for the others, (the pattern was missing -).
JAVAMETHOD1 works, but I need to get the optional trailing integer retrieved as a "thread_no" field (11 from streamLock-9-th-11, 1 from localhost-startStop-1)
I'm wrecking my brain, the methods like streamLock-9-th-11 has the internal "-\d+" "-9" which belongs to "streamLock-9-th"
Any ideas?

Ansible until loop with regex in conditional statement

I need to check command stdout with a regex pattern in until loop. When I using regex_search to display debug msg all works fine:
- name: Checking if node is ready
shell: "kubectl --kubeconfig {{kubeconf}} get nodes"
register: k_output
- debug:
msg: "{{k_output.stdout | regex_search(node_hostname | string + '\\s+Ready')}}"
The message is
ok: [any_node_hostname] => {
"msg": "any_node_hostname Ready"
}
But if I trying to use that construction inside until conditional statement task fails with syntax error.
- set_fact:
regexp_pattern: "{{node_hostname | string + '\\s+Ready'}}"
- debug:
msg: "{{regexp_pattern}}"
- name: Checking if node is ready
shell: "kubectl --kubeconfig {{kubeconf}} get nodes"
register: k_output
until: "{{k_output.stdout | regex_search(regexp_pattern)}}" != ""
retries: 5
delay: 10
The same behaviour without set_fact when I just copying and pasting full string {{k_output.stdout | regex_search(node_hostname | string + '\\s+Ready')}} to until conditional statement.
So, how can I use regex_search or something that fits this case with until?
you have syntax error at until: statement : you must not quote the vars in expression, like in example here : Retrying a task until a condition is met
until: k_output.stdout | regex_search(regexp_pattern) != ""
I hope this will help

Ansible regex_replace filter does not work

I try to use Ansible regex_replace to filter the sub-string "application_1514971620021_4505" from a status message.
In the shell the message looks like this:
I run this code in Ansible:
---
- hosts: [npif]
remote_user: root
tasks:
- block:
- name: Admin submit check
command: chdir=/usr/spring-xd-1.3.1.RELEASE-yarn/ bin/xd-yarn submitted
register: admininfo
- debug: msg="{{ admininfo.stdout }}"
- debug: msg="{{ admininfo.stdout | regex_replace('^.*(application\_\d.*\_\d*)\s.*', '\\1') }}"
become: yes
become_user: ingestdev
debug: msg="{{ admininfo.stdout }}" returns the status message in the different format than in the shell:
ok: [npif] => {
"msg": " APPLICATION ID USER NAME QUEUE TYPE STARTTIME FINISHTIME STATE FINALSTATUS ORIGINAL TRACKING URL\n ------------------------------ --------- --------- -------- ---- -------------- ---------- ------- ----------- ------------------------\n application_1514971620021_4505 ingestdev spring-xd batch_cb XD 1/3/18 2:49 PM N/A RUNNING UNDEFINED http://x.x.x.x:9394"
}
When I run the second debug with regex_replace, I get the identical output to the first debug output - no regex_replace filter has been applied. The regex filter is correct - I've tested it externally. Basically the Ansible code is working too - I have tested with line below and got "test" as expected.
- debug: msg="{{ 'test.home.com' | regex_replace('^([^.]*).*', '\\1') }}"
Do you have an idea, what is wrong with my approach?
Your first problem is that .* doesn't appear to match newlines. Consider this:
- debug:
msg: "{{ admininfo.stdout | regex_replace('.*application', 'foo') }}"
This will replace application with foo, but will leave the header lines intact. Since the ^ anchors your regular expression to the beginning of the text (not the beginning of a line) your expression will never match.
You can take advantage of the fact that ansible has already provided you with individual lines in the stdout_lines key of your registered output. In this case, you would use something like:
- debug:
msg: >
{{ admininfo.stdout_lines[2] | regex_replace('^.*(application_\d.*_\d*)\s.*', '\1') }}
Note here that I've made a few changes in how things are quoted and escaped. In particular, I'm using the folded literal operator > in place of double quotes, and you neede to use \1 instead of \\1 for your replacement string.
This gives me:
ok: [localhost] => {
"msg": "application_1514971620021_4505\n"
}

grok regex parsing not matching a log. when specifying a group as optional, but not the last group

Example:
info: 2014-10-28T22:39:46.593Z - info: an error occurred while trying
to handle command: PlaceMarketOrderCommand, xkkdAAGRIl. Error:
Insufficient Cash #userId=5 #orderId=Y5545
pattern:
> %{LOGLEVEL:stream_level}: %{TIMESTAMP_ISO8601:timestamp} -
> %{LOGLEVEL:log_level}: %{MESSAGE:message}
> (#userId=%{USER_ID:user_id})? (#orderId=%{ORDER_ID:order_id})?
extra patterns used:
USER_ID (\d+|None)
ORDER_ID .*
ORDER_ID_HASH \s*(#orderId=%{ORDER_ID:order_id})?
USER_ID_HASH \s*(#userId=%{USER_ID:user_id})?
MESSAGE (.*?)
Works fine:
removing the optional last orderId also works
info: 2014-10-28T22:39:46.593Z - info: an error occurred while trying
to handle command: PlaceMarketOrderCommand, xkkdAAGRIl. Error:
Insufficient Cash #userId=5
but if I keep the orderId and remove the userId then I get a "no match"
info: 2014-10-28T22:39:46.593Z - info: an error occurred while trying
to handle command: PlaceMarketOrderCommand, xkkdAAGRIl. Error:
Insufficient Cash #orderId=Y5545
Also the user_id group is ending with a ? as an optional group..
working with the grok debugger in heroku:
Is this a bug? (logstash 1.4.2) missing something with the regex? (more probable.. but what?)
I looked at the regex lib grok is using and looks this syntax supposed to work. It does work for the last group (orderId) but not for the one before..
Thanks for the help!
You are forcing a space to be before your optional last... you need to do ?:
%{LOGLEVEL:stream_level}: %{TIMESTAMP_ISO8601:timestamp} -> %{LOGLEVEL:log_level}: %{MESSAGE:message} ?(#userId=%{USER_ID:user_id})? ?(#orderId=%{ORDER_ID:order_id})?