Multiple custom grok patterns not matching, but they successfully match alone? - regex

Grok matches single custom patterns, but does match when custom patterns are combined.
Complete, working, an verifiable example
Sample data:
OK 05/20 20:12:10:067 ABC_02~~DE_02 FGH_IJK jsmith _A0011
Custom patterns:
MMDD [0-1][0-9]/[0-3][0-9]
THREAD _A\w+
They work separately; specifically, this pattern works by itself:
%{MMDD:mmdd}
// Result
{
"mmdd": [
[
"05/20"
]
]
}
... and this pattern works by itself:
%{THREAD:thread}
// Result
{
"thread": [
[
"_A0011"
]
]
}
..but together, they fail:
%{MMDD:mmdd} %{THREAD:keyword}
No Matches
Puzzling. Tyvm Keith :^)
Testing here:
https://grokdebug.herokuapp.com/
Regex Resource:
https://regex101.com/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
EDIT based on Jeff Y's comment below
Note change of keyword to thread
// Grok Pattern
%{MMDD:mmdd}%{DATA}%{THREAD:thread}
// Result
{
"mmdd": [
[
"05/20"
]
],
"DATA": [
[
" 20:12:10:067 ABC_02~~DE_02 FGH_IJK jsmith "
]
],
"thread": [
[
"_A0011"
]
]
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
EDIT 2 based on Jeff Y's second comment below
// Data - HACKED - Note move of _A0011 to after mm/dd
OK 05/20 _A0011 20:12:10:067 ABC_02~~DE_02 FGH_IJK jsmith
// Grok Pattern
%{MMDD:mmdd} %{THREAD:thread}
// Result
{
"mmdd": [
[
"05/20"
]
],
"thread": [
[
"_A0011"
]
]
}

Grok will test your patterns against the whole message.
If your message is OK 05/20 _A0011 20:12:10:067 ABC_02~~DE_02 FGH_IJK jsmith and you only want the 05/20 and _A0011 part, your grok should have patterns to match the rest of string, but do not save them in a field.
For example, the pattern %{WORD}%{SPACE}%{MMDD:mmdd}%{SPACE}%{THREAD:thread}%{SPACE}%{GREEDYDATA} will match your string, it will save the mmdd and thread fiealds, but ignore everything else.

Related

Regex function to look for a character just on part of the string

I need help to build a regex rule to find some [ on a text file.
Here is a sample of te text. It is a Json, but I can't use it as it is because of limitation of the program I'm using.
{
"event":[
"ONIMBOTMESSAGEADD"
],
"data[BOT][123][BOT_ID]":[
"123"
]
}
I need to find a regex that matches the line "data[BOT][123][BOT_ID]":[ and find all [ on it. The objectve is to replace it by an underscore so I would end up with something like this:
{
"event":[
"ONIMBOTMESSAGEADD"
],
"data_BOT_123_BOT_ID":[
"123"
]
}
I can't just remove all special characters because this would destroy the json structure.
I found a way to select each one of the lines that need to be corrected with the rule below, but I was not able to apply another rule over the result. I don't know how to do it.
pattern = (("data\[[a-zA-Z]+]\[[0-9]+]\[([a-zA-Z]+_[a-zA-Z]+)\]":\[)|("data\[[A-Z]+]\[([A-Z]+(_|)[A-Z]+)\]":\[)|("data\[[A-Z]+]\[([A-Z]+(_|)[A-Z]+(_|)[A-Z]+)\]":\[))
Any ideas on how to solve it? Thank you in advance.
Replacing weird data* key by only data:
jq '.["data"] = .[keys[0]] | del(.[keys[1]])' file
{
"event": [
"ONIMBOTMESSAGEADD"
],
"data": [
"123"
]
}

I want to apply the regular expression used in gitleaks in secretlint

I am now trying to migrate from gitleaks to a tool called secretlint.
Originally, there was a warning in the generic-api-key rule when executing gitleaks, but after moving to secretlint, the warning no longer occurs.
Specifically, I wrote the regular expression of gitleaks.toml provided by gitleaks in the secretlint configuration file .secretlintrc.json according to the format of #secretlint-rule-pattern provided by secretlint.
[[rules]]
id = "generic-api-key"
description = "Generic API Key"
regex = '''(?i)((key|api[^Version]|token|secret|password|auth)[a-z0-9_ .\-,]{0,25})(=|>|:=|\|\|:|<=|=>|:).{0,5}['\"]([0-9a-zA-Z\-_=]{8,64})['\"]'''
entropy = 3.7
secretGroup = 4
keywords = [
"key",
"api",
"token",
"secret",
"password",
"auth",
]
to
{
"rules": [
{
"id": "#secretlint/secretlint-rule-pattern",
"options": {
"patterns": [
{
"name": "Generic API key",
"pattern": "/(?i)((key|api[^Version]|token|secret|password|auth)[a-z0-9_ .\\-,]{0,25})(=|>|:=|\\|\\|:|<=|=>|:).{0,5}['\"]([0-9a-zA-Z\\-_=]{8,64})['\"]/"
}
]
}
}
]
}
I'm thinking that perhaps I'm not migrating the regex correctly, but if anyone can tell me where I'm going wrong, I'd like to know.
The main issue is the the inline (?i) modifier is not supported by the JavaScript regex engine. You must use the normal i flag after the second regex delimiter (/.../i).
Also, the api[^Version] is a typical user error. If you meant to say api not followed with Version, you need api(?!Version).
So you can use
"pattern": "/((key|api(?!Version)|token|secret|password|auth)[\\w .,-]{0,25})([=>:]|:=|\\|\\|:|<=|=>).{0,5}['\"]([\\w=-]{8,64})['\"]/i"
Note that I "shrunk" [A-Za-z0-9_] into a single \w, they are equivalent here. Note the - char does not need escaping when used at the end (or start) of a character class.

Regex for matching ts files but not test

I need regex to add to the tsconfig file. I'm trying to match only ts source files, but not test files.
Tried something like;
"include": [
"src/**/*",
"../../node_modules/#web/common/src/app/views/**/*.ts"
"../../node_modules/#web/common/src/app/views/**/*.(module|component).ts"
],
"exclude": [
"node_modules",
"**/*.spec.ts",
"../../**/*.spec.ts"
But no luck.
// should match
/main.ts
/hello-world.component.ts
// shouldn't match
/hello-world.component.spec.ts
/app.e2e-spec.ts
The tsconfig file has an exclude section to exclude files based on a pattern match. For example:
"include": [
"src/**/*"
],
"exclude": [
"node_modules",
"**/*.spec.ts"
]
the handbook

Regex: Match Numbers inside a bracket

Ok here is an example of the text I got
"data": [
{
"post_id": "164902600239452_10202071734744222",
"actor_id": 164902600239452,
"target_id": null,
"likes": {
"href": "https://www.facebook.com/browse/likes/?id=10202071734744222",
"count": 2,
"sample": [
678063648,
100000551340876,
100000805495404,
100000905843684,
],
"friends": [
],
"user_likes": false,
"can_like": true
},
"comments": {
"can_remove": false,
"can_post": true,
"count": 0,
"comment_list": [
]
},
"message": "Down to the FINAL 3 SEATS for It Factor LIVE 2013... WHO will snag them before we close registration on October 15th???\n\nLearn more now at http://www.ItFactorLIVE.com/"
}, ]
I want to match only the numbers inside the brackets after the "sample":
"sample": [
678063648,
100000551340876,
100000805495404,
100000905843684,
],
so that I end up with this
678063648
100000551340876
100000805495404
100000905843684
May somebody please help me with the correct regex to make that happen?
OK - I have looked at the solution that #hwnd had suggested, as well as the link you gave to the "real" data, and came up with the following:
\d+(?=,*\s+(?:\d|\]))
You can see at http://regex101.com/r/pL3gW2 that this matches every string of digits in the sample that is inside square brackets.
The key difference with #hwnd's solution was the addition of a * after the ,, making the comma after the digits optional: this allows the expression to match the last set of numbers before the close ]. Without it, the match skipped the last number inside the brackets.
It's been said before: there are powerful JSON parsers available in almost any language / platform. Look into them.
see if this works for you
pattern = (\d+)(?=(?:(?!\[).)*\]) Demo

Nagiosgraph rrd files not created(maybe because of map file)

I'm having a problem with Nagiosgraph. I have created a nagios check which monitors the traffic on a server/workstation through SNMP and the output of the check is a long string that looks like this:
OK - traffmon eth0:incoming:170KB:outgoing:1606KB eth1:incoming:1576KB:outgoing:170KB eth2:incoming:156:outgoing:0|lo;incoming;25;outgoing;25 tunl0;incoming;0;outgoing;0 gre0;incoming;0;outgoing;0 sit0;incoming;0;outgoing;0 eth0;incoming;170KB;outgoing;1606KB eth1;incoming;1576KB;outgoing;170KB eth2;incoming;156;outgoing;0
I'm interested in the first three interfaces that is why i've separated eth0,eth1,eth2 from the whole string with interfaces(which i considered performance data) and i followed the instructions on http://www.novell.com/coolsolutions/feature/19843.html and i have in my service.cfg
define serviceextinfo{
host_name workstation
service_description Throughput Monitor
action_url /nagiosgraph/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&db=eth0,incoming,outgoing,&geom=500x100&rrdopts%3D-l%200%20-u%2010000%20-t%20Traffic
}
and in my map file i have wrote this to match the things that interested me:
/output:.*traffmon ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+), ([0-9]+)/
and push #s, [ 'eth0',
['incoming', 'GAUGE', $2],
['outgoing', 'GAUGE', $3] ],
[ 'eth1',
['incoming', 'GAUGE', $5],
['outgoing', 'GAUGE', $6] ],
[ 'eth2',
['incoming', 'GAUGE', $8],
['outgoing', 'GAUGE', $9] ];
I wanted to create three tables (eth0, eth1, eth2) with two columns (incoming, outgoing) and from then on to try to represent them nicely. The thing is that usually my rrd files get created automatically, but for this check the folder in the rrd folder with the workstation's name doesn't get created and neither are the .rrd files, and i have the feeling that it has something to do with the map file, maybe the matching is not working or something(i'm saying this because i don't now perl). Any suggestion is appreciated. Thank you
You can try this regex:
/traffmon eth0:incoming:(\d+)(?:KB):outgoing:(\d+)(?:KB) eth1:incoming:(\d+)(?:KB):outgoing:(\d+)(?:KB) eth2:incoming:(\d+):outgoing:(\d+)/
You can test it on rubular: http://rubular.com/r/vj7VXwDPPU
I'm not familiar with how your nagios system works, but if there is room for more perl code, you could also do something like:
my $res = 'OK - traffmon eth0:incoming:170KB:outgoing:1606KB eth1:incoming:1576KB:outgoing:170KB eth2:incoming:156:outgoing:0|lo;incoming;25;outgoing;25 tunl0;incoming;0;outgoing;0 gre0;incoming;0;outgoing;0 sit0;incoming;0;outgoing;0 eth0;incoming;170KB;outgoing;1606KB eth1;incoming;1576KB;outgoing;170KB eth2;incoming;156;outgoing;0';
my #s;
push #s, map {
my #f = split /:/;
[ $f[0], [$f[1], 'GAUGE', $f[2] ], [$f[3], 'GAUGE', $f[4]] ]
} (split(/ |\|/, $res))[3..5];
print Dumper #s;
This splits the string at a space or a pipe |, takes the 3rd to 5th element (which is the first three interfaces) and then does a loop with them. It splits on colon :, builds your data structure and returns it for each interface. The returned data structure is pushed into #s.
Output:
$VAR1 = [
'eth0',
[
'incoming',
'GAUGE',
'170KB'
],
[
'outgoing',
'GAUGE',
'1606KB'
]
];
$VAR2 = [
'eth1',
[
'incoming',
'GAUGE',
'1576KB'
],
[
'outgoing',
'GAUGE',
'170KB'
]
];
$VAR3 = [
'eth2',
[
'incoming',
'GAUGE',
'156'
],
[
'outgoing',
'GAUGE',
'0'
]
];