Regex Pattern for a Java Log

Regex Pattern for a Java Log - regex

I am trying to use the regex Parser Plugin in fluentd to index the logs of my application.
Here's a snippet of it.
2020-05-06T22:34:50.860-0700 - WARN [main] o.s.b.GenericTypeAwarePropertyDescriptor: Invalid JavaBean property 'pipeline' being accessed! Ambiguous write methods found next to actually used [public void com.theoaal.module.pipeline.mbean.DynamicPhaseExecutionConfigurationMBeanBuilder.setPipeline(com.theplatform.module.pipeline.DynamicPipeline)]: [public void com.theplatform.module.pipeline.mbean.PhaseExecutionConfigurationMBeanBuilder.setPipeline(com.theoaal.module.pipeline.Pipeline)]
I have used the regex101.com to match the regex pattern and I am not able to get a match.
^(?<date>\d{4}\-\d{2}\-\d{2})(?<timestamp>[A-Z][a-z]{1}\d{2}:\d{2}:\d{2}.\d{3}\-\d{4})\s\-\s(?<loglevel>\[\w\]{6})\s+(?<class>\[[A-Z][a-z]+\])\s(?<message>.*)$
Kindly help.
Thanks

You may use
^(?<date>\d{4}-\d{2}-\d{2})[A-Z](?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4})\s+-\s+(?<loglevel>\w+)\s+(?<class>\[\w+\])\s+(?<message>.*)
See the regex demo
Note, in your pattern, \[\w\]{6} only matches [, a single word char and six ] chars. In the timestamp pattern, [A-Z][a-z]{1} requires two letters, but tere is a single T. Your "class" pattern requires a capitalized word with [A-Z][a-z]+, but main is all lowercase. You escape - outside of character classes unnecessarily, and you failed to escape a literal dot in the pattern.
Details
^ - start of string
(?<date>\d{4}-\d{2}-\d{2}) - date: 4 digits, -, 2 digits, -, 2 digits
[A-Z] - an uppercase ASCII letter
(?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4}) - 2 digits, :, 2 digits, :, 2 digits, ., 3 digits, - and 4 digits
\s+-\s+ - - enclosed with 1+ whitespaces
(?<loglevel>\w+) - 1+ word chars
\s+ - 1+ whitespaces
(?<class>\[\w+\]) - [, 1+ word chars, ]
\s+ - 1+ whitespaces
(?<message>.*) - the res of the line.
Copy and paste to fluent.conf or td-agent.conf:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /^(?<date>\d{4}-\d{2}-\d{2})[A-Z](?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4})\s+-\s+(?<loglevel>\w+)\s+(?<class>\[\w+\])\s+(?<message>.*)/
</source>
Test:

Related

Regex exclude trailing text from company names

CURRENTLY
I am try to match valid company names from strings with 4 conditions:
the name can ONLY contain alphanumeric characters + spaces + hyphens
the name can contain a hyphen (inside the name)
there are company suffixes that should be excluded from the company name i.e. Pty Ltd, Pty. Ltd., Limited, and Ltd.
If there are additional matches on the same line, these are to be excluded
What I am trying to achieve:
My regex so far:
(?:\s|^)([a-zA-Z0-9]+[a-zA-Z0-9\s-]*?[a-zA-Z0-9]+)(?: Pty Ltd| Ltd(\.){0,1}| Limited){0,1}(?:\s|$)
ISSUES
https://regex101.com/r/Gpbdln/4
It seems I am struggling with:
Excluding the suffixes to be ignored
Making the capture include spaces for the company name (while at the same time excluded suffixes)
I have been stuck on this for over an hour and would appreciate some help.

You may use
^[a-zA-Z0-9]+(?:[\s-]+[a-zA-Z0-9]+)*?(?=(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)?$)
See the regex demo
If you only need to get matches that do not span across lines, replace \s with \h or [\p{Zs}\t] if supported, or [^\S\r\n], to only match horizontal whitespaces.
Details
^ - start of string
[a-zA-Z0-9]+ - 1+ ASCII alphanumeric chars
(?:[\s-]+[a-zA-Z0-9]+)*? - 0 or more (but as few as possible) occurrences of
[\s-]+ - 1+ whitespaces or hyphens
[a-zA-Z0-9]+ - 1+ ASCII alphanumeric chars
(?=(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)?$) - immediately to the right, there must be
(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)? - an optional occurrence of a sequence of patterns:
\s+ - 1+ whitespaces
(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]) - any of
(?:Pty\.?\s+)?Ltd\.?| - an optional sequence of Pty, an optional dot and then 1+ whitespaces and then Ltd string and an optional . char, or
Limited| - Limited string, or
[a-zA-Z0-9]*[^a-zA-Z0-9\s] - any 0 or more ASCII alphanumeric chars followed with a char other than whitespace and alphanumeric char
.* - the rest of the string
$ - end of string.

How to get only the first match of a regex Grok filter

goal
I want to retrieve only this string "14" from this message with a logstash Grok
3/03/0 EE 14 GFR 20 AAA XXXXX 50 3365.00
this is my grok code
grok{
match => {
field1 => [
"(?<number_extract>\d{0}\s\d{1,3}\s{1})"
]
}
}
I would like to match just the first match "14" but my Grok filter returns all matches:
14 20 50

If you need to find the first occurrence of a number that consists of 1, 2 or 3 digits only, you may use
^(?:.*?\s)?(?<number_extract>\d{1,3})(?!\S)
Details
^ - start of string
(?:.*?\s)? - an optional substring of any 0+ chars other than line break chars as few as possible, and then a whitespace (this enables a match at the start of the string if it is there)
(?<number_extract>\d{1,3}) - 1 to 3 digits
(?!\S) - a negative lookahead that makes sure there is a whitespace or end of string immediately to the right (enables a match at the end of the string).
Alternative solution
If you know that the number you are looking for is after a date-like field and another field, and you want to force this pre-validation, you may use
^\d+/\d+/\d+\s+\S+\s+(?<number_extract>\d+)
See the regex demo
If you do not have to check if the first field is date-like, you may simply use
^\S+\s+\S+\s+(?<number_extract>\d+)
^(?:\S+\s+){2}(?<number_extract>\d+) // Equivalent
See the regex demo here.
Details
^ - start of string
\d+/\d+/\d+ - 1+ digits, /, 1+ digits, /, 1+ digits
\s+ - 1+ whitespaces
\S+ - 1+ chars other than whitespace
\s+ - 1+ whitespaces
(?<number_extract>\d+) - Capturing group "number_extract": 1+ digits.
Grok demo:

Regex to check Optional Group of numbers

i am trying to create a regex which should be able to accept the following strings
proj_asdasd_000.gz.xml
proj_asdasd.gz.xml
basically 2nd underscore is optional and if any value follows it, it should only be integer.
Following is my Regex that i am trying.
^proj([a-zA-z0-9]?)+_[a-zA-z]+(_[0-9]?)+\.[a-z]+.[a-z]
Any suggestion to make it accept the above mentioned strings?

You may use
^proj[a-zA-Z0-9]*_[a-zA-Z]+(?:_[0-9]+)?\.[a-z]+\.[a-z]+$
^proj[a-zA-Z0-9]*_[a-zA-Z]+(?:_[0-9]+)?(?:\.[a-z]+){2}$
See the regex demo
Details
^ - start of string
proj - a literal substring
[a-zA-Z0-9]* - 0 or more alphanumeric chars
_ - a _ char
[a-zA-Z]+ - 1+ ASCII letters
(?:_[0-9]+)? - an optional sequence of an underscore followed with 1+ digits
\.[a-z]+\.[a-z]+ = (?:\.[a-z]+){2} - two occurrences of . and 1+ lowercase ASCII letters
$ - end of string.
Notes:
[A-z] matches more than just ASCII letters
([a-zA-z0-9]?)+ matches an optional character 1 or more times, which makes little sense. Either match a char 1 or more times with + or 0 or more times with *, no need of parentheses
(_[0-9]?)+ matches 1 or more sequences of _ followed by a single optional digit (so, it matches _9___1_, for example). The quantifiers must be swapped to match an optional sequence of _ and 1+ digits.

Greedy regex quantifier not matching password criteria

/(^[a-zA-Z]+-?[a-zA-Z0-9]+){5,15}$/g
regex criteria
match length must be between 6 and 16 characters inclusive
must start with a letter only
must contain letters, numbers and one optional hyphen
must not end with a hyphen
the above regular expression doesnt satisfy all 4 conditions. tried moving the ^ before the group and omitting the + quantifiers but doesnt work

You are setting the limiting quantifier on a group that already has quantified subpatterns, thus, the length restriction won't work.
To set the length restriction, add the (?=.{6,16}$) lookahead after ^ and then feel free to set your consuming pattern.
You may use
/^(?=.{6,16}$)[a-zA-Z][a-zA-Z0-9]*(?:-[a-zA-Z0-9]+)?$/
See the regex demo. Note you should not use g modifier when validating the whole input string against a regex.
Details
^ - start of string
(?=.{6,16}$) - 6 to 16 chars in the string input allowed/required
[a-zA-Z] - a letter as the first char
[a-zA-Z0-9]* - 0+ alphanumeric chars
(?:-[a-zA-Z0-9]+)? - an optional sequence of - and then 1+ alphanumeric chars
$ - end of string.

All you need
^(?i)(?=.{6,16}$)(?!.*-.*-)[a-z][a-z\d-]*\d[a-z\d-]*(?<!-)$
Readable
^
(?i)
(?= .{6,16} $ ) # 6 - 16 chars
(?! .* - .* - ) # Not 2 dashes
[a-z] # Start letter
[a-z\d-]* # Optional letters, digits, dashes
\d # Must be digit
[a-z\d-]* # Optional letters, digits, dashes
(?<! - ) # Not end in dash
$
Well, at least my regex forces a number be present.

Nested regex replacement

I need to create the laravel migrations, so I have converted my SQL script to a laravel migration format using "replacement in files" with regular expressions from Sublime Text.
My problem is that i have to replace in the following string the '#' character by the 'tablename' in about 70 tables:
Schema::table('tablename', function($table) {
$table->dropForeign('#_columnname_foreign');
});
Actually I can do this using the following expression:
(Schema::table\('([a-z]+)',[\s]*function\(\$table\)[\s]*{[\s]*\$table->dropForeign\(')#(_[a-z_]+'\);)
And in the replace field:
$1$2$3
but I don't know how to do when the table has more than one fk:
Schema::table('tablename1', function($table) {
$table->dropForeign('#_field1_foreign');
$table->dropForeign('#_field2_foreign');
$table->dropForeign('#_field3_foreign');
$table->dropForeign('#_field4_foreign');
$table->dropForeign('#_field5_foreign');
$table->dropForeign('#_field6_foreign');
});
I have been using this site to validate my regular expressions RegExr

It is not an easy task for a regex in Sublime Text. The only way to do it with a regex is to make sure you capture the function singature with the optional number of table-dropForeign lines (matched lazily), and replace #s on the next line.
The regex below requires clicking Replace All multiple times until all matches are found.
(Schema::table\('([a-z0-9]+)',\s*function\(\$table\)\s*{(?:\s*\$table->dropForeign\('[a-z0-9]+_\w+'\);)*?\s*\$table->dropForeign\(')#(_\w+'\);)
Replacement is $1$2$3. See this regex demo, where you may replace the # in the second block manually with the table name and see how the match goes further.
Details:
(Schema::table\('([a-z0-9]+)',\s*function\(\$table\)\s*{(?:\s*\$table->dropForeign\('[a-z0-9]+_\w+'\);)*?\s*\$table->dropForeign\(') - Group 1 capturing:
Schema::table\(' - literal Schema::table(' substring
([a-z0-9]+) - Group 2 capturing 1+ alphanumerics (do not check Match Case option to also match uppercase ASCII letters)
',\s* - a comma and 0+ whitespaces
function\(\$table\) - a literal text function($table)
\s* - 0+ whitespaces
{ - a literal { (in SublimeText 2, it requires escaping)
(?:\s*\$table->dropForeign\('[a-z0-9]+_\w+'\);)*? - 0+ sequences, but as few as possible, matching:
\s*\$table->dropForeign\(' - 0+ whitespaces and then a literal text `$table->dropForeign('
[a-z0-9]+_\w+ - 1+ alphanumerics, _ and 1+ digits, letters or underscores (\w+)
'\); - a literal substring ');
\s* - 0+ whitespaces
\$table->dropForeign\(' - a literal text $table->dropForeign('
# - a matched # symbol to be replaced
(_\w+'\);) - Group 2 capturing:
_ - an underscore
\w+ - 1 or more letters, digits or underscores
'\); - a literal substring ');
NOTE: The issue I thought I found was related to an unescaped { that causes a regex failure in Sublime Text 2. In Sublime Text 3, the { in the regex does not have to be escaped.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex Pattern for a Java Log - regex

Related

Regex exclude trailing text from company names

How to get only the first match of a regex Grok filter

Regex to check Optional Group of numbers

Greedy regex quantifier not matching password criteria

Nested regex replacement

Categories

Resources