Need guidance in delimiter for regex - regex

Trying to send multiline Kafka log from RSYSLOG to FLuentd.
(?<date>\[.*?\]) (.*?) ((.|\n*)*)
Here is the link:
https://regex101.com/r/iFHyTi/1
But my regex is considering next timestamp pattern as a single line. Requirement is to stop before the next timestamp starts.

You can match all subsequent lines that start with either a TAB or a space char:
(?<date>\[[^][]*]) ([A-Z]+) (.*(?:\n(?!\[\d{4}-\d\d-\d\d).*)*)
See the regex demo.
Details
(?<date>\[[^][]*]) - Group "date": [, zero or more chars other than square brackets, ]
- space
([A-Z]+) - Group 2: one or more uppercase ASCII letters
- space
(.*(?:\n(?!\[\d{4}-\d\d-\d\d).*)*) - Group 3:
.* - any zero or more chars other that line break chars as many as possible
(?:\n(?!\[\d{4}-\d\d-\d\d).*)* - zero or more sequences of
\n(?!\[\d{4}-\d\d-\d\d) - a newline, LF, char not followed with [, four digis, -, two digits, -, two digits
.* - any zero or more chars other that line break chars as many as possible

Related

Issue in removing a special character from regex to format the phone number

Working to format the phone number in .rdl reporting. I have below Regex in the expression property of the report field.
Possible inputs from user - 720) 352-6511 , +1 (469) 292-4242, 310.614.1316, (310) 468-0516
desired Output - (303) 233-2345
My regex =System.Text.RegularExpressions.Regex.Replace(IIf(IsNothing(Fields!Contact_Phone.Value), "", Fields!Contact_Phone.Value), "(\d{3})[ -.]*(\d{3})[ -.]*(\d{4})", "($1) $2-$3")
My output - ((303) 233-2345
There is an extra open parenthesis at the starting, the other options i tired are not working.
Thanks
Here is a very generic approach to fixing this issue: match any non-digit chars between the possible items in the input text and capture the three parts that look obligatory in the input:
^[^\d+]*(?:\+\D*\d+\D+)?(\d{3})\D*(\d{3})\D*(\d{4})\D*$
See the regex demo. Details:
^ - start of string
[^\d+]* - zero or more chars other than digits and +
(?:\+\D*\d+\D+)? - an optional sequence of +, zero or more non-digits, one or more digits, one or more non-digits
(\d{3}) - Group 1: three digits
\D* - zero or more non-digits
(\d{3}) - Group 2: three digits
\D* - zero or more non-digits
(\d{4}) - Group 3: four digits
\D* - zero or more non-digits
$ - end of string.

Regex forward slash separator

I am using the below regex expression to ensure string is max 50 characters in length and that each word starts with uppercase letter:
reMatch("Jet Black","^(?=.{0,50}$)(^|^([A-Z][a-z]* +)*([A-Z][a-z]* *)$)")
This works, but I would also like to allow for option to separate words with / character. Example: Jet/Black and Jet / Black with a space in between.
Your suggestions are highly appreciated! Mike.
If you do not care if there may be several spaces, or slahes or intermingled spaces and slashes you may use
^(?=.{0,50}$)(?:[A-Z][a-z]*(?:[ /]+[A-Z][a-z]*)*)?$
See the regex demo.
To only allow spaces and an optional single slash with (white)spaces after use
^(?=.{0,50}$)(?:[A-Z][a-z]*(?:\s*(?:/\s*)?[A-Z][a-z]*)*)?$
See this regex demo
Details
^ - start of string
(?=.{0,50}$) - string should contain only 0 to 50 chars other than linebreak chars (same as (?!.{51}))
(?:[A-Z][a-z]*(?:\s*(?:/\s*)?[A-Z][a-z]*)*)? - an optional sequence of
[A-Z][a-z]* - an uppercase ASCII letter and 0+ lowercase ASCII letters
(?:\s*(?:/\s*)?[A-Z][a-z]*)* - 0 or more sequences of
\s* - 0+ whitespaces
(?:/\s*)? - an optional / and 0+ whitespaces
[A-Z][a-z]* - an uppercase ASCII letter and 0+ lowercase ASCII letters
$ - end of string.

Regex to match if a word starts and end with a letter, have no more than one consecutive non-letter (. *')

I'm currently trying to find a regex to match a specific use case and I'm not finding any specific way to achieve it. I would like, as the title says, to match if a word starts and end with a letter, contains only letter and those characters: "\ *- \'" . It should also have no more than one consecutive non-letter.
I currently have this, but it accepts consecutive non-letter and doesn't accept single letters [a-zA-Z][a-zA-Z \-*']+[a-zA-Z]
I want my regex to accept this string
This is accepted since it contains only spaces and letter and there is no consecutive space
a should be accepted
This is --- not accepted because it contains 5 consecutive non-letters characters (3 dashes and 2 spaces)
" This is not accepted because it starts with a space"
Neither is this one since it ends with a dash -
You may use
^[a-zA-Z]+(?:[ *'-][a-zA-Z]+)*$
See the regex demo and the regex graph:
Details
^ - start of string anchor
[a-zA-Z]+ - 1+ ASCII letters
(?:[ *'-][a-zA-Z]+)* - 0 or more sequences of:
[ *'-] - a space, *, ' or -
[a-zA-Z]+ - 1+ ASCII letters
$ - end of string anchor.

checking if one expression contains the next expression in regex

I want my regex to allow alphanumeric characters, "/_-" and white spaces in between but it must always have at least one alphanumeric character.
my validation goes like this,
/^([A-Za-z0-9/-]+[A-Za-z0-9/-\s]*[A-Za-z0-9/_-]+)$/
It should accept **ABC_1-2-3 but it must not allow 123 or -_/ alone
Can somebody help me please.
The below given regex will capture strings with alpha-numeric characters with optional white space, hyphen and underscore in it. Try it.
([*A-Za-z]+(\s+)?([\d\-_]+)?)
Your regex is almost right, you need to add 2 positive lookaheads at the start to require at least 1 letter and at least 1 digit:
/^(?=.*[a-z])(?=.*\d)[a-z0-9\/_-][a-z0-9\/_\s-]*[a-z0-9\/_-]$/i
See the regex demo (in the demo, \s is replaced with a space since the demo is multiline).
Details:
^ - start of string
(?=.*[a-z]) - after any 0+ chars other than line break chars, there must be at least 1 letter (replace .* with [^a-z]* for better performance)
(?=.*\d) - after any 0+ chars other than line break chars, there must be at least 1 digit (replace.with\D` for better performance)
[a-z0-9\/_-] - a letter, digit, /, _ or -
[a-z0-9\/_\s-]* - 0+ letters, digits, /, whitespaces, _ or -
[a-z0-9\/_-] - a letter, digit, /, _ or -
$ - end of string.
The i modifier makes the pattern case insensitive.

Regex help for Event Match that are unique, though the pattern is same

here is my regex: https://regex101.com/r/g56UzY/1
i have this pattern
pdlvkw6v INFO 18:25:03.994 pdlvkw6v WARN 18:25:03.994 pdlvkw6v INFO
18:25:03.994 rg9n9bz7 INFO 18:23:52.987 rg9n9bz7 ERROR 19:23:52.987
rg9n9bz7 INFO 21:23:52.987 5y6n9bz7 WARN 18:23:52.987
and my current regex is: [\w]{8}\s+(INFO|WARN|ERROR)\s+\d\d:\d\d:\d\d\.\d\d\d
I want the regex to only determine the first unique string ie. show pdlvkw6v and after that it should show me rg9n9bz7 and then 5y6n9bz7, it should not match the repititive strings.
What i am trying is to break events from multiline based on this fixed string and since one event can have multiple string and i want to be able to break it by the first matching string and leave the rest into the event.
You need to capture the word you are interested in and add a negative lookahead check:
(?s)\b(\w{8})\b(?!.*\b\1\b)\s+(?:INFO|WARN|ERROR)\s+\d\d(?::\d\d){2}\.\d{3}
^^^^^^^^^^^^^^^^^^^^^^^
Or, if (?s) modifier is not supported:
\b(\w{8})\b(?![\s\S]*\b\1\b)\s+(?:INFO|WARN|ERROR)\s+\d\d(?::\d\d){2}\.\d{3}
See the regex demo
Explanation:
(?s) - a DOTALL modifier making . match any char
\b - a word boundary
(\w{8}) - Group 1: 8 word chars
\b - a word boundary
(?!.*\b\1\b) - the negative lookahead that fails the match if immediately to the right of the current location, after 0+ chars, there is a whole word equal to the one stored in the Group 1 buffer
\s+ - 1+ whitespaces
(?:INFO|WARN|ERROR) - one of the three substrings
\s+ - 1+ whitespaces
\d\d - 2 digits
(?::\d\d){2} - 2 sequences of :, digit, digit
\. - a dot
\d{3} - three digits