How to capture a multiline string having a match on each line? - regex

I have a multiline text field and need to test if each line matches a pattern.
The field might look like this:
1xABCD
9xDEFGHIJK
7xAJDKSLD
2xA
The pattern is this: \dx\w.*
The number of lines is from 1 to X.
I was trying ^\d+x\w.*${1,} or \d+x\w.*\r\n{1,}
Thank you

You may use
^\d+x\w+(?:\r?\n\d+x\w+)*$
Details
^ - start of string
\d+x\w+ - 1+ digits, x and then 1+ word chars (letters, digits or _)
(?:\r?\n\d+x\w+)* - a non-capturing group ((?:...)) that matches 0 or more (*) occurrences of:
\r?\n - an optional CR and an LF symbol
\d+x\w+ - 1+ digits, x and then 1+ word chars (letters, digits or _)
$ - end of string.
See the regex demo (note the text pasted in the regex101.com has LF only line endings).

Related

Regex to match the letter string group between 2 numbers

Is it possible to match only the letter from the following string?
RO41 RNCB 0089 0957 6044 0001 FPS21098343
What I want: FPS
What I'm trying LINK : [0-9]{4}\s*\S+\s+(\S+)
What I get: FPS21098343
Any help is much appreciated! Thanks.
You can try with this:
var String = "0258 6044 0001 FPS21098343";
var Reg = /^(?:\d{4} )+ *([a-zA-Z]+)(?:\d+)$/;
var Match = Reg.exec(String);
console.log(Match);
console.log(Match[1]);
You can match up to the first one or more letters in the following way:
^[^a-zA-Z]*([A-Za-z]+)
^.*?([A-Za-z]+)
^[\w\W]*?([A-Za-z]+)
(?s)^.*?([A-Za-z]+)
If the tool treats ^ as the start of a line, replace it with \A that always matches the start of string.
The point is to match
^ / \A - start of string
[^a-zA-Z]* - zero or more chars other than letters
([A-Za-z]+) - capture one or more letters into Group 1.
The .*? part matches any text (as short as possible) before the subsequent pattern(s). (?s) makes . match line break chars.
Replace A-Za-z in all the patterns with \p{L} to match any Unicode letters. Also, note that [^\p{L}] = \P{L}.
To grep all the groups of letters that go in a row in any place in the string you can simply use:
([a-zA-Z]+)
You could use a capture group to get FPS:
\b[0-9]{4}\s+\S+\s+([A-Z]+)
The pattern matches:
\b[0-9]{4} A wordboundary to prevent a partial match, and match 4 digits
\s+\S+\s+ Match 1+ non whitespace chars between whitespace chars
([A-Z]+) Capture group 1, match 1+ chars A-Z
Regex demo
If the chars have to be followed by digits till the end of the string, you can add \d+$ to the pattern:
\b[0-9]{4}\s+\S+\s+([A-Z]+)\d+$
Regex demo

Regex exclude whitespaces from a group to select only a number

I need to take only a number (a float number) from a text, but I can't remove the whitespaces...
** Update
I have a problem with this method, I only need to consider numbers and ',' between '- EUR' and 'Fee' as rule.
You can use
- EUR\W*(.*?)\W*Fee
See the regex demo.
Variations of the regex that might work in different regex engines:
- EUR\W*\K.*?(?=\W*Fee)
(?<=- EUR\W*).*?(?=\W*Fee)
Details:
- EUR - literal text
\W* - zero or more non-word chars
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\W*- zero or more non-word chars
Fee - a string.
You could also match the number format in capture group 1
- EUR\b\D*(\d+(?:,\d+)?)\s+Fee\b
- EUR\b Match - EUR and a word boundary
\D* Match 0+ times any char except a digit
( Capture group 1
\d+(?:,\d+)? Match 1+ digits with an optional decimal part
) Close group 1
\s+Fee\b Match 1+ whitespace chars, Fee and a word boundary
Regex demo
this is working i removed the , from (.) in test string.
Regex example - working

How to get only the first match of a regex Grok filter

goal
I want to retrieve only this string "14" from this message with a logstash Grok
3/03/0 EE 14 GFR 20 AAA XXXXX 50 3365.00
this is my grok code
grok{
match => {
field1 => [
"(?<number_extract>\d{0}\s\d{1,3}\s{1})"
]
}
}
I would like to match just the first match "14" but my Grok filter returns all matches:
14 20 50
If you need to find the first occurrence of a number that consists of 1, 2 or 3 digits only, you may use
^(?:.*?\s)?(?<number_extract>\d{1,3})(?!\S)
Details
^ - start of string
(?:.*?\s)? - an optional substring of any 0+ chars other than line break chars as few as possible, and then a whitespace (this enables a match at the start of the string if it is there)
(?<number_extract>\d{1,3}) - 1 to 3 digits
(?!\S) - a negative lookahead that makes sure there is a whitespace or end of string immediately to the right (enables a match at the end of the string).
Alternative solution
If you know that the number you are looking for is after a date-like field and another field, and you want to force this pre-validation, you may use
^\d+/\d+/\d+\s+\S+\s+(?<number_extract>\d+)
See the regex demo
If you do not have to check if the first field is date-like, you may simply use
^\S+\s+\S+\s+(?<number_extract>\d+)
^(?:\S+\s+){2}(?<number_extract>\d+) // Equivalent
See the regex demo here.
Details
^ - start of string
\d+/\d+/\d+ - 1+ digits, /, 1+ digits, /, 1+ digits
\s+ - 1+ whitespaces
\S+ - 1+ chars other than whitespace
\s+ - 1+ whitespaces
(?<number_extract>\d+) - Capturing group "number_extract": 1+ digits.
Grok demo:

Regex to check Optional Group of numbers

i am trying to create a regex which should be able to accept the following strings
proj_asdasd_000.gz.xml
proj_asdasd.gz.xml
basically 2nd underscore is optional and if any value follows it, it should only be integer.
Following is my Regex that i am trying.
^proj([a-zA-z0-9]?)+_[a-zA-z]+(_[0-9]?)+\.[a-z]+.[a-z]
Any suggestion to make it accept the above mentioned strings?
You may use
^proj[a-zA-Z0-9]*_[a-zA-Z]+(?:_[0-9]+)?\.[a-z]+\.[a-z]+$
^proj[a-zA-Z0-9]*_[a-zA-Z]+(?:_[0-9]+)?(?:\.[a-z]+){2}$
See the regex demo
Details
^ - start of string
proj - a literal substring
[a-zA-Z0-9]* - 0 or more alphanumeric chars
_ - a _ char
[a-zA-Z]+ - 1+ ASCII letters
(?:_[0-9]+)? - an optional sequence of an underscore followed with 1+ digits
\.[a-z]+\.[a-z]+ = (?:\.[a-z]+){2} - two occurrences of . and 1+ lowercase ASCII letters
$ - end of string.
Notes:
[A-z] matches more than just ASCII letters
([a-zA-z0-9]?)+ matches an optional character 1 or more times, which makes little sense. Either match a char 1 or more times with + or 0 or more times with *, no need of parentheses
(_[0-9]?)+ matches 1 or more sequences of _ followed by a single optional digit (so, it matches _9___1_, for example). The quantifiers must be swapped to match an optional sequence of _ and 1+ digits.

Regex pattern for underscore or hyphen but not both

I have a regular expression that is allowing a string to be standalone, separated by hyphen and underscore.
I need help so the string only takes hyphen or underscore, but not both.
This is what I have so far.
^([a-z][a-z0-9]*)([-_]{1}[a-z0-9]+)*$
foo = passed
foo-bar = passed
foo_bar = passed
foo-bar-baz = passed
foo_bar_baz = passed
foo-bar_baz_qux = passed # but I don't want it to
foo_bar-baz-quz = passed # but I don't want it to
You may expand the pattern a bit and use a backreference to only match the same delimiter:
^[a-z][a-z0-9]*(?:([-_])[a-z0-9]+(?:\1[a-z0-9]+)*)?$
See the regex demo
Details:
^ - start of string
[a-z][a-z0-9]* - a letter followed with 0+ lowercase letters or digits
(?:([-_])[a-z0-9]+(?:\1[a-z0-9]+)*)? - an optional sequence of:
([-_]) - Capture group 1 matching either - or _
[a-z0-9]+ - 1+ lowercase letters or digits
(?:\1[a-z0-9]+)* - 0+ sequences of:
\1 - the same value as in Group 1
[a-z0-9]+ - 1 or more lowercase letters or digits
$ - end of string.
Here's a nice clean solution:
^([a-zA-Z-]+|[a-zA-Z_]+)$
Break it down!
^ start at the beginning of the text
[a-zA-Z-]+ match anything a-z or A-Z or -
| OR operator
[a-zA-Z_]+ match anything a-z or A-Z or _
$ end at the end of the text
Here's an example on regexr!