Why my Regex is only giving me ONE group back? - regex

Im currenty having issues with a regex that Im creating. The regex has to extract all the groups that says number #### between Hello and Regards. At this moment my regex only extracts one group and I need all the groups inside, at this case I have 2, but there may be more inside.
Regex Image
I'm using the web page https://regex101.com/
Flavor: PCRE (PHP)
Regex: Hello\s.*(number\s*[\d]*)\s.*Regards
Text:
This is my test text number 25120
Hello my name is testing
I'm 20 years old
Please help me with the regex number 1542
I have been trying to create the regex many times this is my number 5152
Regards
I'm still trying my attempt number 5150
Result:
My Result is only the group number 5152 but inside is another group number 1542.

You may use
(?si)(?:\G(?!\A)|\bHello\b)(?:(?!\bHello\b).)*?\K\bnumber\s*\d+(?=.*?\bRegards\b)
See the regex demo.
Details
(?si) - s - DOTALL modifier making . match any chars, and i makes the pattern case insensitive
(?:\G(?!\A)|\bHello\b) - either the end of the previous match (\G(?!\A)) or (|) a whole word Hello (\bHello\b)
(?:(?!\bHello\b).)*? - any char, 0 or more times but as few as possible, that does not start a whole word Hello char sequence
\K - match reset operator that discards all text matched so far
\bnumber - a whole word number
\s* - 0+ whitespaces
\d+ - 1+ digits
(?=.*?\bRegards\b) - there must be a whole word Regards somewhere after any 0+ chars (as few as possible).

Related

RegEx: allow 1-25 letters or spaces but exclude three-letter-value-list

I'm new to this forum and hope someone can support me.
I need to create a RegEx pattern which allows 1 to 25 letters or spaces but does not allow one of the values EMP, NDB, POI or CWR.
I tried the following using negative lookahead:
((?!EMP|NDB|POI|CWR)[A-Za-z\s]{1,25})$
However this does not work properly, the value (like EMP) is still accepted - see https://regex101.com/r/YfflBi/1
This only works fine if I only have letters (no spaces) and limit down to 3:
((?!EMP|NDB|POI|CWR)[A-Za-z]{3})$
(see https://regex101.com/r/SzmuwP/1)
However the challenge here is that I need 1 to 25 letters or spaces to be accepted but not one of the three-letter-values I mentioned.
Many thanks in advance to everyone thinking about a solution!
You can use
^(?:(?!EMP|NDB|POI|CWR)[A-Za-z\s]){1,25}$
See the regex demo. Details:
^ - start of string
(?: - start of a non-capturing group:
(?!EMP|NDB|POI|CWR)[A-Za-z\s] - a letter or whitespace that is not the starting char of the char sequences defined in the negative lookahead
){1,25} - repeat the pattern sequence inside the non-capturing group one to 25 times
$ - end of string.

Regex (PCRE): Match all digits in a line following a line which includes a certain string

Using PCRE, I want to capture only and all digits in a line which follows a line in which a certain string appears. Say the string is "STRING99". Example:
car string99 house 45b
22 dog 1 cat
women 6 man
In this case, the desired result is:
221
As asked a similar question some time ago, however, back then trying to capture the numbers in the SAME line where the string appears ( Regex (PCRE): Match all digits conditional upon presence of a string ). While the question is similar, I don't think the answer, if there is one at all, will be similar. The approach using the newline anchor ^ does not work in this case.
I am looking for a single regular expression without any other programming code. It would be easy to accomplish with two consecutive regex operations, but this not what I'm looking for.
Maybe you could try:
(?:\bstring99\b.*?\n|\G(?!^))[^\d\n]*\K\d
See the online demo
(?: - Open non-capture group:
\bstring99\b - Literally match "string99" between word-boundaries.
.*?\n - Lazy match up to (including) nearest newline character.
| - Or:
\G(?!^) - Asserts position at the end of the previous match but prevent it to be the start of the string for the first match using a negative lookahead.
) - Close non-capture group.
[^\d\n]* - Match 0+ non-digit/newline characters.
\K - Resets the starting point of the reported match.
\d - Match a digit.

Regex to replace up to 4 digits before a word

I am using this extension for chrome (It's called Word Replacer II) and I'm trying to create a Regex find and replace.
Quick backstory, my partner is recovering from an eating disorder and I want to find all mentions of Kilojoules and kJs and replace them with .
I am entirely new to Regex and after a few hours, I'm not much closer to getting a working expression.
I need it to remove up to 4 digits before the letters "kJs". E.g, 400kJs and 1000kJs. I'd like the "400kJs and 1000kJs" to be replaced with "[removed kJs] and [removed kJs]".
The code I have put together so far is;
\s+(a{1,4}<=\d)\s+(?=kJ)
And help would be much appreciated!
You may use the following approach:
\d{1,4}\s*kJs\b
See the regex demo
If you need to keep kJs, you may wrap the right part of the pattern with a lookahead, \d{1,4}(?=\s*kJs\b).
If you do not want to touch 5 or more digit numbers, use
\b\d{1,4}\s*kJs\b
(?<!\d)\d{1,4}\s*kJs\b
That is, add a word boundary, \b, or a left-hand digit boundary, (?<!\d).
Pattern details
\d{1,4} - one to four digits
\s* - 0+ whitespaces
kJs - a string of letters
\b - a word boundary (may not be necessary if there can be no word starting with kJs).

How to find more than one match of numbers utilizing regex

I am attempting to pick apart data from the following string utlizing a regex expression:
Ethane, C2 11.7310 3.1530 13.9982 HV, Dry # Base P,T 1432.00
The ultimate goal is to be able to pull out the middle three data points as individual values 11.7310, 3.153, 13.9982
The code expression I am working with at the moment is as follows:
(?<=C2 )(\d*\.?\d+)
This yields a full match of 11.7310 and a Group 1 match of 11.7310, but I can't figure out how to match the other two data points.
I am using PCRE (PHP) to create my expression.
You may use
(?:\G(?!^)|\bC2)\s+\K\d*\.?\d+
See the regex demo.
Details
(?:\G(?!^)|\bC2) - either the end of the previous successful match or C2 whole word
\s+ - 1+ whitespaces
\K - match reset operator discarding all the text matched so far in the match memory buffer
\d* - 0+ digits
\.? - an optional dot
\d+ - 1+ digits.

Only match unique string occurrences

We are doing some Data Loss Prevention for emails, but the issue is when people reply to emails multiple times sometimes the credit card number or account number will appear multiple times.
How can we get Java Regex to only match strings once each.
So for example, we are using the following regex to catch account numbers that match 2 letters followed by 5 or 6 numbers. it will also omit CR in either case.
\b(?!CR)(?!cr)[A-Za-z]{2}[0-9]{5,6}\b
How can we have it find:
CX12345
CX14584
JB145888
JD748452
CX12345 (Ignore as its already found it above)
LM45855
Unique string occurrence can be matched with
<STRING_PATTERN>(?!.*<STRING_PATTERN>) // Find the last occurrence
(?<!<STRING_PATTERN>.*)<STRING_PATTERN> // Find the first occurrence, only works in regex
// that supports infinite-width lookbehind patterns
where <STRING_PATTERN> is the pattern the unique occurrence of which one searches for. Note that both will work with the .NET regex library, but the second one is not usually supported by the majority of other libraries (only PyPi Python regex library and the JavaScript ECMAScript 2018 regex support it). Note that . does not match line break chars by default, so you need to pass a modifier like DOTALL (in most libraries, you may add (?s) modifier inside the pattern (only in Ruby (?m) does the same), or use specific flags that you pass to the regex compile method. See more about this in How do I match any character across multiple lines in a regular expression?
You seem to need a regex like this:
/\b((?!CR|cr)[A-Za-z]{2}\d{5,6})\b(?![\s\S]*\b\1\b)/
The regex demo is available here
Details:
\b - a leading word boundary
((?!CR|cr)[A-Za-z]{2}\d{5,6}) - Group 1 capturing
(?!CR|cr) - the next two characters cannot be CR or cr, the negative lookahead check
[A-Za-z]{2} - 2 ASCII letters
\d{5,6} - 5 to 6 digits
\b - trailing word boundary
(?![\s\S]*\b\1\b) - a negative lookahead that fails the match if there are any 0+ chars ([\s\S]*) followed with a word boundary (\b), same value captured into Group 1 (with the \1 backreference), and a trailing word boundary.
I would use a Map of some sort here, to keep tally of the strings which you encounter. For example:
String ccNumber = "CX12345";
Map<String, Boolean> ccMap = new HashMap<>();
if (ccNumber.matches("^(?!CR)(?!cr)[A-Za-z]{2}[0-9]{5,6}$")) {
ccMap.put(ccNumber, null);
}
Then just iterate over the keyset of the map to get unique credit card numbers which matched the pattern in your regex:
for (String key : map.keySet()) {
System.out.println("Found a matching credit card: " + key);
}