PCRE Regex Matching Patterns until next pattern is found - regex

I'm struggling to find a solution to this regex which appears to be fairly straight forward. I need to match a pattern that precedes another matching pattern.
I need to capture the "Mean:" that follows "Keberos-wsfed" in the following:
Kerberos:
Historical:
Between 26 and 50 milliseconds: 10262
Between 50 and 100 milliseconds: 658
Between 101 and 200 milliseconds: 9406
Between 201 and 500 milliseconds: 6046
Between 501 milliseconds and 1 second: 1646
Between 1 and 5 seconds: 1399
Between 6 and 10 seconds: 13
Between 11 and 30 seconds: 34
Between 31 seconds and 1 minute: 7
Between 1 minute and 2 minutes: 1
Mean: 268, Mode: 36, Median: 123
Total: 29472
Kerberos-wsfed:
Historical:
Between 26 and 50 milliseconds: 3151
Between 50 and 100 milliseconds: 129
Between 101 and 200 milliseconds: 650
Between 201 and 500 milliseconds: 411
Between 501 milliseconds and 1 second: 171
Between 1 and 5 seconds: 119
Between 6 and 10 seconds: 4
Between 11 and 30 seconds: 6
Between 1 minute and 2 minutes: 1
Mean: 176, Mode: 33, Median: 37
Total: 4642
I can match (?:Kerberos-wsfed:), I can match Mean: but I must find the value of Mean after Kerberos-wsfed but having difficulty. Thanks for the assistance.

Use the regex
Kerberos-wsfed[\s\S]*?Mean: *(\d+)
The mean value is contained in the capturing group 1, that is $1 or \1 depending on your programming language.
See demo.

Try to use that regular expresion: #Kerberos-wsfed:.+?Mean:\s+(\d+)#s
You can use just space or \s instead of \s+ if you're shure in file format.
Value 176 will be at group 1 of matched elements
Demo: https://regex101.com/r/gwkUPJ/1

Using capturing group:
Kerberos-wsfed:[\s\S]*Mean:\s(\d+)
Kerberos-wsfed: matches the literal as-is
[\s\S]* allows any number of characters between (including line delimitters)
Mean:\s matches the literal Mean followed by a space \s
Finally (\d+) which is wrapped in the first capturing group captures the value you are looking for. It essentially allows any number of digits
Regex 101 Demo
The value that you are looking for (176) will be in the first capturing group which is $1 or the first one based on your language. For instance, in PHP:
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo $matches[0][1];
// Output: 176

Related

I need to fetch all the numbers between 2 spaces after my expression

I would like to extract data from the below sample data using regex
I have tried \d{2}/\d{4} and get the ex: 39/2021.I need to get 23 which is in between 2 spaces. Any numbers between those 2 spaces after my expression.
Sample Data
Backlog 25 567 07/2022 120 2510
39/2021 23 590 08/2022 120 2630
40/2021 120 710 09/2022 120 2750
41/2021 120 830 10/2022 120 2870
42/2021 120 950 11/2022 120 2990
45/2021 120 1070 12/2022 120 3110
47/2021 120 1190 13/2022 120 3230
48/2021 120 1310 14/2022 240 3470
49/2021 120 1430 15/2022 120 3590
50/2021 120 1550 16/2022 120 3710
51/2021 120 1670 17/2022 240 3950
52/2021 120 1790 18/2022 120 4070
02/2022 120 1910 19/2022 120 4190
03/2022 120 2030 20/2022 120 4310
04/2022 120 2150 21/2022 240 4550
05/2022 120 2270 22/2022 120 4670
06/2022 120 2390 23/2022 120 4790
enter image description here
I have added a picture reference for the output.
You can use a capture group, matching a space before the digits and either assert a whitespace boundary after it or match the following space
\b\d{2}/\d{4} (\d+)(?!\S)
The pattern matches:
\b A word boundary
\d{2}/\d{4} Match 2 digits / 4 digits
(\d+) Capture 1+ digits in group 1
(?!\S) Negative lookahead, assert a whitespace boundary to the right
Regex demo
If there should be a space at the left and at the right:
\b\d{2}/\d{4} (\d+)
Regex demo

regExp for HH:mm format including intermediate value

I am trying to compose a regExp that accepts HH:mm time formats, but also accepts all of the intermediate values:
e.g. all of these are accepted:
0
1
12
12:
12:3
12:30
1:
1:3
1:30
For now, I came up with this: ^([\d]{1,2}):?([\d]{1,2})?$
But this accepts any numeric 1/2 digit values for hours and minutes (e.g. 25:66 is acceptable)
So I came relatively close to my goal, but I need to filter out values x>24 from the hours, and x>60 from the minutes?
Try this:
^((?:[01][0-9]?)|(?:2[0-4]?)|(?:[3-9]))(?::((?:[0-5][0-9]?)|(?:60))|:)?$
NOTE:
This accepts 24 for HH and 60 for MM as stated in your question:
but I need to filter out values x>24 from the hours, and x>60 from the minutes?
Thus ff. are accepted:
0
1
12
12:
12:3
12:30
1:
1:3
1:30
1:60
24:60
24:00
00:60
and below are not accepted:
25:30
00:61
Regex DEMO 1
If you want to exclude 24 HH and 60 MM, try this instead:
^((?:[01]\d?)|(?:2[0-3]?))(?::|(?::([0-5][0-9]?)))?$
Regex DEMO 2
Groups (applies to both cases):
\1 = HH
\2 = MM
You are looking for
^(?:[01]\d?|2[0-3]?)(?::(?:[0-5]\d?)?)?$
See the regex demo and the regex graph:
Details:
^ - start of string
(?:[01]\d?|2[0-3]?) - either a 0 or 1 followed by an optional digit, or a 2 followed with an optional 0, 1, 2 or 3
(?::(?:[0-5]\d?)?)? - an optional sequence of patterns:
: - a colon
(?:[0-5]\d?)? - an optional sequence of patterns:
[0-5] - a digit from 1 to 5
\d? - an optional digit
$ - end of string.

Find/Replace regex to rearrange text in Notepad++

I have certain data that I want to rearrange (it's all on the same line) I have tried multiple approaches but I can't get it to work.
Here is an example of the text:
DATA1="8DE" DATA2="322" DATA3="20" DATA4="19.99" DATA5="0.01"
DATA1="FE4" DATA2="222" DATA4="400" DATA3="400" DATA5="0.00"
DATA1="CE3" DATA2="444" DATA4="60" DATA5="0.00" DATA3="60"
DATA1="MME" DATA3="20" DATA4="20" DATA5="0.00"
DATA2="667" DATA4="30" DATA3="30" DATA5="0.00" DATA1="MH4"
This should be the output:
8DE 322 20 19.99 0.01
FE4 222 400 400 0.00
CE3 444 60 60 0.00
MME 20 20 0.00
MH4 667 30 30 0.00
I have tried the following but to no avail:
FIND: DATA1=\"(.*?)\"|DATA2=\"(.*?)\"|DATA3=\"(.*?)\"|DATA4=\"(.*?)\"|DATA5=\"(.*?)\"
REPLACE: \1 \2 \3 \4 \5
and
FIND: DATA1=\"(?<d1>.*?)\"|DATA2=\"(?<d2>.*?)\"|DATA3=\"(?<d3>.*?)\"|DATA4=\"(?<d4>.*?)\"|DATA5=\"(?<d5>.*?)\"
REPLACE: $+{d1} $+{d2} $+{d3} $+{d4} $+{d5}
I would be happy if someone can help or direct me to the right answer (and sorry for any misunderstanding as english is not my first languaje)
The regex
^(?=.*\bDATA1="([^"]+)"\h*)?(?=.*\bDATA2="([^"]+)"\h*)?(?=.*\bDATA3="([^"]+)"\h*)?(?=.*\bDATA4="([^"]+)"\h*)?(?=.*\bDATA5="([^"]+)"\h*)?.*
This regex works by using optional lookaheads to locate DATAx (where x is the number) and capturing the value inside the " into a capture group, then matching the whole line (in order to replace it).
The replacement
$1\t\t$2\t\t$3\t\t$4\t\t$5
This replacement just references the capture groups and adds tab characters between them while reordering them in the order of DATA [1,2,3,4,5].
The result
8DE 322 20 19.99 0.01
FE4 222 400 400 0.00
CE3 444 60 60 0.00
MME 20 20 0.00
MH4 667 30 30 0.00
See it working
See the regex in use here

Regex, replacing for newline with group replace

675185538end432 204 9/9 4709 908 2
343269172end430 3 43 9335 975 7
590144128end89 7 29 3-5-4 420 2
337460105end8Y5 7A 78 2 23
292484648end70 A53 03 9235 93
These are the strings that I am working with. I want to find a regex to replace the above strings as follows
675185538
432 204 9/9 4709 908 2
343269172
430 3 43 9335 975 7
590144128
89 7 29 3-5-4 420 2
337460105
8Y5 7A 78 2 23
292484648
70 A53 03 9235 93
Wherever end comes, \r\n should be introduced.
The string before end is numeric and after end is alphanumeric with whiteline characters.
I am using notepad++.
To make the match strict, try this:
Find: ^(\d+)end(\w)
Replace: \1\r\n\2
This captures, then puts back via back references, the preceding number between start of line and "end" and the following digit/letter. This won't match "end" elsewhere.
Kludgery:
Find (\d\d\d\d\d\d\d\d\d)end(\d)
Replace \1\r\n\2
Find creates two capture groups:
each group is bounded by an ( and a )
one capture group matches exactly nine numerals
the other capture group matches exactly one numeral.
In the replace:
the first capture group is referenced with \1
and the second group with \2.

Regular expression for matching numbers and ranges of numbers

In an application I have the need to validate a string entered by the user.
One number
OR
a range (two numbers separated by a '-')
OR
a list of comma separated numbers and/or ranges
AND
any number must be between 1 and 999999.
A space is allowed before and after a comma and or '-'.
I thought the following regular expression would do it.
(\d{1,6}\040?(,|-)?\040?){1,}
This matches the following (which is excellent). (\040 in the regular expression is the character for space).
00001
12
20,21,22
100-200
1,2-9,11-12
20, 21, 22
100 - 200
1, 2 - 9, 11 - 12
However, I also get a match on:
!!!12
What am I missing here?
You need to anchor your regex
^(\d{1,6}\040?(,|-)?\040?){1,}$
otherwise you will get a partial match on "!!!12", it matches only on the last digits.
See it here on Regexr
/\d*[-]?\d*/
i have tested this with perl:
> cat temp
00001
12
20,21,22
100-200
1,2-9,11-12
20, 21, 22
100-200
1, 2-9, 11-12
> perl -lne 'push #a,/\d*[-]?\d*/g;END{print "#a"}' temp
00001 12 20 21 22 100-200 1 2-9 11-12 20 21 22 100-200 1 2-9 11-12
As the result above shows putting all the regex matches in an array and finally printing the array elements.