I have some manually entered data (it's an email subject), and I am trying to extract the correct ID to perform a series of actions with RPA on.
RE:'HC=312-822-281' abc2-1234567 7354612
I have a regex query:
(?<!\d)\d{7}(?!\d)
I want to extract 7354612 but not 1234567.
I want to avoid matching any 7-digit number that is preceded with a hyphen, or a hyphen and a space.
My initial query works 80% of the time, but this hyphen issue is interfering with the other 20%.
You can modify the existing (?<!\d) lookbehind to also exclude the position after a hyphen, i.e. (?<![\d-]), and add another lookbehind to exclude the hyphen + space context ((?<!- ) or (?<!-\s)):
(?<![\d-])(?<!- )\d{7}(?!\d)
(?<![\d-])(?<!-\s)\d{7}(?!\d)
Note \s matches any whitespace. See the regex demo.
Details
(?<![\d-]) - a negative lookbehind that fails the match if there is a digit or a hyphen immediately to the left of the current location
(?<!-\s) - a negative lookbehind that fails the match if there is a - and a space after it immediately to the left of the current location
\d{7} - any seven digits
(?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location.
Variations
With PCRE regex, you may also use
-\s*\d{7}(?!\d)(*SKIP)(*F)|(?<!\d)\d{7}(?!\d)
See the regex demo, where -\s*\d{7}(?!\d)(*SKIP)(*F)| matches -, 0+ spaces, seven digits after which there are no more digits and skips that match, only returning matches for the (?<!\d)\d{7}(?!\d) pattern.
In .NET, modern JavaScript and PyPi regex in Python, you may use
(?<!\d|-\s*)\d{7}(?!\d)
See this regex demo. Here, (?<!\d|-\s*) negative lookbehind fails the match if there is a digit or - + 0 or more whitespace chars immediately to the left of the current position.
Related
Have used an online regex learning site (regexr) and created something that works but with my very limited experience with regex creation, I could do with some help/advice.
In IIS10 logs, there is a list for time, date... but I am only interested in the cs(User-Agent) field.
My Regex:
(scan\-\d+)(?:\w)+\.shadowserver\.org
which matches these:
scan-02.shadowserver.org
scan-15n.shadowserver.org
scan-42o.shadowserver.org
scan-42j.shadowserver.org
scan-42b.shadowserver.org
scan-47m.shadowserver.org
scan-47a.shadowserver.org
scan-47c.shadowserver.org
scan-42a.shadowserver.org
scan-42n.shadowserver.org
scan-42o.shadowserver.org
but what I would like it to do is:
Match a single number with the option of capturing more than one: scan-2 or scan-02 with an optional letter: scan-2j or scan-02f
Append the rest of the User Agent: .shadowserver.org to the regex.
I will then add it to an existing URL Rewrite rule (as a condition) to abort the request.
Any advice/help would be very much appreciated
Tried:
To write a regex for IIS10 to block requests from a certain user-agent
Expected:
It to work on single numbers as well as double/triple numbers with or without a letter.
(scan\-\d+)(?:\w)+\.shadowserver\.org
Input Text:
scan-2.shadowserver.org
scan-02.shadowserver.org
scan-2j.shadowserver.org
scan-02j.shadowserver.org
scan-17w.shadowserver.org
scan-101p.shadowserver.org
UPDATE:
I eventually came up with this:
scan\-[0-9]+[a-z]{0,1}\.shadowserver\.org
This is explanation of your regex pattern if you only want the solution, then go directly to the end.
(scan\-\d+)(?:\w)+
(scan\-\d+) Group1: match the word scan followed by a literal -, you escaped the hyphen with a \, but if you keep it without escaping it also means a literal - in this case, so you don't have to escape it here, the - followed by \d+ which means one more digit from 0-9 there must be at least one digit, then the value inside the group will be saved inside the first capturing group.
(?:\w)+ non-capturing group, \w one character which is equal to [A-Za-z0-9_], but the the plus + sign after the non-capturing group (?:\w)+, means match the whole group one or more times, the group contains only \w which means it will match one or more word character, note the non-capturing group here is redundant and we can use \w+ directly in this case.
Taking two examples:
The first example: scan-02.shadowserver.org
(scan\-\d+)(?:\w)+
scan will match the word scan in scan-02 and the \- will match the hyphen after scan scan-, the \d+ which means match one or more digit at first it will match the 02 after scan- and the value would be scan-02, then the (?:\w)+ part, the plus + means match one or more word character, at least match one, it will try to match the period . but it will fail, because the period . is not a word character, at this point, do you think it is over ? No , the regex engine will return back to the previous \d+, and this time it will only match the 0 in scan-02, and the value scan-0 will be saved inside the first capturing group, then the (?:\w)+ part will match the 2 in scan-02, but why the engine returns back to \d+ ? this is because you used the + sign after \d+, (?:\w)+ which means match at least one digit, and one word character respectively, so it will try to do what it is asked to do literally.
The second example: scan-2.shadowserver.org
(scan\-\d+)(?:\w)+
(scan\-\d+) will match scan-2, (?:\w)+ will try to match the period after scan-2 but it fails and this is the important point here, then it will go back to the beginning of the string scan-2.shadowserver.org and try to match (scan\-\d+) again but starting from the character c in the string , so s in (scan\-\d+) faild to match c, and it will continue trying, at the end it will fail.
Simple solution:
(scan-\d+[a-z]?)\.shadowserver\.org
Explanation
(scan-\d+[a-z]?), Group1: will capture the word scan, followed by a literal -, followed by \d+ one or more digits, followed by an optional small letter [a-z]? the ? make the [a-z] part optional, if not used, then the [a-z] means that there must be only one small letter.
See regex demo
I need to extract numbers like 2.268 out of strings that contain the word output:
Approxmiate output size of output: 2.268 kilobytes
But ignore it in strings that don't:
some entirely different string: 2.268 kilobytes
This regex:
(?:output.+?)([\d\.]+)
Gives me a match with 1 group, with the group being 2.268 for the target string. But since I'm not using a programming language but rather CloudWatch Log Insights, I need a way to only match the number itself without using groups.
I could use a positive lookbehind ?<= in order to not consume the string at all, but then I don't know how to throw away size of output: without using .+, which positive lookbehind doesn't allow.
With your shown samples, please try following regex.
output:\D+\K\d(?:\.\d+)?
Online demo for above regex
Explanation: Adding detailed explanation for above.
output:\D+ ##Matching output colon followed by non-digits(1 or more occurrences)
\K ##\K to forget previous matched values to make sure we get only further matched values in this expression.
\d(?:\.\d+)? ##Matching digit followed by optional dot digits.
Since you are using PCRE, you can use
output.*?\K\d[\d.]*
See the regex demo. This matches
output - a fixed string
.*? - any zero or more chars other than line break chars, as few as possible
\K - match reset operator that removes all text matched so far from the overall match memory buffer
\d - a digit
[\d.]* - zero or more digits or periods.
I am attempting to pick apart data from the following string utlizing a regex expression:
Ethane, C2 11.7310 3.1530 13.9982 HV, Dry # Base P,T 1432.00
The ultimate goal is to be able to pull out the middle three data points as individual values 11.7310, 3.153, 13.9982
The code expression I am working with at the moment is as follows:
(?<=C2 )(\d*\.?\d+)
This yields a full match of 11.7310 and a Group 1 match of 11.7310, but I can't figure out how to match the other two data points.
I am using PCRE (PHP) to create my expression.
You may use
(?:\G(?!^)|\bC2)\s+\K\d*\.?\d+
See the regex demo.
Details
(?:\G(?!^)|\bC2) - either the end of the previous successful match or C2 whole word
\s+ - 1+ whitespaces
\K - match reset operator discarding all the text matched so far in the match memory buffer
\d* - 0+ digits
\.? - an optional dot
\d+ - 1+ digits.
I'm using regex in powershell 5.1.
I need it to detect groups of numbers, but ignore groups followed or preceeded by /, so from this it should detect only 9876.
[regex]::matches('9876 1234/56',‘(?<!/)([0-9]{1,}(?!(\/[0-9])))’).value
As it is now, the result is:
9876
123
6
More examples: "13 17 10/20" should only match 13 and 17.
Tried using something like (?!(\/([0-9]{1,}))), but it did not help.
You may use
\b(?<!/)[0-9]+\b(?!/[0-9])
See the regex demo
Alternatively, if the numbers can be glued to text:
(?<![/0-9])[0-9]+(?!/?[0-9])
See this regex demo.
The first pattern is based on word boundaries \b that make sure there are no letters, digits and _ right before and after an expected match. The second one just makes sure there are no digits and / on both ends of the match.
Details
(?<![/0-9]) - a negative lookbehind making sure there is no digit or / immediately to the left of the current location
[0-9]+ - one or more digis
(?!/?[0-9]) - a negative lookahead making sure there is no optional / followed with a digit immediately to the right of the current location.
I want to apply regex on a string to get alphanumeric value and the value should not start with the RUN substring followed with any digit, e.g. RUN123456.
Below is the regex I am using to get alphanumeric value
regex='[A-Z]{2,}[_0-9a-zA-Z]*'
Sample Input:
CY0PNI94980 Production AutoSys Job has failed. Call 249-3344. EC=54. RUN130990.
The matches can include CY0PNI94980 and EC, but not RUN130990.
Kindly help me on this.
You may match the strings matching your pattern excluding all those starting with RUN and a digit:
\b(?!RUN[0-9])[A-Z]{2,}[_0-9a-zA-Z]*
See the regex demo
If you do not care if you match Unicode letters or digits or not, you may contract [A-Za-z0-9_] with \w and use
\b(?!RUN[0-9])[A-Z]{2,}\w*
Details
\b - a word boundary
(?!RUN[0-9]) - a negative lookahead that fails the match if there is RUN and any ASCII digit immediately to the right of the current location
[A-Z]{2,} - 2 or more uppercase ASCII letters
[_0-9a-zA-Z]* / \w* - 0 or more word chars (letters/digits/_).