How to match this expression with regex? - regex

I have a text with some lines (200+) in this format:
10684 - The jackpot ? discuss Lev 3 --- ? ---
10755 - Garbage Heap ? discuss Lev 5 --- ? ---
I hant to retrieve the first number (10684 or 10755) only if number after "Lev" is greater than 3.
I'm able to get the first number with this regex: ([0-9]+) - but without the 'level' restrictions.
How this could be made?
Thanks in advance.

(\d+) - .*?Lev (?:[4-9]|[1-9]\d+)
The first \d+ matches line number as you have done.
The next .*? is a lazy quantifier, which will not consume too many characters. And the following expression will guide it to the right place. (lazy quantifier is usually more efficient)
The second parenthesis, (?:[4-9]|[1-9]\d+), matches either single digital numbers greater than 3 or two digital numbers without leading zero.
Alright stackoverflow doesn't properly show my image. Take this link : http://regexr.com?36n5l
Example Output:

Regular expressions doesn't recognize numbers as numbers (only strings). You can do this though:
([0-9]+) - .*Lev (?:[4-9][^0-9]|[1-9][0-9]+)
Basically, we use the alternation operator (|) to accept only a single digit greater than 3 (enforced by checking that the following character is not a digit) or a multi-digit number not beginning with a zero.
In case that level number might be the end of the line, though, you might have to do this:
([0-9]+) - .*Lev (?:[4-9](?:[^0-9]|$)|[1-9][0-9]+)
(I'm assuming whatever regex engine you're using can't handle lookaround assertions. In the future, try to always include what language you're using when you're asking a regex question.)
Ah, I just read your edit that the number is always less than 10. Well, that's much easier then:
([0-9]+) - .*Lev [4-9]

A lookahead is really the best thing because it will leave just the number:
/\d+(?=.*Lev (0*[4-9]|[1-9]\d))/

A bit of Awk trickery:
awk -F '\? +discuss +Lev' '$2>3 { split($1,a,/ */); print a[1] }' file

In bash use this:
var=">3"
perl -lne '/(\d+) - .*Lev (\d+)/; print $1 if $2'"$var"
This is a good solution to be able to pass the condition by parameter.

Related

Is it possible to negate a group in a regular expression?

Let's say that we have this text:
2020-09-29
2020-09-30
2020-10-01
2020-10-02
2020-10-12
2020-10-16
2020-11-12
2020-11-23
2020-11-15
2020-12-01
2020-12-11
2020-12-30
I want to do something like this:
\d\d\d\d-(NOT10)-(30)
So i want to get all dates of any year, but not of the 10th month and it is important, that the day is 30.
I tried a lot to do this using negative lookahead asserations but i did not come up with any working regexes.
You can use negative lookaheads:
\d\d\d\d-(?!10)\d\d-30
The Part (?!10) ensures that no 10 follows at the point where it is inserted into the regex. Notice that you still need to match the following digits afterwards, thus the \d\d part.
Generally speaking you can not (to my knowledge) negate a part that then also matches parts of the string. But with negative lookaheads you can simulate this as I did above. The generalized idea looks something like:
(?!<special-exclusion-pattern>)<general-inclusion-pattern>
Where the special-exclusion-pattern matches a subset of the general-inclusion-pattern. In the above case the general inclusion pattern is \d\d and the special exclusion pattern ins 10.
Try :
/20\d{2}-(?:0[1-9]|1[12])-30/
Explanation :
20\d{2} it will match 20XX
(?:0[1-9]|1[12]) it will match 0X or 11, 12
30 it will match 30
Demo :https://regex101.com/r/O2F1eV/1
It's easiest to simply convert the substring (if present) that matches /^\d{4}-10-30$/ to an empty string, then split the resulting string on one or more newlines.
If your string were
2020-10-16
2020-10-30
2020-11-12
2020-11-23
and was held by the variable str, then in Ruby, for example,
str.sub(/^\d{4}-10-30$/,'')
#=> "2020-10-16\n\n2020-11-12\n2020-11-23\n"
so
str.sub(/^\d{4}-10-30$/,'').split
#=> ["2020-10-16", "2020-11-12", "2020-11-23"]
Whatever language you are using undoubtedly has similar methods.

How to create a matching regex pattern for "greater than 10-000-000 and lower than 150-000-000"?

I'm trying to make
09-546-943
fail in the below regex pattern.
​^[0-9]{2,3}[- ]{0,1}[0-9]{3}[- ]{0,1}[0-9]{3}$
Passing criteria is
greater than 10-000-000 or 010-000-000 and
less than 150-000-000
The tried example "09-546-943" passes. This should be a fail.
Any idea how to create a regex that makes this example a fail instead of a pass?
You may use
^(?:(?:0?[1-9][0-9]|1[0-4][0-9])-[0-9]{3}-[0-9]{3}|150-000-000)$
See the regex demo.
The pattern is partially generated with this online number range regex generator, I set the min number to 10 and max to 150, then merged the branches that match 1-8 and 9 (the tool does a bad job here), added 0? to the two digit numbers to match an optional leading 0 and -[0-9]{3}-[0-9]{3} for 10-149 part and -000-000 for 150.
See the regex graph:
Details
^ - start of string
(?: - start of a container non-capturing group making the anchors apply to both alternatives:
(?:0?[1-9][0-9]|1[0-4][0-9]) - an optional 0 and then a number from 10 to 99 or 1 followed with a digit from 0 to 4 and then any digit (100 to 149)
-[0-9]{3}-[0-9]{3} - a hyphen and three digits repeated twice (=(?:-[0-9]{3}){2})
| - or
150-000-000 - a 150-000-000 value
) - end of the non-capturing group
$ - end of string.
This expression or maybe a slightly modified version of which might work:
^[1][0-4][0-9]-[0-9]{3}-[0-9]{3}$|^[1][0]-[0-9]{3}-[0-9]{2}[1-9]$
It would also fail 10-000-000 and 150-000-000.
In this demo, the expression is explained, if you might be interested.
This pattern:
((0?[1-9])|(1[0-4]))[0-9]-[0-9]{3}-[0-9]{3}
matches the range from (0)10-000-000 to 149-999-999 inclusive. To keep the regex simple, you may need to handle the extremes ((0)10-000-000 and 150-000-000) separately - depending on your need of them to be included or excluded.
Test here.
This regex:
((0?[1-9])|(1[0-4]))[0-9][- ]?[0-9]{3}[- ]?[0-9]{3}
accepts (space) or nothing instead of -.
Test here.

Tricky regex validation

I need to validate string with 2 groups which are separated with one space with next rules:
Each group needs to be at least 2 character long but less or equal to 15
Both groups together can't be more than 20 chars long (not counting space)
Groups can only contain letters (that's simple, it's [a-zA-Z])
Following these rules, here are some examples
Firstname Lastname (Valid)
Somename T (Invalid, 2nd one is <2)
Somethingsomettt Here (Invalid, first one is > 15)
Somethingsome Somethingsome (Invalid, total > 20)
It'd be simple [a-zA-Z]{2,15} [a-zA-Z]{2,15} if it wasn't for that 2+2<=total<=20 condition.
Is it even possible to limit it this way? If it is - how?
UPDATE
Just for the sake of it, resulting regex was supposed to be ^(?=[a-zA-Z ]{5,21}$)[a-zA-z]{2,15} [a-zA-Z]{2,15}$, #vks was closest one to it. Nevertheless, thanks #popovitsj and #Avinash Raj too.
^(?=.{5,21}$)[a-zA-Z]{2,15} [a-zA-Z]{2,15}$
Try this.See demo.
http://regex101.com/r/nA6hN9/30
This can be done with lookahead. Something like this:
^(?=.{1,20}$)[a-zA-z]{2,14} [a-zA-Z]{2,14}$
You could try the below regex which uses negative lookahead,
(?!^.{22,})^[a-zA-Z]{2,15} [a-zA-Z]{2,15}$
DEMO

*NIX REGEXP number series

Am playing around with regexp's but this is my headache. I have a dynamic number which needs a suffix. The suffix is always 0 to 9, 99 or 999.
Example:
I have the number 461200 and now I want to create an regexp that will match 461200 to 461209. What I've learned it should be ^46120[0-9]$? Is this correct or somewhere to the left of hell?
Ok, let us assume it is correct and I now want to match 461200 - 461299? This is where I get lost.
^4612[0-9]{2}?
It cannot be. I am yet to figure this out.
Any help appreciated.
For 1 digit at the end you need:
^4612[0-9]$
2 digits at the end:
^4612[0-9]{2}$
3 digits at the end:
^4612[0-9]{3}$
The number in braces {} means the number of time the preceding character or set has to be repeated.
Ok, let us assume it is correct and I now want to match 461200 -
461299?
You can either repeat the desired character class by saying [0-9][0-9] or use quantifiers [0-9]{2}.
It can be either:
^4612[0-9][0-9]$
or
^4612[0-9]{2}$
Both would work.
maybe try this regex:
^4612\d{2}$

Regular Expression for a 0.25 interval

My aim is to write a regular expression for a decimal number where a valid number is one of
xx.0, xx.125, xx.25, xx.375, xx.5, xx.625, xx.75, xx.875 (i.e. measured in 1/8ths) The xx can be 0, 1 or 2 digits.
i have come up with the following regex:
^\d*\.?((25)|(50)|(5)|(75)|(0)|(00))?$
while this works for 0.25,0.5,0.75 it wont work for 0.225, 0.675 etc .
i assumed that the '?' would work in a case where there is preceding number as well.
Can someone point out my mistake
Edit : require the number to be a decimal !
Edit2 : i realized my mistake i was confused about the '?'. Thank you.
I would add another \d* after the literal . check \.
^\d*\.?\d*((25)|(50)|(5)|(75)|(0)|(00))?$
I think it would probably just be easier to multiply the decimal part by 8, but you don't consider digits that lead the last two decimals in the regex.
^\d{0,2}\.(00?|(1|6)?25|(3|8)?75|50?)$
Your mistake is: \.? indicates one optional \., not a digit (or anything else, in this case).
About the ? (question mark) operator: Makes the preceding item optional. Greedy, so the optional item is included in the match if possible. (source)
^\d{0,2}\.(0|(1|2|6)?25|(3|6|8)?75|5)$
Regular expressions are for matching patterns, not checking numeric values. Find a likely string with the regex, then check its numeric value in whatever your host language is (PHP, whatever).