How to match a whole string if certain conditions are met

How to match a whole string if certain conditions are met - regex

Im working a lot with trying to isolate sizes from strings, however i have come into some issues.
Current:
https://regex101.com/r/zbEtOU/1
Current regex
^([a-z]+\d*(?:\s*-\s*[a-z\d]+[/-][a-z\d]+)?|\d+)
Examples:
30/32
Fixed 8 (32-36)
XS/S
m/l
1-2Y
s/m
0-3M
32
Desired result:
I want to isolate the first value from, but when i encounter parentheses i want to match on those values.
So actual desired outcome from the examples:
30/32 = 30
Fixed 8 (32-36) = 32
XS/S = XS
m/l = m
1-2Y = 1-2Y (im guessing there is no way to output "1Y" in this case? Else it would overlap with 1-2M causing confusion as 1 != 1 in this case. When this happens I would prefer to get the original string) ideal case = 1Y
s/m = s
1-3M = 1-3M (im guessing there is no way to output "1M" in this case? Else it would overlap with 1-2Y causing confusion as 1 != 1 in this case. When this happens I would prefer to get the original string)
ideal case = 1M
32 = 32
I'm really out of my bounds on solving this as there is a lot of different conditions here!
All regex is run insensitive, so no need to worry about capital letters.
Anyone got a nice and easy way to solve my issue??
Everything needs to be captured in Group 1 - else my system cant isolate it
Run in Python 3.7

You can use
(?:^|.*\()(\d+(?:-\d+[A-Za-z]{1,3})?|[A-Za-z]{1,3})\b
See the regex demo.
Details:
(?:^|.*\() - start of string or any zero or more chars other than line break chars as many as possible, and then a ( char
(\d+(?:-\d+[A-Za-z]{1,3})?|[A-Za-z]{1,3}) - Group 1:
\d+(?:-\d+[A-Za-z]{1,3})? - one or more digits, followed with an optional occurrence of a -, one or more digits, and then one to three ASCII letters
| - or
[A-Za-z]{1,3} - one, two or three ASCII letters
\b - a word boundary.

Related

Regex: How to match a range of characters except another range

I'm trying to create a regex filter to satisfy:
1) The 1st character should be a lower-case letter or a number
2) The rest of the characters should be a single character between index 32 and 126
3) However, none of the characters should be upper case letters or _
My current regex is:
^[a-z0-9][ -~]*$
This solves 1) and 2) above - but I struggle to include 3) above in the right way. Any help is appreciated.

A simple way is to add a negative lookahead for what you don't want.
^[a-z0-9](?!.*[A-Z_])[ -~]*$
But it's also possible to just split up the ranges, based on the ascii-table
^[a-z0-9][ -#\[-^`-~]*$
It's just a bit less easy to understand at a first glance.

Regex for validation of a street number

I'm using an online tool to create contests. In order to send prizes, there's a form in there asking for user information (first name, last name, address,... etc).
There's an option to use regular expressions to validate the data entered in this form.
I'm struggling with the regular expression to put for the street number (I'm located in Belgium).
A street number can be the following:
1234
1234a
1234a12
begins with a number (max 4 digits)
can have letters as well (max 2 char)
Can have numbers after the letter(s) (max3)
I came up with the following expression:
^([0-9]{1,4})([A-Za-z]{1,2})?([0-9]{1,3})?$
But the problem is that as letters and second part of numbers are optional, it allows to enter numbers with up to 8 digits, which is not optimal.
1234 (first group)(no letters in the second group) 5678 (third group)
If one of you can tip me on how to achieve the expected result, it would be greatly appreciated !

You might use this regex:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
where:
\d{1,4} - 1-4 digits
([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|) - optional group, which can be
[a-zA-Z]{1,2}\d{1,3} - 1-2 letters + 1-3 digits
or
[a-zA-Z]{1,2} - 1-2 letters
or
empty

\d{0,4}[a-zA-Z]{0,2}\d{0,3}
\d{0,4} The first groupe matches a number with 4 digits max
[a-zA-Z]{0,2} The second groupe matches a char with 2 digit in max
\d{0,3} The first groupe matches a number with 3 digits max

You have to keep the last two groups together, not allowing the last one to be present, if the second isn't, e.g.
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
or a little less optimized (but showing the approach a bit better)
^\d{1,4}(?:[a-zA-z]{1,2}(?:\d{1,3})?)?$
As you are using this for a validation I assumed that you don't need the capturing groups and replaced them with non-capturing ones.
You might want to change the first number check to [1-9]\d{0,3} to disallow leading zeros.

Thank you so much for your answers ! I tried Sebastian's solution :
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
And it works like a charm ! I still don't really understand what the ":" stand for, but I'll try to figure it out next time i have to fiddle with Regex !
Have a nice day,
Stan

The first digit cannot be 0.
There shouldn't be other symbols before and after the number.
So:
^[1-9]\d{0,3}(?:[a-zA-Z]{1,2}\d{0,3})?$
The ?: combination means that the () construction does not create a matching substring.
Here is the regex with tests for it.

regex find match within the first n items

I have a string of 8 separated hexadecimal numbers, such as:
3E%12%3%1F%3E%6%1%19
And I need to check if the number 12 is located within the first 4 set of numbers.
I'm guessing this shouldn't be all that complex, but my searches turned up empty. Regular expressions are always a trouble for me, but I don't have access to anything else in this scenario. Any help would be appreciated.

^([^%]+%){0,3}12%
See it in action
The idea is:
^ - from the start
[^%]+% - match multiple non % characters, followed by a % character
{0,3} - between 0 and 3 of those
12% - 12% after that

Here you go
^([^%]*%){4}(?<=.*12.*)
This will match both the following if that is what is intended
1%312%..
1%123%..
Check the solution if %123% is matched or not
If the number 12 should stand on its own then use
^([^%]*%){4}(?<=.*\b12\b.*)

Regex newbie: How to isolate 'num-num-num' in a string

I'm sure this is a super simple question for many of you, but I've only just started learning regex and at the moment can't for the life of me isolate what I'm after from the following:
June 2015 - Won / Void / Lost = 3-0-1
I need a solution to isolate the 'num-num-num' part at the end of the string that would work for any positive integers.
Thanks for any help
EDIT
So this line of code from a scrapy spider I'm writing produces the line above:
tips_str = sel.xpath('//*[#class="recent-picks"]//div[#class="title3"]/text()').extract()[0]
I've tried to isolate the part I'm after with:
tips_str = sel.xpath('//*[#class="recent-picks"]//div[#class="title3"]/text()').re(r'\d+-\d+-\d+$').extract()[0]
No luck though :(

The regex to capture that is:
\d+-\d+-\d+$
It works as follows:
\d+- means: capture 1 or more digits (the numbers [0-9]), and then a "-".
$ means: you should now be at the end of the line.
Translating that into the full regex pattern:
Capture 1 or more digits, then a hyphen, then 1 or more digits, then a hyphen, then 1 or more digits, and we should now be at the end of the string.
EDIT: Addressing your edits and comments:
I'm not so sure what you mean by "isolate". I'll assume that you mean you want tips_str to equal "3-0-1".
I believe the easiest way would be to first use xpath extract the string for the entire line without doing any regex. Then, when we're simply dealing with a string (instead of xpath stuff), it should be nice and easy to use regex and get the pattern out.
As far as I understand, sel.xpath('//*[#class="recent-picks"]//div[#class="title3"]/text()').extract()[0] (without .re()) is providing you with the string: "June 2015 - Won / Void / Lost = 3-0-1".
So then:
full_str = sel.xpath('//*[#class="recent-picks"]//div[#class="title3"]/text()').extract()[0]
Now that we've got the full string, we can use standard string regex to pluck the part we want out:
tips_str = false
search = re.search(r'\d+-\d+-\d+$', full_str)
if(search):
tips_str = search.group(0)
Now tips_str will equal "3-0-1". If the pattern wasn't matched at all, it'd instead equal false.
If any of my assumptions are wrong then let me know what's actually happening (like if .extract()[0] isn't giving back a string, then what is it giving back?) and I'll try to adjust this response.

Any and all numbers, so negatives, scientific notation, etc? This will match it.
/(\-?[\.\d]+(e\+|e\-)?[\.\d]*)-(\-?[\.\d]+(e\+|e\-)?[\.\d]*)-(\-?[\.\d]+(e\+|e\-)?[\.\d]*)$/ig
Tested with these:
June 2015 - Won / Void / Lost = -1.1e+3-1.01-0.1e+2
June 2015 - Won / Void / Lost = 1-2-3
June 2015 - Won / Void / Lost = 0.1--5-5.6
If you take $ out if it, it will match on all lines at the same time.

Create shortest possible regex

I want to create a regex that will match any of these values
7-5
6-6 ((0-99) - (0-99))
6-4
6-3
6-2
6-1
6-0
0-6
1-6
2-6
3-6
4-6
the 6-6 example is a special case, here are some examples of values:
6-6 (23-8)
6-6 (4-25)
6-6 (56-34)
Is it possible to make one regex that can do this?
If so, is it possible to further extend that regex for the 6-6 special case such that the the difference between the two numbers within the parentheses is equal to 2 or -2?
I could easily write this with procedural code, but i'm really curious if someone can devise a regex for this.
Lastly, if it could be further extended such that the individual digits were in their own match groups I'd be amazed. An example would be for 7-5, i could have a match group that just had the value 7, and another that had the value 5. However for 6-6 (24-26) I'd like a match group that had the first six, a match group for the second 6, a match group for the 24 and a match group for the 26.
This may be impossible, but some of you can probably get this part of the way there.
Good luck, and thanks for the help.

NO. The answer is "We can't," and the reason is because you're trying to use a hammer to dig a hole.
The problem with writing one long "clever" (this word causes a knee-jerk reaction in many people who are far more anti-regex than I) regex is that, six months from now, you'll have forgotten those clever regex features that you used so heavily, and you'll have written six months worth of code related to something else, and you'll get back to your impressive regex and have to tweak one detail, and you'll say, "WTF?"
This is what (I understand) you want, in Perl:
# data is in $_
if(/7-5|6-[0-4]|[0-4]-6|6-6 \((\d{1,2})-(\d{1,2})\)/) {
if($1 and $2 and abs($1 - $2) == 2) {
# we have the right difference
}
}
Some might say that the given regex is a bit much, but I don't think it's too bad. If the \d{1,2} bit is a little too obscure you could use \d\d? (which is what I used at first, but didn't like the repetition).

You can do it like this:
7-5|6-[0-4]|[0-5]-6|6-6 \(\d\d?-\d\d?\)
Just add parens to get your match groups.

Off the top of my head (there may be some errors but the principle should be good):
\d-\d|6-6 (\d+-\d+)
And like with any regexp, you can surround what you want to extract with parentheses for match groups:
(\d)-(\d)|(6)-(6) ((\d)+-(\d+))
In the 6-6 case, the first two parentheses should get the sixes, and the second two should get the multi-digit values that come afterwards.

Here is one that will match only the numbers you want and let you get each digit by name:
p = r'(?P<a>[0-4]|6|7)-(?P<b>[0-4]|6|5) *(\((?P<c>\d{1,2})-(?P<d>\d{1,2})\))?'
To get each digit you could use:
values = re.search(p, string).group('a', 'b', 'c', 'd')
Which will return a four element tuple with the values you are looking for (or None if no match was found).
One problem with this pattern is that it will patch the stuff in the parenthesis whether or not there was a match to '6-6'. This one will only match the final parenthesis if 6-6 is matched:
p = r'(?P<a>[0-4]|(?P<tmp_a>6)|7)-(?P<b>(?(tmp_a)(?P<tmp_b>6)|([0-4]|5)))(?(tmp_b) *(\((?P<c>\d{1,2})-(?P<d>\d{1,2})\))?)'
I don't know of any way to look for a difference between the numbers in the parenthesis; regex only knows about strings, not numerical values . . .
(I am assuming python syntax here; the perl syntax is slightly different, though perl supports the python way of doing things.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to match a whole string if certain conditions are met - regex

Related

Regex: How to match a range of characters except another range

Regex for validation of a street number

regex find match within the first n items

Regex newbie: How to isolate 'num-num-num' in a string

Create shortest possible regex

Categories

Resources