Regex for checking address house number - regex

I'm using the following expression to validate a house number:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
Now the requirement has changed to the following constraints:
one number (25)
one number w/ one letter (25A)
one number w/ a second one divided by a hyphen (25-32)
one number w/ a second one divided by a hyphen and one letter w/ blank (25-32 A)
How do I validate these w/ changes to the regex above?

If you only want to match those values, you might use a pattern to match 1 or more digits followed by an optional part that matches either A-Z OR a hyphen and 1+ digits optionally followed by a space and a char A-Z
^\d+(?:[A-Z]|-\d+(?: [A-Z])?)?$
^ Start of string
\d+ Match 1+ digits
(?: Non capture group
[A-Z] Match a char A-Z
| Or
-\d+ Match
(?: [A-Z])?
)? Close group and make it optional
$ End of string
Regex demo

Related

Regex for combination of alphnumeric letters which has at least 2 uppercase letter or 1 number?

I need a regex for combination of numbers and uppercase letters and maybe lowercase letters and /,- characters, which contains at least 4 characters.
But of course it should contain at least 2 uppercase letter or one number.
I tried this:
barcode_regex = r"(?=(?:.+[A-Z]))(?=(?:.+[0-9]))([a-zA-Z0-9/-]{4,})"
For example match such cases as follows:
ametFXUT0
G197-6STK
adipiscXWWFHH
A654/9023847
HYJ/54GFJ
hgdy67h
You could use a single lookahead to assert at least 4 characters, and the match either a single digit or 2 uppercase chars in the allowed ranges.
^(?=.{4})(?:[A-Za-z/,-]*\d|(?:[a-z\d/,-]*[A-Z]){2})[A-Za-z\d/,-]*$
Explanation
^ Start of string
(?=.{4}) Assert 4 charcters
(?: Non capture group
[A-Za-z/,-]*\d Match optional allowed characters without a digit, then match a digit
| Or
(?:[a-z\d/,-]*[A-Z]){2} Match 2 times optional allowed characters withtout an uppercase char, then match an uppercase char
) Close non capture group
[A-Za-z\d/,-]* Match optional allowed characters
$ End of string
See a regex demo.
You could use two lookaheads combined via an alternation to check for 2 uppercase or 1 number:
^(?:(?=.*[A-Z].*[A-Z])|(?=.*\d))[A-Za-z0-9/-]+$
Demo
This regex patterns says to:
^
(?:
(?=.*[A-Z].*[A-Z]) assert that 2 or more uppercase are present
| OR
(?=.*\d) assert that at least one digit is present
)
[A-Za-z0-9/-]+ match any alphanumeric content (plus forward slash or dash)
$

Regex (Has between 1 and 4 digits and as many characters as possible)

Hi trying to create a regex that ensures you have between 1 and 4 number of digits and also as many characters as possible
Here's what I have written so far ^([A-Za-z]+([0-9]){1,4}$)
This doesnt allow me to have characters after the digits
You might repeat the whole part 1-4 times and match optional trailing chars a-z
^(?:[A-Za-z]*[0-9]){1,4}[A-Za-z]*$
The pattern matches
^ Start of string
(?: Non capture group
[A-Za-z]*[0-9]){1,4} Repeat matching 1-4 times optionalchars a-z and a single digit
[A-Za-z]* Optionally repeat char A-Za-z
$ End of string
Regex demo

Regex start with any number and it should end without zeros

I am trying to create a Regex with groups that should group 1234.0500- to 1234.05-.
What I have tried is:
^([0-9]+)(\.)([1-9]*)0*(-?)$
but it does not match 1234.0500-. Here is the example https://regex101.com/r/koSZoB/1. The regex should also group
1234.0000
0.9000
to
1234
0.9
In your pattern, this part ([1-9]*)0*(-?)$ matches optional digits 1-9 followed by optional zeroes and then an optional hyphen at the end of the string. It will succeed until the first zero:
0500
^
But the match will fail as it can not match (-?)$
You could use 3 capturing groups and use those in the replacement.
After group 1, you could either match a dot followed by only zeroes which should be removed, or capture in group 2 matching from the dot till the lats digits 1-9 and remove the trailing zeroes.
^(\d+)(?:\.0+|(\.\d*[1-9])0+)(-?)$
Explanation
^ Start of string
(\d+) Capture group 1, match 1+ digits
(?: Non capture group, match either
\.0+ Match a . and 1+ zeroes
| Or
(\.\d*[1-9])0+ Capture ., 0+ digits followed by a digit 1-9 and match the following 1+ zeroes to be removed
) Close group
(-?) Capture optional -
$ End of string
Regex demo
There is no language tagged, but for example in Javascript
const pattern = /^(\d+)(?:\.0+|(\.\d*[1-9])0+)(-?)$/;
[
"1234.0500-",
"1234.05500-",
"1234.0550588500-",
"1234.0000",
"0.9000",
"12.1222",
"12.1222-",
].forEach(s => console.log(s.replace(pattern, "$1$2$3")));
The third capture group doesn't include zeroes meaning that the 0 in 05 is making the match fail.
I would suggest making the third capture group non-greedy by adding a ?: ^([0-9]+)(\.)([0-9]*?)0*(-?)$ This will make it match the minimum amount of zeroes possible instead of the maximum. With the last group being greedy it should work.

How can I extract non digit characters and digit characters in the end of a string?

I have a string that has the following structure:
digit-word(s)-digit.
For example:
2029 AG.IZTAPALAPA 2
I want to extract the word(s) in the middle, and the digit at the end of the string.
I want to extract AG.IZTAPALAPA and 2 in the same capture group to extract like:
AG.IZTAPALAPA 2
I managed to capture them as individual capture groups but not as a single:
town_state['municipality'] = town_state['Town'].str.extract(r'(\D+)', expand=False)
town_state['number'] = town_state['Town'].str.extract(r'(\d+)$', expand=False)
Thank you for your help!
Yo can use a single capturing group for the example string to match a single "word" that consists of uppercase chars A-Z with an optional dot in the middle which can not be at the start or end followed by 1 or more digits.
\b\d+ ([A-Z]+(?:\.[A-Z]+)* \d+)\b
Explanation
\b A word boundary
\d+
( Capture group 1
[A-Z]+ Match 1+ occurrences of an uppercase char A-Z
(?:\.[A-Z]+)* \d+ Repeat 0+ times matching a dot and a char A-Z followed by matching 1+ digits
) Close group 1
\b A word boundary
Regex demo
Or you can make the pattern a bit broader matching either a dot or a word character
\b\d+ ([\w.]+(?: [\w.]+)* \d+)\b
Regex demo
You can use the following simple regex:
[0-9]+\s([A-Z]+.[A-Z]+(?: [0-9]+)*)
Note:
(?: [0-9]+)* will make it the last digital optional.

Regex to pull first two fields from a comma separated file

I want to pull the second string in a commma delimited list where the first value is numeric and the second is alpha.
I'm using \d[^,]+(?=,) to pull the numeric value in the first field and just need help with pulling the second value from the "Name" column.
Here's part of a sample file that I'm trying to extract data from:
Address Number,Name,Employee Master Exist(Y/N),Auto-Deposit Exists(Y/N),Supplier Master Exists(Y/N),Supplier Master Created,ACH Account Exists(Y/N),ACH Account Created,ACH Same as Auto-deposit(Y/N)
//line break here is for clarity and does not exist in file//
4398,Presley Elvis Aaron,Y,N,Y,N,Y,N,N
10154,Shepard Alan Barrett,Y,Y,Y,N,Y,N,N
You could make use of a capturing group if you want to match the second string by first matching 1+ digits and a comma.
Then capture in a group matching 1+ chars a-zA-Z and match the trailing comma.
^\d+,([a-zA-Z]+(?: [a-zA-Z]+)*),
^ Start of string
\d+, Match 1+ digits and a comma (Or use (\d+), if the digits should also be a group)
( Capture group 1
[a-zA-Z]+ Match 1+ chars a-zA-Z
(?: [a-zA-Z]+)* Repeat matching the same as previous preceded by a space
), Close capturing group and match trailing comma
Regex demo
To get a bit broader match you could use this pattern to match at least a single char a-zA-Z
\d+,([a-zA-Z ]*[a-zA-Z][a-zA-Z ]*),
Regex demo
Note that this part in your pattern \d[^,]+ matches not only digits, but 1 digit followed by 1+ times any char except a comma which would for example also match 4a$ .
You could try this regex:
^\d+,([^,]+),
This will look for lines:
starting with one or more digits
followed by a comma
capture anything that is not a comma
followed by a comma
See it at Regex 101
If not all lines contain a name, then change the + to a *:
^\d+,([^,]*),
See alternative regex