Regex start with any number and it should end without zeros - regex

I am trying to create a Regex with groups that should group 1234.0500- to 1234.05-.
What I have tried is:
^([0-9]+)(\.)([1-9]*)0*(-?)$
but it does not match 1234.0500-. Here is the example https://regex101.com/r/koSZoB/1. The regex should also group
1234.0000
0.9000
to
1234
0.9

In your pattern, this part ([1-9]*)0*(-?)$ matches optional digits 1-9 followed by optional zeroes and then an optional hyphen at the end of the string. It will succeed until the first zero:
0500
^
But the match will fail as it can not match (-?)$
You could use 3 capturing groups and use those in the replacement.
After group 1, you could either match a dot followed by only zeroes which should be removed, or capture in group 2 matching from the dot till the lats digits 1-9 and remove the trailing zeroes.
^(\d+)(?:\.0+|(\.\d*[1-9])0+)(-?)$
Explanation
^ Start of string
(\d+) Capture group 1, match 1+ digits
(?: Non capture group, match either
\.0+ Match a . and 1+ zeroes
| Or
(\.\d*[1-9])0+ Capture ., 0+ digits followed by a digit 1-9 and match the following 1+ zeroes to be removed
) Close group
(-?) Capture optional -
$ End of string
Regex demo
There is no language tagged, but for example in Javascript
const pattern = /^(\d+)(?:\.0+|(\.\d*[1-9])0+)(-?)$/;
[
"1234.0500-",
"1234.05500-",
"1234.0550588500-",
"1234.0000",
"0.9000",
"12.1222",
"12.1222-",
].forEach(s => console.log(s.replace(pattern, "$1$2$3")));

The third capture group doesn't include zeroes meaning that the 0 in 05 is making the match fail.
I would suggest making the third capture group non-greedy by adding a ?: ^([0-9]+)(\.)([0-9]*?)0*(-?)$ This will make it match the minimum amount of zeroes possible instead of the maximum. With the last group being greedy it should work.

Related

Regex match specific strings

I want to capture all the strings from multi lines data. Supposed here the result and here’s my code which does not work.
Pattern: ^XYZ/[0-9|ALL|P] I’m lost with this part anyone can help?
Result
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/ALL
XYZ/P1
XYZ/P2,3
XYZ/P4,5-7
XYZ/P1-4,5-7,8-9
Changed to
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/A12345 after the slash limited to 6 alphanumeric chars
XYZ/LH-1234567890 after the /LH- limited to 10 numeric chars
The pattern could be:
^XYZ\/(?:ALL|P?[0-9]+(?:-[0-9]+)?(?:,[0-9]+(?:-[0-9]+)?)*)$
The pattern in parts matches:
^ Start of string
XYZ\/ Match XYX/ (You don't have to escape the / depending on the pattern delimiters)
(?: Outer on capture group for the alternatives
ALL Match literally
| Or
P? Match an optional P
[0-9]+(?:-[0-9]+)? Match 1+ digits with an optional - and 1+ digits
(?: Non capture group to match as a whole
,[0-9]+(?:-[0-9]+)? Match ,and 1+ digits and optional - and 1+ digits
)* Close the non capture group and optionally repeat it
) Close the outer non capture group
$ End of string
Regex demo
You can use this regex pattern to match those lines
^XYZ\/(?:P|ALL|[0-9])[0-9,-]*$
Use the global g and multiline m flags.
Btw, [P|ALL] doesn't match the word "ALL".
It only matches a single character that's a P or A or L or |.

Regular Expression, optional character on multiple locations but result must contain at least once

I am performing a string search where I am looking for the following three strings:
XXX-99-X
XXX-99X
XXX99-X
So far I have:
([A-Z]{3}(-?)[0-9]{2}(-?)[A-Z]{1})
How do I enforce that - has to be present at least once in either of the two possible locations?
You might use an alternation, to match either a - and optional - at the left or - at the right part.
Note that you can omit {1} from the pattern.
^[A-Z]{3}(?:-[0-9]{2}-?|[0-9]{2}-)[A-Z]$
^[A-Z]{3}
(?: Non capture group
-[0-9]{2}-?|[0-9]{2}- Match either - 2 digits and optional - Or 2 digits and -
) Close non capture group
$ end of string
regex demo
Or use a positive lookahead to assert a - at the right
^(?=[^-\r\n]*-)[A-Z]{3}-?[0-9]{2}-?[A-Z]$
^ Start of string
(?=[^-\r\n]*-) Positive lookahead, assert a - at the right
[A-Z]{3}-? Match 3 chars A-Z and optional -
[0-9]{2}-? Match 2 digits and optional -
[A-Z] Match a single char A-Z
$ End of string
Regex demo
With your shown samples, please try following.
^[A-Z]{3}(?:-?\d{2}-|-\d{2})[A-Z]+$
online demo for above regex
Explanation: Adding detailed explanation for above.
^[A-Z]{3} ##Matching if value starts with 3 alphabets here.
(?: ##Starting a non capturing group here.
-?\d{2}- ##Matching -(optional) followed by 2 digits followed by -
|
-\d{2} ##Matching dash followed by 2 digits.
) ##Closing very first capturing group.
[A-Z]+$ ##Matching 1 or more occurrences of capital letters at the end of value.

Regex - add a zero after second period

I have the following example of numbers, and I need to add a zero after the second period (.).
1.01.1
1.01.2
1.01.3
1.02.1
I would like them to be:
1.01.01
1.01.02
1.01.03
1.02.01
I have the following so far:
Search:
^([^.])(?:[^.]*\.){2}([^.].*)
Substitution:
0\1
but this returns:
01 only.
I need the 1.01. to be captured in a group as well, but now I'm getting confuddled.
Does anyone know what I am missing?
Thanks!!
You may try this regex replacement with 2 capture groups:
Search:
^(\d+\.\d+)\.([1-9])
Replacement:
\1.0\2
RegEx Demo
RegEx Details:
^: Start
(\d+\.\d+): Match 1+ digits + dot followed by 1+ digits in capture group #1
\.: Match a dot
([1-9]): Match digits 1-9 in capture group #2 (this is to avoid putting 0 before already existing 0)
Replacement: \1.0\2 inserts 0 just before capture group #2
You could try:
^([^.]*\.){2}\K
Replace with 0. See an online demo
^ - Start line anchor.
([^.]*\.){2} - Negated character 0+ times (greedy) followed by a literal dot, matched twice.
\K - Reset starting point of reported match.
EDIT:
Or/And if \K meta escape isn't supported, than see if the following does work:
^((?:[^.]*\.){2})
Replace with ${1}0. See the online demo
^ - Start line anchor.
( - Open 1st capture group;
(?: - Open non-capture group;
`Negated character 0+ times (greedy) followed by a literal dot.
){2} - Close non-capture group and match twice.
) - Close capture group.
Using your pattern, you can use 2 capture groups and prepend the second group with a dot in the replacement like for example \g<1>0\g<2> or ${1}0${2} or $10$2 depending on the language.
^((?:[^.]*\.){2})([^.])
^ Start of string
((?:[^.]*\.){2}) Capture group 1, match 2 times any char except a dot, then match the dot
([^.].*) Capture group 2, match any char except a dot
Regex demo
A more specific pattern could be matching the digits
^(\d+\.\d+\.)(\d)
^ Start of string
(\d+\.\d+\.) Capture group 1, match 2 times 1+ digits and a dot
(\d) Capture group 2, match a digit
Regex demo
For example in JavaScript
const regex = /^(\d+\.\d+\.)(\d)/;
[
"1.01.1",
"1.01.2",
"1.01.3",
"1.02.1",
].forEach(s => console.log(s.replace(regex, "$10$2")));
Obviously, there will be tons of solutions for this, but if this pattern holds (i.e. always the trailing group that is a single digit)... \.(\d)$ => \.0\1 would suffice - to merely insert a 0, you don't need to match the whole thing, only just enough context to uniquely identify the places targeted. In this case, finding all lines ending in a . followed by a single digit is enough.

How can I extract non digit characters and digit characters in the end of a string?

I have a string that has the following structure:
digit-word(s)-digit.
For example:
2029 AG.IZTAPALAPA 2
I want to extract the word(s) in the middle, and the digit at the end of the string.
I want to extract AG.IZTAPALAPA and 2 in the same capture group to extract like:
AG.IZTAPALAPA 2
I managed to capture them as individual capture groups but not as a single:
town_state['municipality'] = town_state['Town'].str.extract(r'(\D+)', expand=False)
town_state['number'] = town_state['Town'].str.extract(r'(\d+)$', expand=False)
Thank you for your help!
Yo can use a single capturing group for the example string to match a single "word" that consists of uppercase chars A-Z with an optional dot in the middle which can not be at the start or end followed by 1 or more digits.
\b\d+ ([A-Z]+(?:\.[A-Z]+)* \d+)\b
Explanation
\b A word boundary
\d+
( Capture group 1
[A-Z]+ Match 1+ occurrences of an uppercase char A-Z
(?:\.[A-Z]+)* \d+ Repeat 0+ times matching a dot and a char A-Z followed by matching 1+ digits
) Close group 1
\b A word boundary
Regex demo
Or you can make the pattern a bit broader matching either a dot or a word character
\b\d+ ([\w.]+(?: [\w.]+)* \d+)\b
Regex demo
You can use the following simple regex:
[0-9]+\s([A-Z]+.[A-Z]+(?: [0-9]+)*)
Note:
(?: [0-9]+)* will make it the last digital optional.

Is it possible to compare two values in a row and fetch the desired one, but both the values matches the regex written

text = "Happy 4/20 from the team! 13/10 congrats..after so many contents"
I want to fetch only 13/10 which is the rating. I have written regex
(\d+\.\d+|\d+)/(((?=10)10)|([1-9]\d+))
but it fetches the first one(4/20).
Is this possible to achieve using regex?
In this part of your pattern (?=10)10 you can omit the positive lookahead because that says if what is on the right is 10, then match 10. This part [1-9]\d+ matches 10 and above so 10 is already in the range.
You could use a capturing group with a quantifier {2} to repeat that group.
Your pattern can also be written as \d+(?:\.\d+)?/[1-9]\d+)
To get the second group, you might use:
^(?:.*?(\d+(?:\.\d+)?/[1-9]\d+)){2}
^ Start of the string
(?: Non capturing group
.*? Match any char non greedy
( Capturing group
\d+(?:\.\d+)? Match 1+ digits, optionally match a dot and 1+ digits
/ Match /
[1-9]\d+ Match 10 and above
) Close capturing group
){2} Close non capturing group and repeat 2 times
Regex demo