I would like to match an exact number in a string, but my regex keeps matching the exact number if it repeats together.
I have the following string:
SomePrefix1201-21,4,52
And I have the following regex to find a match for 21:
SomePrefix[\d]+-[,\d]*21[,$]*
It will match this string fine.
However, it also matches:
SomePrefix1201-2121,4,52
But I only want it to match if it is the exact number.
The number may exist at the end too, so it is not always following by a comma.
I've been racking my brain like anything
Update
Based on the corrected answer below, I managed to find the exact regex I need, with one addition of a lookahead too.
SomePrefix[\d]+-([\d]*,)*21(?!\d)[,$]*
The [,\d]* part matches any number of digits and commas in any order. What you probably wanted was ([\d]*,)* so that any preceding digits and commas must end in a comma (not a digit, which would become a part of the number).
SomePrefix[^-]+-(\d+,)*(21,|21$)
Match the prefix, followed by one or more non-dash characters, then a dash, then zero or more comma-terminated digit fields, followed either by 21, (and possibly more material) or just 21 anchored to the end.
If the comma-terminated fields can be empty, then of course \d* rather than \d+.
It's not clear that you can widely use the anchor operator $ inside a character class (perhaps some regex implementations have this feature), so I distributed it out into two matches for 21, which looks clear. The 21 can be factored out of this:
(21,|21$) -> 21(,|$)
Related
I have a string that the following structure:
ABCD123456EFGHIJ78 but sometimes it's missing a number or a character like:
ABC123456EFGHIJ78 or
ABCD123456E or
ABCD12345EFGHIJ78
etc.
That's why I need regular expressions.
What I want to extract is the first letter of the third group, in this case 'E'.
I have the following regex:
(\D+)+(\d+)+(\D{1})\3
but I don't get the letter E.
This seems to work for the example cases you provided.
^(?:[A-Za-z]+)(?:\d+)(.)
It assumes that the first group is only letters and that the second group is only digits.
There's already a nice answer.
But for the records, your initial proposal was very close to work. You just needed to say that the character matching the 3rd group can repeat several times by adding a star:
^(\D+)(\d+)(\D{1})\3*
The main weakness is that \D matches any char except digits, so also spaces. Making it more robust leads us to explicit the range of chars accepted:
^([A-Za-z]+)(\d+)([A-Za-z]{1})\3*
It's much better, but my favourite uses \w to match at the end of the pattern any non white character:
([A-Za-z]+)(\d+)([A-Za-z]{1})\w*
I'm trying to improve with regex as I'm tired of constantly having to look up existing solutions instead of creating my own. Having a bit of difficulty understanding why this isn't working though:
Trying to extract both phone numbers from the following string (numbers and address are random):
+1-541-754-3010 156 Alphand_St. <J Steeve>\n 133, Green, Rd. <E Kustur> NY-56423 ;+1-541-914-3010\n"
So I'm using the following expression:
/\+(.+)(?:\s|\b)/
These are the matches I'm getting back:
1-541-754-3010 156 Alphand_St.
1-541-914-3010
So I'm getting the last one correctly, but not the first one. Based on the expression, it should match anything from between a + and a space/boundary. But for some reason it's not stopping at the space after the first number. Am I going about this the wrong way?
In the format you provided for the search string, and since you are starting with a literal "+", I would just include the next following string of decimals and separators, like the hyphen:
/\+([0-9\-]+)/
Your ".+" says to match everything until there's a \s. However that also includes \s on the way to the \s.
Remember that dashes - are not word characters, so \b will match between, for example, 1- and -5 and so on. Also, your current regex is greedy - it'll try to match as many characters as it can with the repeated ., which is why it goes all the way to the end of the first line (because after the last character in the line matches \b). Making it lazy (with .+?) wouldn't fix it, though, because then it would terminate right after the 1 in 1-541 (because between 1- is a word boundary)
Try using a character set of digits and - instead:
\+([\d-]+)
https://regex101.com/r/ktbcHJ/1
I'm trying to detect a price in regex with this:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
This covers:
12
12.5
12.50
12,500
12,500.00
But if I pass it
12..50 or 12.5.0 or 12.0.
it still returns a match on the 12 . I want it to negate the entire string and return no match at all if there is more than one period in the entire string.
I've been trying to get my head around negative lookaheads for an hour and have searched on Stack Overflow but can't seem to find the right answer. How do I do this?
What you are looking for, is this:
^\d+(,\d{3})*(\.\d{1,2})?$
What it does:
^ Start of Line
\d+ one or more Digits followed by
(,\d{3})* zero, one or more times a , followed by three Digits followed by
(\.\d{1,2})? one or zero . followed by one or two Digits followed by
$ End of Line
This will only match valid Prices. The Comma (,) is not obligatory in this Regex, but it will be matched.
Look here: http://www.regextester.com/?fam=98001
If you work with Prices and want to store them in a Database I recommend saving them as INT. So 1,234,56 becomes 123456 or 1,234 becomes 123400. After you matched the valid price, all you have to do is to remove the ,s, split the Value by the Dot, and fill the Value of [1] with str_pad() (STR_PAD_RIGHT) with Zeros. This makes Calculations easier, in special when you work with Javascript or other different Languages.
Your regex:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
Note: The regex you provided does not seem to work for 12 (without "."). Since you didn't add a quantifier after \., it tries to match that pattern literally (.).
While there are multiple ways to solve this and the most "correct" answer will depend on your specific requirements, here's a regex that will not match 12..1, but will match 12.1:
(^\-?[0-9]+(?:,[0-9]+)?(?:\.[0-9]+))+
I surrounded the entire regex you provided in a capturing group (...), and added a one or more quantifier + at the end, so that the entire regex will fail if it does not satisfy that pattern.
Also (this may or may not be what you want), I modified the inner groups into non-capturing groups (?: ... ) so that it does not return unnecessary groups.
This site offers a deconstruction of regexes and explains them:
For the regex provided: https://regex101.com/r/EDimzu/2
Unit tests: https://regex101.com/r/EDimzu/2/tests (Note the 12 one's failure for multiple languages).
You can limit it by requiring there is only 0 or 1 periods like this:
^[0-9,]+[\.]{0,1}?[0-9,]+$
I need some assistance constructing a regular expression in a ColdFusion application. I apologize if this has been asked. I have searched, but I may not be asking for the correct thing.
I am using the following to search an email subject line for an issue number:
reMatchNoCase("[0-9]{5}", mailCheck.subject)
The issue number contains only numeric values, and should be exactly 5 digits. This is working except in cases where I have a longer number that appears in the string, such as 34512345. It takes the first 5 digits of that string as a valid issue number as well.
What I want is to retrieve only 5 digit numbers, nothing shorter or longer. I am then placing these into a list to be looped over and processed. Do I perhaps need to include spaces before and after in the regex to get the desired result?
Thank you.
The general way to exclude content from occurring before/after a match is to use negative lookbehind before the match and a negative lookahead afterwards. To do this for numeric digits would be:
(?<!\d)\d{5}(?!\d)
(Where \d is the shorthand for [0-9])
CF's regex supports lookaheads, but unfortunately not lookbehinds, so that wouldn't work directly in rematch - however that probably doesn't matter in this case because it's likely that you don't want, for example, abc12345 to match either - so what you more likely want is:
\b\d{5}\b
Where \b is a "word boundary" - roughly, it checks for a change between a "word character" and a non-word character (or visa versa) - so in this case the first \b will check that there is NOT one of [a-zA-Z0-9_] before the first digit, and the second \b will check that there isn't one after the fifth digit. A \b does not append any characters to the match (i.e. it is a zero-width assertion).
Since you're not dealing with case, you don't need the nocase variable and can simply write:
rematch( '\b\d{5}\b' , mailCheck.subject )
The benefit of this over simply checking for spaces is that the result is five digits (no need to trim), but the downside is that it would match values such as [12345] or 3.14159^2 which are probably not what you want?
To check for spaces, or the start/end of the string, you can do:
rematch( '(?:^| )\d{5}(?= |$)' , mailCheck.subject )
Then use trim on each result to remove spaces.
If that's not what you're after, go ahead and provide more details.
I want a regular expression to match a string that may or may not start with plus symbol and then contain any number of digits.
Those should be matched
+35423452354554
or
3423564564
This should work
\+?\d+
Matches an optional + at the beginning of the line and digits after it
EDIT:
As of OP's request of clarification: 3423kk55 is matched because so it is the first part (3423). To match a whole string only use this instead:
^\+?\d+$
It'll look something like this:
\+?\d+
The \+ means a literal plus sign, the ? means that the preceding group (the plus sign) can appear 0 or 1 times, \d indicates a digit character, and the final + requires that the preceding group (the digit) appears one or more times.
EDIT: When using regular expressions, bear in mind that there's a difference between find and matches (in Java at least, though most regex implementations have similar methods). find will find the substring somewhere in the owning string, and matches will try to match the entire string against the pattern, failing if there are extra characters before or after. Ensure you're using the right method, and remember that you can add a ^ to force the beginning of the line and a $ to force the end of the line (making the entire thing look like ^\+?\d+$.
Simple ^\+?\d+$
Start line, then 1 or 0 plus signs, followed by at least 1 digit, then end of lnie
A Perl regular expression for it could be: \+?\d+