Regex to match numbers, commas, dots, spaces in a given string - regex

I'm writing a regex to match any numbers, commas, dots, except when they are at the end of the number.
Here is an example of what I have so far:
/([0-9]+[., ]*)+/
This is pretty good already because it is matching what I want. The only issue is that it's matching ' ' or ',' '.' at the end of the expression too.
Let's say I have this string:
The cost of the food was 1 999,49 € without drinks.
I want to match the 1 999,49 string. Right now my regexp is matching 1 999,49 . The same should happen if the format of the price is different like:
1,999.49 $ => 1,999.49 (with no whitespace or anything in the end)
How can I do this with regular expressions?

You might use a pattern to first match the digits and optionally match either a space, comma or dot followed by 1+ digit so that the dot comma or space can not be at the end.
\d+(?:[,. ]\d+)*
\d+ Match 1+ digits
(?: Non capture group
[,. ]\d+ Match either a space , or . and 1+ digits
)* Close group and repeat 0+ times
Regex demo
A bit more precise match could be
\b\d{1,3}(?:[,. ]\d{3})*(?:[.,]\d{2})?\b
Regex demo

Related

Regex to list strings with no occurrence of special character '.' and having a space

I'm trying to filter out strings in project code which have the following form
'alphanumeric.alphanumeric.alphanumeric.alphanumeric'
(surrounded by quote and has one or more dots between alphanumeric words)
and another regex to find strings with the form
'this is a regular sentence with space'
I'm new to regex and have the following pattern which doesn't work. Which should mean:
(' + anything + . + anything + ')
/'*[^.]*'
I need multiple words with . connecting them.
The pattern that you tried /'*[^.]*' matches a /, then optional occurrences of ' followed by optional chars other than ' and match a ' so a dot can not be matched.
You could use 2 separate patterns matching either a dot or a space at the start of the group and matching alphanumerics [^\W_]+ exluding the underscore from a word character.
'[^\W_]+(?:\.[^\W_]+)+'
Another option is to use a capture group matching either a dot or space and use a backreference in the repetition and match any letter or any number:
'[\p{L}\p{N}]+([.\p{Zs}\t])[\p{L}\p{N}]+(?:\1[\p{L}\p{N}]+)*'
' Match literally
[\p{L}\p{N}]+ Match 1+ alphanumerics
([.\p{Zs}\t])[\p{L}\p{N}]+ Capture group 1, match either . or a space and 1+ alphanumerics
(?:\1[\p{L}\p{N}]+)* Optionally match what is captured in group 1 using the backreference \1 followed by 1+ alphanumerics
' Match literally
Regex demo

Regex for extracting digits in a string not in a word and not separated by a symbol?

I want to extract an ID from a search query but I don't know the length of the ID.
From this input I want to get the numbers that are not in the words and the numbers that are not separated by symbols.
12 11231390 good123e41 12he12o1 1391389 dajue1290a 12331 12-10 1.2 test12.0why 12+12 12*6 2d1139013 09`29 83919 1
Here I want to return
12 11231390 1391389 12331 83919 1
So far I've tried /\b[^\D]\d*[^\D]\b/gm but I get the numbers in between the symbols and I don't get the 1 at the end.
You could repeatedly match digits between whitespace boundaries. Using a word boundary \b would give you partial matches.
Note that [^\D] is the same as \d and would expect at least a single character.
Your pattern can be written as \b\d\d*\d\b and you can see that you don't get the 1 at the end as your pattern matches at least 2 digits.
(?<!\S)\d+(?:\s+\d+)*(?!\S)
The pattern matches:
(?<!\S) Negateive lookbehind, assert a whitespace boundary to the left
\d+(?:\s+\d+)* Match 1+ digits and optionally repeat matching 1+ whitespace chars and 1+ digits.
(?!\S) Negative lookahead, assert a whitspace boundary to the right
Regex demo
If lookarounds are not supported, you could use a match with a capture group
(?:^|\s)(\d+(?:\s+\d+)*)(?:$|\s)
Regex demo

Regex match an optional number of digits

I have a list that could look sort of like
("!Goal 27' Edward Nketiah"),
("!Goal 33' 46' Pierre Emerick-Aubameyang"),
("!Sub Nicolas Pepe"),
("Jordan Pickford"),
and I'm looking to match either !Sub or !Goal 33' 46' or !Goal 27'
Right now I'm using the regex (!\w+\s) which will match !Goal and !Sub, but I want to be able to get the timestamps too. Is there an easy way to do that? There is no limit on the number of timestamps there could be.
As I mentioned in my comment, you can use the following regex to accomplish this:
(!\w+(?:\s\d+')*)
Explanation:
(!\w+(?:\s\d+')*) capture the following
! matches this character literally
\w+ matches one or more word characters
(?:\s\d+')* match the following non-capture group zero or more times
\s match a whitespace character
\d+ matches one or more digits
' match this character literally
Additionally, the first capture group isn't necessary - you can remove it to simply match:
!\w+(?:\s\d+')*
If you need each timestamp, you can use !\w+(\s\d+')* and split capture group 1 on the space character.
If your input always follows the format "bang text blank digits apostrophe blank digits apostrophe etc", then it should be as simple as:
!\w+(?:\s\d+')*
Explanation:
! matches an exclamation mark
\w+ matches 1 or more word-characters (letters, underscores)
(?:…) is a non-capturing group
\s matches a single whitespace character
\d+ matches one or more digits
' matches the apostrophe character
* repeatedly matches the group 0 or more times
this :
(!\w+(?:\s\d+')*)
will capture :
"!Goal 27'"
"!Goal 33' 46'"
"!Sub"

how to match a list of fixed length words separated by space or comma?

The words' length could be 2 or 6-10 and could be separated by space or comma. The word only include alphabet, not case sensitive.
Here is the groups of words that should be matched:
RE,re,rereRE
Not matching groups:
RE,rere,rel
RE,RERE
Here is the pattern that I have tried
((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|\s+)?)
But unfortunately this pattern can match string like this: RE,RERE
Look like the word boundary has not been set.
You could match chars a-z either 2 or 6 - 10 times using an alternation
Then repeat that pattern 0+ times preceded by a comma or a space [ ,].
^(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*$
Explanation
^ Start of string
(?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match chars a-z 6 -10 or 2 times
(?: Non capturing group
[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match comma or space and repeat previous pattern
)* Close non capturing group and repeat 0+ times
$ End of string
Regex demo
If lookarounds are supported, you might also assert what is directly on the left and on the right is not a non whitespace character \S.
(?<!\S)(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[ ,](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*(?!\S)
Regex demo
([a-zA-Z]{2}(,|\s)|[a-zA-Z]{6,10}|(,|\s))
This one will get only the words who have 2 letter, or between 6 and 10
\b,?([a-zA-Z]{6,10}|[a-zA-Z]{2}),?\b
You can use this
^(?!.*\b[a-z]{4}\b)(?:(?:[a-z]{2}|[a-z]{6,10})(?:,|[ ]+)?)+$
Regex Demo
This regex will match your first case, but neither of your two other cases:
^((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|[ ]+|$))+$
I'm making the assumption here that each line should be a single match.
Here it is in action.

Regex: Detect Phone numbers that are separated by dashes (-) and/or spaces

I am trying to recognize these types of phone number inputs:
0172665476
+6265476393
+62-65476393
+62-654-76393
+62 65476393
While my regex: (?:\d+\s*)+ can recognize the 1st 2 sample values, it recognizes the last 3 sample values as multiple matches in each line, instead of recognizing the number as a whole.
How can I modify this to support multiple dashes and/or spaces and still recognize it as 1 whole number instead of multiple matches?
You may use this regex:
^\+?\d+(?:[\s-]\d+)*\b
RegEx Details:
^\+?: Match optional + at start
\d+: match 1+ digits
(?:[\s-]\d+)*: Match 0 or more groups that start with whitespace or - followed by 1+ digits
$: End (Replaced by word boundary as if there are trailing spaces, that match would be missed.)
This should work:
(?:[\d +-]+)+
This would work as per your reqt: (If there are trailing spaces, this regex will ignore.)
Regex: '^(?:[\d +-]+)\b'
Another option could be to use an alternation to match either 10 digits without a leading plus sign or match the pattern with a +, and optional space or hyphen:
(?:\d{10}|\+\d{2}[- ]?\d{3}-?\d{5})\b
That will match:
(?: Non capturing group
\d{10} Match 10 digits
| Or
\+\d{2}[-\s]?\d{3}-?\d{5} Match +, 2 digits, optional whitespace char or -, 3 digits, optional -, 5 digits
)\b Close non capturing group and word boundary
Regex demo
If your language supports negative lookbehinds you could prepend (?<!\S) which checks that what comes before is not a non-whitespace character.