I am trying to generate regular expression in java to parse financial entities from strings. I need to write a regex in such a way that numbers ending with "." or "," should be removed, like
15,
15.
where as if values like
15,303(currency )
15.55(rate)
should be taken.
This should do it:
/^\d+[,.]$/
You can play with it here.
You might be looking for something like:
(\d+)[\.,][^\d]
Where the group captures digits followed by . or , and not continuing with other digit.
\d+(\.|,)\d+ for your should be taken values
To remove such numbers (example in Python, but should work in nearly any regex flavor):
>>> import re
>>> regex = re.compile(r"\d+[.,](?!\d)")
>>> regex.sub("", "15 15,0 15, 15. 15.0 15")
'15 15,0 15.0 15'
To find only "correct" numbers:
>>> regex = re.compile(r"\d+(?:[.,]\d+)?(?![\d.,])\b")
>>> regex.findall("15 15,0 15, 15. 15.0 15")
['15', '15,0', '15.0', '15']
Related
I have a list of strings each telling me after how many iterations an algorithm converged.
string_list = [
"Converged after 1 iteration",
"Converged after 20 iterations",
"Converged after 7 iterations"
]
How can I extract the number of iterations? The result woudl be [1, 20, 7]. I tried with regex. Apparently (?<=after )(.*)(?= iteration*) will give me anything in between after and iteration but then this doesn't work:
occursin(string_list[1], r"(?<=after )(.*)(?= iteration*)")
There's a great little Julia package that makes creating regexes easier called ReadableRegex, and as luck would have it the first example in the readme is an example of finding every integer in a string:
julia> using ReadableRegex
julia> reg = #compile look_for(
maybe(char_in("+-")) * one_or_more(DIGIT),
not_after = ".",
not_before = NON_SEPARATOR)
r"(?:(?<!\.)(?:(?:[+\-])?(?:\d)+))(?!\P{Z})"
That regex can now be broadcast over your list of strings:
julia> collect.(eachmatch.(reg, string_list))
3-element Vector{Vector{RegexMatch}}:
[RegexMatch("1")]
[RegexMatch("20")]
[RegexMatch("7")]
To extract information out of a regex, you want to use match and captures:
julia> convergeregex = r"Converged after (\d+) iteration"
r"Converged after (\d+) iteration"
julia> match(convergeregex, string_list[2]).captures[1]
"20"
julia> parse.(Int, [match(convergeregex, s).captures[1] for s in string_list])
3-element Vector{Int64}:
1
20
7
\d+ matches a series of digits (so, the number of iterations here), and the parantheses around it indicates that you want the part of the string matched by that to be placed in the results captures array.
You don't need the lookbehind and lookahead operators (?<=, ?=) here.
How to select for example:
0
0,5
2,5
3,5
But not:
3,6
3,52
In python, it's simple:
import re
PATTERN = re.compile(r'(?<![\d.,])([0-2]([.,]\d+)?|3([.,]([0-4]\d*|50*))?)(?![\d,.])')
number_str = input()
if PATTERN.match(number_str) is not None:
print('Do something')
else:
print('It\'s not a match')
[0-2] will match the first digit and natural numbers between 0 and 2.
[.,] will match the separator between the cases.
\d+ will match any natural number.
([.,]\d+)? will match if exists, a separator followed by any natural number.
Please take a look of the following links (they may help you with regex):
re module
regex101 helper
While it's possible (demo link):
\b
(?<!-)
(?:
(?:[0-3](?!,))
|
(?:[0-2],\d+)
|
(?:3,(?:5(?!\d)|[0-4]\d*))
)
\b
don't use it - convert the numbers to floats and compare them programatically.
It would probably be better iterating over a list of the numbers and using list comprehension of some sort to extract what you need;
nums = [0, 0.5, 2.5, 3.5, 2.6, 8.4, 9.1, 7.5]
my_nums = [num for num in nums if num%0.5 == 0]
print(my_nums)
>>> [0, 0.5, 2.5, 3.5, 7.5]
I am trying to extract 10 digit phone numbers from string. In some cases the numbers are separated by space after 2 or 5 digits. How do I merge such numbers to get the final count of 10 digits?
mystr='(R) 98198 38466 (some Text) 9702977470'
import re
re.findall('\d+' , mystr)
Close, but not correct:
['98198', '38466', '9702977470']
Expected Results:
['9819838466', '9702977470']
I can write python code to concat '98198' and '38466', but I will like to know if regular expression can be used for this.
You could remove the non-digits first.
>>> mydigits = re.sub(r'\D', '', mystr)
>>> mydigits
'98198384669702977470'
>>> re.findall(r'.{10}', mydigits)
['9819838466', '9702977470']
If all the separators are one character long, this would work.
>>> re.findall(r'(?:\d.?)+\d', mystr)
['98198 38466', '9702977470']
Of course, this includes the non-digit separators in the match. A regex findall can only return some number of slices of the input string. It cannot modify them.
These are easy to remove afterwards if that's a problem.
>>> [re.sub(r'\D', '', s) for s in _]
['9819838466', '9702977470']
In some cases numbers are separated by space after 2 or 5 digits.
You can use the regex:
\b(?:\d{2}\s?\d{3}|\d{5}\s)\d{5}\b
For example, this regular expression will match all of these:
01 23456789
01234 56789
0123456789
I doubt if you can achieve it just by a regex pattern alone. May be just use a pattern to get 10+ digits and spaces and then clean out its spaces programmatically. The below pattern should work as long as you are sure of there being some text between the phone nos.
[\d ]{10,}
credit goes to commenter jsonharper
\d{2} ?\d{3} ?\d{5}
I want to remove numbers (integers and floats) from a character vector, preserving dates:
"I'd like to delete numbers like 84 and 0.5 but not dates like 2015"
I would like to get:
"I'd like to delete numbers like and but not dates like 2015"
In English a quick and dirty rule could be: if the number starts with 18, 19, or 20 and has length 4, don't delete.
I asked the same question in Python and the answer was very satisfying (\b(?!(?:18|19|20)\d{2}\b(?!\.\d))\d*\.?).
However, when I pass the same regex to grepl in R:
gsub("[\b(?!(?:18|19|20)\d{2}\b(?!\.\d))\d*\.?]"," ", "I'd like to delete numbers like 84 and 0.5 but not dates like 2015")
I get:
Error: '\d' is an unrecognized escape in character string starting ""\b(?!(?:18|19|20)\d"
As I mentioned in my comments, the main points here are:
regex pattern should be placed outside the character class to be treated as a sequence of subpatterns and not as separate symbols inside the class
the backslashes must be doubled in R regex patterns (since it uses C strings where \ is used to escape entities like \n, \r, etc)
and also you need to use perl=T with patterns featuring lookarounds (you are using lookaheads in yours)
Use
gsub("\\b(?!(?:18|19|20)\\d{2}\\b(?!\\.\\d))\\d*\\.?\\d+\\b"," ", "I'd like to delete numbers like 84 and 0.5 but not dates like 2015", perl=T)
See IDEONE demo.
To search and replace in R you can use:
gsub("\\b(?!(?:18|19|20)\\p{Nd}{2}\\b(?!\\.\\p{Nd}))\\p{Nd}*\\.?", "replacement_text_here", subject, perl=TRUE);
I want to match a number between 2-16, spanning 1 digit to 2 digits.
Regular-Expressions.info has examples for 1 or 2 digit ranges, but not something that spans both:
The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99.
Something like ^[2-9][1-6]$ matches 21 or even 96! Any help would be appreciated.
^([2-9]|1[0-6])$
will match either a single digit between 2 and 9 inclusive, or a 1 followed by a digit between 0 and 6, inclusive.
With delimiters (out of habit): /^([2-9]|1[0-6])$/
The regex itself is just: ^([2-9]|1[0-6])$
Use the python package regex_engine for generating regular expressions for numerical ranges
You can install this package with pip.
pip install regex-engine
from regex_engine import generator
generate = generator()
regex = generate.numerical_range(2, 16)
print(regex)
^([2-9]|1[0-6])$
You can also generate regexes for floating point and negative ranges.
from regex_engine import generator
generate = generator()
regex1 = generate.numerical_range(5, 89)
regex2 = generate.numerical_range(81.78, 250.23)
regex3 = generate.numerical_range(-65, 12)
^([2-9]|1[0-6])$
(Edit: Removed quotes for clarification.)
Just replace your input formatters with
inputFormatters: [
FilteringTextInputFormatter(
RegExp(
r'^([0-9]|[2-8][0-9]|1[0-9]|9[0-9]|[2-8][0-9][0-9]|1[1-9][0-9]|10[0-9]|9[0-8][0-9]|99[0-9]|[2-4][0-9][0-9][0-9]|1[1-9][0-9][0-9]|10[1-9][0-9]|100[0-9]|500[0-0])$'),
allow: true,
)
],
(^[2-9]$|^1[0-6]$)
By specifying start and stop for each set of numbers you are looking for your regex won't also return 36, 46, ... and so on. I tried the above solution and found that this works best for staying within the range of 2-16.