I need extract all characters or numbers prior to a word using regex
Case 1
Sample Data 1
11.01.2022 Belegdatum
I need to get the date (11.01.2022) before Belegdatum
Case 2
Sample Data 2
532,53 0,00 0,00 532,53 EUR 0034906 38436 DEMMMM
Sample Data 3
532,53 0,00 0,00 4567,00 EUR 0034906 38436 DEMMMM
I need to get the data 532,53 and EUR, means search for currency (EUR) and get the amount (532,53) or (4567,00) in case of sample data 3. The number can vary from 3 digits to 5 digits before the comma(,)
Thank you in advance for the great support.
For both cases, you need to use positive lookahead
case 1:
(\d{2}\.\d{2}\.\d{4})(?=\s?Belegdatum)
you can change the regex date to whatever date formate you have
case 2:
considering there are two digits after the comma, the regex to match samples 2 and 3 will be:
(\d{3,5},\d{2})(?=\s?EUR)
We've a "street_number" field which has been freely filed over the years that we want to format. Using regular expressions, we'd like to to extract the real "street_number", and the "street_number_suffix".
Ex: 17 b, "street_number" would be 17, and "street_number_suffix" would be b.
As there's a dozen of different patterns, I'm having troubles to tune the regular expression correctly. I consider using 2 different regexes, one to extract the "street_number", and another to extract the "street_number_suffix"
Here's an exhaustive set of patterns we'd like to format and the expected output:
# Extract street_number using PCRE
input street_number street_number_suffix
19-21 19 null
2 G 2 G
A null A
1 bis 1 bis
3 C 3 C
N°10 10 null
17 b 17 b
76 B 76 B
7 ter 7 ter
9/11 9 null
21.3 21 3
42 42 null
I know I could invoke an expressions that matches any digits until a hyphen using \d+(?=\-).
It could be extended to match until a hyphen OR a slash using \d+(?=\-|\/), thought, once I include \s to this pattern, 21 from 19-21 will match. Adding conditions may no be that simple, which is why I ask your help.
Could anyone give me a helping hand on this ? If it can help, here's a draft: https://regex101.com/r/jGK5Sa/4
Edit: at the time I'm editing, here's the closest regex I could find:
(?:(N°|(?<!\-|\/|\.|[a-z]|.{1})))\d+
Thought the full match of N°10 isn't 10 but N°10 (and our ETL doesn't support capturing groups, so I can't use /......(\d+)/)
To get the street numbers, you could update the pattern to:
(?<![-/.a-z\d])\d+
Explanation
(?<! Negative lookbehind
[-/.a-z\d] Match any of the listed using a charater class
) Close the negative lookbehind
\d+ Match 1+ digits
Regex demo
I have a requirement to restrict the numbers between 3.00 to 100.00
I used below expression
^([3-9]|[1-9][0-9]|100)+(\.\d{1,2})?$
The issue with above expression is that, it's allowing 100.01 to 100.99,which should be restricted.It also allows 310 to 399,which needs to restricted.
I used another flavor of same expression
^([3-9]|[1-9][0-9]|100.00)+(\.\d{1,2})?$
Which was working as expected,but we need to enter 100.00 in to pass the regular expression instead of 100.
Is there any way,I can achieve the desired result?
When alternating with the final 100, use negative lookahead for \.\d?[1-9], to ensure that the decimal places, if any, have only 0s.
Your first pattern can also match many repeated digits before the optional decimal (like 333 and 101010) due to the + at the end of the group, so best to remove the + if you only want to match between 3 and 100.
^(?:[3-9]|[1-9][0-9]|100(?!\.\d?[1-9]))(?:\.\d{1,2})?$
^^^^^^^^^^^^^^
https://regex101.com/r/tJd3LQ/1
To permit leading zeros, add 0* right after the ^:
^0*(?:[3-9]|[1-9][0-9]|100(?!\.\d?[1-9]))(?:\.\d{1,2})?$
^^
You can try this mate
^(?:100(?:\.0+)?|(?:[3-9]|[1-9][0-9])(?:\.\d{1,2})?)$
Demo
Explanation
^ - Anchor represent start of string.
(?:100(?:\.0+)?) - This will match 100, 100.0, 100.00(any number of decimal zero's).
| - Alternation this works same as Logical OR.
(?:[3-9]|[1-9][0-9])(?:\.\d{1,2})?) - This will match any number from 3.00 to 99.99
Suggestion
Always use non capturing group in case you're not using the group any where else again in your regex.
The problem with this: ^([3-9]|[1-9][0-9]|100)+(\.\d{1,2})?$, is that it also allows 100 in the first chunk (the chunk responsible for allowing whole numbers).
In your case, you would need to use something like so: ^100\.00$|([3-9]|[1-9][0-9])+(\.\d{1,2})?$ (Example here).
This expression will either try to match 100.00 as a whole (which is your upper bound, or else, any number between 3.00 up till 99.99.
Maybe this regex will work for You:
^([3-9]|[1-9][0-9])(\.[0-9]+)?$|^100(\.0+)?$
Test:
$ cat numeric.txt
0.0
3
3.0
3.00
3.001
2.99
99.99
99.999
100
100.0
100.00
100.01
100.99
$ egrep '^([3-9]|[1-9][0-9])(\.[0-9]+)?$|^100(\.0+)?$' numeric.txt
3
3.0
3.00
3.001
99.99
99.999
100
100.0
100.00
I need to create a regex which captures all different types of currency such as:
£1
$100,000,000
€25.00
25p (pence)
25c (cents)
25m (million)
25bn (billion)
25 million
25 billion
£0.25
Currently, I've got the following:
^(([^0]{1})([0-9])*|(0{1}))(\.\d{2})?$
This works for:
£100
200 -- don't want this to be included
$200
€400
350 -- don't want this to be included
Any help please?
You could try a pattern like this:
^([£€$]([0-9]([0-9,])*)(\.\d{2})?|([0-9]([0-9,]))(\.\d{2})?([pcm]|bn| [mb]illion))$
This will match either:
A £, €, or $ followed by a number which may contain commas, followed by an optional . followed by two more digits.
A number which may contain commas, followed by an optional . followed by two more digits, followed by p, c, m, bn, or a space, followed by million or billion.
Here's a demonstration
I am searching for a RegEx for prices.
So it should be X numbers in front, than a "," and at the end 2 numbers max.
Can someone support me and post it please?
In what language are you going to use it?
It should be something like:
^\d+(,\d{1,2})?$
Explaination:
X number in front is: ^\d+ where ^ means the start of the string, \d means a digit and + means one or more
We use group () with a question mark, a ? means: match what is inside the group one or no times.
inside the group there is ,\d{1,2}, the , is the comma you wrote, \d is still a digit {1,2} means match the previous digit one or two times.
The final $ matches the end of the string.
I was not satisfied with the previous answers. Here is my take on it:
\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})
|^^^^^^|^^^^^^^^^^^^^|^^^^^^^^^^^|
| 1-3 | 3 digits | 2 digits |
|digits| repeat any | |
| | no. of | |
| | times | |
(get a detailed explanation here: https://regex101.com/r/cG6iO8/1)
Covers all cases below
5.00
1,000
1,000,000.99
5,99 (european price)
5.999,99 (european price)
0.11
0.00
But also weird stuff like
5.000,000.00
In case you want to include 5 and 1000 (I personally wound not like to match ALL numbers), then just add a "?" like so:
\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})?
I am working on similar problem. However i want only to match if a currency Symbol or String is also included in the String like EUR,€,USD or $. The Symbol may be trailing or leading. I don't care if there is space between the Number and the Currency substring. I based the Number matching on the previous discussion and used Price Number: \d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})?
Here is final result:
(USD|EUR|€|\$)\s?(\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2}))|(\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})?)\s?(USD|EUR|€|\$)
I use (\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})?)\s?(USD|EUR|€|\$) as a pattern to match against a currency symbol (here with tolerance for a leading space). I think you can easily tweak it for any other currencies
A Gist with the latest Version can be found at https://gist.github.com/wischweh/b6c0ac878913cca8b1ba
So I ran into a similar problem, needing to validate if an arbitrary string is a price, but needed a lot more resilience than the regexes provided in this thread and many other threads.
I needed a regex that would match all of the following:
5
5.00
1,000
1,000,000.99
5,99 (european price)
5.999,99 (european price)
0.11
0.00
And not to match stuff like IP addresses. I couldn't figure out a single regex to deal with the european and non-european stuff in one fell swoop so I wrote a little bit of Ruby code to normalise prices:
if value =~ /^([1-9][0-9]{,2}(,[0-9]{3})*|[0-9]+)(\.[0-9]{1,9})?$/
Float(value.delete(","))
elsif value =~ /^([1-9][0-9]{,2}(\.[0-9]{3})*|[0-9]+)(,[0-9]{1,9})?$/
Float(value.delete(".").gsub(",", "."))
else
false
end
The only difference between the two regexes is the swapped decimal place and comma. I'll try and break down what this is doing:
/^([1-9][0-9]{,2}(,[0-9]{3})*|[0-9]+)(\.[0-9]{1,9})?$/
The first part:
([1-9][0-9]{,2}(,[0-9]{3})*
This is a statement of numbers that follow this form: 1,000 1,000,000 100 12. But it does not allow leading zeroes. It's for the properly formatted numbers that have groups of 3 numerics separated by the thousands separator.
Second part:
[0-9]+
Just match any number 1 or more times. You could make this 0 or more times if you want to match: .11 .34 .00 etc.
The last part:
(\.[0-9]{1,9})?
This is the decimal place bit. Why up to 9 numerics, you ask? I've seen it happen. This regex is supposed to be able to handle any weird and wonderful price it sees and I've seen some retailers use up to 9 decimal places in prices. Usually all 0s, but we wouldn't want to miss out on the data ^_^
Hopefully this helps the next person to come along needing to process arbitrarily badly formatted price strings or either european or non-european format :)
^\d+,\d{1,2}$
I am currently working on a small function using regex to get price amount inside a String :
private static String getPrice(String input)
{
String output = "";
Pattern pattern = Pattern.compile("\\d{1,3}[,\\.]?(\\d{1,2})?");
Matcher matcher = pattern.matcher(input);
if (matcher.find())
{
output = matcher.group(0);
}
return output;
}
this seems to work with small price (0,00 to 999,99) and various currency :
$12.34 -> 12.34
$12,34 -> 12,34
$12.00 -> 12.00
$12 -> 12
12€ -> 12
12,11€ -> 12,11
12.999€ -> 12.99
12.9€ -> 12.9
£999.99€ -> 999.99
...
Pretty simple for "," separated numbers(Or no seperation) with 2 decimal places , supports deliminator but does not force them. Needs some improvement but should work.
^((\d{1,3}|\s*){1})((\,\d{3}|\d)*)(\s*|\.(\d{2}))$
matches:
1,123,456,789,134.45
1123456134.45
1234568979
12,345.45
123.45
123
no match:
1,2,3
12.4
1234,456.45
This may need some editing to make it function correctly
Quick explanation: Matches 1-3 numbers(Or nothing), matches a comma followed by 3 numbers as many times as needed(Or just numbers), matches a decimal point followed by 1 or 2 numbers(Or Nothing)
This code worked for me !! (PHP)
preg_match_all('/\d+((,\d+)+)?(.\d+)?(.\d+)?(,\d+)?/',$price[1]->plaintext,$lPrices);
So far I tried, this is the best
\d{1,3}[,\\.]?(\\d{1,2})?
https://regex101.com/r/xT8aQ7/1
r'(^\-?\d*\d+.?(\d{1,2})?$)'
This will allow digits with only one decimal and two digits after decimal
This one reasonably works when you may or may not have decimal part but an amount shows up like this 100,000 - or 100,000.00. Tested using Clojure only
\d{1,3}(?:[.,]\\d{3})*(?:[.,]\d{2,3})
\d+((,\d+)+)?(.\d+)?(.\d+)?(,\d+)?
to cover all
5
5.00
1,000
1,000,000.99
5,99 (european price)
5.999,99 (european price)
0.11
0.00
^((\d+)((,\d+|\d+)*)(\s*|\.(\d{2}))$)
Matches:
1
11
111
1111111
11,2122
1222,21222
122.23
1223,3232.23
Not Matches:
11e
x111
111,111.090
1.000
anything like \d+,\d{2} is wrong because the \d matches [0-9\.] i.e. 12.34,1.
should be: [0-9]+,[0-9]{2} (or [0-9]+,[0-9]{1,2} to allow only 1 decimal place)