Extract Number from String Field, Including Decimal (Postgres 9.5) - regex

Suppose I have a field:
product_strength
10MG/ML
0.25MG
25MG
0.125MG
How do I extract just the "numeric" part and then cast to numeric? I can get this far: regexp_replace(product_strength, '(\D|!\.)','','g')::numeric AS result_numeric
But the problem with this is that it doesn't actually account for the decimal. In other words, this returns
product_strength result_numeric
10MG/ML 10
0.25MG 25
25MG 25
0.125MG 125
But I would want to return
product_strength result_numeric
10MG/ML 10
0.25MG 0.25
25MG 25
0.125MG 0.125

I would use regexp_matches for this:
select (regexp_matches(product_strength, '[0-9]+\.?[0-9]*'))[1]::numeric
from the_table
regexp_matches() returns an array of all matched strings, that's why the [1] is needed.

Try this regex to match the numbers;
\d+\.?\d*
Edit: as "Boolean_Type" says, if you need negative numbers too, you could add in an optional negative sign, and use;
\-?\d+\.?\d*

Related

Regex for two numbers and space between

I need to validate a property which can contain a two numbers(can be both float and integer) and they are separated with a space
So this should be valid: 72 72,278
This should be also valid: 72 72
this shouldn't be valid: 45 45 - more than one space between
this shouldn't be valid : 2,2 - just one float number
this shouldn't be valid: 23 # 12 - if it contain any other characters
I've tried like this:
\d+[ ,]\d+
but this does not validate correctly if I have two float numbers, only if I have two integers and the space
I've also tried this
[0-9 .,]+
This validates what I need, but also validates invalid items
You can try something like so: ^\d+(,\d+)?(\s\d+(,\d+)?)+$, example here.
This assumes that:
The , is your floating point delimeter;
Negative numbers aren't allowed.
EDIT
As per Wiktor's comment, this solution accepts two or more numbers separated by a white space. To only accept 2 numbers, the following needs to be used: ^\d+(,\d+)?(\s\d+(,\d+)?)$. (Notice the + at the end has been removed, changing the logic to 1 or more to just 1).
EDIT 2
As per your comment, if you need to match both the , and the . as separator, you would need to use something such as this: ^\d+([,.]\d+)?(\s\d+([,.]\d+)?)+$. This will accept both characters as decimal separators, however, please note that as is, can accept something of the sort: 12.34 34,56.

regular expression that accepts numbers like 1,000.10? [duplicate]

I need regex to validate a number that could contain thousand separators or decimals using javascript.
Max value being 9,999,999.99
Min value 0.01
Other valid values:
11,111
11.1
1,111.11
INVALID values:
1111
1111,11
,111
111,
I've searched all over with no joy.
/^\d{1,3}(,\d{3})*(\.\d+)?$/
About the minimum and maximum values... Well, I wouldn't do it with a regex, but you can add lookaheads at the beginning:
/^(?!0+\.00)(?=.{1,9}(\.|$))\d{1,3}(,\d{3})*(\.\d+)?$/
Note: this allows 0,999.00, so you may want to change it to:
/^(?!0+\.00)(?=.{1,9}(\.|$))(?!0(?!\.))\d{1,3}(,\d{3})*(\.\d+)?$/
which would not allow a leading 0.
Edit:
Tests: http://jsfiddle.net/pKsYq/2/
((\d){1,3})+([,][\d]{3})*([.](\d)*)?
It worked on a few, but I'm still learning regex as well.
The logic should be 1-3 digits 0-1 times, 1 comma followed by 3 digits any number of times, and a single . followed by any number of digits 0-1 times
First, I want to point out that if you own the form the data is coming from, the best way to restrict the input is to use the proper form elements (aka, number field)
<input type="number" name="size" min="0.01" max="9,999,999.99" step="0.01">
Whether "," can be entered will be based on the browser, but the browser will always give you the value as an actual number. (Remember that all form data must be validated/sanitized server side as well. Never trust the client)
Second, I'd like to expand on the other answers to a more robust (platform independent)/modifiable regex.
You should surround the regex with ^ and $ to make sure you are matching against the whole number, not just a subset of it. ex ^<my_regex>$
The right side of the decimal is optional, so we can put it in an optional group (<regex>)?
Matching a literal period and than any chain of numbers is simply \.\d+
If you want to insist the last number after the decimal isn't a 0, you can use [1-9] for "a non-zero number" so \.\d+[1-9]
For the left side of the decimal, the leading number will be non-zero, or the number is zero. So ([1-9]<rest-of-number-regex>|0)
The first group of numbers will be 1-3 digits so [1-9]\d{0,2}
After that, we have to add digits in 3s so (,\d{3})*
Remember ? means optional, so to make the , optional is just (,?\d{3})*
Putting it all together
^([1-9]\d{0,2}(,?\d{3})*|0)(\.\d+[1-9])?$
Tezra's formula fails for '1.' or '1.0'. For my purposes, I allow leading and trailing zeros, as well as a leading + or - sign, like so:
^[-+]?((\d{1,3}(,\d{3})*)|(\d*))(\.|\.\d*)?$
In a recent project we needed to alter this version in order to meet international requirements.
This is what we used: ^-?(\d{1,3}(?<tt>\.|\,| ))((\d{3}\k<tt>)*(\d{3}(?!\k<tt>)[\.|\,]))?\d*$
Creating a named group (?<tt>\.|\,| ) allowed us to use the negative look ahead (?!\k<tt>)[\.|\,]) later to ensure the thousands separator and the decimal point are in fact different.
I have used below regrex for following retrictions -
^(?!0|\.00)[0-9]+(,\d{3})*(.[0-9]{0,2})$
Not allow 0 and .00.
','(thousand seperator) after 3 digits.
'.' (decimal upto 2 decimal places).

Optimization of Regular Expression to match numbers bigger or equal to 50

I want to check if a number is 50 or more using a regular expression. This in itself is no problem but the number field has another regex checking the format of the entered number.
The number will be in the continental format: 123.456,78 (a dot between groups of three digits and always a comma with 2 digits at the end)
Examples:
100.000,00
50.000,00
50,00
34,34
etc.
I want to capture numbers which are 50 or more. So from the four examples above the first three should be matched.
I've come up with this rather complicated one and am wondering if there is an easier way to do this.
^(\d{1,3}[.]|[5-9][0-9]|\d{3}|[.]\d{1,3})*[,]\d{2}$
EDIT
I want to match continental numbers here. The numbers have this format due to internal regulations and specify a price.
Example: 1000 EUR would be written as 1.000,00 EUR
50000 as 50.000,00 and so on.
It's a matter of taste, obviously, but using a negative lookahead gives a simple solution.
^(?!([1-4]?\d),)[1-9](\d{1,2})?(\.\d{3})*,\d{2}\b
In words: starting from a boundary ignore all numbers that start with 1 digit OR 2 digits (the first being a 1,2,3 or 4), followed by a comma.
Check on regex101.com
Try:
EDIT ^(.{3,}|[5-9]\d),\d{2}$
It checks if:
there 3 chars or more before the ,
there are 2 numbers before the , and the first is between 5 and 9
and then a , and 2 numbers
Donno if it answer your question as it'll return true for:
aa50,00
1sdf,54
But this assumes that your original string is a number in the format you expect (as it was not a requirement in your question).
EDIT 3
The regex below tests if the number is valid referring to the continental format and if it's equal or greater than 50. See tests here.
Regex: ^((([1-9]\d{0,2}\.)(\d{3}\.){0,}\d{3})|([1-9]\d{2})|([5-9]\d)),\d{2}$
Explanation (d is a number):
([1-9]\d{0,2}\.): either d., dd. or ddd. one time with the first d between 1 and 9.
(\d{3}\.){0,}: ddd. zero or x time
\d{3}: ddd 3 digit
These 3 parts combined match any numbers equals or greater than 1000 like: 1.000, 22.002 or 100.000.000.
([1-9]\d{2}): any number between 100 and 999.
([5-9]\d)): a number between 5 and 9 followed by a number. Matches anything between 50 and 99.
So it's either the one of the parts above or this one.
Then ,\d{2}$ matches the comma and the two last digits.
I have named all inner groups, for better understanding what part of number is matched by each group. After you understand how it works, change all ?P<..> to ?:.
This one is for any dec number in the continental format.
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})*|0)(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})*,)|0,|,)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
test
This one is for the same with the limit number>=50
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})+|(?P<int_short>[1-9]\d{2}|[5-9]\d))(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})+,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
tests
If you always have the integer part under 999.999 and fractal part always 2 digits, it will be a bit more simple:
^(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})?,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d)(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_end>\d{1,2})))?$
test
If you can guarantee that the number is correctly formed -- that is, that the regex isn't expected to detect that 5,0.1 is invalid, then there are a limited number of passing cases:
ends with \d{3}
ends with [5-9]\d
contains \d{3},
contains [5-9]\d,
It's not actually necessary to do anything with \.
The easiest regex is to code for each of these individually:
(\d{3}$|[5-9]\d$|\d{3},|[5-9]\d)
You could make it more compact and efficient by merging some of the cases:
(\d{3}[$,]|[5-9]\d[$,])
If you need to also validate the format, you will need extra complexity. I would advise against attempting to do both in a single regex.
However unless you have a very good reason for having to do this with a regex, I recommend against it. Parse the string into an integer, and compare it with 50.

Regular expression for price validation

Need regular expression which have:
Maximum 8 digits before decimal(.) point
Maximum 4 digits after decimal point
Decimal point is optional
Maximum valid decimal is 8 digits before decimal and 4 digits after decimal
So 99999999.9999
The regular rexpression I have tried ^\d{0,8}[.]?\d{1,4}$ is failing for 123456789
and more than this. means it is taking more than 8 digits if decimal point is not available.
Tested here : http://regexpal.com/
Many many thanks in advance!
^\d{0,8}(\.\d{1,4})?$
You can make the entire decimal optional
You can try this:
^\d{1,8}(?:\.\d{1,4})?$
or
^[1-9]\d{0,7}(?:\.\d{1,4})?$
If you don't want to have a zero as first digit.
You can allow this if you want: (.1234)
^[1-9]\d{0,7}(?:\.\d{1,4})?|\.\d{1,4}$
Any of the above did not work for me.
Only this works for me
^([0-9]{0,2}((.)[0-9]{0,2}))$
This regex is working for most cases even negative prices,
(\-?\d+\.?\d{0,2})
Tested with the following,
9
9.97
37.97
132.97
-125.55
12.2
1000.00
10000.00
100000.00
1000000.00
401395011
If there is a price of $9.97, £9.97 or €9.97 it will validate 9.97 removing the symbol.
1-(\$+.[1-9])
2-(\£+.[1-9])
You can use this expression for complete price digits.
I'm using this:
^[1-9]\d{0,7}(\.\d{1-4})$
^ = the start of the string
[1-9] = at least the string has to begin with 1 number between 1 and 9
\d{0,7} = optional or max 7 times d (digit: a number between 0 and 9)
() = create a group like a substring
. = need a .
\d{1-4} = digit repited max 4 time
$ end of the string
For price validation we can not allow inputs with leading repeating zeros like 0012 etc.
My solution check for any cases. Also it allows maximum 2 decimal point after the dot.
^(?:0\.[0-9]{1,2}|[1-9]{1}[0-9]*(\.[0-9]{1,2})?|0)$

Regex Numerical Range 1 - 1 million

I'm looking for a expression range for monetary purposes. It needs to be 1 - 1 million and allow commas and periods. I don't need a min/max of (, and .) for correct formatting but I would like the digits after a period to be a min/max of 2 for actual cent values. Thanks
In Range:
640 or 5,000.35 or 999,000
Not in Range:
01 or 1,000,000.01 or 333,567.678
What I would suggest is :
Use something like that to verify that the input has a specific format :
(here's a demo - http://regexr.com?30l28)
(1[\.,])?([0-9]{1,3}[\.,])?([0-9]{1,3})([\.,][0-9]{1,2})
And then test the value range :
is value<1.000.000?
My regex is by no means 100% complete, but it DOES verify your general number format though.
This should do it:
^(1(\.\d{2})?|[1-9]\d{0,2}(,?\d{3})?(\.\d{2})?)|1((,000){0,2}|(000){0,2})(\.00)?$
But it would probably easier if you normalize the value first (e. g. remove any character except digits and the .) and then parse it.