Regex for Regex validation decimal[19,3] - regex

I want to validate a decimal number (decimal[19,3]). I used this
#"[\d]{1,16}|[\d]{1,16}[\.]\d{1,3}"
but it didn't work.
Below are valid values:
1234567890123456.123
1234567890123456.12
1234567890123456.1
1234567890123456
1234567
0.0
.1

Simplification:
The \d doesn't have to be in []. Use [] only when you want to check whether a character is one of multiple characters or character classes.
. doesn't need to be escaped inside [] - [\.] appears to just allow ., but allowing \ to appear in the string in the place of the . may be a language dependent possibility (?). Or you can just take it out of the [] and keep it escaped.
So we get to:
\d{1,16}|\d{1,16}\.\d{1,3}
(which can be shortened using the optional / "once or not at all" quantifier (?)
to \d{1,16}(\.\d{1,3})?)
Corrections:
You probably want to make the second \d{1,16} optional, or equivalently simply make it \d{0,16}, so something like .1 is allowed:
\d{1,16}|\d{0,16}\.\d{1,3}
If something like 1. should also be allowed, you'll need to add an optional . to the first part:
\d{1,16}\.?|\d{0,16}\.\d{1,3}
Edit: I was under the impression [\d] matches \ or d, but it actually matches the character class \d (corrected above).

This would match your 3 scenarios
^(\d{1,16}|(\d{0,16}\.)?\d{1,3})$
first part: a 0 to 16 digit number
second: a 0 to 16 digit number with 1 to 3 decimals
third: nothing before a dot and then 1 to 3 decimals
the ^ and $ are anchorpoints that match start of line and end of line, so if you need to search for numbers inside lines of text, your should remove those.
Testdata:
Usage in C#
string resultString = null;
try {
resultString = Regex.Match(subjectString, #"\d{1,16}\.?|\d{0,16}\.\d{1,3}").Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Slight optimization
A bit more complicated regex, but a bit more correct would be to have the ?: notation in the "inner" group, if you are not using it, to make that a non-capture group, like this:
^(\d{1,16}|(?:\d{0,16}\.)?\d{1,3})$

Following Regex will help you out -
#"^(\d{1,16}(\.\d{1,3})?|\.\d{1,3})$"

Try something like that
(\d{0,16}\.\d{0,3})|(\d{0,16})
It work with all your examples.
edit. new version ;)

You can try:
^\d{0,16}(?:\.|$)(?:\d{0,3}|)$
match 0 to 16 digits
then match a dot or end of string
and then match 3 more digits

Related

Return dash followed by a single character

This works as expected:
([^\u0000-\u007F])+-हा([^\u0000-\u007F])+
Returns:
ब-हाणपूर
ब-हाणी
बनियन-हाफ
But I am looking for 1 character followed by dash. The expected output is:
ब-हाणपूर
ब-हाणी
I tried to replace + sign with character count like this...
([^\u0000-\u007F]){1}-हा([^\u0000-\u007F])+
But it returned the same 3 results. How do I return the first 2?
You need anchors:
^([^\u0000-\u007F])-हा([^\u0000-\u007F])+$
Demo
You asked 'What if I need 5 characters to the left of dash?'
The regex portion [^\u0000-\u007F] as written matches a single character that meets that criterion. If you want more or less than one, use a regex quantifier to describe how many you want.
In this case, if you want 5, you would use:
^([^\u0000-\u007F]{5})-हा([^\u0000-\u007F])+$
Probably like this:
^([^\u0000-\u007F]){1}-हा([^\u0000-\u007F])+
^([^\u0000-\u007F]{1})-हा([^\u0000-\u007F]+)
(\b[^\u0000-\u007F]{1})-हा([^\u0000-\u007F]+)
Regex demo

Python Regex - How to extract the third portion?

My input is of this format: (xxx)yyyy(zz)(eee)fff where {x,y,z,e,f} are all numbers. But fff is optional though.
Input: x = (123)4567(89)(660)
Expected output: Only the eeepart i.e. the number inside 3rd "()" i.e. 660 in my example.
I am able to achieve this so far:
re.search("\((\d*)\)", x).group()
Output: (123)
Expected: (660)
I am surely missing something fundamental. Please advise.
Edit 1: Just added fff to the input data format.
You could find all those matches that have round braces (), and print the third match with findall
import re
n = "(123)4567(89)(660)999"
r = re.findall("\(\d*\)", n)
print(r[2])
Output:
(660)
The (eee) part is identical to the (xxx) part in your regex. If you don't provide an anchor, or some sequencing requirement, then an unanchored search will match the first thing it finds, which is (xxx) in your case.
If you know the (eee) always appears at the end of the string, you could append an "at-end" anchor ($) to force the match at the end. Or perhaps you could append a following character, like a space or comma or something.
Otherwise, you might do well to match the other parts of the pattern and not capture them:
pattern = r'[0-9()]{13}\((\d{3})\)'
If you want to get the third group of numbers in brackets, you need to skip the first two groups which you can do with a repeating non-capturing group which looks for a set of digits enclosed in () followed by some number of non ( characters:
x = '(123)4567(89)(660)'
print(re.search("(?:\(\d+\)[^(]*){2}(\(\d+\))", x).group(1))
Output:
(660)
Demo on rextester

Using RegEx how do I remove the trailing zeros from a decimal number

I'm needing to write some regex that takes a number and removes any trailing zeros after a decimal point. The language is Actionscript 3. So I would like to write:
var result:String = theStringOfTheNumber.replace( [ the regex ], "" );
So for example:
3.04000 would be 3.04
0.456000 would be 0.456 etc
I've spent some time looking at various regex websites and I'm finding this harder to resolve than I initially thought.
Regex:
^(\d+\.\d*?[1-9])0+$
OR
(\.\d*?[1-9])0+$
Replacement string:
$1
DEMO
Code:
var result:String = theStringOfTheNumber.replace(/(\.\d*?[1-9])0+$/g, "$1" );
What worked best for me was
^([\d,]+)$|^([\d,]+)\.0*$|^([\d,]+\.[0-9]*?)0*$
For example,
s.replace(/^([\d,]+)$|^([\d,]+)\.0*$|^([\d,]+\.[0-9]*?)0*$/, "$1$2$3");
This changes
1.10000 => 1.1
1.100100 => 1.1001
1.000 => 1
1 >= 1
What about stripping the trailing zeros before a \b boundary if there's at least one digit after the .
(\.\d+?)0+\b
And replace with what was captured in the first capture group.
$1
See test at regexr.com
(?=.*?\.)(.*?[1-9])(?!.*?\.)(?=0*$)|^.*$
Try this.Grab the capture.See demo.
http://regex101.com/r/xE6aD0/11
Other answers didn't consider numbers without fraction (like 1.000000 ) or used a lookbehind function (sadly, not supported by implementation I'm using). So I modified existing answers.
Match using ^-?\d+(\.\d*[1-9])? - Demo (see matches). This will not work with numbers in text (like sentences).
Replace(with \1 or $1) using (^-?\d+\.\d*[1-9])(0+$)|(\.0+$) - Demo (see substitution). This one will work with numbers in text (like sentences) if you remove the ^ and $.
Both demos with examples.
Side note: Replace the \. with decimal separator you use (, - no need for slash) if you have to, but I would advise against supporting multiple separator formats within such regex (like (\.|,)). Internal formats normally use one specific separator like . in 1.135644131 (no need to check for other potential separators), while external tend to use both (one for decimals and one for thousands, like 1.123,541,921), which would make your regex unreliable.
Update: I added -? to both regexes to add support for negative numbers, which is not in demo.
If your regular expressions engine doesn't support "lookaround" feature then you can use this simple approach:
fn:replace("12300400", "([^0])0*$", "$1")
Result will be: 123004
I know I am kind of late but I think this can be solved in a far more simple way.
Either I miss something or the other repliers overcomplicate it, but I think there is a far more straightforward yet resilient solution RE:
([0-9]*[.]?([0-9]*[1-9]|[0]?))[0]*
By backreferencing the first group (\1) you can get the number without trailing zeros.
It also works with .XXXXX... and ...XXXXX. type number strings. For example, it will convert .45600 to .456 and 123. to 123. as well.
More importantly, it leaves integer number strings intact (numbers without decimal point). For example, it will convert 12300 to 12300.
Note that if there is a decimal point and there are only zeroes after that it will leave only one trailing zeroes. For example for the 42.0000 you get 42.0.
If you want to eliminate the leading zeroes too then youse this RE (just put a [0]* at the start of the former):
[0]*([0-9]*[.]?([0-9]*[1-9]|[0]?))[0]*
I tested few answers from the top:
^(\d+\.\d*?[1-9])0+$
(\.\d*?[1-9])0+$
(\.\d+?)0+\b
All of them not work for case when there are all zeroes after "." like 45.000 or 450.000
modified version to match that case: (\.\d*?[1-9]|)\.?0+$
also need to replace to '$1' like:
preg_replace('/(\.\d*?[1-9]|)\.?0+$/', '$1', $value);
try this
^(?!0*(\.0+)?$)(\d+|\d*\.\d+)$
And read this
http://www.regular-expressions.info/numericranges.html it might be helpful.
I know it's not what the original question is looking for, but anyone who is looking to format money and would only like to remove two consecutive trailing zeros, like so:
£30.00 => £30
£30.10 => £30.10 (and not £30.1)
30.00€ => 30€
30.10€ => 30.10€
Then you should be able to use the following regular expression which will identify two trailing zeros not followed by any other digit or exist at the end of a string.
([^\d]00)(?=[^\d]|$)
I'm a bit late to the party, but here's my solution:
(((?<=(\.|,)\d*?[1-9])0+$)|(\.|,)0+$)
My regular expression will only match the trailing 0s, making it easy to do a .replaceAll(..) type function.
Breaking it down, part one: ((?<=(\.|,)\d*?[1-9])0+$)
(?<=(\.|,): A positive look behind. Decimal must contain a . or a , (commas are used as a decimal point in some countries). But as its a look behind, it is not included in the matched text, but still must be present.
\d*?: Matches any number of digits lazily
[1-9]: Matches a single non-zero character (this will be the last digit before trailing 0s)
0+$: Matches 1 or more 0s that occur between the last non-zero digit and the line end.
This works great for everything except the case where trailing 0s begin immediately, like in 1.0 or 5.000. The second part fixes this (\.|,)0+$:
(\.|,): Matches a . or a , that will be included in matched text.
0+$ matches 1 or more 0s between the decimal point and the line end.
Examples:
1.0 becomes 1
5.0000 becomes 5
5.02394900022000 becomes 5.02394900022
Is it really necessary to use regex? Why not just check the last digits in your numbers? I am not familiar with Actionscript 3, but in python I would do something like this:
decinums = ['1.100', '0.0','1.1','10']
for d in decinums:
if d.find('.'):
while d.endswith('0'):
d = d[:-1]
if d.endswith('.'):
d = d[:-1]
print(d)
The result will be:
1.1
0
1.1
10

Why doesn't this regex pattern work?

I'm trying to select commas without numbers of 4 digits or the word "id" before, I tried with this:
( ? < ! [ \ d { 5 } | id ] ) ,
The problem
for example, if input string is "1999," that comma is not selected, I don't understand why.
Try this pattern:
(?<!\d{5}|id),
Your pattern, (?<![\d{5}|id]), is looking for a comma that is not after a digit, {, }, |, i, or d - They should not be in a charterer class: []. If anything, (?<![\d]{5}|id), will also work, but is redundant.
First of all, unless you're using the /x flag, each space will attempt to match a space. So take those out.
Second, you're using [...] presumably to group an alternation (|) but square brackets actually indicate a character class, i.e. [\d{5}|id] is equivalent to [id5{}|] and matches any one of those characters, but not more. What you mean is this:
(?<!\d{5}|id),
The final problem might be that many implementations of regex (you haven't specified which you're using) don't support variable-width lookbehind assertions. So, you may need to do something like:
(?<!\d{5}|...id),

Regular expression help - comma delimited string

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.
Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement