Regex for remove unnecessary zeros after decimal point - regex

I need to cut some numbers:
21250000.022000 -> 21250000.022
20.00 -> 20
200 -> 200
20.50 -> 20.5
UPDATE:
This is what i try:
https://regex101.com/r/Sdfe5D/1
In this regex the problem is with the numnbers '200' and '20'.

You don't need a regex for this. Just cast your string to double:
$arr = array('21250000.022000', '20.00', '200', '20.50');
foreach ($arr as $n)
echo (double) $n. "\n";
Output:
21250000.022
20
200
20.5
Code Demo
Update: If you're looking for a regex solution then use:
Search regex:
(?:(\.\d*?[1-9]+)|\.)0*$
replacement:
$1
RegEx Demo

You can use the following regular expression. I used a positive look-ahead to only select the relevant numbers: (?=)
^[^\.]+?(?=\.0*$)|^[^\.]+?\..*?(?=0*$)|^[^\.]*$
^[^\.]+?(?=\.0*$): The first part looks for a number which is not a dot ([^\.]), followed by a dot and a random number of zeros at, which continue until the end ($). This matches numbers like 20.000 -> 20 or 25.0 -> 25. It won't match any dots at the end of the line.
^[^\.]+?\..*?(?=0*$): The second part looks is pretty similar. It looks for a number which is not a dot, followed by a dot, any characters and trailing zeros. The difference is in the brackets of the positive look-ahead: This means, it matches also characters after the dot (\.), but only if they are non-trailing zeros. This regex alone would match 20.0 -> 20.. Examples for matches here are 20.04020000 -> 20.0402 or 24.120300 -> 24.1203
^[^\.]*$: matches full numbers without any dot in it. This one is necessary for numbers like 200 or 21.
This regular expression is based on the beginning (^) and the end of a string ($). It won't match multiple numbers in one string. When using multiple numbers in one string, line breaks need to be used to match the numbers.
You can test the regular expression with the following snippet:
// expected results:
// 21250000.022000 -> 21250000.022
// 20.00 -> 20
// 200 -> 200
// 20.50 -> 20.5
var regex = /^[^\.]+?(?=\.0*$)|^[^\.]+?\..*?(?=0*$)|^[^\.]*$/g;
var texts = [
'21250000.022000',
'20.00',
'200',
'20.50'
];
for(var i = 0; i < texts.length; i++) {
var text = texts[i];
console.log(text, '->', text.match(regex)[0]);
}

This regex has better performance:
\.0+$|(\.\d*[1-9])(0+)$
Replace with $1
Here's an explanation, if you're interested:
First, we deal with any whole numbers, since they're easier:
\.0+$
Then, we match the important numbers (That we want to keep) after the decimal point in a capturing group:
(\.\d*[1-9])
Finally, we match all trailling zeroes, and the end of the string (replace $ with \b if needed)
(0+)$
See it live

If you really need to use regex for this, following would be one way:
preg_replace("/(\.[0-9]*[1-9]+)[0]+/", "$1", "21250000.022000");
However, the trailing zeroes can be removed simply by casting the string to a double:
(double) "21250000.022000";

Related

Regular expression to search for digits after a decimal place

I'm trying to write a regular expression that can match a decimal (and the digits after) of a dollar value. For example, I want to match $1.00 , $1,100.89 (includes values in the thousands with commas). It cannot match any digits that are not preceded by a $ character. There values are also not the only pieces of text in this file.
So far, I've tried a few things that haven't quite gotten me there:
\.+[\d]+ (highlights the decimal and every digit after the decimal point, but not what we want because it includes non-dollar values like 1.00)
\$+[\d+\.]+ highlights the whole value of the dollar except the 1,250
(\$\d+\.+\d+)|\$\d+\,+\d+\.+\d+ highlights the whole value of anything with a dollar sign
Anyone have an idea?
I looked at your problem and I believe I have a solution.
You could use the regex below to search for the last two decimals.
^\$[\d,]+\.((?:\d){2})
You can see it in action here
Use:
^\$[\d,]+\.(\d\d)$
Explanation:
^ # beginning of string
\$ # $ sign
[\d,]+ # 1 or more digit or comma
\. # a dot
(\d\d) # group 1, 2 digits
$ # end of string
var test = [
'$100.00',
'$1,100.89',
'$123',
'123.45',
];
console.log(test.map(function (a) {
m = a.match(/^\$[\d,]+\.(\d\d)$/);
if (m)
return a + ' : ' + m[1];
else
return a + ' : no match';
}));
You could use the non matching group selector (?:) to isolate only the group you want. I've come up with this regex and it seams to do what you are looking for
^(?:\$[,\d]+)(?:\.([\d]{2}))
const regex = /^(?:\$[,\d]+)(?:\.([\d]{2}))/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => regex.exec(item)[1]);
console.log(result);
You could test more cases here
EDIT :
Here is an example on how to replace only the last digit.
I'm using the same concept as the other one, only this time i'm not keeping the digit. I'm going to use $1 to get the group i want in the new string.
const regex = /^(\$[,\d]+)\.(?:[\d]{2})/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => item.replace(regex, '$1.50'));
console.log(result);
Notice here that the $1 in the replace function refer to the first group matching group of the regex. This way, we can get it back an "insert" it into our final string.
Here I've choosen .50 as a replace string, but you could use what ever.
P.S. I know this might be confusing because we are talking about dollar, so here is an example where we replace the final digit with a word.
const regex = /^(\$[,\d]+)\.(?:[\d]{2})/;
const values = [
'$100.00',
'$99.99',
'$1,354.92'
];
const result = values.map(item => item.replace(regex, '$1 this is a word'));
console.log(result);

Python Regex - How to extract the third portion?

My input is of this format: (xxx)yyyy(zz)(eee)fff where {x,y,z,e,f} are all numbers. But fff is optional though.
Input: x = (123)4567(89)(660)
Expected output: Only the eeepart i.e. the number inside 3rd "()" i.e. 660 in my example.
I am able to achieve this so far:
re.search("\((\d*)\)", x).group()
Output: (123)
Expected: (660)
I am surely missing something fundamental. Please advise.
Edit 1: Just added fff to the input data format.
You could find all those matches that have round braces (), and print the third match with findall
import re
n = "(123)4567(89)(660)999"
r = re.findall("\(\d*\)", n)
print(r[2])
Output:
(660)
The (eee) part is identical to the (xxx) part in your regex. If you don't provide an anchor, or some sequencing requirement, then an unanchored search will match the first thing it finds, which is (xxx) in your case.
If you know the (eee) always appears at the end of the string, you could append an "at-end" anchor ($) to force the match at the end. Or perhaps you could append a following character, like a space or comma or something.
Otherwise, you might do well to match the other parts of the pattern and not capture them:
pattern = r'[0-9()]{13}\((\d{3})\)'
If you want to get the third group of numbers in brackets, you need to skip the first two groups which you can do with a repeating non-capturing group which looks for a set of digits enclosed in () followed by some number of non ( characters:
x = '(123)4567(89)(660)'
print(re.search("(?:\(\d+\)[^(]*){2}(\(\d+\))", x).group(1))
Output:
(660)
Demo on rextester

Extracting Number from Log File

I'm trying to extract a number from a log file that outputs lines of text like this:
1/11/2016 3:26:12 AM 1/11/2016 3:27:00 AM 45.6 A
The output from the line is 45.6 A
However, my Regex code is returning the 12 A from 3:26:12 AM. I need it to completely ignore the time number and just output the 45.6 A.
Here's my Regex code:
$regex = '\d+(?:\.\d+)?(?=\s+A)'
You just forgot to anchor the lookeahead at the end of the string:
\d+(?:\.\d+)?(?=\s+A$)
^
See the regex demo
The \d+(?:\.\d+)? will match one or more digits optionally followed with a . followed with one or more digits (a float value), and the (?=\s+A$) lookahead will require one or more whitespace characters with A right at the end of the string to appear after the float value.
$s = '1/11/2016 3:26:12 AM 1/11/2016 3:27:00 AM 45.6 A'
$rx = '\d+(?:\.\d+)?(?=\s+A$)'
$result = [regex]::Match($s, $rx, 'RightToLeft')
if ($result) { $result.Value; }
You can use word boundary (\b) to match only A, not AM:
\d+(?:\.\d+)?(?=\s+A\b)
DEMO: https://regex101.com/r/pA7jK2/1
if you just need find the last digit with an A in it, try this
(\d+\.\d\sA)
Demo here

Extract number not in brackets from this string using regular expressions [70-(90)]

[15-]
[41-(32)]
[48-(45)]
[70-15]
[40-(64)]
[(128)-42]
[(128)-56]
I have these values for which I want to extract the value not in curled brackets. If there is more than one, then add them together.
What is the regular expression to do this?
So the solution would look like this:
[15-] -> 15
[41-(32)] -> 41
[48-(45)] -> 48
[70-15] -> 85
[40-(64)] -> 40
[(128)-42] -> 42
[(128)-56] -> 56
You would be over complicating if you go for a regex approach (in this case, at least), also, regular expressions does not support mathematical operations, as pointed out by #richardtallent.
You can use an approach as shown here to extract a substring which omits the initial and final square brackets, and then, use the Split (as shown here) and split the string in two using the dash sign. Lastly, use the Instr function (as shown here) to see if any of the substrings that the split yielded contains a bracket.
If any of the substrings contain a bracket, then, they are omitted from the addition, or they are added up if otherwise.
Regular expressions does not support performing math on the terms. You can loop through the groups that are matched and perform the math outside of Regex.
Here's the pattern to extract any number within the square brackets that are not in cury brackets:
\[
(?:(?:\d+|\([^\)]*\))-)*
(\d+)
(?:-[^\]]*)*
\]
Each number will be returned in $1.
This works by looking for a number that is prefixed by any number of "words" separated by dashes, where the "words" are either numbers themselves or parenthesized strings, and followed by, optionally, a dash and some other stuff before hitting the end brace.
If VBA's RegEx doesn't support uncaptured groups (?:), remove all of the ?:'s and your captured numbers will be in $3 instead.
A simpler pattern also works:
\[
(?:[^\]]*-)*
(\d+)
(?:-[^\]]*)*
\]
This simply looks for numbers delimited by dashes and allowing for the number to be at the beginning or end.
Private Sub regEx()
Dim RegexObj As New VBScript_RegExp_55.RegExp
RegexObj.Pattern = "\[(\(?[0-9]*?\)?)-(\(?[0-9]*?\)?)\]"
Dim str As String
str = "[15-]"
Dim Match As Object
Set Match = RegexObj.Execute(str)
Dim result As Integer
Dim value1 As Integer
Dim value2 As Integer
If Not InStr(1, Match.Item(0).submatches.Item(0), "(", 1) Then
value1 = Match.Item(0).submatches.Item(0)
End If
If Not InStr(1, Match.Item(0).submatches.Item(1), "(", 1) And Not Match.Item(0).submatches.Item(1) = "" Then
value2 = Match.Item(0).submatches.Item(1)
End If
result = value1 + value2
MsgBox (result)
End Sub
Fill [15-] with the other strings.
Ok! It's been 6 years and 6 months since the question was posted. Still, for anyone looking for something like that maybe now or in the future...
Step 1:
Trim Leading and Trailing Spaces, if any
Step 2:
Find/Search:
\]|\[|\(.*\)
Replace With:
<Leave this field Empty>
Step 3:
Trim Leading and Trailing Spaces, if any
Step 4:
Find/Search:
^-|-$
Replace With:
<Leave this field Empty>
Step 5:
Find/Search:
-
Replace With:
\+

Regex for Regex validation decimal[19,3]

I want to validate a decimal number (decimal[19,3]). I used this
#"[\d]{1,16}|[\d]{1,16}[\.]\d{1,3}"
but it didn't work.
Below are valid values:
1234567890123456.123
1234567890123456.12
1234567890123456.1
1234567890123456
1234567
0.0
.1
Simplification:
The \d doesn't have to be in []. Use [] only when you want to check whether a character is one of multiple characters or character classes.
. doesn't need to be escaped inside [] - [\.] appears to just allow ., but allowing \ to appear in the string in the place of the . may be a language dependent possibility (?). Or you can just take it out of the [] and keep it escaped.
So we get to:
\d{1,16}|\d{1,16}\.\d{1,3}
(which can be shortened using the optional / "once or not at all" quantifier (?)
to \d{1,16}(\.\d{1,3})?)
Corrections:
You probably want to make the second \d{1,16} optional, or equivalently simply make it \d{0,16}, so something like .1 is allowed:
\d{1,16}|\d{0,16}\.\d{1,3}
If something like 1. should also be allowed, you'll need to add an optional . to the first part:
\d{1,16}\.?|\d{0,16}\.\d{1,3}
Edit: I was under the impression [\d] matches \ or d, but it actually matches the character class \d (corrected above).
This would match your 3 scenarios
^(\d{1,16}|(\d{0,16}\.)?\d{1,3})$
first part: a 0 to 16 digit number
second: a 0 to 16 digit number with 1 to 3 decimals
third: nothing before a dot and then 1 to 3 decimals
the ^ and $ are anchorpoints that match start of line and end of line, so if you need to search for numbers inside lines of text, your should remove those.
Testdata:
Usage in C#
string resultString = null;
try {
resultString = Regex.Match(subjectString, #"\d{1,16}\.?|\d{0,16}\.\d{1,3}").Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Slight optimization
A bit more complicated regex, but a bit more correct would be to have the ?: notation in the "inner" group, if you are not using it, to make that a non-capture group, like this:
^(\d{1,16}|(?:\d{0,16}\.)?\d{1,3})$
Following Regex will help you out -
#"^(\d{1,16}(\.\d{1,3})?|\.\d{1,3})$"
Try something like that
(\d{0,16}\.\d{0,3})|(\d{0,16})
It work with all your examples.
edit. new version ;)
You can try:
^\d{0,16}(?:\.|$)(?:\d{0,3}|)$
match 0 to 16 digits
then match a dot or end of string
and then match 3 more digits