Strip zeroes with regex in csv file

Strip zeroes with regex in csv file - regex

I have csv file with decimal and integer numbers that represents amount of money, delimiter is semicolon. Example of file is:
00012,00;002200,21;00000;0000,00;0;
450000,21;056,01;0023,50;000000000000;
-032,23;-21.56;-00005630,05;
I have used this \b0*([1-9][0-9]*\,|0)\b to replace with $1like this.
However, the result must be:
12,00;2200,21;;0,00;;
450000,21;56,01;23,50;;
-32,23;-21.56;-5630,05;
So, if number is integer and has from 1 to unlimited number of 0 digits as result need to be empty (inserted in database as NULL), but if number is decimal the result must be 0,00.

You may use
(?<=;|-|^)(?:0+|(0)+(,00?)0*)(?=[1-9]\d*,|;|$)
Replace with $1$2. See the regex demo.
Details
(?<=;|-|^) - start of string, ; or - should be immediately to the left of the current location
(?:0+|(0)+(,00?)0*) - either of the two alternatives:
0+ - one or more 0 digits
| - or
(0)+(,00?)0* - one or more 0 digits with the last one captured in Group 1 ($1) followed with ,, 0 and an optional 0 captured into Group 2, and then zero or more 0 digits
(?=[1-9]\d*,|;|$) - there must be a digit from 1 to 9 followed with any amount of any digits and then , or a ; or end of string immediately to the right of the current location.

Combine more rules with |
rule1 0*([^0]\d*,\d*[^0])0*
rule2 0*(0,0)0*
rule3 0*(0)
rule4 0*([^0]\d*))
rule5 0*(0,\d*[^0])0*
rule6 0*([^0]\d*,0)0*
Be attentive how you anchor the beginning and end of number.

Here is an example using Perl and printf :
export LC_ALL=en_DK.UTF8 # some locale which uses commas as decimal separator
perl -Mlocale -nle '#fields =
map { if (/,/) {
sprintf "%.2f", $_
} else {
$_+=0;
$_ ? $_ : "NULL"
}
} split /;/;
print join(";", #fields)' test.csv
Output:
12,00;2200,21;NULL;0,00;NULL
450000,21;56,01;23,50;NULL
-32,23;-21,56;-5630,05

If you can use lookahead and lookbehind, this one should do the trick:
(?<=(?:^|;)-?)0+(?=\d)
Explanation:
Positive lookbehind group, containing:
a non-capturing group containing either a line start or a semicolon
an optional minus sign.
at least one zero
Positive lookahead group containing at least one decimal character.
This will match all the zeroes and nothing else, so you can just do a regex replace of the matches with an empty string.
Tested:
https://regex101.com/r/2a2q5h/1

Related

Regex expression for numbers and leading zeros just with a dot and decimal

I'm trying to find a regex for numeric inputs. We can receive a leading 0 just if we add a dot for adding 1 or 2 decimal numbers. And of course just accept numbers.
These are the scenarios that we can accept:
0.01
1.1
1.02
120.01
We can't accept these values
0023
0100
.01
.12
Which regex is the best option for these cases?
Until now we try we the following regex for accepting just number and dots
[A-Za-z,]
And also we try with the following ones:
^[+-]?[0-9]{1,3}(?:[0-9]*(?:[.,][0-9]{1})?|(?:,[0-9]{3})*(?:\.[0-9]{1,2})?|(?:\.[0-9]{3})*(?:,[0-9]{1,2})?)$
"/^[-]?[$]\d{1,3}(?:,?\d{3})*\.\d{2}$/"
"/(^(\d{1})\.{0,1}([0-9]){0,2}$)|(^([1-9])\d{0,2}(\,\d{0,3})$)/g"
(?:0|[1-9][0-9]*)(?:\.[0-9]{1,2})?
And the next one for deleting the leading zeros but it didn't work for 0.10 cases
^0+

If a negative lookahead is supported, you can exclude matches that start with a zero and have no decimal part.
^(?!0\d*$)\d+(?:\.\d{1,2})?$
^ Start of string
(?!0+\d*$) Negative lookahead, assert not a zero followed by optional digits at the right
\d+ Match 1+ digits
(?:\.\d{1,2})? Match an optional decimal part with 1 or 2 digits
$ End of string
Regex demo

I would go with ^(0|[1-9]\d*|(0|[1-9]\d*)\.\d+)$
You can test here: https://regex101.com/r/oNMgR9/1
Explanation
^ means : match the beginning of the string (or line if the m flag is enabled).
$ means : match the end of the string (or line if the m flag is enabled).
(a|b) means match "a" or match "b" so I'll use this to match either "0" alone or any number not starting with a "0". It's the syntax for a logical or.
. alone is used to match any char. So you have to escape it if you want to match the dot character. This is why I wrote 0\. instead of 0..
[ ] is used to list some characters you want to match. It can be a range if you use the - char, so [1-9] means any digit char from "1" to "9".
\d is to match a digit. It's totally equivalent to [0-9].
* means : match the preceding pattern 0 or many times, so \d* means that it will match 0 or many times a digit, so it will match "8" or "465" or "09" but also an empty string "". If you want to match the preceding pattern at least once or many times then you use + instead of *. So \d+ won't match an empty string "" but \d* would match it.
A) Just a number not starting with 0
[1-9]\d* will match any digit from 1 to 9 and then optionnaly followed by other digits. This will match numbers without a decimal point.
B) Just 0
0 alone is a possibility. This is because the case above isn't covering it.
B) A number with decimals
(0|[1-9]\d*)\.\d+ will match either a "0" alone or a number not starting by "0" and then followed by a point and some other digits (which have to be present because we don't want to match "45." without the numbers behind the dot).
Better alternative
The solution from #TheFourthBird is a bit cleaner with the use of a negative lookahead. It's just a bit different to understand. And he read the question completely: You wanted 1 or 2 digits after the decimal. I forgot about that, so, effectively, \d+ should be replaced by \d{1,2} as you don't want more than 2 digits.

You can use
^(?![0.]+$)(?:[1-9]\d*|0)(?:\.\d{1,2})?$
See the regex demo.
Details:
^ - start of string
(?![0.]+$) - fail the match if there are just zeros or dots till end of string
(?:[1-9]\d*|0) - either a non-zero digit followed with any zero or more digits or a zero
(?:\.\d{1,2})? - optionally followed with a sequence of a . and one or two digits
$ - end of string.

Removing trailing zeros using REPLACE regex

Remove trailing zeros to a number with 4 decimals
Sample expected output:
1.7500 -> 1.75
1.1010 -> 1.101
1.0000 -> 1
I am new with REGEX so I just tried this one first but not working:
REPLACE ALL OCCURRENCES OF REGEX '^\.[0]\d{0,3}' IN lv_rate WITH space.
Need help for the right regex to use. Thanks!
EDIT: SHIFT lv_rate RIGHT DELETING TRAILING '0' is not an option.

Try replacing on the following regex pattern:
\.?0+$
Use empty string as the replacement. This will match an optional decimal point, followed by trailing zeroes until the end of the string. See the demo below to see this pattern working.
Demo
This answer assumes that all inputs would always have a decimal component. If not, then we would need to add additional logic.

If you want to remove trailing zeros to a number with 4 decimals, one option is to use a capturing group and use group 1 in the replacement.
^(\d+(?=\.\d{4}$)(?:\.\d*[1-9])?)\.?0+$
In parts
^ Start of string
( Capture group 1
\d+ Match 1+ digits
(?=\.\d{4}$) Assert what is on the right is a . and 4 digits
(?:\.\d*[1-9])? Optionally match digits until the last digit 1-9
) Close group 1
\.?0+ Match an optional . and 1 or more times a zero
$ End of string
Regex demo

How to match the decimal digits which equals 0 at the end of a number in Regex?

I want to remove the zeros at the end of a number coming after the decimal point. To give an example:
12.009000 should match "000"
I have the regex pattern below but it gives an error A quantifier inside a lookbehind makes it non-fixed width and I can't find any solution to fix that. What is the correct pattern to match successfully?
Pattern: (?<=\.[0-9]*)0+$

With Java, you can do it like this.
(\\d) capture digits
followed by 0's
replace with the captured digits.
$1 is the back reference to the capture group
str = str.replaceAll("(\\.\\d+?)0+$","$1");
System.out.println(str);
Note: It will leave 12.000000 as 12.0.

(\d+[.]?\d*?)0*$
One more step is needed to replace the dot for numbers such as 12.000
Click here for demo: Click Here
Or to deal with numbers such as 12.000 in one step:
(?:(\d+)\.0*$)|(?:(\d+[.]?\d*?)0*$)
Click here for demo: Click Here

Here is my attempt:
(?:[.][0-9]*[1-9])(0+)$|([.]0+$)
This assumes that the input string is actually a number (it won't protect against things like xyz.001). It will not match at all if there are no trailing zeros after decimal point; and if there are, it removes:
sequence of 0s preceded by a [1-9] after [.][0-9]*
or
a [.] followed by a sequence of 0s.
The result will always be in the captured group if the regex matches.

([\d.]+?)(0*)
"Find digits and dots, but not greedily, then find trailing zeros"
Group 1 is the number. Group 2 is the trailing zeros.

Input Commas into regex during the whole number part of 10,4 decimal

I am looking for a regex that will limit a decimal to 10,4 but in the whole number part (10) I would like it to separate with commas.
For example - 1,123,123,123.1234
This gets me close to what I need - \d{0,10}.\d{4}
But I would like to show commas as in the example.
But I am not sure how to tweak this to achieve what I need?

You should be able to use the following :
(?:\d{1,3}(?:,\d{3}){0,2}|\d(?:,\d{3}){3}|\d{1,10})(?:\.\d{1,4})?
I've tested it here.
The whole pattern is an integer part followed by an optional floating part.
The integer part, (?:\d{1,3}(?:,\d{3}){0,2}|\d(?:,\d{3}){3}|\d{1,10}), is an alternative between three sub-patterns :
up to 9 digits with commas, \d{1,3}(?:,\d{3}){0,2}, which is a leading group of digits of one to three digits followed by up to two optional groups of exactly three digits, groups which are separated by commas
the 10 digits case with commas, \d(?:,\d{3}){3}, in which the leading digits group must contain exactly one digit and is followed by three three-digits groups, groups which are separated by commas
the commas-less number you had to begin with, \d{1,10}
The floating part is a dot followed by at least one digit and at most four.
Note that if you can avoid using a regex you absolutely should, this is the kind of regex which will make maintainers cry...

I don't think you can do this with a single regex
The algorithm I use is
Take the part of the number before the decimal point
Convert that to a string
Reverse the string
Split the string into chunks of 3 digits allowing the last group to have 1, 2 or 3 digits (this depends on your programming language)
Join the string together inserting , between each group
Reverse the string.
Concatenate a decimal point and the decimal digits if necessary.
You now have a correctly formatted string.

This does the job:
^(?:\d,)?\d{0,3}(?:,\d{1,3}){0,2}\.\d{4}$
Explanation:
^ # beginning of string
(?:\d,)? # non capture group, a digit and a comma, optional
\d{0,3} # 0 to 3 digits
(?: # non capture group
, # a comma
\d{1,3} # a to 3 digits
){0,2} # end group, may appear 0, 1 or 2 times
\. # a dot
\d{4} # 4 digits
$ # end of string
Demo

The following perl code uses a trick to work from right to left:
$num = 12345678.01;
$rev = reverse($num);
$rev =~ s/(\d{3})(?=\d)(?!\d*\.)/$1,/g;
$res = reverse($rev);
print "$res\n";
results in
12,345,678.01

Decimal number regular expression, where digit after decimal is optional

I need a regular expression that validates a number, but doesn't require a digit after the decimal.
ie.
123
123.
123.4
would all be valid
123..
would be invalid
Any would be greatly appreciated!

Use the following:
/^\d*\.?\d*$/
^ - Beginning of the line;
\d* - 0 or more digits;
\.? - An optional dot (escaped, because in regex, . is a special character);
\d* - 0 or more digits (the decimal part);
$ - End of the line.
This allows for .5 decimal rather than requiring the leading zero, such as 0.5

/\d+\.?\d*/
One or more digits (\d+), optional period (\.?), zero or more digits (\d*).
Depending on your usage or regex engine you may need to add start/end line anchors:
/^\d+\.?\d*$/
Debuggex Demo

You need a regular expression like the following to do it properly:
/^[+-]?((\d+(\.\d*)?)|(\.\d+))$/
The same expression with whitespace, using the extended modifier (as supported by Perl):
/^ [+-]? ( (\d+ (\.\d*)?) | (\.\d+) ) $/x
or with comments:
/^ # Beginning of string
[+-]? # Optional plus or minus character
( # Followed by either:
( # Start of first option
\d+ # One or more digits
(\.\d*)? # Optionally followed by: one decimal point and zero or more digits
) # End of first option
| # or
(\.\d+) # One decimal point followed by one or more digits
) # End of grouping of the OR options
$ # End of string (i.e. no extra characters remaining)
/x # Extended modifier (allows whitespace & comments in regular expression)
For example, it will match:
123
23.45
34.
.45
-123
-273.15
-42.
-.45
+516
+9.8
+2.
+.5
And will reject these non-numbers:
. (single decimal point)
-. (negative decimal point)
+. (plus decimal point)
(empty string)
The simpler solutions can incorrectly reject valid numbers or match these non-numbers.

this matches all requirements:
^\d+(\.\d+)?$

Try this regex:
\d+\.?\d*
\d+ digits before optional decimal
.? optional decimal(optional due to the ? quantifier)
\d* optional digits after decimal

I ended up using the following:
^\d*\.?\d+$
This makes the following invalid:
.
3.

This is what I did. It's more strict than any of the above (and more correct than some):
^0$|^[1-9]\d*$|^\.\d+$|^0\.\d*$|^[1-9]\d*\.\d*$
Strings that passes:
0
0.
1
123
123.
123.4
.0
.0123
.123
0.123
1.234
12.34
Strings that fails:
.
00000
01
.0.
..
00.123
02.134

you can use this:
^\d+(\.\d)?\d*$
matches:
11
11.1
0.2
does not match:
.2
2.
2.6.9

^[+-]?(([1-9][0-9]*)?[0-9](\.[0-9]*)?|\.[0-9]+)$
should reflect what people usually think of as a well formed decimal number.
The digits before the decimal point can be either a single digit, in which case it can be from 0 to 9, or more than one digits, in which case it cannot start with a 0.
If there are any digits present before the decimal sign, then the decimal and the digits following it are optional. Otherwise, a decimal has to be present followed by at least one digit. Note that multiple trailing 0's are allowed after the decimal point.
grep -E '^[+-]?(([1-9][0-9]*)?[0-9](\.[0-9]*)?|\.[0-9]+)$'
correctly matches the following:
9
0
10
10.
0.
0.0
0.100
0.10
0.01
10.0
10.10
.0
.1
.00
.100
.001
as well as their signed equivalents, whereas it rejects the following:
.
00
01
00.0
01.3
and their signed equivalents, as well as the empty string.

What language? In Perl style: ^\d+(\.\d*)?$

What you asked is already answered so this is just an additional info for those who want only 2 decimal digits if optional decimal point is entered:
^\d+(\.\d{2})?$
^ : start of the string
\d : a digit (equal to [0-9])
+ : one and unlimited times
Capturing Group (.\d{2})?
? : zero and one times
. : character .
\d : a digit (equal to [0-9])
{2} : exactly 2 times
$ : end of the string
1 : match
123 : match
123.00 : match
123. : no match
123.. : no match
123.0 : no match
123.000 : no match
123.00.00 : no match

try this. ^[0-9]\d{0,9}(\.\d{1,3})?%?$ it is tested and worked for me.

Regular expression:
^\d+((.)|(.\d{0,1})?)$
use \d+ instead of \d{0,1} if you want to allow more then one number use \d{0,2} instead of \d{0,1} if you want to allow up to two numbers after coma. See the example below for reference:
or
^\d+((.)|(.\d{0,2})?)$
or
^\d+((.)|(.\d+)?)$
Explanation
(These are generated by regex101)
^ asserts position at start of a line
\d matches a digit (equivalent to [0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ((.)|(.\d{0,1})?)
1st Alternative (.)
2nd Capturing Group (.)
. matches any character (except for line terminators)
2nd Alternative (.\d{0,1})?
3rd Capturing Group (.\d{0,1})?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
. matches any character (except for line terminators)
\d matches a digit (equivalent to [0-9])
{0,1} matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
Sandbox
Play with regex here: https://regex101.com/

(?<![^d])\d+(?:\.\d+)?(?![^d])
clean and simple.
This uses Suffix and Prefix, RegEx features.
It directly returns true - false for IsMatch condition

^\d+(()|(\.\d+)?)$
Came up with this. Allows both integer and decimal, but forces a complete decimal (leading and trailing numbers) if you decide to enter a decimal.

In Perl, use Regexp::Common which will allow you to assemble a finely-tuned regular expression for your particular number format. If you are not using Perl, the generated regular expression can still typically be used by other languages.
Printing the result of generating the example regular expressions in Regexp::Common::Number:
$ perl -MRegexp::Common=number -E 'say $RE{num}{int}'
(?:(?:[-+]?)(?:[0123456789]+))
$ perl -MRegexp::Common=number -E 'say $RE{num}{real}'
(?:(?i)(?:[-+]?)(?:(?=[.]?[0123456789])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[-+]?)(?:[0123456789]+))|))
$ perl -MRegexp::Common=number -E 'say $RE{num}{real}{-base=>16}'
(?:(?i)(?:[-+]?)(?:(?=[.]?[0123456789ABCDEF])(?:[0123456789ABCDEF]*)(?:(?:[.])(?:[0123456789ABCDEF]{0,}))?)(?:(?:[G])(?:(?:[-+]?)(?:[0123456789ABCDEF]+))|))

For those who wanna match the same thing as JavaScript does:
[-+]?(\d+\.?\d*|\.\d+)
Matches:
1
+1
-1
0.1
-1.
.1
+.1
Drawing: https://regexper.com/#%5B-%2B%5D%3F%28%5Cd%2B%5C.%3F%5Cd*%7C%5C.%5Cd%2B%29

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Strip zeroes with regex in csv file - regex

Combine more rules with | rule1 0([^0]\d,\d[^0])0 rule2 0(0,0)0 rule3 0(0) rule4 0([^0]\d)) rule5 0(0,\d[^0])0 rule6 0([^0]\d,0)0* Be attentive how you anchor the beginning and end of number.

Related

Regex expression for numbers and leading zeros just with a dot and decimal

Removing trailing zeros using REPLACE regex

How to match the decimal digits which equals 0 at the end of a number in Regex?

Input Commas into regex during the whole number part of 10,4 decimal

Decimal number regular expression, where digit after decimal is optional

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Strip zeroes with regex in csv file - regex

Combine more rules with | rule1 0*([^0]\d*,\d*[^0])0* rule2 0*(0,0)0* rule3 0*(0) rule4 0*([^0]\d*)) rule5 0*(0,\d*[^0])0* rule6 0*([^0]\d*,0)0* Be attentive how you anchor the beginning and end of number.

Related

Regex expression for numbers and leading zeros just with a dot and decimal

Removing trailing zeros using REPLACE regex

How to match the decimal digits which equals 0 at the end of a number in Regex?

Input Commas into regex during the whole number part of 10,4 decimal

Decimal number regular expression, where digit after decimal is optional

Categories

Resources

Combine more rules with | rule1 0([^0]\d,\d[^0])0 rule2 0(0,0)0 rule3 0(0) rule4 0([^0]\d)) rule5 0(0,\d[^0])0 rule6 0([^0]\d,0)0* Be attentive how you anchor the beginning and end of number.