I've test cases with a series of words like this :
{
input: "Halley's Comet",
expected: "HC",
},
{
input: "First In, First Out",
expected: "FIFO",
},
{
input: "The Road _Not_ Taken",
expected: "TRNT",
},
I want with one regex to match all first letters of these words, avoid char: "_" to be matched as a first letter and count single quote in the word.
Currently, I have this regex working on pcre syntax but not with Go regexp package : (?<![a-zA-Z0-9'])([a-zA-Z0-9'])
I know lookarounds aren't supported by Go but I'm looking for a good way to do that.
I also use this func to get an array of all strings : re.FindAllString(s, -1)
Thanks for helping.
Something that plays with character classes and word boundaries should suffice:
\b_*([a-z])[a-z]*(?:'s)?_*\b\W*
demo
Usage:
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`(?i)\b_*([a-z])[a-z]*(?:'s)?_*\b\W*`)
fmt.Println(re.ReplaceAllString("O'Brian's dog", "$1"))
}
ftr, regexp less solution
package main
import (
"fmt"
)
func main() {
inputs := []string{"Hallمرحباey's Comet", "First In, First Out", "The Road _Not_ Taken", "O'Brian's Dog"}
c := [][]string{}
w := [][]string{}
for _, input := range inputs {
c = append(c, firstLet(input))
w = append(w, words(input))
}
fmt.Printf("%#v\n", w)
fmt.Printf("%#v\n", c)
}
func firstLet(in string) (out []string) {
var inword bool
for _, r := range in {
if !inword {
if isChar(r) {
inword = true
out = append(out, string(r))
}
} else if r == ' ' {
inword = false
}
}
return out
}
func words(in string) (out []string) {
var inword bool
var w []rune
for _, r := range in {
if !inword {
if isChar(r) {
w = append(w, r)
inword = true
}
} else if r == ' ' {
if len(w) > 0 {
out = append(out, string(w))
w = w[:0]
}
inword = false
} else if r != '_' {
w = append(w, r)
}
}
if len(w) > 0 {
out = append(out, string(w))
}
return out
}
func isChar(r rune) bool {
return (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z')
}
outputs
[][]string{[]string{"Hallمرحباey's", "Comet"}, []string{"First", "In,", "First", "Out"}, []string{"The", "Road", "Not", "Taken"}, []string{"O'Brian's", "Dog"}}
[][]string{[]string{"H", "C"}, []string{"F", "I", "F", "O"}, []string{"T", "R", "N", "T"}, []string{"O", "D"}}
Hey there, I'm trying to perform a backwards regular expression search on a string to divide it into groups of 3 digits. As far as i can see from the AS3 documentation, searching backwards is not possible in the reg ex engine.
The point of this exercise is to insert triplet commas into a number like so:
10000000 => 10,000,000
I'm thinking of doing it like so:
string.replace(/(\d{3})/g, ",$1")
But this is not correct due to the search not happening from the back and the replace $1 will only work for the first match.
I'm getting the feeling I would be better off performing this task using a loop.
UPDATE:
Due to AS3 not supporting lookahead this is how I have solved it.
public static function formatNumber(number:Number):String
{
var numString:String = number.toString()
var result:String = ''
while (numString.length > 3)
{
var chunk:String = numString.substr(-3)
numString = numString.substr(0, numString.length - 3)
result = ',' + chunk + result
}
if (numString.length > 0)
{
result = numString + result
}
return result
}
If your language supports postive lookahead assertions, then I think the following regex will work:
(\d)(?=(\d{3})+$)
Demonstrated in Java:
import static org.junit.Assert.assertEquals;
import org.junit.Test;
public class CommifyTest {
#Test
public void testCommify() {
String num0 = "1";
String num1 = "123456";
String num2 = "1234567";
String num3 = "12345678";
String num4 = "123456789";
String regex = "(\\d)(?=(\\d{3})+$)";
assertEquals("1", num0.replaceAll(regex, "$1,"));
assertEquals("123,456", num1.replaceAll(regex, "$1,"));
assertEquals("1,234,567", num2.replaceAll(regex, "$1,"));
assertEquals("12,345,678", num3.replaceAll(regex, "$1,"));
assertEquals("123,456,789", num4.replaceAll(regex, "$1,"));
}
}
Found on http://gskinner.com/RegExr/
Community > Thousands separator
Pattern: /\d{1,3}(?=(\d{3})+(?!\d))/g
Replace: $&,
trace ( String("1000000000").replace( /\d{1,3}(?=(\d{3})+(?!\d))/g , "$&,") );
It done the job!
If your regex engine has positive lookaheads, you could do something like this:
string.replace(/(\d)(?=(\d\d\d)+$)/, "$1,")
Where the positive lookahead (?=...) means that the regex only matches when the lookahead expression ... matches.
(Note that lookaround-expressions are not always very efficient.)
While many of these answers work fine with positive integers, many of their argument inputs are cast as Numbers, which implies that they can handle negative values or contain decimals, and here all of the solutions fail. Though the currently selected answer does not assume a Number I was curious to find a solution that could and was also more performant than RegExp (which AS3 does not do well).
I put together many of the answers here in a testing class (and included a solution from this blog and an answer of my own called commaify) and formatted them in a consistent way for easy comparison:
package
{
public class CommaNumberSolutions
{
public static function commaify( input:Number ):String
{
var split:Array = input.toString().split( '.' ),
front:String = split[0],
back:String = ( split.length > 1 ) ? "." + split[1] : null,
n:int = input < 0 ? 2 : 1,
commas:int = Math.floor( (front.length - n) / 3 ),
i:int = 1;
for ( ; i <= commas; i++ )
{
n = front.length - (3 * i + i - 1);
front = front.slice( 0, n ) + "," + front.slice( n );
}
if ( back )
return front + back;
else
return front;
}
public static function getCommaString( input:Number ):String
{
var s:String = input.toString();
if ( s.length <= 3 )
return s;
var i:int = s.length % 3;
if ( i == 0 )
i = 3;
for ( ; i < s.length; i += 4 )
{
var part1:String = s.substr(0, i);
var part2:String = s.substr(i, s.length);
s = part1.concat(",", part2);
}
return s;
}
public static function formatNumber( input:Number ):String
{
var s:String = input.toString()
var result:String = ''
while ( s.length > 3 )
{
var chunk:String = s.substr(-3)
s = s.substr(0, s.length - 3)
result = ',' + chunk + result
}
if ( s.length > 0 )
result = s + result
return result
}
public static function commaCoder( input:Number ):String
{
var s:String = "";
var len:Number = input.toString().length;
for ( var i:int = 0; i < len; i++ )
{
if ( (len-i) % 3 == 0 && i != 0)
s += ",";
s += input.toString().charAt(i);
}
return s;
}
public static function regex1( input:Number ):String
{
return input.toString().replace( /-{0,1}(\d)(?=(\d\d\d)+$)/g, "$1," );
}
public static function regex2( input:Number ):String
{
return input.toString().replace( /-{0,1}\d{1,3}(?=(\d{3})+(?!\d))/g , "$&,")
}
public static function addCommas( input:Number ):String
{
var negative:String = "";
if ( input < 0 )
{
negative = "-";
input = Math.abs(input);
}
var s:String = input.toString();
var results:Array = s.split(/\./);
s = results[0];
if ( s.length > 3 )
{
var mod:Number = s.length % 3;
var output:String = s.substr(0, mod);
for ( var i:Number = mod; i < s.length; i += 3 )
{
output += ((mod == 0 && i == 0) ? "" : ",") + s.substr(i, 3);
}
if ( results.length > 1 )
{
if ( results[1].length == 1 )
return negative + output + "." + results[1] + "0";
else
return negative + output + "." + results[1];
}
else
return negative + output;
}
if ( results.length > 1 )
{
if ( results[1].length == 1 )
return negative + s + "." + results[1] + "0";
else
return negative + s + "." + results[1];
}
else
return negative + s;
}
}
}
Then I tested each for accuracy and performance:
package
{
public class TestCommaNumberSolutions
{
private var functions:Array;
function TestCommaNumberSolutions()
{
functions = [
{ name: "commaify()", f: CommaNumberSolutions.commaify },
{ name: "addCommas()", f: CommaNumberSolutions.addCommas },
{ name: "getCommaString()", f: CommaNumberSolutions.getCommaString },
{ name: "formatNumber()", f: CommaNumberSolutions.formatNumber },
{ name: "regex1()", f: CommaNumberSolutions.regex1 },
{ name: "regex2()", f: CommaNumberSolutions.regex2 },
{ name: "commaCoder()", f: CommaNumberSolutions.commaCoder }
];
verify();
measure();
}
protected function verify():void
{
var assertions:Array = [
{ input: 1, output: "1" },
{ input: 21, output: "21" },
{ input: 321, output: "321" },
{ input: 4321, output: "4,321" },
{ input: 54321, output: "54,321" },
{ input: 654321, output: "654,321" },
{ input: 7654321, output: "7,654,321" },
{ input: 987654321, output: "987,654,321" },
{ input: 1987654321, output: "1,987,654,321" },
{ input: 21987654321, output: "21,987,654,321" },
{ input: 321987654321, output: "321,987,654,321" },
{ input: 4321987654321, output: "4,321,987,654,321" },
{ input: 54321987654321, output: "54,321,987,654,321" },
{ input: 654321987654321, output: "654,321,987,654,321" },
{ input: 7654321987654321, output: "7,654,321,987,654,321" },
{ input: 87654321987654321, output: "87,654,321,987,654,321" },
{ input: -1, output: "-1" },
{ input: -21, output: "-21" },
{ input: -321, output: "-321" },
{ input: -4321, output: "-4,321" },
{ input: -54321, output: "-54,321" },
{ input: -654321, output: "-654,321" },
{ input: -7654321, output: "-7,654,321" },
{ input: -987654321, output: "-987,654,321" },
{ input: -1987654321, output: "-1,987,654,321" },
{ input: -21987654321, output: "-21,987,654,321" },
{ input: -321987654321, output: "-321,987,654,321" },
{ input: -4321987654321, output: "-4,321,987,654,321" },
{ input: -54321987654321, output: "-54,321,987,654,321" },
{ input: -654321987654321, output: "-654,321,987,654,321" },
{ input: -7654321987654321, output: "-7,654,321,987,654,321" },
{ input: -87654321987654321, output: "-87,654,321,987,654,321" },
{ input: .012345, output: "0.012345" },
{ input: 1.012345, output: "1.012345" },
{ input: 21.012345, output: "21.012345" },
{ input: 321.012345, output: "321.012345" },
{ input: 4321.012345, output: "4,321.012345" },
{ input: 54321.012345, output: "54,321.012345" },
{ input: 654321.012345, output: "654,321.012345" },
{ input: 7654321.012345, output: "7,654,321.012345" },
{ input: 987654321.012345, output: "987,654,321.012345" },
{ input: 1987654321.012345, output: "1,987,654,321.012345" },
{ input: 21987654321.012345, output: "21,987,654,321.012345" },
{ input: -.012345, output: "-0.012345" },
{ input: -1.012345, output: "-1.012345" },
{ input: -21.012345, output: "-21.012345" },
{ input: -321.012345, output: "-321.012345" },
{ input: -4321.012345, output: "-4,321.012345" },
{ input: -54321.012345, output: "-54,321.012345" },
{ input: -654321.012345, output: "-654,321.012345" },
{ input: -7654321.012345, output: "-7,654,321.012345" },
{ input: -987654321.012345, output: "-987,654,321.012345" },
{ input: -1987654321.012345, output: "-1,987,654,321.012345" },
{ input: -21987654321.012345, output: "-21,987,654,321.012345" }
];
var i:int;
var len:int = assertions.length;
var assertion:Object;
var f:Function;
var s1:String;
var s2:String;
for each ( var o:Object in functions )
{
i = 0;
f = o.f;
trace( '\rVerify: ' + o.name );
for ( ; i < len; i++ )
{
assertion = assertions[ i ];
s1 = f.apply( null, [ assertion.input ] );
s2 = assertion.output;
if ( s1 !== s2 )
trace( 'Test #' + i + ' Failed: ' + s1 + ' !== ' + s2 );
}
}
}
protected function measure():void
{
// Generate random inputs
var values:Array = [];
for ( var i:int = 0; i < 999999; i++ ) {
values.push( Math.random() * int.MAX_VALUE * ( Math.random() > .5 ? -1 : 1) );
}
var len:int = values.length;
var stopwatch:Stopwatch = new Stopwatch;
var s:String;
var f:Function;
trace( '\rTesting ' + len + ' random values' );
// Test each function
for each ( var o:Object in functions )
{
i = 0;
s = "";
f = o.f;
stopwatch.start();
for ( ; i < len; i++ ) {
s += f.apply( null, [ values[i] ] ) + " ";
}
stopwatch.stop();
trace( o.name + '\t\ttook ' + (stopwatch.elapsed/1000) + 's' ); //(stopwatch.elapsed/len) + 'ms'
}
}
}
}
import flash.utils.getTimer;
class Stopwatch
{
protected var startStamp:int;
protected var stopStamp:int;
protected var _started:Boolean;
protected var _stopped:Boolean;
function Stopwatch( startNow:Boolean = true ):void
{
if ( startNow )
start();
}
public function start():void
{
startStamp = getTimer();
_started = true;
_stopped = false;
}
public function stop():void
{
stopStamp = getTimer();
_stopped = true;
_started = false;
}
public function get elapsed():int
{
return ( _stopped ) ? stopStamp - startStamp : ( _started ) ? getTimer() - startStamp : 0;
}
public function get started():Boolean
{
return _started;
}
public function get stopped():Boolean
{
return _stopped;
}
}
Because of AS3's lack of precision with larger Numbers every class failed these tests:
Test #15 Failed: 87,654,321,987,654,320 !== 87,654,321,987,654,321
Test #31 Failed: -87,654,321,987,654,320 !== -87,654,321,987,654,321
Test #42 Failed: 21,987,654,321.012344 !== 21,987,654,321.012345
Test #53 Failed: -21,987,654,321.012344 !== -21,987,654,321.012345
But only two functions passed all of the other tests: commaify() and addCommas().
The performance tests show that commaify() is the most preformant of all the solutions:
Testing 999999 random values
commaify() took 12.411s
addCommas() took 17.863s
getCommaString() took 18.519s
formatNumber() took 14.409s
regex1() took 40.654s
regex2() took 36.985s
Additionally commaify() can be extended to including arguments for decimal length and zero-padding on the decimal portion — it also outperforms the others at 13.128s:
public static function cappedDecimal( input:Number, decimalPlaces:int = 2 ):Number
{
if ( decimalPlaces == 0 )
return Math.floor( input );
var decimalFactor:Number = Math.pow( 10, decimalPlaces );
return Math.floor( input * decimalFactor ) / decimalFactor;
}
public static function cappedDecimalString( input:Number, decimalPlaces:int = 2, padZeros:Boolean = true ):String
{
if ( padZeros )
return cappedDecimal( input, decimalPlaces ).toFixed( decimalPlaces );
else
return cappedDecimal( input, decimalPlaces ).toString();
}
public static function commaifyExtended( input:Number, decimalPlaces:int = 2, padZeros:Boolean = true ):String
{
var split:Array = cappedDecimalString( input, decimalPlaces, padZeros ).split( '.' ),
front:String = split[0],
back:String = ( split.length > 1 ) ? "." + split[1] : null,
n:int = input < 0 ? 2 : 1,
commas:int = Math.floor( (front.length - n) / 3 ),
i:int = 1;
for ( ; i <= commas; i++ )
{
n = front.length - (3 * i + i - 1);
front = front.slice( 0, n ) + "," + front.slice( n );
}
if ( back )
return front + back;
else
return front;
}
So, I'd offer that commaify() meets the demands of versatility and performance though certainly not the most compact or elegant.
This really isn't the best use of RegEx... I'm not aware of a number formatting function, but this thread seems to provide a solution.
function commaCoder(yourNum):String {
//var yourNum:Number = new Number();
var numtoString:String = new String();
var numLength:Number = yourNum.toString().length;
numtoString = "";
for (i=0; i<numLength; i++) {
if ((numLength-i)%3 == 0 && i != 0) {
numtoString += ",";
}
numtoString += yourNum.toString().charAt(i);
trace(numtoString);
}
return numtoString;
}
If you really are insistent on using RegEx, you could just reverse the string, apply the RegEx replace function, then reverse it back.
A sexeger is good for this. In brief, a sexeger is a reversed regex run against a reversed string that you reverse the output of. It is generally more efficient than the alternative. Here is some pseudocode for what you want to do:
string = reverse string
string.replace(/(\d{3})(?!$)/g, "$1,")
string = reverse string
Here is is a Perl implemntation
#!/usr/bin/perl
use strict;
use warnings;
my $s = 13_456_789;
for my $n (1, 12, 123, 1234, 12345, 123456, 1234567) {
my $s = reverse $n;
$s =~ s/([0-9]{3})(?!$)/$1,/g;
$s = reverse $s;
print "$s\n";
}
You may want to consider NumberFormatter
I'll take the downvotes for not being the requested language, but this non-regex technique should apply (and I arrived here via searching for "C# regex to add commas into number")
var raw = "104241824 15202656 KB 13498560 KB 1612672KB already 1,000,000 or 99.999 or 9999.99";
int i = 0;
bool isnum = false;
var formatted = raw.Reverse().Aggregate(new StringBuilder(), (sb, c) => {
//$"{i}: [{c}] {isnum}".Dump();
if (char.IsDigit(c) && c != ' ' && c!= '.' && c != ',') {
if (isnum) {
if (i == 3) {
//$"ins ,".Dump();
sb.Insert(0, ',');
i = 0;
}
}
else isnum = true;
i++;
}
else {
isnum = false;
i = 0;
}
sb.Insert(0, c);
return sb;
});
results in:
104,241,824 15,202,656 KB 13,498,560 KB 1,612,672KB already 1,000,000 or 99.999 or 9,999.99
// This is a simple code and it works fine...:)
import java.util.Scanner;
public class NumberWithCommas {
public static void main(String a[]) {
Scanner sc = new Scanner(System.in);
String num;
System.out.println("\n enter the number:");
num = sc.next();
printNumber(num);
}
public static void printNumber(String ar) {
int len, i = 0, temp = 0;
len = ar.length();
temp = len / 3;
if (len % 3 == 0)
temp = temp - 1;
len = len + temp;
char[] ch = ar.toCharArray();
char[] ch1 = new char[len];
for (int j = 0, k = (ar.length() - 1); j < len; j++)
{
if (i < 3)
{
ch1[j] = ch[k];
i++;
k--;
}
else
{
ch1[j] = ',';
i = 0;
}
}
for (int j = len - 1; j >= 0; j--)
System.out.print(ch1[j]);
System.out.println("");
}
}
If you can't use lookahead on regular expressions, you can use this:
string.replace(/^(.*?,)?(\d{1,3})((?:\d{3})+)$/, "$1$2,$3")
inside a loop until there's nothing to replace.
For example, a perlish solution would look like this:
my $num = '1234567890';
while ($num =~ s/^(.*?,)?(\d{1,3})((?:\d{3})+)$/$1$2,$3/) {}
Perl RegExp 1 liner:
1 while $VAR{total} =~ s/(.*\d)(\d\d\d)/$1,$2/g;
Try this code. it's simple and best performance.
var reg:RegExp=/\d{1,3}(?=(\d{3})+(?!\d))/g;
var my_num:Number = 48712694;
var my_num_str:String = String(my_num).replace(reg,"$&,");
trace(my_num_str);
::output::
48,712,694