How to replace numbers with numbers separated with numeric separators? - regex

Having a text contains numbers; which regex expression to use to add numeric separators?
The separator character will be an underscore. Starting from the end, for every 3 digits there will be a separator.
Example input:
Key: 100000,
Value: 120000000000
Expected output:
Key: 100_000,
Value: 120_000_000_000
You can use any popular regex flavor (Perl, Pcre, Python etc.)

(?<=\d)(?=(?:\d\d\d)+\b) will get the positions where to insert the underscore.
Then it is just a matter of injecting the underscore, which is a non-regex task. For instance in JavaScript it would be:
let regex = /(?<=\d)(?=(?:\d\d\d)+\b)/g
let inputstr = `Key: 100000,
Value: 120000000000`;
let result = inputstr.replace(regex, "_");
console.log(result);
And in Python:
import re
regex = r"(?<=\d)(?=(?:\d\d\d)+\b)"
inputstr = """Key: 100000,
Value: 120000000000""";
result = re.sub(regex, "_", inputstr)
print(result)

Regular expressions are used to find patterns in a string. What you do with the matches are language specific.
The regular expression pattern to find three numbers is pretty simple: /\d{3}/
Apply that expression to your specific language to retrieve the matches and build your desired output string:
Perl, using split and then join:
$string = "120000000000"
$new_string = join('_', (split /\d{3}/, $string))
# value of $new_string is: 120_000_000_000
PHP, using split and then join:
$string = "120000000000"
$new_string = implode ("_", preg_split("/\d{3}/", $string))
# value of $new_string is: 120_000_000_000
VB, using split and then join:
Dim MyString As String = "120000000000"
Dim new_string As String = String.Join(Regex.Split(MyString, "\d{3}"), "_")
'new_string value is: 120_000_000_000

Related

Extracting Lines of data from a string with RegEx

I have several strings, e.g.
(3)_(9)--(11).(FT-2)
(10)--(20).(10)/test--(99)
I am trying Regex.Match(here I do no know) to get a list like this:
First sample:
3
_
9
--
11
.
FT-1
Second Sample:
10
--
20
.
10
/test--
99
So there are several numbers in brackets and any text between them.
Can anyone help me doing this in vb.net? A given string returns this list?
One option is to use the Split method of [String]
"(3)_(9)--(11).(FT-2)".Split('()')
Another option is to match everything excluding ( and )
As regex, this would do [^()]+
Breakdown
"[^()]" ' Match any single character NOT present in the list “()”
"+" ' Between one and unlimited times, as many times as possible, giving back as needed (greedy)
You can use following block of code to extract all matches
Try
Dim RegexObj As New Regex("[^()]+", RegexOptions.IgnoreCase)
Dim MatchResults As Match = RegexObj.Match(SubjectString)
While MatchResults.Success
' matched text: MatchResults.Value
' match start: MatchResults.Index
' match length: MatchResults.Length
MatchResults = MatchResults.NextMatch()
End While
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
This should work:
Dim input As String = "(3)_(9)--(11).(FT-2)"
Dim searchPattern As String = "\((?<keep>[^)]+)\)|(?<=\))(?<keep>[^()]+)"
Dim replacementPattern As String = "${keep}" + Environment.NewLine
Dim output As String = RegEx.Replace(input, searchPattern, replacementPattern)
The simplest way is to use Regex.Split (formulated as a little console test):
Dim input = {"(3)_(9)--(11).(FT-2)", "(10)--(20).(10)/test--(99)"}
For Each s As String In input
Dim parts = Regex.Split(s, "\(|\)")
Console.WriteLine($"Input = {s}")
For Each p As String In parts
Console.WriteLine(p)
Next
Next
Console.ReadKey()
So basically we have a one-liner for the regex part.
The regular expression \(|\) means: split at ( or ) where the braces are escaped with \ because of their special meaning within regex.
The slightly shorter regex [()] where the desired characters are enclosed in [] would produce the same result.

VBScript RegEx - match between words

I'm having a hard time coming up with a working RegEx that words in VBScript. I'm trying to match all text between 2 keywords:
(?<=key)(.*)(?=Id)
This throws a RegEx error in VBScript. Id
Blob I'm matching against:
\"key\":[\"food\",\"real\",\"versus\",\"giant\",\"giant gummy\",\"diy candy\",\"candy\",\"gummy worm\",\"pizza\",\"fries\",\"spooky diy science\",\"spooky\",\"trapped\"],\"Id\"
Ideally, I'd end up with a comma delimited list like this:
food,real,versus,giant,giant gummy,diy candy,candy,gummy worm,pizza,fries,spooky diy science,spooky,trapped
but, I'd settle for all text between 2 keywords working in VBScript.
Thanks in advance!
VBScript's regular expression engine doesn't support lookbehind assertions, so you'll want to do something like this instead:
s = "\""key\"":[\""food\"",\""real\"",\""trapped\""],\""Id\"""
'remove backslashes and double quotes from string
s1 = Replace(s, "\", "")
s1 = Replace(s1, Chr(34), "")
Set re = New RegExp
re.Pattern = "key:\[(.*?)\],Id"
For Each m In re.Execute(s1)
list = m.Submatches(0)
Next
WScript.Echo list

How to search and add a string before the pattern in python using regular expressions

Iam writing a program to find and replace a string from all the files with the given extension.Here Iam using regular expressions for searching.The task is to findall the occurances and modify those.
If my string is "The number is 1234567890"
Result after searching and replacing should be +911234567890
I think i can re.sub() here like
s = "The number is 1234567890"
re.sub(r"\d{10}",??,s)
What can be given as the second argument here i don't know what the number would be i have modify the same matched string by preceding it with +91
I could do it using the findall from re and replace from string like
s = "The number is 1234567890 and 2345678901"
matches = re.findall(r'\d{10}',s)
for match in matches:
s = s.replace(match,"+91"+match)
After this s is The number is +911234567890 and +912345678901
Is this the only way of doing it??Is it not possible using re.sub() ??
If it is please help me.Thank you...!
Try this regex:
(?=\d{10})
Click for Demo
Explanation:
(?=\d{10}) - positive lookahead to find a zero-length match which is immediately followed by 10 digits
Code
import re
regex = r"(?=\d{10})"
test_str = "The number is 1234567890"
subst = "+91"
result = re.sub(regex, subst, test_str, 0)
if result:
print (result)
Code Output
OR
If I use your regex, the code would look like:
import re
regex = r"(\d{10})"
test_str = "The number is 1234567890"
subst = "+91$1"
result = re.sub(regex, subst, test_str, 0)
if result:
print (result)

Removing commas used as thousand separators in dollar price amounts in longer strings using Scala regex

I am trying to remove the , in dollar values in a string. For example I have a string: val str = "Hello the cost is $323,999 and it has 3 modes 1,2, and 3"
I basically want to get the output: "Hello the cost is $323999 and it has 3 modes 1,2, and 3"
I used the regex:
val pattern = """\$([0-9]+(?:,[0-9]+)*)""".r
val replacedStr = pattern replaceAllIn (str, m => m.group(1).replace(",", ""))
The issue is that due to the $3 in the regex match, scala is trying to find a group 3 in the regex match and giving me java.lang.IndexOutOfBoundsException: No group 3
How do I get rid of this issue?
Add the dollar symbol back when replacing, but escape it with double backslashes:
val pattern = """\$([0-9]+(?:,[0-9]+)*)""".r
val replacedStr = pattern replaceAllIn (str, m => "\\$" + m.group(1).replace(",", ""))
^^^^^
See IDEONE demo
You need to tell the regular expression compiler to ignore the dollar symbol, but since it is Java String, two backslashes must be used to get a literal backslash into the String.

Good way to find / replace using regex

I have an value that is not being read by our OCR program correctly. It's predicable so I would like to use a find/replace in regex (because this is how we are already extracting the data).
We get the named group like this: (?<Foo>.*?)
I would like to replace 'N1123456' with 'NY123456'. We know that we expect NY when we are getting N1.
What can I try to get this done in the same regular expression?
Edit: (?<Foo>.*?)
Make groups of non digits and digits and add Y after non digit group.
(\D+)(\d+)
Here is demo
Enclose it inside \b or ^ and $ for better precision.
Sample code:
PHP:
$re = ""(\\D+)(\\d+)"";
$str = "N1123456";
$subst = '$1Y$2';
$result = preg_replace($re, $subst, $str, 1);
Python:
import re
p = re.compile(ur'(\D+)(\d+)')
test_str = u"N1123456"
subst = u"$1Y$2"
result = re.sub(p, subst, test_str)
Java:
System.out.println("N1123456".replaceAll("(\\D+)(\\d+)","$1Y$2"));
If you expect N1 to be always followed by 6 digits, then you can do this:
Replace this: \bN1(\d{6})\b with this: NY$1.
This will replace any N1 followed by 6 digits with NY.
This is what I would do:
Dim str = Regex.Replace("N1123456", #"\bN1(\d+)", "NY$1");
The Expression to find the text is N1 followed by numbers like : \bN1(\d+).
The numbers belongs to the group(1) I would like to preserve and attach to NY during replacing: NY$1