Regular expression to extract numbers from long string containing lots of punctuation - regex

I am trying to separate numbers from a string which includes %,/,etc for eg (%2459348?:, or :2434545/%). How can I separate it, in VB.net

you want only the numbers right?
then you could do it like this
Dim theString As String = "/79465*44498%464"
Dim ret = Regex.Replace(theString, "[^0-9]", String.Empty)
hth
edit:
or do you want to split by all non number chars?
then it would go like this
Dim ret = Regex.Split(theString, "[^0-9]")

You could loop through each character of the string and check the .IsNumber() on it.

This should do:
Dim test As String = "%2459348?:"
Dim match As Match = Regex.Match(test, "\d+")
If match.Success Then
Dim result As String = match.Value
' Do something with result
End If
Result = 2459348

Here's a function which will extract all of the numbers out of a string.
Public Function GetNumbers(ByVal str as String) As String
Dim builder As New StringBuilder()
For Each c in str
If Char.IsNumber(c) Then
builder.Append(c)
End If
Next
return builder.ToString()
End Function

Related

Finding the first occurrence of a regex match using match.index

How do I access the .index method in regex, so in this case it should output the location of the first instance of a number
Dim sourceString As String = "abcdefg12345"
Dim textboxregex As Regex = New Regex("^(d)$")
If textboxregex.IsMatch(sourceString) Then
Console.WriteLine(Match.Index) 'this should display the location of the first occurrence of the pattern within the sourcestring
End If
In this case you dont need regex:
Dim digits = From chr In sourceString Where Char.IsDigit(chr)
Dim index = -1
If digits.Any() Then index = sourceString.IndexOf(digits.First())
or in one statement with the ugly method syntax:
Dim index As Int32 = "abcdefg12345".
Select(Function(chr, ix) New With {chr, ix}).
Where(Function(x) Char.IsDigit(x.chr)).
Select(Function(x) x.ix).
DefaultIfEmpty(-1).
First()
Try a Lookbehind:
Dim textboxregex As Regex = New Regex("(?<=\D)\d")
If textboxregex.IsMatch(sourceString) Then
Console.WriteLine(textboxregex.Match(sourceString).Index)
End If
This will match the first occurence of a digit after all non digit characters.
Your Regex expression is wrong (you need the '\' before the d) and you haven't defined Match
Dim sourceString As String = "abcdefg12345"
Dim textboxregex As Regex = New Regex("\d")
Dim rxMatch as Match = textboxregex.Match(sourceString)
If rxMatch.success Then
Console.WriteLine(rxMatch.Index) 'this should display the location of the first occurrence of the pattern within the sourcestring
End If

Replace the second char of a string

I have a string variable.
Dim str As String = "ABBCD"
I want to replace only the second 'B' character of str (I mean the second occurrence)
my code
Dim regex As New Regex("B")
Dim result As String = regex.Replace(str, "x", 2)
'result: AxxCD
'but I want: ABxCD
What's the easiest way to do this with Regular Expressions.
thanks
Dim str As String = "ABBCD"
Dim matches As MatchCollection = Regex.Matches(str, "B")
If matches.Count >= 2 Then
str = str.Remove(matches(1).Index, matches(1).Length)
str = str.Insert(matches(1).Index, "x")
End If
First we declare the string 'str', then find the matches of "B". If we found two results or more, replace the second result with "x".
How about:
resultString = Regex.Replace(subjectString, #"(B)\1", "$+x");
Use a positive lookbehind:
Dim regex As New Regex("(?<=B)B")
Live demo
If ABCABCABC should produce ABCAxCABC, then the following regex will work:
(?<=^[^B]*B[^B]*)B
Usage:
Dim result As String = Regex.Replace(str, "(?<=^[^B]*B[^B]*)B", "x")
I assume BB was just an example, it can be CC, DD, EE, etc..
Based on that, the regex below will replace any repeated character in the string.
resultString = Regex.Replace(subjectString, #"(\w)\1", "$1x");
'Alternative way to replace the second occurrence
'only of B in the string with X
Dim str As String = "ABBCD"
Dim pattern As String = "B"
Dim reg As Regex = New Regex(pattern)
Dim replacement As String = "X"
'find position of second B
Dim secondBpos As Integer = Regex.Matches(str, pattern)(1).Index
'replace that B with X
Dim result As String = reg.Replace(str, replacement, 1, secondBpos)
MessageBox.Show(result)

Remove all the String before :

"\:(.*)$"
Hi all i am using above expression to remove all the string before : (colon), but it is giving me all the string before this. how can i do this. Thanks a lot.
My string is:
This is text: Hi here we go
I am getting: This is text
I want : Hi here we go
Updated code
Sub Main()
Dim input As String = "This is text with : far too much "
Dim pattern As String = "\:(.*)$"
Dim replacement As String = " "
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(input, replacement)
Console.WriteLine("Original String: {0}", input)
' MsgBox("Original String: {0}")
Console.WriteLine("Replacement String: {0}", result)
MsgBox("Original String: {0}")
End Sub
Try this pattern. This will help you to match string after colon
/?:(.)/
or
/: (.+)/
It should be:
Dim pattern As String = "(.*)\:"
' in vb if above one doesn't work, then try this one
' Dim pattern As String = "^(.*)\:"
' also i don't think we need to use any brackets here as well.
This regex means, anything before the colon(:), Where you were using anything after the colon(:) in your example.
If you are not dead set on RegEx then you can also use
Dim result As String
result = Strings.Split(Input, ":", 2)(1)
This splits the input into an array with two elements. First element is the text before the first ":", the second element is the text after.

Split comma delimited string to array using regex

I have a string as below, which needs to be split to an array, using VB.NET
10,"Test, t1",10.1,,,"123"
The result array must have 6 rows as below
10
Test, t1
10.1
(empty)
(empty)
123
So:
1. quotes around strings must be removed
2. comma can be inside strings, and will remain there (row 2 in result array)
3. can have empty fields (comma after comma in source string, with nothing in between)
Thanks
Don't use String.Split(): it's slow, and doesn't account for a number of possible edge cases.
Don't use RegEx. RegEx can be shoe-horned to do this accurately, but to correctly account for all the cases the expression tends to be very complicated, hard to maintain, and at this point isn't much faster than the .Split() option.
Do use a dedicated CSV parser. Options include the Microsoft.VisualBasic.TextFieldParser type, FastCSV, linq-to-csv, and a parser I wrote for another answer.
You can write a function yourself. This should do the trick:
Dim values as New List(Of String)
Dim currentValueIsString as Boolean
Dim valueSeparator as Char = ","c
Dim currentValue as String = String.Empty
For Each c as Char in inputString
If c = """"c Then
If currentValueIsString Then
currentValueIsString = False
Else
currentValueIsString = True
End If
End If
If c = valueSeparator Andalso not currentValueIsString Then
If String.IsNullOrEmpty(currentValue) Then currentValue = "(empty)"
values.Add(currentValue)
currentValue = String.Empty
End If
currentValue += c
Next
Here's another simple way that loops by the delimiter instead of by character:
Public Function Parser(ByVal ParseString As String) As List(Of String)
Dim Trimmer() As Char = {Chr(34), Chr(44)}
Parser = New List(Of String)
While ParseString.Length > 1
Dim TempString As String = ""
If ParseString.StartsWith(Trimmer(0)) Then
ParseString = ParseString.TrimStart(Trimmer)
Parser.Add(ParseString.Substring(0, ParseString.IndexOf(Trimmer(0))))
ParseString = ParseString.Substring(Parser.Last.Length)
ParseString = ParseString.TrimStart(Trimmer)
ElseIf ParseString.StartsWith(Trimmer(1)) Then
Parser.Add("")
ParseString = ParseString.Substring(1)
Else
Parser.Add(ParseString.Substring(0, ParseString.IndexOf(Trimmer(1))))
ParseString = ParseString.Substring(ParseString.IndexOf(Trimmer(1)) + 1)
End If
End While
End Function
This returns a list. If you must have an array just use the ToArray method when you call the function
Why not just use the split method?
Dim s as String = "10,\"Test, t1\",10.1,,,\"123\""
s = s.Replace("\"","")
Dim arr as String[] = s.Split(',')
My VB is rusty so consider this pseudo-code

Match "THIS" And Replace with "THAT" RegEx Vb.Net

Trying to find out how to find and replace text with corresponding values.
For Example
1) fedex to FedEx
2) nasa to NASA
3) po box to PO BOX
Public Function FindReplace(ByVal s As String) As String
Dim MatchEval As New MatchEvaluator(AddressOf RegexReplace)
Dim Pattern As String = "(?<f1>fedex|nasa|po box)"
Return Regex.Replace(s, Pattern, MatchEval, RegexOptions.IgnoreCase)
End Function
Public Function RegexReplace(ByVal m As Match) As String
Select Case LCase(m.Groups("f1").Value)
Case "fedex"
Return "FedEx"
Case "nasa"
Return "NASA"
Case "po box"
Return "PO BOX"
End Select
End Function
The above code is working fine for fixed values but don't know how to use the above code to match added values on run-time like db to Db.
I'd guess, that the only thing here you need Regex for is IgnoreCase option. If so, then I would like to suggest not to use Regex at all. Use String functionality instead:
Dim input As String = "fEDeX"
Dim pattern As String = "fedex"
Dim replacement As String = "FedEx"
Dim result As String
result = input.ToLowerInvariant().Replace(pattern, replacement)
But if you still need Regex, then this should work:
result = Regex.Replace(input, pattern, replacement, RegexOptions.IgnoreCase)
Example:
Sub Main()
Dim replacements As New Dictionary(Of String, String)()
replacements.Add("fedex", "FedEx")
replacements.Add("nasa", "NASA")
replacements.Add("po box", "PO BOX")
Dim result As String = Replace("fedex, nAsA, po box, etc", replacements)
End Sub
Private Function Replace(ByVal input As String, ByVal replacements As Dictionary(Of String, String)) As String
For Each item In replacements
input = Regex.Replace(input, item.Key, item.Value, RegexOptions.IgnoreCase)
Next
Return input
End Function
Found the solution by using List and did the performance test against dictionary object suggested by Anton Kedrov both methods takes almost same time to complete but i don't know the dictionary method will be good or not for longer replacement list because it loop through all the list to find the match entry for replacement.
I thank you all for your suggestion and advice.
Sub Main()
Dim lst As New List(Of String)
lst.Add("NASA")
lst.Add("FedEx")
lst.Add("PO BOX")
MsgBox(FindReplace("this is testing fedex naSa PO box"))
End Sub
Public Function FindReplace(ByVal s As String) As String
Dim Pattern As String = "(?<f1>fedex|nasa|po box)"
Dim MatchEval As New MatchEvaluator(AddressOf RegexReplace)
Return Regex.Replace(s, Pattern, MatchEval, RegexOptions.IgnoreCase)
End Function
Public Function RegexReplace(ByVal m As Match) As String
Dim Found As String
Found = lst.Find(Function(value As String) LCase(value) = LCase(m.Groups("f1").Value))
Return Found
End Function