Split comma delimited string to array using regex - regex

I have a string as below, which needs to be split to an array, using VB.NET
10,"Test, t1",10.1,,,"123"
The result array must have 6 rows as below
10
Test, t1
10.1
(empty)
(empty)
123
So:
1. quotes around strings must be removed
2. comma can be inside strings, and will remain there (row 2 in result array)
3. can have empty fields (comma after comma in source string, with nothing in between)
Thanks

Don't use String.Split(): it's slow, and doesn't account for a number of possible edge cases.
Don't use RegEx. RegEx can be shoe-horned to do this accurately, but to correctly account for all the cases the expression tends to be very complicated, hard to maintain, and at this point isn't much faster than the .Split() option.
Do use a dedicated CSV parser. Options include the Microsoft.VisualBasic.TextFieldParser type, FastCSV, linq-to-csv, and a parser I wrote for another answer.

You can write a function yourself. This should do the trick:
Dim values as New List(Of String)
Dim currentValueIsString as Boolean
Dim valueSeparator as Char = ","c
Dim currentValue as String = String.Empty
For Each c as Char in inputString
If c = """"c Then
If currentValueIsString Then
currentValueIsString = False
Else
currentValueIsString = True
End If
End If
If c = valueSeparator Andalso not currentValueIsString Then
If String.IsNullOrEmpty(currentValue) Then currentValue = "(empty)"
values.Add(currentValue)
currentValue = String.Empty
End If
currentValue += c
Next

Here's another simple way that loops by the delimiter instead of by character:
Public Function Parser(ByVal ParseString As String) As List(Of String)
Dim Trimmer() As Char = {Chr(34), Chr(44)}
Parser = New List(Of String)
While ParseString.Length > 1
Dim TempString As String = ""
If ParseString.StartsWith(Trimmer(0)) Then
ParseString = ParseString.TrimStart(Trimmer)
Parser.Add(ParseString.Substring(0, ParseString.IndexOf(Trimmer(0))))
ParseString = ParseString.Substring(Parser.Last.Length)
ParseString = ParseString.TrimStart(Trimmer)
ElseIf ParseString.StartsWith(Trimmer(1)) Then
Parser.Add("")
ParseString = ParseString.Substring(1)
Else
Parser.Add(ParseString.Substring(0, ParseString.IndexOf(Trimmer(1))))
ParseString = ParseString.Substring(ParseString.IndexOf(Trimmer(1)) + 1)
End If
End While
End Function
This returns a list. If you must have an array just use the ToArray method when you call the function

Why not just use the split method?
Dim s as String = "10,\"Test, t1\",10.1,,,\"123\""
s = s.Replace("\"","")
Dim arr as String[] = s.Split(',')
My VB is rusty so consider this pseudo-code

Related

Sort a String array

I have as input the string in the below format
"[1_5,3,7,1],[1_2,4,1,9],[],[1_1,,4,,,9,2]"
What I need to obtain is the same string but with the number after the _ sorted:
"[1_1,3,5,7],[1_1,2,4,9],[],[1_1,2,4,9,,,]"
Dim tmprequestedArea_selectionAreaIn As String = "[1_5,3,7,1],[1_2,4,1,9],[],[1_1,,4,,,9,2]"
tmprequestedArea_selectionAreaIn = Regex.Replace(requestedArea_selectionAreaIn,"\],\[","#")
tmprequestedArea_selectionAreaIn = Regex.Replace(tmprequestedArea_selectionAreaIn,"\[|\]","")
bracList.AddRange(tmprequestedArea_selectionAreaIn.Split(New Char() {"#"c}, StringSplitOptions.None ))
If sortNumber Then
'Split braclist by _ and puts the value in strList
'If after _ is only one number put only that number, else split it by char "," and put in strList the join of the split by , array
'Sort the array
'in previous example strList will contain a,b,c in position 0 and _d_f (instead of f,d) in position 1
For i As Integer = 0 To bracList.Count -1
Dim tmp As String()
Dim tmpInt As New System.Collections.Generic.List(Of Integer)
If Not(String.IsNullOrEmpty(bracList(i))) Then
Dim tmpRequested As String = bracList(i).Split(New Char() {"_"c})(0)
Dim tmpSelection As String = bracList(i).Split(New Char() {"_"c})(1)
If tmpSelection.Contains(",") Then
tmp = tmpSelection.Split(New Char() {","c})
For j As Integer = 0 To tmp.Length -1
tmpInt.Add(Convert.toInt32(tmp(j)))
Next
tmpInt.Sort
strList.Add("[" + tmpRequested + "_" + String.Join(",",tmpInt ) + "]")
Else
strList.Add("[" + tmpRequested + "_" + tmpSelection + "]" )
End If
Else
strList.Add("[]")
End If
Next i
I'm looking for a better way to manage it.
Try this, as a possible substitute for what you're doing now.
Given this input string:
Dim input As String = "[1_5,3,7,1],[1_2,4,1,9],[],[1_1,,4,,,9,2]"
Note: this will also deal with decimal values without changes. E.g.,
"[1_5.5,3.5,7,1],[1_2.564,4,2.563,9],[],[1_1,,4.23,,,9.0,2.45]"
You can extract the content of the brackets with this pattern: \[(.*?)\] and use Regex.Matches to return a MatchCollection of all the substrings that match the pattern.
Then use a StringBuilder as a container to rebuild the string while the parts are being treated.
Imports System.Linq
Imports System.Text.RegularExpressions
Dim pattern As String = "\[(.*?)\]"
Dim matches = Regex.Matches(input, pattern, RegexOptions.Singleline)
Dim sb As New StringBuilder()
For Each match As Match In matches
Dim value As String = match.Groups(1).Value
If String.IsNullOrEmpty(value) Then
sb.Append("[],")
Continue For
End If
Dim sepPosition As Integer = value.IndexOf("_"c) + 1
sb.Append("[" & value.Substring(0, sepPosition))
Dim values = value.Substring(sepPosition).Split(","c)
sb.Append(String.Join(",", values.Where(Function(n) n.Length > 0).OrderBy(Function(n) CDec(n))))
sb.Append(","c, values.Count(Function(n) n.Length = 0))
sb.Append("],")
Next
Dim result As String = sb.ToString().TrimEnd(","c)
If you don't know about LINQ, this is what it's doing:
String.Join(",", values.Where(Function(n) n.Length > 0).OrderBy(Function(n) CDec(n)))
values is an array of strings, generated by String.Split().
values.Where(Function(n) n.Length > 0): creates an Enumerable(Of String) from values Where the content, n, is a string of length > 0.
I could have written values.Where(Function(n) Not String.IsNUllOrEmpty(n)).
.OrderBy(Function(n) CDec(n))): Orders the resulting Enumerable(Of String) using the string value converted to Decimal and generates an Enumerable(Of String), which is passed back to String.Join(), to rebuild the string, adding a char (","c) between the parts.
values.Count(Function(n) n.Length = 0): Counts the elements of values that have Length = 0 (empty strings). This is the number of empty elements that are represented by a comma, appended at the end of the partial string.
If you are looking for a "way"
I think it is easier to fetch each char of the string and if it is a number you put it in array (and when the char is ']' you start new array) the sort the arrays and replace each number from the string with it's sorted number (so you will just do allocation without the need to reconstruct with regular expression
I wish that I had Visual Studio to provide you the code (it is joyful to code a riddle) ^_^
ps:for the commas you can use a counter for each blank commas an the put it in the end

VBA function Regex

Do you have an idea of what is wrong in this code please? It should extract all caps and the pattern "1WO" if available.
For example in "User:399595:Account:ETH:balance", i should have "UAETH" and in "User:197755:Account:1WO:balance" i should have "UA1WO"
Thank you
Option Explicit
Function ExtractCap(Txt As String) As String
Application.Volatile
Dim xRegEx As Object
Set xRegEx = CreateObject("VBSCRIPT.REGEXP")
If xRegEx.Pattern = "[^A-Z]" Then
xRegEx.Global = True
xRegEx.MultiLine = False
ExtractCap = xRegEx.Replace(Txt, "")
Set xRegEx = Nothing
Else: xRegEx.Pattern = "1WO"
ExtractCap = xRegEx.Execute(Txt)
End If
End Function
I'm not a "RegEx" expert, so you may want to try an alternative:
Function ExtractCap(Txt As String) As String
Application.Volatile
Dim i As Long
For i = 1 To Len(Txt)
Select Case Asc(Mid(Txt, i, 1))
Case 65 To 90
ExtractCap = ExtractCap & Mid(Txt, i, 1)
End Select
Next
End Function
while, should the pattern of your data strictly be as you showed, you could also consider:
Function ExtractCap(Txt As String) As String
Application.Volatile
ExtractCap = "UA" & Split(Txt, ":")(3)
End Function
Your RegEx works like this:
Function ExtractCap(Txt As String) As String
Application.Volatile
Dim xRegEx As Object
Set xRegEx = CreateObject("VBScript.RegExp")
With xRegEx
.Pattern = "[^A-Z]"
.Global = True
.MultiLine = False
ExtractCap = .Replace(Txt, vbNullString)
End With
If Txt = ExtractCap Then ExtractCap = "1WO"
End Function
Public Sub TestMe()
Debug.Print ExtractCap("User:399595:Account:ETH:balance")
End Sub
In your code, there were 2 errors, which stopped the execution:
xRegEx was set to Nothing and then it was asked to provide a value;
the check If xRegEx.Pattern = "[^A-Z]" does not actually mean a lot to VBA. E.g., you are setting a Pattern and making a condition out of it. If you want to know whether a pattern exists in a RegEx, you should compare the two strings - before and after the execution of the pattern.
Your problem can be easily solved.
Firstly, I assumed that 1WO can appears at most once in your string.
Based on that assumption, logic is as follows:
Define function, which extracts all capital letters from strings.
Now, in the main function, you split your string first using 1WO as delimeter. Now, pass every string (after splitting) to function, get all the caps from those strings and concatenate them again with 1WO in its place.
Option Explicit
Public Function Extract(str As String) As String
Dim s As Variant
For Each s In Split(str, "1WO")
'append extracted caps with 1WO at the end
Extract = Extract & ExtractCaps(s) & "1WO"
Next
'delete lest 1WO from result
Extract = Left(Extract, Len(Extract) - 3)
End Function
Function ExtractCaps(str As Variant) As String
Dim i As Long, char As String
For i = 1 To Len(str)
char = Mid(str, i, 1)
If Asc(char) > 64 And Asc(char) < 91 And char = UCase(char) Then
ExtractCaps = ExtractCaps & char
End If
Next
End Function
If you put this code in inserted Module, you can use it in a worksheet in formula: =Extract(A1).

VB NET regexp matching numerical substrings

I'm trying to make a vb function that takes as input a String and returns, if exist, the string made of numeric digits from the beginning until the first non numerical char, so:
123 -> 123
12f -> 12
12g34 -> 12
f12 -> ""
"" -> ""
I wrote a function that incrementally compares the result matching the regex, but it goes on even on non numeric characters...
This is the function:
Public Function ParseValoreVelocita(ByVal valoreRaw As String) As String
Dim result As New StringBuilder
Dim regexp As New Regex("^[0-9]+")
Dim tmp As New StringBuilder
Dim stringIndex As Integer = 0
Dim out As Boolean = False
While stringIndex < valoreRaw.Length AndAlso Not out
tmp.Append(valoreRaw.ElementAt(stringIndex))
If regexp.Match(tmp.ToString).Success Then
result.Append(valoreRaw.ElementAt(stringIndex))
stringIndex = stringIndex + 1
Else
out = True
End If
End While
Return result.ToString
End Function
The output always equals the input string, so there's something wrong and I can't get out of it...
Here's a LINQ solution that doesn't need regex and increases readability:
Dim startDigits = valoreRaw.TakeWhile(AddressOf Char.IsDigit)
Dim result As String = String.Concat(startDigits)
Try this instead. You need to use a capture group:
Public Function ParseValoreVelocita(ByVal valoreRaw As String) As String
Dim result As New StringBuilder
Dim regexp As New Regex("^([0-9]+)")
Dim tmp As New StringBuilder
Dim stringIndex As Integer = 0
Dim out As Boolean = False
While stringIndex < valoreRaw.Length AndAlso Not out
tmp.Append(valoreRaw.ElementAt(stringIndex))
If regexp.Match(tmp.ToString).Success Then
result.Append(regexp.Match(tmp.ToString).Groups(1).Value)
stringIndex = stringIndex + 1
Else
out = True
End If
End While
Return result.ToString
End Function
The expression:
Dim regexp As New Regex("^([0-9]+)")
and the result appending lines have been updated:
result.Append(regexp.Match(tmp.ToString).Groups(1).Value)
You have made your code very complex for a simple task.
Your loop keeps trying to build a longer string and it keeps checking if it is still working with digits, and if so keep appending results.
So and input string of "123x" would, if your code worked, produce a string of "112123" as output. In other words it matches the "1", then "12", then "123"and concatenates each before exiting after it finds the "x".
Here's what you should be doing:
Public Function ParseValoreVelocita(valoreRaw As String) As String
Dim regexp As New Regex("^([0-9]+)")
Dim match = regexp.Match(valoreRaw)
If match.Success Then
Return match.Groups(1).Captures(0).Value
Else
Return ""
End If
End Function
No loop and you let the regex do the work.

Get/split text inside brackets/parentheses

Just have a list of words, such as:
gram (g)
kilogram (kg)
pound (lb)
just wondering how I would get the words within the brackets for example get the "g" in "gram (g)" and dim it as a new string.
Possibly using regex?
Thanks.
Use split function ..
strArr = str.Split("(") ' splitting 'gram (g)' returns an array ["gram " , "g)"] index 0 and 1
strArr2 = strArr[1].Split(")") ' splitting 'g)' returns an array ["g " ..]
the string is in
strArr2[0]
Edit
you want getAbbrev and getAbbrev2 to be arrays
try
Dim getAbbrev As String() = Str.Split("(")
Dim getAbbrev2 as String() = getAbbrev[1].Split(")")
To do it without declaring arrays you can do
"gram (g)".Split("(")[1].Split(")")[0]
but that's unreadable
Edit
You have some very trivial errors. I would suggest you strengthen your understanding on objects and declarations first. Then you can look into invoking methods. I rather have you understand it than give it to you. Re-read the book you have or look for a basic tutorial.
Dim unit As String = 'make sure this is the actual string you are getting, not sure where you are supposed to get the string value from => ie grams (g)
Dim getAbbrev As String() = unit.Split("(") 'use unit not Str - Str does not exist
Dim getAbbrev2 As String() = getAbbrev[1].Split(")") 'As no as - case sensitive
for the last line reference getAbbrev2 instead of the unknown abbrev2
Fun with Regular Expressions (I'm really not an expert here, but tested and works)
Imports System.Text.RegularExpressions
.....
Dim charsToTrim() As Char = { "("c, ")"c }
Dim test as String = "gram (g)" + Environment.NewLine +
"kilogram (kg)" + Environment.NewLine +
"pound (lb)"
Dim pattern as String = "\([a-zA-Z0-9]*\)"
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
Dim m As Match = r.Match(test)
While(m.Success)
System.Diagnostics.Debug.WriteLine("Match" + "=" + m.Value.ToString())
Dim tempText as String = m.Value.ToString().Trim(charsToTrim)
System.Diagnostics.Debug.WriteLine("String Trimmed" + "=" + tempText)
m = m.NextMatch()
End While
You can split at the space and remove the parens from the second token (by replacing them with an empty string).
A regex is also an option, and is very simple, its pattern is
\w+\s+\((\w+)\)
Which means, a word, then at least one space, then opening parens, then in real regex parens you search for a word, and, eventually a closing paren. The inner parentheses are capturing parentheses, which make it possible to refer to the unit g, kg, lb.

Regular expression to extract numbers from long string containing lots of punctuation

I am trying to separate numbers from a string which includes %,/,etc for eg (%2459348?:, or :2434545/%). How can I separate it, in VB.net
you want only the numbers right?
then you could do it like this
Dim theString As String = "/79465*44498%464"
Dim ret = Regex.Replace(theString, "[^0-9]", String.Empty)
hth
edit:
or do you want to split by all non number chars?
then it would go like this
Dim ret = Regex.Split(theString, "[^0-9]")
You could loop through each character of the string and check the .IsNumber() on it.
This should do:
Dim test As String = "%2459348?:"
Dim match As Match = Regex.Match(test, "\d+")
If match.Success Then
Dim result As String = match.Value
' Do something with result
End If
Result = 2459348
Here's a function which will extract all of the numbers out of a string.
Public Function GetNumbers(ByVal str as String) As String
Dim builder As New StringBuilder()
For Each c in str
If Char.IsNumber(c) Then
builder.Append(c)
End If
Next
return builder.ToString()
End Function