Get Strings from Textbox and put in varible array - regex

I have some dynamic lines of text in a TextBox
TextBox example:
Nombre : Maria
Nombre : Carlos Manuel
Nombre : Antonio
Nombre : Ana Gabriela
.
.
.
I need to get the names only into an array.
The names are to the right of the " : "
Dim myMatches As MatchCollection
Dim myPattern As New Regex(" : ")
Dim myString As String = TextBox1.Text
myMatches = myPattern.Matches(myString)
Dim successfulMatch As Match
Dim counter As Integer = 0
Dim names(counter) As String
For Each successfulMatch In myMatches
counter = counter + 1
names = TextBox1.Text.Split(" : ").Last
Next
I want to put the names into an array
names(1) = Maria
names(2) = Carlos Manuel
names(3) = Antonio
names(4) = Ana Gabriela
.
.
.

You just need the TextBox.Lines collection. This property return an Array of strings representing all the lines of text (sub-strings of text separated by a line feed) contained in a TextBoxBase control.
Find the last : char and, from this position, take the text to the end of the string:
Dim listOfNames = New List(Of String)
For Each line As String In TextBox1.Lines
If Not String.IsNullOrEmpty(line) Then
listOfNames.Add(line.Substring(line.LastIndexOf(":") + 1).TrimStart())
End If
Next
Using an array of strings, instead of a List(Of String)
Dim lines = TextBox1.Lines
Dim arrayOfNames(lines.Length - 1) As String
For i As Integer = 0 To lines.Length - 1
If Not String.IsNullOrEmpty(lines(i)) Then
arrayOfNames(i) = lines(i).Substring(lines(i).LastIndexOf(":") + 1).TrimStart()
End If
Next
In case you have to use a RegEx for some reason.
Using the List(Of String) seen before to store the results:
Dim regx = New Regex(":", RegexOptions.Multiline).Matches(TextBox1.Text)
For Each match As Match In regx
Dim position = TextBox1.Text.IndexOf(Environment.NewLine, match.Index)
If position = -1 Then position = TextBox1.Text.Length - 1
listOfNames.Add(TextBox1.Text.Substring(match.Index + 1, position - match.Index).Trim())
Next
One-line LINQ version (it returns the well-known List(Of String)).
Use ToArray() instead of ToList() to return an array of strings.
Neither of them to return an IEnumerable(Of String):
Dim result = TextBox1.Lines.Select(Function(line) line.Split(":"c)(1).TrimStart()).ToList()

Make a function which returns an IEnumerable(Of String)
Private Function getNombres(text As String) As IEnumerable(Of String)
Return text.
Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries).
Select(Function(l) l.Split(":"c)(1).TrimStart())
End Function
call it
Dim nombres = getNombres(TextBox1.Text)
if you really need it in an array, you can convert it into one
Dim nombres = getNombres(TextBox1.Text).ToArray()
Note: when indexing the data, you will start with index = 0, contrary to what you asked for in your question. VB6 collections started at 1; .NET collections start at 0.

Related

How to replace unwanted value in string with regex

A string containing below values:
Dim abc As String = 'UserId1'|'ValueA1'|'ValueB1'|'ValueC1', 'UserId2'|'ValueA2'|'ValueB2'|'ValueC2'
Current Function
Dim arrAll As String() = abc.Split(",")
Dim UserIdList As New List(Of String)
Dim ValueAList As New List(Of String)
Dim ValueBList As New List(Of String)
Dim ValueCList As New List(Of String)
For i = 0 To UBound(arrAll)
Dim arrSeparate As String() = arrAll(i).Split("|")
UserIdList.Add(arrSeparate(0))
ValueAList.Add(arrSeparate(1))
ValueBList.Add(arrSeparate(2))
ValueCList.Add(arrSeparate(3))
Next
I'm trying to separate the value above into 4 separate list without using Split / Loop functions.
With regular expression, I'm only able to retrieve all the 'UserId' or 'ValueC'. How can I retrieve 'ValueA' or 'ValueB'?
I'm not familiar with regular expression. Any help would be greatly appreciated.
Regular expression
\|'([^']*)'
'([^']*)'\|
Result
'UserId1', 'UserId2'
'ValueC1', 'ValueC2'
If you have working code with loops, why the need for regex? Just a few adjustments to your code. The small c in the Char array tells the compiler that this is a Char not a String.
If you don't have Option Strict on - turn it on NOW!
Private Sub Button3_Click(sender As Object, e As EventArgs) Handles Button3.Click
Dim abc As String = "'UserId1'|'ValueA1'|'ValueB1'|'ValueC1', 'UserId2'|'ValueA2'|'ValueB2'|'ValueC2'"
Dim arrAll As String() = abc.Split(","c)
Dim UserIdList As New List(Of String)
Dim ValueAList As New List(Of String)
Dim ValueBList As New List(Of String)
Dim ValueCList As New List(Of String)
'To get rid of single quote and spaces
Dim charsToTrim() As Char = {"'"c, " "c}
For i = 0 To UBound(arrAll)
Dim arrSeparate As String() = arrAll(i).Split("|"c)
UserIdList.Add(arrSeparate(0).Trim(charsToTrim))
ValueAList.Add(arrSeparate(1).Trim(charsToTrim))
ValueBList.Add(arrSeparate(2).Trim(charsToTrim))
ValueCList.Add(arrSeparate(3).Trim(charsToTrim))
Next
'Inspect the values in your lists
Debug.Print("ID List")
For Each s In UserIdList
Debug.Print(s)
Next
Debug.Print("A List")
For Each s In ValueAList
Debug.Print(s)
Next
Debug.Print("B List")
For Each s In ValueBList
Debug.Print(s)
Next
Debug.Print("C List")
For Each s In ValueCList
Debug.Print(s)
Next
End Sub
this should return all char sequences not containg | or '
[^'\|]*

VB NET regexp matching numerical substrings

I'm trying to make a vb function that takes as input a String and returns, if exist, the string made of numeric digits from the beginning until the first non numerical char, so:
123 -> 123
12f -> 12
12g34 -> 12
f12 -> ""
"" -> ""
I wrote a function that incrementally compares the result matching the regex, but it goes on even on non numeric characters...
This is the function:
Public Function ParseValoreVelocita(ByVal valoreRaw As String) As String
Dim result As New StringBuilder
Dim regexp As New Regex("^[0-9]+")
Dim tmp As New StringBuilder
Dim stringIndex As Integer = 0
Dim out As Boolean = False
While stringIndex < valoreRaw.Length AndAlso Not out
tmp.Append(valoreRaw.ElementAt(stringIndex))
If regexp.Match(tmp.ToString).Success Then
result.Append(valoreRaw.ElementAt(stringIndex))
stringIndex = stringIndex + 1
Else
out = True
End If
End While
Return result.ToString
End Function
The output always equals the input string, so there's something wrong and I can't get out of it...
Here's a LINQ solution that doesn't need regex and increases readability:
Dim startDigits = valoreRaw.TakeWhile(AddressOf Char.IsDigit)
Dim result As String = String.Concat(startDigits)
Try this instead. You need to use a capture group:
Public Function ParseValoreVelocita(ByVal valoreRaw As String) As String
Dim result As New StringBuilder
Dim regexp As New Regex("^([0-9]+)")
Dim tmp As New StringBuilder
Dim stringIndex As Integer = 0
Dim out As Boolean = False
While stringIndex < valoreRaw.Length AndAlso Not out
tmp.Append(valoreRaw.ElementAt(stringIndex))
If regexp.Match(tmp.ToString).Success Then
result.Append(regexp.Match(tmp.ToString).Groups(1).Value)
stringIndex = stringIndex + 1
Else
out = True
End If
End While
Return result.ToString
End Function
The expression:
Dim regexp As New Regex("^([0-9]+)")
and the result appending lines have been updated:
result.Append(regexp.Match(tmp.ToString).Groups(1).Value)
You have made your code very complex for a simple task.
Your loop keeps trying to build a longer string and it keeps checking if it is still working with digits, and if so keep appending results.
So and input string of "123x" would, if your code worked, produce a string of "112123" as output. In other words it matches the "1", then "12", then "123"and concatenates each before exiting after it finds the "x".
Here's what you should be doing:
Public Function ParseValoreVelocita(valoreRaw As String) As String
Dim regexp As New Regex("^([0-9]+)")
Dim match = regexp.Match(valoreRaw)
If match.Success Then
Return match.Groups(1).Captures(0).Value
Else
Return ""
End If
End Function
No loop and you let the regex do the work.

What is the RegExp Pattern to Extract Bullet Points Between Two Group Words using VBA in Word?

I can't seem to figure out the RegExp to extract the bullet points between two group of words in a word document.
For example:
Risk Assessment:
Test 1
Test 2
Test 3
Internal Audit
In this case I want to extract the bullet points between "Risk Assessment" and "Internal Audit", one bullet at a time and assign that bullet to an Excel cell. As shown in the code below I have pretty much everything done, except I cant figure out the correct Regex pattern. Any help would be great. Thanks in advance!
Sub PopulateExcelTable()
Dim fd As Office.FileDialog
Set fd = Application.FileDialog(msoFileDialogFilePicker)
With fd
.AllowMultiSelect = False
.Title = "Please select the file."
.Filters.Clear
.Filters.Add "Word 2007-2013", "*.docx"
If .Show = True Then
txtFileName = .SelectedItems(1)
End If
End With
Dim WordApp As Word.Application
Set WordApp = CreateObject("Word.Application")
Dim WordDoc As Word.Document
Set WordDoc = WordApp.Documents.Open(txtFileName)
Dim str As String: str = WordDoc.Content.Text ' Assign entire document content to string
Dim rex As New RegExp
rex.Pattern = "\b[^Risk Assessment\s].*[^Internal Audit\s]"
Dim i As long : i = 1
rex.Global = True
For Each mtch In rex.Execute(str)
Debug.Print mtch
Range("A" & i).Value = mtch
i = i + 1
Next mtch
WordDoc.Close
WordApp.Quit
End Sub
This is probably a long way around the problem but it works.
Steps I'm taking:
Find bullet list items using keywords before and after list in regexp.
(Group) regexp pattern so that you can extract everything in-between words.
Store listed items group into a string.
Split string by new line character into a new array.
Output each array item to excel.
Loop again since there may be more than one list in document.
Note: I don't see your code for a link to Excel workbook. I'll assume this part is working.
Dim rex As New RegExp
rex.Pattern = "(\bRisk Assessment\s)(.*)(Internal\sAudit\s)"
rex.Global = True
rex.MultiLine = True
rex.IgnoreCase = True
Dim lineArray() As String
Dim myMatches As Object
Set myMatches = rex.Execute(str)
For Each mtch In rex.Execute(str)
'Debug.Print mtch.SubMatches(1)
lineArray = Split(mtch.SubMatches(1), vbLf)
For x = LBound(lineArray) To UBound(lineArray)
'Debug.Print lineArray(x)
Range("A" & i).Value = lineArray(x)
i = i + 1
Next
Next mtch
My test page looks like this:
Results from inner Debug.Print line return this:
Item 1
Item 2
Item 3

Split comma delimited string to array using regex

I have a string as below, which needs to be split to an array, using VB.NET
10,"Test, t1",10.1,,,"123"
The result array must have 6 rows as below
10
Test, t1
10.1
(empty)
(empty)
123
So:
1. quotes around strings must be removed
2. comma can be inside strings, and will remain there (row 2 in result array)
3. can have empty fields (comma after comma in source string, with nothing in between)
Thanks
Don't use String.Split(): it's slow, and doesn't account for a number of possible edge cases.
Don't use RegEx. RegEx can be shoe-horned to do this accurately, but to correctly account for all the cases the expression tends to be very complicated, hard to maintain, and at this point isn't much faster than the .Split() option.
Do use a dedicated CSV parser. Options include the Microsoft.VisualBasic.TextFieldParser type, FastCSV, linq-to-csv, and a parser I wrote for another answer.
You can write a function yourself. This should do the trick:
Dim values as New List(Of String)
Dim currentValueIsString as Boolean
Dim valueSeparator as Char = ","c
Dim currentValue as String = String.Empty
For Each c as Char in inputString
If c = """"c Then
If currentValueIsString Then
currentValueIsString = False
Else
currentValueIsString = True
End If
End If
If c = valueSeparator Andalso not currentValueIsString Then
If String.IsNullOrEmpty(currentValue) Then currentValue = "(empty)"
values.Add(currentValue)
currentValue = String.Empty
End If
currentValue += c
Next
Here's another simple way that loops by the delimiter instead of by character:
Public Function Parser(ByVal ParseString As String) As List(Of String)
Dim Trimmer() As Char = {Chr(34), Chr(44)}
Parser = New List(Of String)
While ParseString.Length > 1
Dim TempString As String = ""
If ParseString.StartsWith(Trimmer(0)) Then
ParseString = ParseString.TrimStart(Trimmer)
Parser.Add(ParseString.Substring(0, ParseString.IndexOf(Trimmer(0))))
ParseString = ParseString.Substring(Parser.Last.Length)
ParseString = ParseString.TrimStart(Trimmer)
ElseIf ParseString.StartsWith(Trimmer(1)) Then
Parser.Add("")
ParseString = ParseString.Substring(1)
Else
Parser.Add(ParseString.Substring(0, ParseString.IndexOf(Trimmer(1))))
ParseString = ParseString.Substring(ParseString.IndexOf(Trimmer(1)) + 1)
End If
End While
End Function
This returns a list. If you must have an array just use the ToArray method when you call the function
Why not just use the split method?
Dim s as String = "10,\"Test, t1\",10.1,,,\"123\""
s = s.Replace("\"","")
Dim arr as String[] = s.Split(',')
My VB is rusty so consider this pseudo-code

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function