How to count occurence of partly-matching words with VB.NET? - regex

I am using VB 9.0 to split a text file and then count occurrences of the term <sequence>. Supposing I want also to count occurrences of the same term in a different format, e.g. <sequence and then group them together such that I output the result to a text box, i.e.
txtMyTerms.Text=<sequence>+<sequence
How to do it? My current code is as follows:
Dim str As String = txtSource.Text
Dim arr As String() = str.Split(Nothing)
Dim searchTerm As String = "<sequence>"
'create query to search for the term <sequence>
Dim matchQuery = From word In arr Where word.ToLowerInvariant() = searchTerm.ToLowerInvariant() Select word
' Count the matches.
Dim count As Integer = matchQuery.Count()
txtMyTerms.Text = count.ToString()

I would try something like this. Note that string.Compare is more efficient than repeatedly calling ToLowerInvariant().
Dim str As String = txtSource.Text
Dim arr As String() = str.Split(Nothing)
Dim searchTerm1 As String = "<sequence>"
Dim searchTerm2 As String = "<sequence"
'create query to search for the term <sequence>
Dim matchQuery = From word In arr Where word.Compare(searchTerm1, StringComparison.InvariantCultureIgnoreCase) == 0 Or word.Compare(searchTerm2, StringComparison.InvariantCultureIgnoreCase) == 0 Select word
' Count the matches.
Dim count As Integer = matchQuery.Count()
txtMyTerms.Text = count.ToString()

Related

Get Strings from Textbox and put in varible array

I have some dynamic lines of text in a TextBox
TextBox example:
Nombre : Maria
Nombre : Carlos Manuel
Nombre : Antonio
Nombre : Ana Gabriela
.
.
.
I need to get the names only into an array.
The names are to the right of the " : "
Dim myMatches As MatchCollection
Dim myPattern As New Regex(" : ")
Dim myString As String = TextBox1.Text
myMatches = myPattern.Matches(myString)
Dim successfulMatch As Match
Dim counter As Integer = 0
Dim names(counter) As String
For Each successfulMatch In myMatches
counter = counter + 1
names = TextBox1.Text.Split(" : ").Last
Next
I want to put the names into an array
names(1) = Maria
names(2) = Carlos Manuel
names(3) = Antonio
names(4) = Ana Gabriela
.
.
.
You just need the TextBox.Lines collection. This property return an Array of strings representing all the lines of text (sub-strings of text separated by a line feed) contained in a TextBoxBase control.
Find the last : char and, from this position, take the text to the end of the string:
Dim listOfNames = New List(Of String)
For Each line As String In TextBox1.Lines
If Not String.IsNullOrEmpty(line) Then
listOfNames.Add(line.Substring(line.LastIndexOf(":") + 1).TrimStart())
End If
Next
Using an array of strings, instead of a List(Of String)
Dim lines = TextBox1.Lines
Dim arrayOfNames(lines.Length - 1) As String
For i As Integer = 0 To lines.Length - 1
If Not String.IsNullOrEmpty(lines(i)) Then
arrayOfNames(i) = lines(i).Substring(lines(i).LastIndexOf(":") + 1).TrimStart()
End If
Next
In case you have to use a RegEx for some reason.
Using the List(Of String) seen before to store the results:
Dim regx = New Regex(":", RegexOptions.Multiline).Matches(TextBox1.Text)
For Each match As Match In regx
Dim position = TextBox1.Text.IndexOf(Environment.NewLine, match.Index)
If position = -1 Then position = TextBox1.Text.Length - 1
listOfNames.Add(TextBox1.Text.Substring(match.Index + 1, position - match.Index).Trim())
Next
One-line LINQ version (it returns the well-known List(Of String)).
Use ToArray() instead of ToList() to return an array of strings.
Neither of them to return an IEnumerable(Of String):
Dim result = TextBox1.Lines.Select(Function(line) line.Split(":"c)(1).TrimStart()).ToList()
Make a function which returns an IEnumerable(Of String)
Private Function getNombres(text As String) As IEnumerable(Of String)
Return text.
Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries).
Select(Function(l) l.Split(":"c)(1).TrimStart())
End Function
call it
Dim nombres = getNombres(TextBox1.Text)
if you really need it in an array, you can convert it into one
Dim nombres = getNombres(TextBox1.Text).ToArray()
Note: when indexing the data, you will start with index = 0, contrary to what you asked for in your question. VB6 collections started at 1; .NET collections start at 0.

How to find words on both sides of (period)

in the following examples I need to get the words on either side on the period
I am using this regex
Dim myRegex As New Regex("[^\w]+")
Dim mymatch As String() = myRegex.Split(currentField)
where as currentfield = one of the following 3 samples
Contacts.Address2 as `Contact Address2`
Contacts.ContactID
CONCAT(Contacts.FirstName;;' ';;Contacts.LastName) as `Contact`
returns are as follows.
1-- Contacts, Address2, as, Contact and Address2 do not want the word as.
2-- Contacts and ContactID this is ok.
3-- CONCAT,Contacts,FirstName,Contacts,LastName,as and Contact.
3rd one this is too much do not want CONCAT,as or Contact. I only want the four words (ones before and after the period) to be returned Contacts, Firstname, Contacts, and Lastname
how can I write the regex to only get words before and after the period
I would consider matching vs. splitting the input:
For Each m As Match In Regex.Matches(input, "(\w+)\.(\w+)")
Console.WriteLine(
String.Join(", ",
m.Groups(1).Value,
m.Groups(2).Value
))
Next
This is an example, It's not clear what you expect to do with the returned results.
Ideone Demo
I think you are looking to only split inside round brackets and you are not interested in the word as. Thus, I suggest a 2 step approach:
Get the substring(s) in round brackets (\([^()]+\) regex)
If there are such substrings, split them with the split regex, if not, split the original string with split regex (\W+|\s*\bas\b\s* regex).
Sample code:
'Dim currentField As String = "Contacts.Address2 as `Contact Address2`"
Dim currentField As String = "CONCAT(Contacts.FirstName;;' ';;Contacts.LastName) as `Contact`"
'Dim currentField As String = "Contacts.ContactID"
Dim myRegex As New Regex("\([^()]+\)")
Dim splitRegex As New Regex("\W+|\s*\bas\b\s*")
Dim mymatch As MatchCollection = myRegex.Matches(currentField)
If mymatch.Count > 0 Then
For Each match As Match In mymatch
Dim mysubstrs As String() = splitRegex.Split(match.Value)
For Each substr As String In mysubstrs
If String.IsNullOrEmpty(substr) = False Then
Console.WriteLine(substr)
End If
Next
Next
Else
Dim mysubstrs As String() = splitRegex.Split(currentField)
For Each substr As String In mysubstrs
If String.IsNullOrEmpty(substr) = False Then
Console.WriteLine(substr)
End If
Next
End If
here is the final working routine, based on the accepted answer above
Public Sub Load_Field_List(FieldSTR As String, FieldType As String)
Dim t As New FileIO.TextFieldParser(New System.IO.StringReader(FieldSTR))
t.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
t.Delimiters = New String() {","}
Dim currentRow As String()
Dim dr As DataRow
Dim ColListSTR As String = loadeddataview.Tables(0).Rows(0).Item("ColumnList")
Dim ColListSTRArr As String() = ColListSTR.Split(",")
While Not t.EndOfData
Try
currentRow = t.ReadFields()
Dim currentField As String 'field string
For Each currentField In currentRow
Dim startName As Integer
Dim endName As Integer
Dim name As String
dr = fieldDT.NewRow
Dim isValid As Boolean = False
If currentField = "" Then 'make sure current field has data
isValid = False
ElseIf (Regex.IsMatch(currentField, "(\w+)\.(\w+)")) = True Then 'make sure current field has xxxx.yyyy pattern
Dim m As Match = Regex.Match(currentField, "(\w+)\.(\w+)") 'sets m to the first xxxx.yyyy pattern
dr("Table") = m.Groups(1).Value 'sets table column to table name xxxx
dr("Column Name") = "`" & m.Groups(2).Value & "`" 'sets column name to column yyyy enclosed in ` `
If ColListSTRArr.Contains(m.Groups(2).Value) Then 'checks columnlist str to see if column visible
dr("Show") = "True"
Else
dr("Show") = "False"
End If
' this section overrides column name if it was set using AS `zzzzz` statement
startName = currentField.IndexOf("`")
endName = currentField.IndexOf("`", If(startName > 0, startName + 1, 0))
If (endName > startName) Then
Dim mylength As Integer = currentField.Length
name = currentField.Substring(startName, endName - startName + 1)
dr("Column Name") = name 'set override columname
dr("Field") = currentField.Substring(0, startName - 4) 'sets field minus the " as 'ZZZZZ" above
If ColListSTRArr.Contains(currentField.Substring(startName + 1, endName - startName - 1)) Then 'dup may be able to remove
dr("Show") = "True"
Else
dr("Show") = "False"
End If
Else
dr("Field") = currentField 'sets field if there was no " as `ZZZZZZ`" in string
End If
If FieldType = "Field" Then 'sets the column linking field
dr("Linking") = "No Linking"
Else
dr("Linking") = FieldType
End If
End If
' commit changes
fieldDT.Rows.Add(dr)
fieldDT.AcceptChanges()
DataGridView3.DataSource = fieldDT
DataGridView3.ClearSelection()
Next
Catch ex As Microsoft.VisualBasic.
FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
"is not valid and will be skipped.")
End Try
End While
End Sub

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function

Split URL in two parts (strings)

I'm trying to find a way to easily split a path (Raw URL) in two portions:
For example: /search/criteria/newyork/list
I would like to populate a string that would contain everything before third slash, in this case: "/search/criteria"
I also want to get the second portion into a string: "newyork/list"
You can use IndexOf to find the third slash (assuming that the first character is always the first slash, and that there are at least three slashes in the string):
Dim index3 = url.IndexOf("/"c, url.IndexOf("/"c, 1) + 1)
Then you can use Substring to get the parts before and after that slash:
Dim path As String = url.Substring(0, index3)
Dim resource As String = url.Substring(index3 + 1)
Try this:
Dim sAux() As String = sURL.Split("/"c)
Dim sResult As String = ""
If sAux.Length > 3 Then
For i As Integer = 2 to sAux.Length - 1
sResult &= sAux(i) & "/"
Next
End If
Or this:
Dim sAux As New List(Of String)(sURL.Split("/"c))
sAux.RemoveRange(0,2)
sResult = String.Join("/", sAux.ToArray())
Dim ar As String()
Dim str1 As String
Dim str2 As String
Dim a As Integer
Dim splitPosition = 3
Dim urlToSplit = "/search/criteria/newyork/list"
ar = urlToSplit.Split("/"c)
If UBound(ar) < splitPosition Then
' there are 3 or less slashes. do what you want here, error or just exit
Else
For a = 0 To splitPosition - 1
If Not String.IsNullOrEmpty(ar(a)) Then str1 += ar(a) + "/"
Next
For a = splitPosition To UBound(ar)
If Not String.IsNullOrEmpty(ar(a)) Then str2 += ar(a) + "/"
Next
End If
str1 will contain /search/criteria/
str2 will contain newyork/list/
This code will handle any number of / combinations and should not error out for a badly formed Url
If your string is always in the same format and has same number of elements (in splited array), you could use String.Format method as;
Dim arr() As String = "/search/criteria/newyork/list".Split("/"c)
Dim str1 As String = String.Format("/{1}/{2}", arr) '/search/criteria
Dim str2 As String = String.Format("{3}/{4}", arr) 'newyork/list

Separating strings from numbers with Excel VBA

I need to
a) separate strings from numbers for a selection of cells
and
b) place the separated strings and numbers into different columns.
For example , Excel sheet is as follows:
A1 B1
100CASH etc.etc.
The result should be:
A1 B1 C1
100 CASH etc.etc.
Utilization of regular expressions will be useful, as there may be different cell formats,such as 100-CASH, 100/CASH, 100%CASH. Once the procedure is set up it won't be hard to use regular expressions for different variations.
I came across a UDF for extracting numbers from a cell. This can easily be modified to extract string or other types of data from cells simply changing the regular expression.
But what I need is not just a UDF but a sub procedure to split cells using regular expressions and place the separated data into separate columns.
I've also found a similar question in SU, however it isn't VBA.
See if this will work for you:
UPDATED 11/30:
Sub test()
Dim RegEx As Object
Dim strTest As String
Dim ThisCell As Range
Dim Matches As Object
Dim strNumber As String
Dim strText As String
Dim i As Integer
Dim CurrCol As Integer
Set RegEx = CreateObject("VBScript.RegExp")
' may need to be tweaked
RegEx.Pattern = "-?\d+"
' Get the current column
CurrCol = ActiveCell.Column
Dim lngLastRow As Long
lngLastRow = Cells(1, CurrCol).End(xlDown).Row
' add a new column & shift column 2 to the right
Columns(CurrCol + 1).Insert Shift:=xlToRight
For i = 1 To lngLastRow ' change to number of rows to search
Set ThisCell = ActiveSheet.Cells(i, CurrCol)
strTest = ThisCell.Value
If RegEx.test(strTest) Then
Set Matches = RegEx.Execute(strTest)
strNumber = CStr(Matches(0))
strText = Mid(strTest, Len(strNumber) + 1)
' replace original cell with number only portion
ThisCell.Value = strNumber
' replace cell to the right with string portion
ThisCell.Offset(0, 1).Value = strText
End If
Next
Set RegEx = Nothing
End Sub
How about:
Sub UpdateCells()
Dim rng As Range
Dim c As Range
Dim l As Long
Dim s As String, a As String, b As String
''Working with sheet1 and column C
With Sheet1
l = .Range("C" & .Rows.Count).End(xlUp).Row
Set rng = .Range("C1:C" & l)
End With
''Working with selected range from above
For Each c In rng.Cells
If c <> vbNullString Then
s = FirstNonNumeric(c.Value)
''Split the string into numeric and non-numeric, based
''on the position of first non-numeric, obtained above.
a = Mid(c.Value, 1, InStr(c.Value, s) - 1)
b = Mid(c.Value, InStr(c.Value, s))
''Put the two values on the sheet in positions one and two
''columns further along than the test column. The offset
''can be any suitable value.
c.Offset(0, 1) = a
c.Offset(0, 2) = b
End If
Next
End Sub
Function FirstNonNumeric(txt As String) As String
With CreateObject("VBScript.RegExp")
.Pattern = "[^0-9]"
FirstNonNumeric = .Execute(txt)(0)
End With
End Function