Linq with HashTable Matching - regex

I need another pair of eyes. I've been playing around with this LINQ syntax for scanning a Hashtable with a regular express. Can't seem to get it quite right. The goal is to match all keys to a regular expression, then using those results match the remaining values to an separate regular expression. In the test case below, I should end up with the first three entries.
Private ReadOnly Property Testhash As Hashtable
Get
Testhash = New Hashtable
Testhash.Add("a1a", "abc")
Testhash.Add("a2a", "aac")
Testhash.Add("a3a", "acc")
Testhash.Add("a4a", "ade")
Testhash.Add("a1b", "abc")
Testhash.Add("a2b", "aac")
Testhash.Add("a3b", "acc")
Testhash.Add("a4b", "ade")
End Get
End Property
Public Sub TestHashSearch()
Dim KeyPattern As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex("a.a")
Dim ValuePattern As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex("a.c")
Try
Dim queryMatchingPairs = (From item In Testhash
Let MatchedKeys = KeyPattern.Matches(item.key)
From key In MatchedKeys
Let MatchedValues = ValuePattern.Matches(key.value)
From val In MatchedValues
Select item).ToList.Distinct
Dim info = queryMatchingPairs
Catch ex As Exception
End Try
End Sub

Can't you match both the key and value at the same time?
Dim queryMatchingPairs = (From item In Testhash
Where KeyPattern.IsMatch(item.Key) And ValuePattern.IsMatch(item.Value)
Select item).ToList

I should have taken a break sooner, then worked a little more. The correct solution uses the original "from item" and not the lower "from key" in the second regular expression. Also, "distinct" is unnecessary for a hashtable.
Dim queryMatchingPairs = (From item In Testhash
Let MatchedKeys = KeyPattern.Matches(item.key)
From key In MatchedKeys
Let MatchedValues = ValuePattern.Matches(item.value)
From val In MatchedValues
Select item).ToList

Related

Excel VBA - Looking up a string with wildcards

Im trying to look up a string which contains wildcards. I need to find where in a specific row the string occurs. The string all take form of "IP##W## XX" where XX are the 2 letters by which I look up the value and the ## are the number wildcards that can be any random number. Hence this is what my look up string looks like :
FullLookUpString = "IP##W## " & LookUpString
I tried using the Find Command to find the column where this first occurs but I keep on getting with errors. Here's what I had so far but it doesn't work :L if anyone has an easy way of doing. Quite new to VBA -.-
Dim GatewayColumn As Variant
Dim GatewayDateColumn As Variant
Dim FirstLookUpRange As Range
Dim SecondLookUpRange As Range
FullLookUpString = "IP##W## " & LookUpString
Set FirstLookUpRange = wsMPNT.Range(wsMPNT.Cells(3, 26), wsMPNT.Cells(3, lcolumnMPNT))
Debug.Print FullLookUpString
GatewayColumn = FirstLookUpRange.Find(What:=FullLookUpString, After:=Range("O3")).Column
Debug.Print GatewayColumn
Per the comment by #SJR you can do this two ways. Using LIKE the pattern is:
IP##W## [A-Z][A-Z]
Using regular expressions, the pattern is:
IP\d{2}W\d{2} [A-Z]{2}
Example code:
Option Explicit
Sub FindString()
Dim ws As Worksheet
Dim rngData As Range
Dim rngCell As Range
Set ws = ThisWorkbook.Worksheets("Sheet1") '<-- set your sheet
Set rngData = ws.Range("A1:A4")
' with LIKE operator
For Each rngCell In rngData
If rngCell.Value Like "IP##W## [A-Z][A-Z]" Then
Debug.Print rngCell.Address
End If
Next rngCell
' with regular expression
Dim objRegex As Object
Dim objMatch As Object
Set objRegex = CreateObject("VBScript.RegExp")
objRegex.Pattern = "IP\d{2}W\d{2} [A-Z]{2}"
For Each rngCell In rngData
If objRegex.Test(rngCell.Value) Then
Debug.Print rngCell.Address
End If
Next rngCell
End Sub
If we can assume that ALL the strings in the row match the given pattern, then we can examine only the last three characters:
Sub FindAA()
Dim rng As Range, r As Range, Gold As String
Set rng = Range(Range("A1"), Cells(1, Columns.Count))
Gold = " AA"
For Each r In rng
If Right(r.Value, 3) = Gold Then
MsgBox r.Address(0, 0)
Exit Sub
End If
Next r
End Sub
Try this:
If FullLookUpString Like "*IP##W##[a-zA-Z][a-zA-Z]*" Then
MsgBox "Match is found"
End If
It will find your pattern (pattern can be surrounded by any characters - that's allowed by *).

Use Regex to update VBA code

I have a VBA source code containing many hard coded references to cells. The code is part of the Worksheet_Change sub, so I guess hard coding the range references was necessary and you will see many assignment statements like the following:
Set cell = Range("B7")
If Not Application.Intersect(cell, Range(Target.Address)) Is Nothing Then
I would like insert 2 additional rows on top of the worksheet, so basically all the row references will shift by 2 rows. So for example the above assignment statement will be changed to Set cell = Range("B9").
Given the large number of hard coded row references in the code, I thought of using Regex to increment all the row references by 2. So I have developed the following code.
Sub UpdateVBACode()
'*********************Read Text File Containing VBA code and assign content to string variable*************************
Dim str As String
Dim strFile As String: strFile = "F:\Preprocessed_code.txt"
Open strFile For Input As #1
str = Input$(LOF(1), 1)
Close #1
'*********************Split string variables to lines******************************************************************
Dim vStr As Variant: vStr = Split(str, vbCrLf)
'*********************Regex work***************************************************************************************
Dim rex As New RegExp
rex.Global = True
Dim i As Long
Dim mtch As Object
rex.Pattern = "(""\w)([0-9][0-9])("")" ' 3 capturing groups to reconstruct the replacement string
For i = 0 To UBound(vStr, 1)
If rex.Test(vStr(i)) Then
For Each mtch In rex.Execute(vStr(i))
vStr(i) = rex.Replace(vStr(i), mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2))
Next
End If
Next i
'********************Reconstruct String*********************************************************************************
str = ""
For i = 0 To UBound(vStr, 1)
str = str & vbCrLf & vStr(i)
Next i
'********************Write string to text file******************************************************************************
Dim myFile As String
myFile = "F:\Processed_code.txt"
Open myFile For Output As #2
Print #2, str
Close #2
'
End Sub
Function IncrementString(rowNum As String) As String '
Dim num As Integer
num = CInt(rowNum) + 2
IncrementString = CStr(num)
End Function
The above VBA code works, except it fails if there are two row references in the same line, so for instance if we have If Range("B15").Value <> Range("B12").Value Then, after the line gets processed I get If Range("B14").Value <> Range("B14").Value Theninstead of If Range("B17").Value <> Range("B14").Value Then. The problem is in the vStr(i) = rex.Replace(vStr(i), mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2)) statement, because it is getting called more than once if a line has more than Regex match.
Any ideas? Thanks in advance
I think what you are trying to do is a bad idea, for two reasons:
Hard-coded cell references are almost always poor practice. A better solution may be to replace hard-coded cell references with named ranges. You can refer to them in the code by name, and the associated references will update automatically if you insert/delete rows or columns. You have some painful upfront work to do but the result will be a much more maintainable spreadsheet.
You are effectively trying to write a VBA parser using regexes. This is pretty much guaranteed not to work in all cases. Your current regex will match lots of things that aren't cell references (e.g. "123", "_12", and "A00") and will also miss lots of hard-coded cell references (e.g. "A1" and Cell(3,7)). That may not matter for your particular code but the only way to be sure it's worked is to check each reference by hand. Which is IMHO not much less effort than refactoring (e.g. replace with named ranges). In my experience you don't fix a regex, you just make the problems more subtle.
That said, since you asked...
<cthulu>
There are only two choices when using RegExp.Replace() - either replace the first match or replace all matches (corresponding to setting RegExp.Global to False or True respectively). You don't have any finer control than that, so your logic has to change. Instead of using Replace() you could write your own code for the replacements, using the FirstIndex property of the Match object, and VBA's string functions to isolate the relevant parts of the string:
Dim rex As Object
Set rex = CreateObject("VBScript.RegExp")
rex.Global = True
Dim i As Long
Dim mtch As Object
Dim newLineText As String
Dim currMatchIndex As Long, prevPosition As Long
rex.Pattern = "(""\w)([0-9][0-9])("")" ' 3 capturing groups to reconstruct the replacement string
For i = 0 To UBound(vStr, 1)
If rex.Test(vStr(i)) Then
currMatchIndex = 0: prevPosition = 1
newLineText = ""
For Each mtch In rex.Execute(vStr(i))
'Note that VBA string functions are indexed from 1 but Match.FirstIndex starts from 0
currMatchIndex = mtch.FirstIndex
newLineText = newLineText & Mid(vStr(i), prevPosition, currMatchIndex - prevPosition + 1) & _
mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2)
prevPosition = currMatchIndex + Len(mtch.Value) + 1
Next
vStr(i) = newLineText & Right(vStr(i), Len(vStr(i)) - prevPosition + 1)
End If
Next i
Note that I still haven't fixed the problems with the regex pattern in the first place. I recommend that you just go and use named ranges instead...
Oops, nearly forgot - </cth

What is the RegExp Pattern to Extract Bullet Points Between Two Group Words using VBA in Word?

I can't seem to figure out the RegExp to extract the bullet points between two group of words in a word document.
For example:
Risk Assessment:
Test 1
Test 2
Test 3
Internal Audit
In this case I want to extract the bullet points between "Risk Assessment" and "Internal Audit", one bullet at a time and assign that bullet to an Excel cell. As shown in the code below I have pretty much everything done, except I cant figure out the correct Regex pattern. Any help would be great. Thanks in advance!
Sub PopulateExcelTable()
Dim fd As Office.FileDialog
Set fd = Application.FileDialog(msoFileDialogFilePicker)
With fd
.AllowMultiSelect = False
.Title = "Please select the file."
.Filters.Clear
.Filters.Add "Word 2007-2013", "*.docx"
If .Show = True Then
txtFileName = .SelectedItems(1)
End If
End With
Dim WordApp As Word.Application
Set WordApp = CreateObject("Word.Application")
Dim WordDoc As Word.Document
Set WordDoc = WordApp.Documents.Open(txtFileName)
Dim str As String: str = WordDoc.Content.Text ' Assign entire document content to string
Dim rex As New RegExp
rex.Pattern = "\b[^Risk Assessment\s].*[^Internal Audit\s]"
Dim i As long : i = 1
rex.Global = True
For Each mtch In rex.Execute(str)
Debug.Print mtch
Range("A" & i).Value = mtch
i = i + 1
Next mtch
WordDoc.Close
WordApp.Quit
End Sub
This is probably a long way around the problem but it works.
Steps I'm taking:
Find bullet list items using keywords before and after list in regexp.
(Group) regexp pattern so that you can extract everything in-between words.
Store listed items group into a string.
Split string by new line character into a new array.
Output each array item to excel.
Loop again since there may be more than one list in document.
Note: I don't see your code for a link to Excel workbook. I'll assume this part is working.
Dim rex As New RegExp
rex.Pattern = "(\bRisk Assessment\s)(.*)(Internal\sAudit\s)"
rex.Global = True
rex.MultiLine = True
rex.IgnoreCase = True
Dim lineArray() As String
Dim myMatches As Object
Set myMatches = rex.Execute(str)
For Each mtch In rex.Execute(str)
'Debug.Print mtch.SubMatches(1)
lineArray = Split(mtch.SubMatches(1), vbLf)
For x = LBound(lineArray) To UBound(lineArray)
'Debug.Print lineArray(x)
Range("A" & i).Value = lineArray(x)
i = i + 1
Next
Next mtch
My test page looks like this:
Results from inner Debug.Print line return this:
Item 1
Item 2
Item 3

Regex as key in Dictionary in VB.NET

Is there a way to use Regex as a key in a Dictionary? Something like Dictionary(Of Regex, String)?
I'm trying to find a Regex in a list (let's say that there is no dictionary for the first time) by string, which it matches.
I can do it by manually iterating through the list of RegEx expressions. I'm just seeking for a method to do that more easily, such as TryGetValue from a Dictionary.
When you use Regex as the type for the key in a Dictionary, it will work, but it compares the key by object instance, not by the expression string. In other words, if you create two separate Regex objects, using the same expression for both, and then add them to the dictionary, they will be treated as two different keys (because they are two different objects).
Dim d As New Dictionary(Of Regex, String)()
Dim r As New Regex(".*")
Dim r2 As New Regex(".*")
d(r) = "1"
d(r2) = "2"
d(r) = "overwrite 1"
Console.WriteLine(d.Count) ' Outputs "2"
If you want to use the expression as the key, rather than the Regex object, then you need to create your dictionary with a key type of String, for instance:
Dim d As New Dictionary(Of String, String)()
d(".*") = "1"
d(".*") = "2"
d(".*") = "3"
Console.WriteLine(d.Count) ' Outputs "1"
Then, when you are using the expression string as the key, you can use TryGetValue, like you described:
Dim d As New Dictionary(Of String, String)()
d(".*") = "1"
Dim value As String = Nothing
' Outputs "1"
If d.TryGetValue(".*", value) Then
Console.WriteLine(value)
Else
Console.WriteLine("Not found")
End If
' Outputs "Not found"
If d.TryGetValue(".+", value) Then
Console.WriteLine(value)
Else
Console.WriteLine("Not found")
End If

Match "THIS" And Replace with "THAT" RegEx Vb.Net

Trying to find out how to find and replace text with corresponding values.
For Example
1) fedex to FedEx
2) nasa to NASA
3) po box to PO BOX
Public Function FindReplace(ByVal s As String) As String
Dim MatchEval As New MatchEvaluator(AddressOf RegexReplace)
Dim Pattern As String = "(?<f1>fedex|nasa|po box)"
Return Regex.Replace(s, Pattern, MatchEval, RegexOptions.IgnoreCase)
End Function
Public Function RegexReplace(ByVal m As Match) As String
Select Case LCase(m.Groups("f1").Value)
Case "fedex"
Return "FedEx"
Case "nasa"
Return "NASA"
Case "po box"
Return "PO BOX"
End Select
End Function
The above code is working fine for fixed values but don't know how to use the above code to match added values on run-time like db to Db.
I'd guess, that the only thing here you need Regex for is IgnoreCase option. If so, then I would like to suggest not to use Regex at all. Use String functionality instead:
Dim input As String = "fEDeX"
Dim pattern As String = "fedex"
Dim replacement As String = "FedEx"
Dim result As String
result = input.ToLowerInvariant().Replace(pattern, replacement)
But if you still need Regex, then this should work:
result = Regex.Replace(input, pattern, replacement, RegexOptions.IgnoreCase)
Example:
Sub Main()
Dim replacements As New Dictionary(Of String, String)()
replacements.Add("fedex", "FedEx")
replacements.Add("nasa", "NASA")
replacements.Add("po box", "PO BOX")
Dim result As String = Replace("fedex, nAsA, po box, etc", replacements)
End Sub
Private Function Replace(ByVal input As String, ByVal replacements As Dictionary(Of String, String)) As String
For Each item In replacements
input = Regex.Replace(input, item.Key, item.Value, RegexOptions.IgnoreCase)
Next
Return input
End Function
Found the solution by using List and did the performance test against dictionary object suggested by Anton Kedrov both methods takes almost same time to complete but i don't know the dictionary method will be good or not for longer replacement list because it loop through all the list to find the match entry for replacement.
I thank you all for your suggestion and advice.
Sub Main()
Dim lst As New List(Of String)
lst.Add("NASA")
lst.Add("FedEx")
lst.Add("PO BOX")
MsgBox(FindReplace("this is testing fedex naSa PO box"))
End Sub
Public Function FindReplace(ByVal s As String) As String
Dim Pattern As String = "(?<f1>fedex|nasa|po box)"
Dim MatchEval As New MatchEvaluator(AddressOf RegexReplace)
Return Regex.Replace(s, Pattern, MatchEval, RegexOptions.IgnoreCase)
End Function
Public Function RegexReplace(ByVal m As Match) As String
Dim Found As String
Found = lst.Find(Function(value As String) LCase(value) = LCase(m.Groups("f1").Value))
Return Found
End Function