vb6 select whole word in string - regex

I am trying to search a string without regex perhaps a bad idea !
The search is acting on a string of text in a RichText Box
If I search for "is" the first word in the String is "This"
The last two letters of "This" is are highlighted in vbRed
The string to be searched has two other occurrences of "is" and these are found and highlighted as expected
The question can I prevent the "is" in "This" from being found?
Private Sub btnSearch_Click()
Dim pos As Integer
Dim strToFind As String
Dim Y As Integer
Dim Ask As String
pos = 1
strToFind = tbSearch.Text
Do
strToFind = tbSearch.Text
pos = InStr(1, strToSearch, strToFind)
For Y = 1 To Len(strToSearch)
Ask = MsgBox("Yes Next Occurrence or No To Exit ?", vbYesNo, "Question")
If Ask = vbYes Then
lbOne.AddItem pos
tbAns.Text = pos
If pos = 0 Then
Exit Sub
End If
rtbOne.SelStart = pos - 1
rtbOne.SelLength = Len(strToFind)
rtbOne.SelColor = vbRed
pos = InStr(pos + 1, strToSearch, strToFind)
Else
tbAns.Text = "NO"
pos = InStr(pos + 1, strToSearch, strToFind)
tbAns.Text = pos
Exit Sub
End If
Next
Loop Until pos > 0
End Sub
Private Sub Form_Load()
strToSearch = "This is a lot of text that will be loaded in the lbText and we will search it is it a case sensative Search"
rtbOne.Text = strToSearch
tbSearch.Text = "is"
End Sub
If this is not possible a few suggestions on how to use regex
I know this much I need to add the Reference and this might be the
Pattern myRegExp.Pattern = "(.)\strToFind\b(.)"

TOM FindText accepts the tomMatchWord flag. Just use that. Don't screw aroung extracting text out of the control and then munching on it using slow scripting language crutches like RegEx.

I was intrigued by the idea of a "built-in" search, as suggested by Bob77 and Mark, so I put together code to implement this idea. The code uses a WinAPI call but is really pretty simple overall and supports moving forward and backward along with toggles for case sensitivity and whole words:
Option Explicit
Private Declare Function SendMessage Lib "user32" Alias "SendMessageA" (ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long
Private Const WM_USER = &H400&
Private Const EM_GETOLEINTERFACE = (WM_USER + 60)
Private Enum Direction
Forward = 1
Backward = -1
End Enum
Private Doc As ITextDocument
Private Sub Form_Load()
RichTextBox1.HideSelection = False
RichTextBox1.Text = "is This is a lot of text that will be loaded in the lbText and we will " & _
"search it is it a case sensative Search" & vbCr & vbCr & _
"is and IS and is"
SearchTerm.Text = "is"
Dim Unknown As IUnknown
SendMessage RichTextBox1.hwnd, EM_GETOLEINTERFACE, 0&, Unknown
Set Doc = Unknown
End Sub
Private Sub cmdForward_Click()
Match SearchTerm.Text, chkWhole.Value, chkCase.Value, Forward
End Sub
Private Sub cmdBack_Click()
Match SearchTerm.Text, chkWhole.Value, chkCase.Value, Backward
End Sub
Private Sub Match(ByVal SearchTerm As String, ByVal WholeWords As Integer, ByVal CaseSensitive As Integer, ByVal Direction As Direction)
Dim Flags As Long
Flags = 2 * WholeWords + 4 * CaseSensitive
Doc.Selection.FindText SearchTerm, Direction * Doc.Selection.StoryLength, Flags
End Sub
You will need to add a reference to RICHED20.dll using the "Browse..." button in Project|References.

You might find following InStrAll implementation based on VBScript.RegExp useful.
Option Explicit
Private Sub Form_Load()
Const STR_TEXT As String = "This is a lot of text that will be loaded in the lbText and we will search it is it a case sensative Search"
Dim vElem As Variant
For Each vElem In InStrAll(STR_TEXT, "is")
Debug.Print vElem, Mid$(STR_TEXT, vElem, 2)
Next
End Sub
Public Function InStrAll(sText As String, sSearch As String, Optional ByVal Compare As VbCompareMethod = vbTextCompare) As Variant
Dim lIdx As Long
Dim vRetVal As Variant
With CreateObject("VBScript.RegExp")
.Global = True
.IgnoreCase = (Compare <> vbBinaryCompare)
.Pattern = "[.*+?^${}()/|[\]\\]"
.Pattern = "\b" & .Replace(sSearch, "\$&") & "\b"
With .Execute(sText)
If .Count = 0 Then
vRetVal = Array()
ElseIf .Count = 1 Then
vRetVal = Array(.Item(0).FirstIndex + 1)
Else
ReDim vRetVal(0 To .Count - 1) As Variant
For lIdx = 0 To .Count - 1
vRetVal(lIdx) = .Item(lIdx).FirstIndex + 1
Next
End If
End With
End With
InStrAll = vRetVal
End Function
The point is first you have to escape the searched string (prefix all regex control symbols with backslash) and then wrap this escaped pattern with \bs before performing the "global" search for all matches.
The InStrAll function returns an array of indexes in the original text. It's up to your to impl actual color-coded highlighting in the RichTextBox control of your choice. (I would set the background color, not the foreground of the found snippet, if I had a choice. Notice how most browsers use yellow background for search results highlighting.)

For a non-Regex approach, try the following:
Option Explicit
Private Sub Form_Load()
RichTextBox1.Text = "is This is a lot of text that will be loaded in the lbText and we will " & _
"search it is it a case sensative Search" & vbCr & vbCr & _
"is and is and is"
End Sub
Private Sub Command1_Click()
Dim SearchTerm As String
Dim SearchIndex As Integer
SearchTerm = "is"
SearchIndex = 1
Do
SearchIndex = InStr(SearchIndex, RichTextBox1.Text, SearchTerm)
If isMatch(SearchIndex, SearchTerm) Then
RichTextBox1.SelStart = SearchIndex - 1
RichTextBox1.SelLength = Len(SearchTerm)
RichTextBox1.SelColor = vbRed
End If
If SearchIndex > 0 Then SearchIndex = SearchIndex + Len(SearchTerm)
Loop Until SearchIndex = 0
End Sub
Private Function isMatch(ByVal SearchIndex As Long, ByVal SearchTerm As String) As Boolean
If SearchIndex = 1 Then
If Mid(RichTextBox1.Text, SearchIndex + Len(SearchTerm), 1) = " " Then isMatch = True
ElseIf SearchIndex + Len(SearchTerm) >= Len(RichTextBox1.Text) Then
If Mid(RichTextBox1.Text, SearchIndex - 1, 1) = " " Then isMatch = True
ElseIf SearchIndex > 1 Then
If (Mid(RichTextBox1.Text, SearchIndex - 1, 1) = " " Or Mid(RichTextBox1.Text, SearchIndex - 1, 1) = vbCr) And Mid(RichTextBox1.Text, SearchIndex + Len(SearchTerm), 1) = " " Then isMatch = True
End If
End Function
As noted in the comments, the original code had limitations. The code now supports matches at the beginning and end of the text, along with embedded breaks. You will likely need to add more checks to the isMatch method.

Related

Crawling web page links through Visaul Basic 2008

I have tried to convert one of my Visual Basic 6 programs to Visual Basic.NET. This program basically crawls email and mobile number data from a web page link. This program works great in Visual Basic 6 but after converting to Visual Basic.NET is not providing any function or result in .NET.
Imports VBScript_RegExp_55
Imports System.Collections.Specialized
Imports System.Diagnostics
Public Class autoextratorform
Dim emailsearch As New ListViewItem()
Dim numbersearch As New ListViewItem
Private Declare Function SendMessageByString _
Lib "user32" _
Alias "SendMessageA" _
(ByVal hwnd As Long, _
ByVal wMsg As Long, _
ByVal wParam As Long, _
ByVal lParam As String) _
As Long
Private Const LB_SELECTSTRING = &H18C
Private Declare Function ShellExecute Lib "shell32.dll" Alias "ShellExecuteA" (ByVal hwnd As Long, ByVal lpOperation As String, ByVal lpFile As String, ByVal lpParameters As String, ByVal lpDirectory As String, ByVal nShowCmd As Long) As Long
Private WithEvents cIE As InternetHtml
Private WithEvents cExtLinks As cExtractLinks
Private cGSearch As cGoogleSearch
Dim Excel As Object
Dim Excelsheet As Object
Dim a() As String
Dim b() As String
Dim c() As String
Dim i As Integer
' Needs reference to Microsoft VBscript Regular Expressions I recomend ver 5.5.
'
Dim baseurl As String ' var to store base url so we can build the full path
Dim dVisited ' Dictionary to hold visited urls = I think Missing here
Dim dEmail ' dictionary to hold emails = I think Missing here
Dim dnumber ' = I think Missing here
Dim dweb '= I think Missing here
' We are putting the emails in a list also, for user feed back
'It would be less momery intensive and faster to just keep these in the dictionry object
'which allows to easily tell if the email already exist
Dim regxPage ' var to hold regular expression to extract urls
Dim regxEmail ' var to hold regular expression to extract emails
Dim regnumber
Dim regweb
Dim Match, Matches As String ' we use these to store are regx matches
Dim Match1, Matches1 As String
Dim Match2, Matches2 As String
' Regular expressions are super powerfull and have been a part of unix for a long time
' goto the form load event to see the regex initialization
' to learn more about regular expressions and to download the latest scripting runtime see
' http://msdn.microsoft.com/scripting/default.htm?/scripting/vbscript/techinfo/vbsdocs.htm
Dim email, pageurl, mnumber, sweb As String
' The above is only because dictionary.exist does not work directly on Match var
Dim stopcrawl As Integer ' Used to exit crawl loop
Private Sub ComboBox1_SelectedIndexChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ComboBox1.SelectedIndexChanged
Select Case ComboBox1.SelectedIndex
Case 0
ListBox2.Items.Clear()
ListBox2.Items.Add("http://classads.jang.com.pk/default.asp?cityid=1")
ListBox2.Items.Add("http://www.merapakistan.com/classifieds/")
ListBox2.Items.Add("http://olx.com.pk/")
ListBox2.SelectedIndex = 0
txtStartUrl.Text = ListBox2.Text
Case 1
ListBox2.Items.Clear()
ListBox2.Items.Add("http://www.locanto.com.pk/")
ListBox2.Items.Add("http://pakistanclassifiedads.com/")
ListBox2.Items.Add("http://pkr.com.pk/")
txtStartUrl.Text = ListBox2.Text
Case 2
ListBox2.Items.Clear()
ListBox2.Items.Add("http://adpost.pk/")
ListBox2.Items.Add("http://page.com.pk/")
ListBox2.Items.Add("http://www.clasf.pk/")
ListBox2.SelectedIndex = 0
txtStartUrl.Text = ListBox2.Text
Case 3
ListBox2.Items.Clear()
ListBox2.Items.Add("http://www.nelaam.com/")
ListBox2.Items.Add("http://www.asani.com.pk/")
ListBox2.Items.Add("http://freepost.com.pk/")
ListBox2.SelectedIndex = 0
txtStartUrl.Text = ListBox2.Text
End Select
End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Command1.Click
stopcrawl = 0 ' set stop crawl so we do not exit loop
If txtStartUrl.Text & "" = "" Then
lblresult.Text = ("Please Load your Desired Site List !")
Exit Sub
ElseIf txtStartUrl.Text & "" = "http://" Then
lblresult.Text = ("Please Load your Desired Site List !")
Exit Sub
ElseIf txtStartUrl.Text = "" Then
lblresult.Text = ("Please Load your Desired Site List !")
Exit Sub
End If
' the above should really check for a valid url, but I am a lazy PERL programmer
ListBox1.Items.Add(txtStartUrl.Text) 'add item to list
Label16.Text = (ListBox2.Items.Count - 1)
Label8.Text = (ListBox2.Items.Count)
lblresult.Text = " Start service for searching Email address and Mobile Numbers"
Command1.Enabled = False
ComboBox1.Enabled = False
LaVolpeButton4.Enabled = True
'UPGRADE_ISSUE: (2064) ComboBox property Combo1.Locked was not upgraded. More Information: https://docs.mobilize.net/vbuc/ewis#2064
LaVolpeButton8.Enabled = False
LaVolpeButton9.Enabled = False
txtStartUrl.ReadOnly = True
crawl() ' and were off
End Sub
Private Sub autoextrator_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
With listnumber
.Columns.Clear()
.Columns.Add("Count!", CInt(Width * 0.1))
.Columns.Add("Mobile Number!", CInt(Width * 0.2))
.Columns.Add("Web URL!", CInt(Width * 0.68))
End With
With listemail
.Columns.Clear()
.Columns.Add("Count!", CInt(Width * 0.1))
.Columns.Add("Email Address!", CInt(Width * 0.2))
.Columns.Add("Web URL!", CInt(Width * 0.68))
End With
'initialize dictionary and regx objects
dVisited = CreateObject("Scripting.Dictionary")
dVisited.CompareMode = CompareMethod.Binary
dEmail = CreateObject("Scripting.Dictionary")
dEmail.CompareMode = CompareMethod.Binary
dnumber = CreateObject("Scripting.Dictionary")
dnumber.CompareMode = CompareMethod.Binary
dweb = CreateObject("Scripting.Dictionary")
dweb.CompareMode = CompareMethod.Binary
Dim counting1 As String
Dim counting2 As String
regxPage = New VBScript_RegExp_55.RegExp() ' Create a regular expression.
regxPage.Pattern = "HREF=""[^""#]+[.][^""#]+" ' Set pattern."
regxPage.IgnoreCase = True ' Set case insensitivity.
regxPage.Global = True ' Set global applicability.
regxEmail = New VBScript_RegExp_55.RegExp() ' Create a regular expression.
regxEmail.Pattern = "\b[a-z0-9._%+-]+#[a-z0-9.-]+\.[a-z]{2,4}\b" ' Set pattern."
regxEmail.IgnoreCase = True ' Set case insensitivity.
regxEmail.Global = True ' Set global applicability.
regnumber = New VBScript_RegExp_55.RegExp()
counting1 = "(\+92)-?\d{3}-?\d{7}"
counting2 = "(?:0092|0|\+92)-?\d{3}-?\d{7}"
regnumber.Pattern = counting1
regnumber.Pattern = counting2
regnumber.IgnoreCase = True
regnumber.Global = True
cIE = New InternetHtml()
cGSearch = New cGoogleSearch()
cExtLinks = New cExtractLinks()
End Sub
Sub crawl()
While ListBox1.Items.Count > 0 ' loop while list has data
If stopcrawl = 1 Then GoTo exitcrawl
getpage(ListBox1.Items(0)) ' This is the heart of the prog, except for the regx = I think Missing here
' stuff in the form load event
ListBox1.Items.Remove(0) ' remove item from list
End While
ListBox2.Items.Remove(ListBox2.SelectedIndex)
If ListBox2.SelectedIndex < ListBox2.Items.Count - 1 Then
ListBox2.SelectedIndex += 1
txtStartUrl.Text = ListBox2.Items.ToString
exitcrawl:
If ListBox2.Items.Count = 0 Then
MessageBox.Show("Searching has completed !", "Done ! searching", MessageBoxButtons.OK, MessageBoxIcon.Information)
txtStartUrl.ReadOnly = False
txtStartUrl.Text = ""
Label3.Text = "-----------------------------------------------------------"
LaVolpeButton4.Enabled = False
Command1.Enabled = True
LaVolpeButton8.Enabled = True
LaVolpeButton9.Enabled = True
End If
End If
End Sub
Sub getpage(ByVal page As String)
On Error Resume Next
If dVisited.Contains(page) Then
Exit Sub
Else
dVisited.Add(page, 1) ' add page to dVisited dictionary
Label6.Text = CStr(ListBox1.Items.Count)
Label14.Text = CStr(dVisited.Count)
Label3.Text = page
End If
baseurl = getpath(page) ' build full url - see getpath
TextBox1.Text = ""
If ListBox1.Items.Count > 5000 Then Exit Sub ' sets the maximum cache (so we don't run out of mem)
Using wc As New System.Net.WebClient
TextBox1.Text = wc.DownloadString(page)
End Using
Matches = regxPage.Execute(TextBox1.Text) ' Execute search.
For Each Me.Match In Matches
pageurl = Match
If InStr(1, pageurl, "http://", vbTextCompare) Then
If dVisited.Exists(pageurl) = False Then ListBox1.Items.Add(Mid(pageurl, 7))
Else
If dVisited.Exists(baseurl & Mid(pageurl, 7)) = False Then ListBox1.Items.Add(baseurl & Mid(pageurl, 7))
End If
Next
' search for email
Matches = regxEmail.Execute(TextBox1.Text) ' Execute search.
For Each Me.Match In Matches ' Iterate Matches collection.
' check if email exist
email = Match
Debug.Print(email & dEmail.Exists(email))
If dEmail.Exists(email) = False Then
dEmail.Add((email), 1)
Dim d As Integer
d = listemail.Items.Count
d = d + 1
emailsearch = listemail.Items.Add(d)
emailsearch.SubItems.Add(email.ToString)
emailsearch.SubItems.Add(pageurl.ToString)
ListBox3.Items.Add(email)
Label10.Text = listemail.Items.Count
End If
Next
Matches1 = regnumber.Execute(TextBox1.Text)
For Each Me.Match1 In Matches1
mnumber = Match1
Debug.Print(mnumber & dnumber.Exists(mnumber))
If dnumber.Exists(mnumber) = False Then
dnumber.Add((mnumber), 1)
Dim c As Integer
c = listnumber.Items.Count
c = c + 1
numbersearch = listnumber.Items.Add(c)
numbersearch.SubItems.Add(mnumber.ToString)
numbersearch.SubItems.Add(pageurl.ToString)
Label12.Text = listnumber.Items.Count
End If
Next
End Sub
Function getpath(ByVal URL As String) As String
' look for the last / and get a string up to that location
Dim lastbar As Integer = URL.LastIndexOf("/") + 1
Dim tmppath As String = URL.Substring(0, Math.Min(lastbar, URL.Length))
If tmppath = "http://" Then tmppath = URL ' full path already so return url
Return tmppath
End Function
Public Function StripOut(ByVal From As String, ByVal What As String) As String
Dim result As String = ""
result = From
For i As Integer = 1 To Strings.Len(What)
result = result.Replace(What.Substring(i - 1, Math.Min(1, What.Length - (i - 1))), "")
Next i
Return result
End Function
Private Sub LaVolpeButton4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles LaVolpeButton4.Click
stopcrawl = 1
MsgBox(" Service has Stoped by USER !")
LaVolpeButton4.Enabled = False
Command1.Enabled = True
End Sub
Private Sub LaVolpeButton8_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles LaVolpeButton8.Click
Dim strData As String = ""
Dim strQ() As String
Dim intFile As Integer
' Set up the Common Dialog initial Directory
' to the application path. Filter for Text and All files
' Prompt the user to select a file
'
cmd.InitialDirectory = My.Application.Info.DirectoryPath
cmd.Filter = "Text Files(*.txt)|*.txt|All Files(*.*)|*.*"
cmd.Title = "Select Email Addresses File"
cmd.ShowDialog()
If cmd.FileName <> "" Then
'
' If the user selected a file
' open and read the entire contents
' then split it into records
'
intFile = FileSystem.FreeFile()
FileSystem.FileOpen(intFile, cmd.FileName, OpenMode.Input)
strData = FileSystem.InputString(intFile, FileSystem.LOF(intFile))
FileSystem.FileClose(intFile)
strQ = strData.Split(CChar(Environment.NewLine))
'
' Populate the texbox array with the questions
' (either the number of textboxes in the control array or
' number of questions, which ever is the smaller)
For Each strQ_item As String In strQ
ListBox2.Items.Add(strQ_item)
Next strQ_item
txtStartUrl.Text = ListBox2.Text
End If
End Sub
End Class
2- This is Visual basic 6 Code that works perfect.
Dim baseurl As String ' var to store base url so we can build the full path
Dim dVisited As Dictionary ' Dictionary to hold visited urls
Dim dEmail As Dictionary ' dictionary to hold emails
Dim dnumber As Dictionary
Dim dweb As Dictionary
Sub crawl()
While List1.ListCount > 0 ' loop while list has data
If stopcrawl = 1 Then GoTo exitcrawl
getpage List1.List(0) ' This is the heart of the prog, except for the regx
' stuff in the form load event
List1.RemoveItem (0) ' remove item from list
Wend
List2.RemoveItem (List2.ListIndex)
If List2.ListIndex < List2.ListCount - 1 Then
List2.ListIndex = List2.ListIndex + 1
txtStartUrl = List2.Text
Command1_Click
Else
exitcrawl:
If List2.ListCount = 0 Then
MsgBox "Searching has completed !", vbInformation, "Done ! searching"
WebBrowser1.Stop
txtStartUrl.Locked = False
txtStartUrl.Text = ""
Label3.Caption = "-----------------------------------------------------------"
LaVolpeButton4.Enabled = False
Command1.Enabled = True
LaVolpeButton8.Enabled = True
LaVolpeButton9.Enabled = True
End If
End If
End Sub

Get Full Range Text to String

I'm writing a script that looks through my outgoing emails and searches for frequent stylistic errors I make. It locates them using regex and then highlights them yellow. Code:
Public Sub highlightBadForm()
Dim oWordDoc As Object
Dim oMatches As Object
Dim oRange As Range
Dim strText As String
Dim lngFindFrom As Long
Dim varMtch As Variant
Set oWordDoc = Application.ActiveInspector.WordEditor
strText = LCase(oWordDoc.Range.Text)
lngFindFrom = InStr(strText, "from: ")
If lngFindFrom > 0 Then
strText = Left(strText, lngFindFrom - 1)
End If
Set oMatches = extractMatches(strText, getBadStrs)
If Not oMatches Is Nothing Then
For Each varMtch In oMatches
Set oRange = oWordDoc.Range(varMtch.firstindex, varMtch.firstindex + varMtch.Length)
oRange.HighlightColorIndex = wdYellow
Next varMtch
End If
Set oRange = Nothing
Set oWordDoc = Nothing
Set oMatches = Nothing
End Sub
extractMatches is a private function implementing VBA's RegEx engine. getBadStrs returns the regular expression containing the errors.
It all works unless I've embedded hyperlinks in my email. If so, oWordDoc.Range.Text returns only the anchor text of the links, not the links (and any other characters Word pads the hyperlinks with - I don't know what they might be). As a result, varMtch.firstindex is correct for strText but not oRange so the text it highlights is offset by several characters.
I tried to assemble the full oRange text by looping through the hyperlinks in oRange and adding the link text to the string assuming it would be included in oRange. Something like:
Dim lngEndLnk as Long
Set oRange = oWordDoc.Range
For Each varMtch In oRange.Hyperlinks
strText = strText & oWordDoc.Range(lngEndLnk, varMtch.Range.Start)
strText = strText & varMtch.TextToDisplay & varMtch.Name
lngEndLnk = varMtch.Range.End
Next varMtch
If lngEndLnk = 0 Then
strText = oRange.text
Else
strText = strText & oWordDoc.Range(lngEndLnk, oWordDoc.Range.End)
End If
That reduced the offset, but there still is one. Also, if I were to include a linked image in the email, the .Anchor property of varMtch fails so I'd have to come up with another workaround.
Is there a more straightforward way to get a String containing all the characters of the Range object so the regex indices line up?
You can access the hyperlink address using the hyperlinks collection of a document:
Private Sub CommandButton1_Click()
strtext = ActiveDocument.Range.Text
MsgBox (strtext)
For Each hLink In Documents(1).Hyperlinks
MsgBox (hLink.Address)
Next hLink
End Sub
This first displays all the text in a document, and then loops through each hyperlink displaying its URL.
This can then be used through your RegEx.
For more information and examples, see hyperlinks.
I ended up with a similar solution to #slightly snarky. I don't know that it's better so I won't mark it as the solution. Happy for comments on pros and cons, in case there's a clear winner I'm just not seeing.
Personally, I like looping the character collection and probably should use it in my code, this works. I find using the position array to highlight matches much less intuitive than constructing a string from the range. For my purposes padding the string with # in place of the zero-length characters in oWordDoc.Range works, but I also know it won't work for everybody.
Public Sub highlightBadForm()
Dim oWordDoc As Object
Dim oMatches As Object
Dim oRange As Range
Dim strText As String
Dim lngFindFrom As Long, lngC As Long, lngPrevLen As Long
Dim varMtch As Variant
Set oWordDoc = Application.ActiveInspector.WordEditor
For lngC = 0 To oWordDoc.Range.End - 1
strText = strText & oWordDoc.Range(lngC, lngC + 1)
If Len(strText) = lngPrevLen Then
strText = strText & "#"
End If
lngPrevLen = lngPrevLen + 1
Next lngC
strText = LCase(strText)
lngFindFrom = InStr(strText, "from: ")
If lngFindFrom > 0 Then
strText = Left(strText, lngFindFrom - 1)
End If
Set oMatches = extractMatches(strText, getBadStrs)
If Not oMatches Is Nothing Then
For Each varMtch In oMatches
Set oRange = oWordDoc.Range(varMtch.FirstIndex, varMtch.FirstIndex + varMtch.Length)
oRange.HighlightColorIndex = wdYellow
Next varMtch
End If
Set oRange = Nothing
Set oWordDoc = Nothing
Set oMatches = Nothing
End Sub
The key to this seems to be that when you iterate through a Range looking at each "position" in the range, e.g. via something like
With ActiveDocument.Range
For i = 0 to .End - 1
Debug.Print i,Ascw(.Range(i,i+1).Text)
Next
End With
The Range does contain all the characters in the code of a field such as HYPERLINK field, and all the characters in its result (which might be displayed or it might be hidden text). But in some cases a Range may contain additional characters which are never displayed. For example, if you have a field code such as { SET x 123 } then the Range contains what are in effect the field braces and the code " SET X 123 ", but before the field end brace it also contains a marker followed by the value "123". But the SET field does not display its result.
That makes it difficult to construct a "find" string that's the same length as the Range.
But Range.Text is the same text as the concatenation of all the characters in Range.Characters, and each Character in that Collection is a Range that contains the .Start position
So that lets us get the .Start and .End as the following example shows.
This assumes you are working with the ActiveDocument in Word, and have some text, a HYPERLINK field (say), and possibly other fields, with the text "test1" in various places.
I haven't done much testing so it may still need tweaking.
Sub HighlightFinds()
Dim match As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim rng1 As Word.Range
Dim rng2 As Word.Range
Set rng1 = ActiveDocument.Content
Set rng2 = ActiveDocument.Content ' or rng1.Duplicate
' When you do this, rng1.Text returns the text of the field *codes* but
' not the field *results*, and so does rng1.Characters
'rng1.TextRetrievalMode.IncludeFieldCodes = True
' when you do this, it returns the *results* but not the *codes*
rng1.TextRetrievalMode.IncludeFieldCodes = False
' You could do both, one after the other, to try to get all the matches
' You might also need to set .TextRetrievalMode.IncludeHiddenText
With New VBScript_RegExp_55.RegExp
.Pattern = "test1"
.Global = True
Set matches = .Execute(rng1.Text)
End With
For Each match In matches
rng2.SetRange rng1.Characters(match.FirstIndex + 1).Start, rng1.Characters(match.FirstIndex + 1 + match.Length).End
rng2.HighlightColorIndex = wdYellow
Next
Set matches = Nothing
Set rng2 = Nothing
Set rng1 = Nothing
End Sub

How to extract ad sizes from a string with excel regex

I am trying to extract ad sizes from string. The ad sizes are all set standard sizes. So while I'd prefer to have a regex that looks for a pattern, IE 3 numbers followed by 2 or 3 numbers, hard coding it will also work, since we know what the sizes will be. Here's an example of some of the ad sizes:
300x250
728x90
320x50
I was able to find some VBScript that I modified that almost works, but because my strings that I'm searching are inconsistent, it's pulling too much in some cases. For example:
You see how it's not matching correctly in every instance.
The VB code I found is actually matching everything EXCEPT that ad sizes. I don't know enough about VBScript to reverse it to just look for ad sizes and pull them. So instead it looks for all other text and removes it.
The code is below. Is there a way to fix the Regex so that it just returns the ad sizes?
Function getAdSize(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "([^300x250|728x90])"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
getAdSize = regEx.Replace(strInput, strReplace)
Else
getAdSize = "Not matched"
End If
End If
End Function
NOTE, THE DATA IS NOT ALWAYS PRECEDED BY AN UNDERSCORE, SOMETIMES IT IS A DASH OR A SPACE BEFORE AND AFTER.
EDIT: Since it's not actually underscore delimited we can't use Split. We can however iterate over the string and extract the "#x#" manually. I have updated the code to reflect this and verified that it works successfully.
Public Function ExtractAdSize(ByVal arg_Text As String) As String
Dim i As Long
Dim Temp As String
Dim Ad As String
If arg_Text Like "*#x#*" Then
For i = 1 To Len(arg_Text) + 1
Temp = Mid(arg_Text & " ", i, 1)
If IsNumeric(Temp) Then
Ad = Ad & Temp
Else
If Temp = "x" Then
Ad = Ad & Temp
Else
If Ad Like "*#x#*" Then
ExtractAdSize = Ad
Exit Function
Else
Ad = vbNullString
End If
End If
End If
Next i
End If
End Function
Alternate version of the same function using Select Case boolean logic instead of nested If statements:
Public Function ExtractAdSize(ByVal arg_Text As String) As String
Dim i As Long
Dim Temp As String
Dim Ad As String
If arg_Text Like "*#x#*" Then
For i = 1 To Len(arg_Text) + 1
Temp = Mid(arg_Text & " ", i, 1)
Select Case Abs(IsNumeric(Temp)) + Abs((Temp = "x")) * 2 + Abs((Ad Like "*#x#*")) * 4
Case 0: Ad = vbNullString 'Temp is not a number, not an "x", and Ad is not valid
Case 1, 2, 5: Ad = Ad & Temp 'Temp is a number or an "x"
Case 4, 6: ExtractAdSize = Ad 'Temp is not a number, Ad is valid
Exit Function
End Select
Next i
End If
End Function
I have managed to make about 95% of the required answer - the RegEx below will remove the DDDxDD size and would return the rest.
Option Explicit
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
E.g. you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
From here you can continue trying to find a way to reverse the display.
Edit:
To go from the 95% to the 100%, I have asked a question here and it turns out that the conditional block should be changed to the following:
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
This formula could work if it's always 3 characters, then x, and it's always between underscores - adjust accordingly.
=iferror(mid(A1,search("_???x*_",A1)+1,search("_",A1,search("_???x*_",A1)+1)-(search("_???x*_",A1)+1)),"No match")

VBA: REGEX LOOKBEHIND MS ACCESS 2010

I have a function that was written so that VBA can be used in MS Access
I wish to do the following
I have set up my code below. Everything before the product works perfectly but trying to get the information behind just returns "" which is strange as when i execute it within Notepad++ it works perfectly fine
So it looks for the letters MIP and one of the 3 letter codes (any of them)
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
' PART 1
' If MIP appears in the string, then delete any of the following codes if they exist - DOM, DOX, DDI, ECX, LOW, WPX, SDX, DD6, DES, BDX, CMX,
' WMX, TDX, TDT, BSA, EPA, EPP, ACP, ACA, ACE, ACS, GMB, MAL, USP, NWP.
' EXAMPLE 1. Flagged as: MADHUBESOMIPTDTLTRCOYORGLEJ, should be MADHUBESOMIPLTRCOYORGLEJ
Do While regexp(StringToCheck, "MIP(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False) <> ""
' SELECT EVERYTHING BEFORE THE THREE LETTER CODES
strPart1 = regexp(StringToCheck, ".*^[^_]+(?=DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False)
' SELECT EVERYTHING AFTER THE THREE LETTER CODES
strPart2 = regexp(StringToCheck, "(?<=(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX).*", False)
StringToCheck = strPart1 & strPart2
Loop
The function i am using which i have taken from the internet is below
Function regexp(StringToCheck As Variant, PatternToUse As String, Optional CaseSensitive As Boolean = True) As String
On Error GoTo RefErr:
Dim re As New regexp
re.Pattern = PatternToUse
re.Global = False
re.IgnoreCase = Not CaseSensitive
Dim m
For Each m In re.Execute(StringToCheck)
regexp = UCase(m.Value)
Next
RefErr:
On Error Resume Next
End Function
Just do it in two steps:
Check if MIP is in the string
If it is, remove the other codes.
Like this:
Sub Test()
Dim StringToCheck As String
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
Debug.Print StringToCheck
Debug.Print CleanupString(StringToCheck)
End Sub
Function CleanupString(str As String) As String
Dim reCheck As New RegExp
Dim reCodes As New RegExp
reCheck.Pattern = "^(?:...)*?MIP"
reCodes.Pattern = "^((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
reCodes.Global = True
If reCheck.Test(str) Then
While reCodes.Test(str)
str = reCodes.Replace(str, "$1")
Wend
End If
CleanupString = str
End Function
Note that the purpose of (?:...)*? is to group the letters in threes.
Since the VBScript regular expression engine does support look-aheads, you can of course also do it in a single regex:
Function CleanupString(str As String) As String
Dim reClean As New RegExp
reClean.Pattern = "^(?=(?:...)*?MIP)((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
While reClean.Test(str)
str = reClean.Replace(str, "$1")
Wend
CleanupString = str
End Function
Personally, I like the two-step check/remove pattern better because it is a lot more obvious and therefore more maintainable.
Non RE option:
Function DeMIPString(StringToCheck As String) As String
If Not InStr(StringToCheck, "MIP") Then
DeMIPString = StringToCheck
Else
Dim i As Long
For i = 1 To Len(StringToCheck) Step 3
Select Case Mid$(StringToCheck, i, 3)
Case "MIP", "DOM", "DOX", "DDI", "ECX", "LOW", "WPX", "SDX", "DD6", "DES", "BDX", "CMX", "WMX", "TDX", "TDT", "BSA", "EPA", "EPP", "ACP", "ACA", "ACE", "ACS", "GMB", "MAL", "USP", "NWP":
Case Else
DeMIPString = DeMIPString & Mid$(StringToCheck, i, 3)
End Select
Next
End If
End Function

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function