Matching Regex Patterns with symbols - regex

I'm looking to match a regex pattern. I know that my code will not work if there's any symbol character (except for _ ) at the start of the word. How do I allow it to match words that start with any symbol?
Here's my code so far:
Set objFso = CreateObject("Scripting.FileSystemObject")
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(\nC_PIN\s)((\b[^\s]+\b\s){3})(\b[^\s]+\b\s)(\b[^\s]+\b\s)(\b\d\b\s)"
.Global = True
Set objFil = objFso.OpenTextFile(infilename)
strAll = objFil.ReadAll
Set objFil1 = objFso.createtextfile(outfilename)
strAll = .Replace(strAll, "$1$2$4 $5 $6 ")
End With
objFil.Close
objFil2.Close

\b[^\s]+\b does not make sense, cause \b is an anchor searching for begin or end of a word. And word is always [a-zA-Z0-9_].
I think you can only use something like:
[^a-zA-Z0-9_!"§$%&...][a-zA-Z0-9_!"§$%&...]
where you should insert all symbol characters you allow for a word (in your interpretation).

I managed to get the right code. The code used to match a word beginning with a symbol is (or a negative number);
Set objFso = CreateObject("Scripting.FileSystemObject")
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(\nC_PIN\s)(\b[^\s]+\b\s)([\S]+\b\s)([\S]+\b\s)(\b[^\s]+\b\s)(\b[^\s]+\b\s)(\b\d\b\s)"
.Global = True
Set objFil = objFso.OpenTextFile(infilename)
strAll = objFil.ReadAll
Set objFil1 = objFso.createtextfile(outfilename)
strAll = .Replace(strAll, "$1$2$4 $5 $6 ")
End With
objFil.Close
objFil2.Close

Related

How to replace Numbers in Parentheses with some calculations in MS Word

I have a problem to replace some serial number such as [30] [31] [32]... to [31] [32] [33]... in MS word when I insert a new references in the middle of article. I have not found a solution way in GUI so I try to use VBA to do that replacement. I find a similar problem in stack overflow:
MS Word Macro to increment all numbers in word document
However, this way is a bit inconvenient because it have to generate some replacement array in other place. Can I make that replacement with regex and some function in MS Word VBA like code below?
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
End With
'How to do some calculations with $1?
Selection.Text = regExp.Replace(Selection.Text, "[$1]")
End Sub
But I don't know how to do some calculations with $1 in regExp? I have try use "[$1+1]" but it return [31+1] [32+1] [33+1]. Can anyone help? Thanks!
It is impossible to pass a callback function to the RegExp.Replace, so you have the only option: use RegExp.execute and process matches in a loop.
Here is an example code for your case (I took a shortcut since you only have the value to modify inside known delimiters, [ and ].)
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})]"
.Global = True
End With
'How to do some calculations with $1?
' Removing regExp.Replace(Selection.Text, "[$1]")
For Each m In regExp.Execute(Selection.Text)
Selection.Text = Left(Selection.Text, m.FirstIndex+1) _
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) _
& Mid(Selection.Text, m.FirstIndex + Len(m.Value))
Next m
End Sub
Here,
Selection.Text = Left(Selection.Text, m.FirstIndex+1) - Get what is before
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) - Add 10 to the captured number
& Mid(Selection.Text, m.FirstIndex + Len(m.Value)) - Append what is after the capture
That should do the trick :
Sub IncrementWithRegex()
Dim Para As Paragraph
Set Para = ThisDocument.Paragraphs.First
Dim ParaNext As Paragraph
Dim oRange As Range
Set oRange = Para.Range
Dim regEx As New RegExp
Dim regMatch As Variant
Dim ACrO As String
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[\[]([0-9]{2})[\]]"
End With
Do While Not Para Is Nothing
Set ParaNext = Para.Next
Set oRange = Para.Range
'Debug.Print oRange.Text
If regEx.test(oRange.Text) Then
For Each regMatch In regEx.Execute(oRange.Text)
oRange.Text = _
Left(oRange.Text, _
InStr(1, oRange.Text, CStr(regMatch))) & _
CDbl(regMatch) + 1 & _
Right(oRange.Text, _
Len(CStr(regMatch)) + InStr(1, oRange.Text, CStr(regMatch)))
Next regMatch
Else
End If
Set Para = ParaNext
Loop
End Sub
To use this, remember to add the reference :
Description: Microsoft VBScript Regular Expressions 5.5
FullPath: C:\windows\SysWOW64\vbscript.dll\3
Major.Minor: 5.5
Name: VBScript_RegExp_55
GUID: {3F4DACA7-160D-11D2-A8E9-00104B365C9F}
Here is a simple VBA macro you can use to achieve this :
Sub IncrementNumbers()
Dim regExp As Object
Dim i As Integer
Dim fullMatch As String
Dim subMatch As Integer
Dim replacement As String
Const TMP_PREFIX As String = "$$$"
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
.MultiLine = True
End With
'Ensure selected text match our regex
If regExp.test(Selection.Text) Then
'Find all matches
Set matches = regExp.Execute(Selection.Text)
' Start from last match
For i = 0 To (matches.Count - 1)
fullMatch = matches(i).Value
subMatch = CInt(matches(i).SubMatches(0))
'Increment by 1
subMatch = subMatch + 1
'Create replacement. Add a temporary prefix so we ensure [30] replaced with [31]
'will not be replaced with [32] when [31] will be replaced
replacement = "[" & TMP_PREFIX & subMatch & "]"
'Replace full match with [subMatch]
Selection.Text = Replace(Selection.Text, fullMatch, replacement)
Next
End If
'Now replacements are complete, we have to remove replacement prefix
Selection.Text = Replace(Selection.Text, TMP_PREFIX, "")
End Sub

Filter items with an email body that contains a less than symbol `<`

I'm trying to filter items with an email body that contains a less than symbol <.
Here is a sample email body that contains less than symbol.
Our drive E: is now < 10%.
Sub CodeSubjectForward(Item As Outlook.MailItem)
Dim M1 As MatchCollection
Dim M As Match
Set Reg1 = New RegExp
With Reg1
.Pattern = "([<]\s*(\w*)\s*)"
.Global = True
End With
If Reg1.Test(Item.Body) Then
Set M1 = Reg1.Execute(Item.Body)
For Each M In M1
Next
End If
Item.Save
Set myForward = Item.Forward
myForward.Recipients.Add "alias#domain.com"
myForward.Send
End Sub
Should be something like this
Public Sub FWItem(Item As Outlook.mailitem)
Dim Email As Outlook.mailitem
Dim Matches As Variant
Dim RegExp As Object
Dim Pattern As String
Set RegExp = CreateObject("VbScript.RegExp")
If TypeOf Item Is Outlook.mailitem Then
Pattern = "(\s[<]\s)"
With RegExp
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Item.Body)
End With
If Matches.Count > 0 Then
Debug.Print Item.subject ' Print on Immediate Window
Set Email = Item.Forward
Email.subject = Item.subject
Email.Recipients.Add "0m3r#Email.com"
Email.Save
Email.Send
End If
End If
Set RegExp = Nothing
Set Matches = Nothing
Set Email = Nothing
Set Item = Nothing
End Sub
https://regex101.com/r/KOFM8E/1/

Macro to skip Column A and strips out lower case letters

In column A, I have a list of sentences
In columns B-Z, I have strings contain numbers followed by letters both uppercase and lower case.
such as
45ABc
The following macro strips all lowercase letters in the entire work sheet - do not want it to strip any letters in column A. Please help.
Sub RegExReplace()
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
For Each objCell In ActiveSheet.UsedRange.Cells
objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End Sub
Try this one:
Sub RegExReplace()
Dim objCell As Range
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
For Each objCell In ActiveSheet.UsedRange.Cells
If objCell.Column<>1 Then objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End Sub
or if you know that values that should be replaced only in columns B:Z, you can use next code as well:
Sub RegExReplace()
Dim rng As Range, objCell As Range
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
With ActiveSheet
Set rng = Application.Intersect(.UsedRange, .Range("B:Z"))
End With
If Not rng Is Nothing Then
For Each objCell In rng
objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End If
End Sub
I've added code that:
Fixes your pattern to remove what you want to remove directly - ie a-z - rather than what you want to preserve (currently A-Z-_ but could be much larger).
To use quicker arrays rather than range loops.
Sub objRegexReplace()
Dim rng1 As Range
Dim objRegex As Object
Dim X
Dim lngRow As Long
Dim lngCol As Long
Set rng1 = Application.Intersect(ActiveSheet.UsedRange, ActiveSheet.Range("B:Z"))
X = rng1.Value2
If rng1.Cells.Count > 1 Then
Set objRegex = CreateObject("VBScript.Regexp")
With objRegex
.Global = True
.Pattern = "[a-z]+"
.ignorecase = False
For lngRow = 1 To UBound(X, 1)
For lngCol = 1 To UBound(X, 2)
X(lngRow, lngCol) = .Replace(X(lngRow, lngCol), vbNullString)
Next
Next
rng1.Value2 = X
End With
Else
MsgBox "No range to work on", vbCritical
End If
End Sub

Using Regular expression in VBA

This is my sample record in a Text format with comma delimited
901,BLL,,,BQ,ARCTICA,,,,
i need to replace ,,, to ,,
The Regular expression that i tried
With regex
.MultiLine = False
.Global = True
.IgnoreCase = False
.Pattern="^(?=[A-Z]{3})\\,{3,}",",,"))$ -- error
Now i want to pass Line from file to Regex to correct the record, can some body guide me to fix this i am very new to VBA
I want to read the file line by line pass it to Regex
Looking at your original pattern I tried using .Pattern = "^\d{3},\D{3},,," which works on the sample record as with the 3 number characters , 3 letters,,,
In the answer I have used a more generalised pattern .Pattern = "^\w*,\w*,\w*,," This also works on the sample and mathces 3 commas each preceded with 0 or more alphanumeric characters followed directly by a fourth comma. Both patterns require a match to be from the begining of the string.
Pattern .Pattern = "^\d+,[a-zA-Z]+,\w*,," also works on the sample record. It would specify that before the first comma there should be 1 or greater numeric characters (and only numeric characters) and before the second comma ther should be 1 or more letters (and only letters). Before the 3rd comma there could be 0 or more alphanumeric characters.
The left function removes the rightmost character in the match ie. the last comma to generate the string used by the Regex.Replace.
Sub Test()
Dim str As String
str = "901,BLL,,,BQ,ARCTICA,,,,"
Debug.Print
Debug.Print str
str = strConv(str)
Debug.Print str
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object
Dim oMatches As Object
Dim oMatch As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = "^\w*,\w*,\w*,,"
End With
Set oMatches = objRegEx.Execute(str)
If oMatches.Count > 0 Then
For Each oMatch In oMatches
str = objRegEx.Replace(str, Left(oMatch.Value, oMatch.Length - 1))
Next oMatch
End If
strConv = str
End Function
Try this
Sub test()
Dim str As String
str = "901,BLL,,,BQ,ARCTICA,,,,"
str = strConv(str)
MsgBox str
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object, allMatches As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = ",,,"
End With
strConv = objRegEx.Replace(str, ",,")
End Function

vbscript: replace text in activedocument with hyperlink

Starting out at a new job and I have to go through a whole lot of documents that my predecessor left. They are MS Word-files that contain information on several hundreds of patents. Instead of copy/pasting every single patent-number in an online form, I would like to replace all patent-numbers with a clickable hyperlink. I guess this should be done with vbscript (I'm not used to working with MS Office).
I have so far:
<obsolete>
This is not working for me:
1. I (probably) need to add something to loop through the ActiveDocument
2. The replace-function probably needs a string and not an object for a parameter - is there a __toString() in vbscript?
THX!
UPDATE:
I have this partially working (regex and finding matches) - now if only I could get the anchor for the hyperlink.add-method right...
Sub HyperlinkPatentNumbers()
'
' HyperlinkPatentNumbers Macro
'
Dim objRegExp, Matches, match, myRange
Set myRange = ActiveDocument.Content
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = False
.Pattern = "(WO|EP|US)([0-9]*)(A1|A2|B1|B2)"
End With
Set Matches = objRegExp.Execute(myRange)
If Matches.Count >= 1 Then
For Each match In Matches
ActiveDocument.Hyperlinks.Add Anchor:=objRegExp.match, Address:="http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=$1&NR=$2&KC=$3"
Next
End If
Set Matches = Nothing
Set objRegExp = Nothing
End Sub
Is this VBA or VBScript? In VBScript you cannot declare types like Dim newText As hyperLink, but every variable is a variant, so: Dim newText and nothing more.
objRegEx.Replace returns the string with replacements and needs two parameters passed into it: The original string and the text you want to replace the pattern with:
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.IgnoreCase = False
objRegEx.Pattern = "^(WO|EP|US)([0-9]*)(A1|A2|B1|B2)$"
' assuming plainText contains the text you want to create the hyperlink for
strName = objRegEx.Replace(plainText, "$1$2$3")
strAddress = objRegex.Replace(plainText, "http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=$1&NR=$2&KC=$3"
Now you can use strName and strAddress to create the hyperlink with.
Pro-tip: You can use objRegEx.Test(plainText) to see if the regexp matches anything for early handling of errors.
Problem solved:
Sub addHyperlinkToNumbers()
Dim objRegExp As Object
Dim matchRange As Range
Dim Matches
Dim match
Set objRegExp = CreateObject("VBScript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = False
.Pattern = "(WO|EP|US|FR|DE|GB|NL)([0-9]+)(A1|A2|A3|A4|B1|B2|B3|B4)"
End With
Set Matches = objRegExp.Execute(ActiveDocument.Content)
For Each match In Matches
'This doesn't work, because of the WYSIWYG-model of MS Word:
'Set matchRange = ActiveDocument.Range(match.FirstIndex, match.FirstIndex + Len(match.Value))
Set matchRange = ActiveDocument.Content
With matchRange.Find
.Text = match.Value
.MatchWholeWord = True
.MatchCase = True
.Wrap = wdFindStop
.Execute
End With
ActiveDocument.Hyperlinks.Add Anchor:=matchRange, _
Address:="http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&adjacent=true&locale=en_EP&CC=" _
& match.Submatches(0) & "&NR=" & match.Submatches(1) & "&KC=" & match.Submatches(2)
Next
MsgBox "Hyperlink added to " & Matches.Count & " patent numbers"
Set objRegExp = Nothing
Set matchRange = Nothing
Set Matches = Nothing
Set match = Nothing
End Sub