find multiple regex patterns using vbscript - regex

Sorry, but I am a bit new to RegEx and hope someone is able to help.
Files in questions:
Apples.A.Tasty.Treat.Author-JoeDirt.doc
Cooking with Apples Publisher-Oscar Publishing.txt
Candied.Treats.Author-JenBloc.Publisher-Event.docx
I currently use this piece of vbscript code to replace spaces or dashes in the filename with a period but I am wondering if there is a more efficient way to accomplish this?
Set colRegExMatches = strRegEx.Execute(objSourceFile.Name)
For Each objRegExMatch in colRegExMatches
strResult = InStr(objSourceFile.Name, objRegExMatch)
objTargetFile = Left(objSourceFile.Name, (strResult -1)) & objRegExMatch.Value
objTargetFile = Replace(objSourceFile.Name, " ", ".", 1, -1, 1)
objTargetFile = Replace(objSourceFile.Name, "-", ".", 1, -1, 1)
objSourceFile.Name = objTargetFile
Next
Once the script above is complete, I have the following list of files:
Apples.A.Tasty.Treat.Author-JoeDirt.doc
Cooking.with.Apples.Publisher-Oscar.Publishing.txt
Candied.Treats.Author-JenBloc.Publisher-Event.docx
Now, I want to find anything beginning with Author or Publisher and simply delete the text until the extension.
myRegEx.Pattern = (?:Author|Publisher)+[\w-]+\.
This works mostly for the files with the exception if there is an additional period to add a second part of the publisher name or year of publication or book number.
Apples.A.Tasty.Treat.doc
Cooking.with.Apples.Publishing.txt
Candied.Treats.docx
I tried this code and it seems to work but I have to specify the file extensions.
myRegEx.Pattern = (?:Author|Publisher)[\w-](\S*\B[^txt|docx|doc][\w-].)
If I try the following, it strips the extension for the Candied.Treats file
myRegEx.Pattern = (?:Author|Publisher)[\w-](\S*\B[^][\w-].)
Apples.A.Tasty.Treat.doc
Cooking.with.Apples.txt
Candied.Treats.
I have been using the RegExr Builder at http://gskinner.com/RegExr to test my patterns but am at a loss right now. Finally once my pattern is working as expected how do I use that in my vbscript? Do I simply add a new line as per below?
objTargetFile = Replace(objSourceFile.Name, "(?:Author|Publisher)[\w-](\S*\B[^txt|docx|pdf|doc][\w-].)", "", 1, -1, 1)
Thanks.
This is the new vbscript code which seems to do nothing.
strFixChars = InputBox("Do you want to replace spaces, dashes and strip tags? (Y/N)", "Confirmation")
Set strRegEx = new RegExp
For Each objSourceFile in colSourceFiles
strFileExt = objFSO.GetExtensionName(objSourceFile)
objLogFile.WriteLine "Input File: " & objSourceFile.Name
strCount = Len(objSourceFile.Name)
strRegEx.Pattern = "(?:Author|Publisher)(.+)\."
strRegEx.IgnoreCase = True
strRegEx.Global = True
Set colRegExMatches = strRegEx.Execute(objSourceFile.Name)
For Each objRegExMatch in colRegExMatches
strResult = InStr(objSourceFile.Name, objRegExMatch)
objTargetFile = Left(objSourceFile.Name, (strResult -1)) & objRegExMatch.Value
If strFixChars = "Y" Then
objTargetFile = Replace(objSourceFile.Name, " ", ".")
objTargetFile = Replace(objSourceFile.Name, "-", ".")
objTargetFile = Replace(objSourceFile.Name, "(?:Author|Publisher)(.+)\.", "")
End If
objLogFile.WriteLine "Output File: " & objTargetFile
strFileList = strFileList & vbCrlf & objTargetFile
Next
Next

A quick fix for your regex would be to use (?:Author|Publisher)(.+)\. You will have to replace the first matching group with an empty string in vbscript.

Related

Find '~XX~' within a string with specific values

I have classic ASP written in VBScript. I have a record pulled from SQL Server and the data is a string. In this string, I need to find text enclosed in ~12345~ and I need to replace with very specific text. Example 1 would be replaced with M, 2 would be replaced with A. I then need to display this on the web page. We don't know how many items will be enclosed with ~.
Example Data:
Group Pref: (To be paid through WIT)
~2.5~ % Quarterly Rebate - Standard Commercial Water Heaters
Display on webpage after:
Group Pref: (To be paid through WIT)
~A.H~ % Quarterly Rebate - Standard Commercial Water Heaters
I tried this following, but there are two many cases and this would be unrealistic to maintain. I does replace the text and display correctly.
dim strSearchThis
strSearchThis =(rsResults("PREF"))
set re = New RegExp
with re
.global = true
.pattern = "~[^>]*~"
strSearchThis = .replace(strSearchThis, "X")
end with
I am also trying this code, I can find the text contained between each ~ ~, but when displayed its the information between the ~ ~ is not changed:
dim strSearchThis
strSearchThis =(rsResults("PREF"))
Set FolioPrefData = New RegExp
FolioPrefData.Pattern = "~[^>]*~"
FolioPrefData.Global = True
FolioPrefData.IgnoreCase = True
'will contain all found instances of ~ ~'
set colmatches = FolioPrefData.Execute(strSearchThis)
Dim itemLength, found
For Each objMatch in colMatches
Select Case found
Case "~"
'ignore - doing nothing'
Case "1"
found = replace(strSearchThis, "M")
End Select
Next
response.write(strSearchThis)
You can do it without using Regular Expressions, just checking the individual characters and writing a function that handles the different cases you have. The following function finds your delimited text and loops through all characters, calling the ReplaceCharacter function defined further down:
Function FixString(p_sSearchString) As String
Dim iStartIndex
Dim iEndIndex
Dim iIndex
Dim sReplaceString
Dim sReturnString
sReturnString = p_sSearchString
' Locate start ~
iStartIndex = InStr(sReturnString, "~")
Do While iStartIndex > 0
' Look for end ~
iEndIndex = InStr(iStartIndex + 1, sReturnString, "~")
If iEndIndex > 0 Then
sReplaceString = ""
' Loop htrough all charatcers
For iIndex = iStartIndex + 1 To iEndIndex - 1
sReplaceString = sReplaceString & ReplaceCharacter(Mid(sReturnString, iIndex, 1))
Next
' Replace string
sReturnString = Left(sReturnString, iStartIndex) & sReplaceString & Mid(sReturnString, iEndIndex)
' Locate next ~
iStartIndex = InStr(iEndIndex + 1, sReturnString, "~")
Else
' End couldn't be found, exit
Exit Do
End If
Loop
FixString = sReturnString
End Function
This is the function where you will enter the different character substitutions you might have:
Function ReplaceCharacter(p_sCharacter) As String
Select Case p_sCharacter
Case "1"
ReplaceCharacter = "M"
Case "2"
ReplaceCharacter = "A"
Case Else
ReplaceCharacter = p_sCharacter
End Select
End Function
You can use this in your existing code:
response.write(FixString(strSearchThis))
You can also use a Split and Join method...
Const SEPARATOR = "~"
Dim deconstructString, myOutputString
Dim arrayPointer
deconstructString = Split(myInputString, SEPARATOR)
For arrayPointer = 0 To UBound(deconstructString)
If IsNumeric(deconstructString(arrayPointer)) Then
'Do whatever you need to with your value...
End If
Next 'arrayPointer
myOutputString = Join(deconstructString, "")
This does rely, obviously, on breaking a string apart and rejoining it, so there is a sleight overhead on string mutability issues.

Need to capture patterns and replace code in file VBScripts

The code in file abc is which needs to captured with Regex.
With TeWindow("tewindow").Tescreen("something").TeField("some")
.set "value"
.setToProperty "V"
.exist(0)
End With
This code should be replaced in abc with
'With TeWindow("tewindow").Tescreen("something").TeField("some")
myset("something_some"), "value"
mysetToProperty("something_some"), ""
myExist("something_some"), (0)
'End With
Following is the trial so far. I'm not able to make it to writing in the file.
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set testfile = objFSO.OpenTextFile("D:\test\testout4.txt", 1, True)
line = testfile.ReadAll
testfile.Close
sString = line
pat = "with[\s]{1,}tewindow\((.*?)\).tescreen\((.*?)\).tefield\((.*?)\)" '12
pat1 = "^\.[a-zA-Z]{1,}"
Call DeclareRegEx(objRE,pat)
If objRE.test(sString) Then
Set Matches = objRE.Execute(sString)
Set match = Matches(0)
intcount = match.SubMatches.Count
If intcount > 0 Then
For I = 1 To intcount-1
'If i = intcount-1 Then
objRef = objRef & match.SubMatches(I)
Next
Else '30
objRef = objRef & match.SubMatches(I) & "_"
End If
End If
call DeclareRegEx(objRE1, pat1)
If objRE1.Test(sString) Then
Set Matches1 = objRE1.Execute(sString)
For Each Match1 in Matches1
RetStr1 = Match1.Value
strplc = Right(RetStr1, Len(RetStr1) - 1)
actual = objRE1.Replace(RetStr1, "my" & strplc & "(" & objRef & ")")
MsgBox actual
Next
End If
Function DeclareRegEx(obj, pattern)
Set obj = New RegExp
obj.Global = True
obj.Multiline = True
obj.Pattern = pattern
obj.IgnoreCase = True
End Function
Suggestion for some other approach or regex is welcome.
well as the approach of finding the block , being captured by verbose regex is not seemed to be a generic in the code i tried something like the following..
take the file content into an array
2.find the line no of with and end with
3.run a loop to iterate the functions from the next line of the with till line before the end with.
it worked for me !

dysfunctional regex based vbscript being used to append multiple lines of text at a specific location in a .c file

I am learning regex and vbscript in order to append text to a .c file on a new line by adding user inputted text on a monthly basis. I removed the positive lookbehind assertion '?<=' from my pattern to void the syntax error from my previous post:
Regular expression syntax error code: 800A1399
This is the modified pattern:
re.Pattern = "(loss_pct_through_([a-zA-Z]{3,5}\d{4})\[([a-zA-Z_]{1,2}\d{1,2})\]\s=\s\d\.\d{14}[;]\n)\n(?=\}\n)"
Now I have a script run, but it does not meet its intended purpose as user input related text to be generated by the following code does not append to the .c file.
path = "<C:\Users\Parth\Desktop\C06S3000.C>"
set re = new regexp
Set objfso = CreateObject("Scripting.FileSystemObject")
If objfso.FileExists(path) Then
Set objFile = objFSO.OpenTextFile(path).ReadAll
End If
inputstr3 = inputbox("enter names of affected groups")`
grpString1 = split(inputstr3, ",")`
inputstr4 = inputbox("enter loss percentage")`
grpString2 = split(inputstr4, ",")`
ublptm = ubound(grpString1)
for i=0 to ublptm 'where lptm = loss_pct_avg_monthyear[group] = percent;'
lptmStr = lptmstr + "loss_pct_through_[" & grpString1(i) & "] = " & grpString2(i) & ";" & vbCrLf
next
re.Pattern = "(loss_pct_through_([a-zA-Z]{3,5}\d{4})\[([a-zA-Z_]{1,2}\d{1,2})\]\s=\s\d\.\d{14}[;]\n)\n(?=\}\n)"
objFile = re.Replace(objFile, vbCrLf & lptmstr & vbCrLf)
For reference, the .c file is supposed to be updated like so:
Original file:
loss_pct_through_nov2015[a4] = 0.13155605112872;
loss_pct_through_nov2015[a5] = 0.23415898757080;
loss_pct_through_dec2015[a2] = 0.00283148378304;
loss_pct_through_dec2015[a3] = 0.39331380134641;
loss_pct_through_dec2015[a4] = 0.56333929692615;
loss_pct_through_dec2015[a5] = 0.04051541794440; <-append content from here
\n <-regex search for this newline character
}
Updated file:
loss_pct_through_nov2015[a4] = 0.13155605112872;
loss_pct_through_nov2015[a5] = 0.23415898757080;
loss_pct_through_dec2015[a2] = 0.00283148378304;
loss_pct_through_dec2015[a3] = 0.39331380134641;
loss_pct_through_dec2015[a4] = 0.56333929692615;
loss_pct_through_dec2015[a5] = 0.04051541794440;
\n <--new newline character replacing the old one to append content below
loss_pct_through_jan2016[a2] = 0.04051541794440;
loss_pct_through_jan2016[a4] = 0.04051541794440;
}
For one thing this code:
If objfso.FileExists(path) Then
Set objFile = objFSO.OpenTextFile(path).ReadAll
End If
should give you an error, because you're reading a string from a file, but try to assign it to a variable using the Set keyword, which is only for assigning objects.
If you don't get an error you most likely have an On Error Resume Next in your code. Remove that.
Change the above code to this so that you a) have a proper assignment, and b) don't use a misleading variable name:
If objfso.FileExists(path) Then
txt = objFSO.OpenTextFile(path).ReadAll
End If
Also, I'd suspect that your regular expression doesn't match what you think it matches. Your input file seems to have linebreaks encoded as CR-LF, since you're adding linebreaks as vbCrLf. In your regular expression, however, you're using \n, which matches only LF. Change that to \r\n (and also remove the pointless groups and assertions):
re.Pattern = "(loss_pct_through_[a-zA-Z]{3,5}\d{4}\[[a-zA-Z_]{1,2}\d{1,2}\]\s=\s\d\.\d{14};\r\n\r\n)(\}\r\n)"
and do the replacement like this:
txt = re.Replace(txt, "$1" & lptmstr & vbCrLf & "$2")
so that the new string is inserted between the last line and the closing curly bracket.
And don't forget to write the modified string back to the file:
objFSO.OpenTextFile(path, 2).Write txt

Split a column in a text file

I have a system which generates 3 text (.txt) files on a daily basis, with 1000's of entries within each.
Once the text files are generated we run a vbscript (below) that modifies the files by entering data at specific column positions.
I now need this vbscript to do an additional task which is to separate a column in one of the text files.
So for example the TR201501554s.txt file looks like this:
6876786786 GFS8978976 I
6786786767 DDF78676 I
4343245443 SBSSK67676 I
8393372263 SBSSK56565 I
6545434347 DDF7878333 I
6757650000 SBSSK453 I
With the additional task of seperating the column, data will now look like this, with the column seperated at a specific position.
6876786786 GFS 8978976 I
6786786767 DDF 78676 I
4343245443 SBSSK 67676 I
8393372263 SBSSK 56565 I
6545434347 DDF 7878333 I
6757650000 SBSSK 453 I
I was thinking maybe I could add another "case" to accomplish this with maybe using a "regex" pattern, since the pattern would be only 3 companies to find
(DDF, GFS and SBSSK).
But after looking at many examples, I am not really sure where to start.
Could someone let me know how to accomplish this additional task in our vbscript (below)?
Option Explicit
Const ForReading = 1
Const ForWriting = 2
Dim objFSO, pFolder, cFile, objWFSO, objFileInput, objFileOutput,strLine
Dim strInputPath, strOutputPath , sName, sExtension
Dim strSourceFileComplete, strTargetFileComplete, objSourceFile, objTargetFile
Dim iPos, rChar
Dim fileMatch
'folder paths
strInputPath = "C:\Scripts\Test"
strOutputPath = "C:\Scripts\Test"
'Create the filesystem object
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get a reference to the processing folder
Set pFolder = objFSO.GetFolder(strInputPath)
'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
ProcessAFile cFile
Next
Sub ProcessAFile(objFile)
fileMatch = false
Select Case Left(objFile.Name,2)
Case "MV"
iPos = 257
rChar = "YES"
fileMatch = true
Case "CA"
iPos = 45
rChar = "OCCUPIED"
fileMatch = true
Case "TR"
iPos = 162
rChar = "EUR"
fileMatch = true
End Select
If fileMatch = true Then
Set objWFSO = CreateObject("Scripting.FileSystemObject")
Set objFileInput = objWFSO.OpenTextFile(objFile.Path, ForReading)
strSourceFileComplete = objFile.Path
sExtension = objWFSO.GetExtensionName(objFile.Name)
sName = Replace(objFile.Name, "." & sExtension, "")
strTargetFileComplete = strOutputPath & "\" & sName & "_mod." & sExtension
Set objFileOutput = objFSO.OpenTextFile(strTargetFileComplete, ForWriting, True)
Do While Not objFileInput.AtEndOfStream
strLine = objFileInput.ReadLine
If Len(strLine) >= iPos Then
objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
End If
Loop
objFileInput.Close
objFileOutput.Close
Set objFileInput = Nothing
Set objFileOutput = Nothing
Set objSourceFile = objWFSO.GetFile(strSourceFileComplete)
objSourceFile.Delete
Set objSourceFile = Nothing
Set objTargetFile = objWFSO.GetFile(strTargetFileComplete)
objTargetFile.Move strSourceFileComplete
Set objTargetFile = Nothing
Set objWFSO = Nothing
End If
End Sub
You could add a regular expression replacement to your input processing loop. Since you want to re-format the columns I'd do it with a replacement function. Define both the regular expression and the function in the global scope:
...
Set pFolder = objFSO.GetFolder(strInputPath)
Set re = New RegExp
re.Pattern = " ([A-Z]+)(\d+)( +)"
Function ReFormatCol(m, g1, g2, g3, p, s)
ReFormatCol = Left(" " & Left(g1 & " ", 7) & g2 & g3, Len(m)+2)
End Function
'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
...
and modify the input processing loop like this:
...
Do While Not objFileInput.AtEndOfStream
strLine = re.Replace(objFileInput.ReadLine, GetRef("ReFormatCol"))
If Len(strLine) >= iPos Then
objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
End If
Loop
...
Note that you may need to change your iPos values, since splitting and re-formatting the columns increases the length of the lines by 2 characters.
The callback function ReFormatCol has the following (required) parameters:
m: the match of the regular expression (used to determine the length of the match)
g1, g2, g3: the three groups from the expression
p: the starting position of the match in the source string (but not used here)
s: the source string (but not used here)
The function constructs the replacement for the match from the 3 groups like this:
Left(g1 & " ", 7) appends 4 spaces to the first group (e.g. GFS) and trims it to 7 characters. This is based on the assumption that the first group will always be 3-5 characters long.→ GFS
" " & ... & g2 & g3 prepends the result of the above operation with 2 spaces and appends the other 2 groups (8978976 & ).→ GFS 8978976
Left(..., Len(m)+2) then trims the result string to the length of the original match plus 2 characters (to account for the additional 2 spaces inserted to separate the new second column from the former second, now third, column).→ GFS 8978976
At first replace by regex pattern (\d+)\s+([A-Z]+)(\d+)\s+(\w+) replace with $1 $2 $3 $4
and split that by +. then ok.
Live demo

Obfuscation VBA Excel - Break Line, Var Names

Im trying to build my own Obfuscation add-in for my VBA projects.
I started with the easier tasks:
Remove Blank Lines
Remove Indentts
Remove Comments
I could figure out how to do this things, maybe not in the best way, but im stuck in:
Insert Random Break Lines (" _")
I would like to have this working for diferent types of delimiter, for now im only working with "=" signal. By the way, i have problems when i have multiple delimiters in the line (Eg: If bla = "abc" or ble = "acd"). The code causes incorrects splits in my line.
Sub VBE_Break_The_Lines()
Dim VBC As VBComponent
Dim a, i, j, lCount As Long
Dim str As String
Dim temp As Variant
lCount = 0
i = 1
Dim blnStringMode, blnLineContinue As Boolean
For Each VBC In VBProjToClean.VBComponents
blnStringMode = False
i = 1
With VBC.CodeModule
Do Until i > .CountOfLines
If Not .ProcOfLine(i, vbext_pk_Proc) = "VBE_Remove_Comments" Then
str = .Lines(i, 1)
End If
If InStr(1, str, " = ", vbTextCompare) > 0 Then
temp = Split(str, " = ")
.InsertLines i, ""
.ReplaceLine i, temp(0) & " _"
.InsertLines i + 1, "= " & temp(1)
.DeleteLines i + 2
lCount = lCount + 1
'a = InStr(1, str, "=", vbTextCompare)
i = i + 1
End If
i = i + 1
Loop
End With
Next
MsgBox lCount & " LINES BREAKED ( = )", , strFileToClean
End Sub
My next step will be change procedure/variable names, but not sure if REGEX should be the best way, i just read a lot, but not sure yet.
Hope you guys can give me a way to follow
Why not start with this Open Source code and make a new build from that instead of reinventing the wheel?
You need to change all the file extensions in the code from .xls to .xlsm, save the IB_test.xls workbook as a macro-enabled workbook and save the addin as .xlam, not .xla to make this work, but even though the code is 9 years old, it still works in Excel 2013.
If you are into VBA obfuscation, you may want to try out VBASH (www.ayedeal.com/vbash). It is pretty straight forward and powerful.