Using vbscript regular expression to conditional replace dates - regex

I would like to use regex to replace the actual dates in the string to YYYYMMDD. However, my string might contain 2 types of dates, it could either be 20160531 or 160531. For these two, I have to replace them with YYYYMMDD and YYMMDD. So the followings are two examples:
Employment_salary_20160531 -> Employment_salary_YYYYMMDD
Employment_salary_160531 -> Employment_salary_YYMMDD
Wondering if it is possible to do this within a single regex without using an IFELSE statement?
Thank you!

This will provide you with accurate validation of the date that's entered. The other regex will work but it's dirty. It will accept 5000 as year.
The short answer: ((19|20)\d{2}|[0-9]{2})(0[1-9]|1[0-2])([012][0-9]|3[0-1])
The Long but thoroughly tested answer...
stringtest1 = "Employment_salary_20160531"
stringtest2 = "Employment_salary_990212"
stringtest3 = "Employment_salary_990242"
wscript.echo : wscript.echo "---------------------------------------------------" : wscript.echo
wscript.echo "Trying: " & stringtest1 & vbcrlf & vbcrlf & vbtab & " => " & sanitizedate(stringtest1)
wscript.echo : wscript.echo "---------------------------------------------------" : wscript.echo
wscript.echo "Trying: " & stringtest2 & vbcrlf & vbcrlf & vbtab & " => " & sanitizedate(stringtest2)
wscript.echo : wscript.echo "---------------------------------------------------" : wscript.echo
wscript.echo "Trying: " & stringtest3 & vbcrlf & vbcrlf & vbtab & " => " & sanitizedate(stringtest3)
wscript.echo : wscript.echo "---------------------------------------------------" : wscript.echo
Function sanitizedate(str)
Set objRE = New RegExp
objRE.Pattern = "((19|20)\d{2}|[0-9]{2})(0[1-9]|1[0-2])([012][0-9]|3[0-1])"
objRE.IgnoreCase = True
objRE.Global = False
objRE.Multiline = true
Set objMatch = objRE.Execute(str)
If objMatch.Count = 1 Then
Select Case Len(objMatch.Item(0))
Case "8"
sanitizedate = Replace(str, objMatch.Item(0), "YYYYMMDD")
Case "6"
sanitizedate = Replace(str, objMatch.Item(0), "YYMMDD")
End Select
Else
sanitizedate = str
End if
End Function
Validation Results
Trying: Employment_salary_20160531
=> Employment_salary_YYYYMMDD
Trying: Employment_salary_990212
=> Employment_salary_YYMMDD
Trying: Employment_salary_990242 failed because 42 is not a valid date
=> Employment_salary_990242

I'm not sure I get you right. But seems there is two different replacement YYYYMMDD and YYMMDD which doing that is impossible by just one single pattern.
You can match those two separated pattern by this:
/(^(\d{4})(\d{2})(\d{2})$)|(^(\d{2})(\d{2})(\d{2})$)/
Online Demo
As you see, pattern above matches both 20160531 and 160531. But you cannot replace them with both YYYYMMDD (for 20160531) and YYMMDD (for 160531). You actually can replace them with either YYYYMMDD or YYMMDD.
Otherwise you need two separated patterns if you want two separated replacements:
/^(\d{4})(\d{2})(\d{2})$/
/* and replace with `YYYYMMDD` */
/^(\d{2})(\d{2})(\d{2})$/
/* and replace with YYMMDD */

Related

Regex to replace word except in comments

How can I modify my regex so that it will ignore the comments in the pattern in a language that doesn't support lookbehind?
My regex pattern is:
\b{Word}\b(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)
\b{Word}\b : Whole word, {word} is replaced iteratively for the vocab list
(?=([^""\](\.|""([^""\]\.)[^""\]""))[^""]$) : Don't replace anything inside of quotes
My goal is to lint variables and words so that they always have the same case. However I do not want to lint any words in a comment. (The IDE sucks and there is no other option)
Comments in this language are prefixed by an apostrophe. Sample code follows
' This is a comment
This = "Is not" ' but this is
' This is a comment, what is it's value?
Object.value = 1234 ' Set value
value = 123
Basically I want the linter to take the above code and say for the word "value" update it to:
' This is a comment
This = "Is not" ' but this is
' This is a comment, what is it's value?
Object.Value = 1234 ' Set value
Value = 123
So that all code based "Value" are updated but not anything in double quotes or in a comment or part of another word such as valueadded wouldn't be touched.
I've tried several solutions but haven't been able to get it to work.
['.*] : Not preceeding an apostrophy
(?<!\s*') : BackSearch not with any spaces with apoostrophy
(?<!\s*') : Second example seemed incorrect but this won't work as the language doesn't support backsearches
Anybody have any ideas how I can alter my pattern so that I don't edit commented variables
VBA
Sub TestSO()
Dim Code As String
Dim Expected As String
Dim Actual As String
Dim Words As Variant
Code = "item = object.value ' Put item in value" & vbNewLine & _
"some.item <> some.otheritem" & vbNewLine & _
"' This is a comment, what is it's value?" & vbNewLine & _
"Object.value = 1234 ' Set value" & vbNewLine & _
"value = 123" & vbNewLine
Expected = "Item = object.Value ' Put item in value" & vbNewLine & _
"some.Item <> some.otheritem" & vbNewLine & _
"' This is a comment, what is it's value?" & vbNewLine & _
"Object.Value = 1234 ' Set value" & vbNewLine & _
"Value = 123" & vbNewLine
Words = Array("Item", "Value")
Actual = SOLint(Words, Code)
Debug.Print Actual = Expected
Debug.Print "CODE: " & vbNewLine & Code
Debug.Print "Actual: " & vbNewLine & Actual
Debug.Print "Expected: " & vbNewLine & Expected
End Sub
Public Function SOLint(ByVal Words As Variant, ByVal FileContents As String) As String
Const NotInQuotes As String = "(?=([^""\\]*(\\.|""([^""\\]*\\.)*[^""\\]*""))*[^""]*$)"
Dim RegExp As Object
Dim Regex As String
Dim Index As Variant
Set RegExp = CreateObject("VBScript.RegExp")
With RegExp
.Global = True
.IgnoreCase = True
End With
For Each Index In Words
Regex = "[('*)]\b" & Index & "\b" & NotInQuotes
RegExp.Pattern = Regex
FileContents = RegExp.Replace(FileContents, Index)
Next Index
SOLint = FileContents
End Function
As discussed in the comments above:
((?:\".*\")|(?:'.*))|\b(v)(alue)\b
3 Parts to this regex used with alternation.
A non-capturing group for text within double quotes, as we dont need that.
A non-capturing group for text starting with single quote
Finally the string "value" is split into two parts (v) and (value) because while replacing we can use \U($2) to convert v to V and rest as is so \E$3 where \U - converts to upper case and \E - turns off the case.
\b \b - word boundaries are used to avoid any stand-alone text which is not part of setting a value.
https://regex101.com/r/mD9JeR/8

Structure replacement possible complex RegexReplace solution?

I need to run a VBScript that changes the structure of a CSV file. To keep it simple I'm only using 3 data fields but there is a lot more. In a production environment I will have a CSV file with hundreds of lines.
The problem is everything is in double quotes. The end result can sometimes be no quotes or single quotes or sometimes a mix of all three.
I have absolutely no idea how I should approach this and was looking for some guidance. This looks like a job for RegexReplace but because it's mixed I'm not sure how to start this. After the file has been modified I have to right over top of the original file.
CSV Example:
"apple";"12";"xyz"
"somereallylongword";"7687";"theredfox"
Pattern
"%1";%2;'%3'
Desired Result
"apple";12;'xyz'
"somereallylongword";7687;'theredfox'
What I'm trying to achieve is to be able to make a new pattern type.  In my example:
"%1" - I keep the original double quotes.
%2 - Remove the double quotes.
'%3' - Replace the double quotes with single quotes.
Any insight would be greatly appreciated.
You can read the CSV file using ADODB:
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H1
Dim objConnection
Dim objRecordset
Dim sCSVFolder
Dim sCSVFile
Dim sValue
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordset = CreateObject("ADODB.Recordset")
sCSVFolder = "C:\CSV_Folder\"
sCSVFile = "your_csv_file.csv"
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & sCSVFolder & ";" & _
"Extended Properties=""text;HDR=YES;FMT=Delimited"""
objRecordset.Open "SELECT * FROM " & sCSVFile, _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
Do Until objRecordset.EOF
' Modify and write fields to new text file here
sValue = objRecordset.Fields.Item("FieldName")
objRecordset.MoveNext
Loop
This way you let ADO handle reading the data and removing the double-quotes and you can manipulate the data easily as a Recordset.
Just give a try for this code by replacing the path of your CSV file and tell me how it works on your side ?
Option Explicit
Dim Data
Call ForceCScriptExecution()
Data = ReadFile("C:\Test\Test.csv")
wscript.echo "Before Replacing"
wscript.echo String(50,"-")
wscript.echo Data
wscript.echo String(50,"-")
wscript.echo "After Replacing"
wscript.echo String(50,"-")
wscript.echo Search_Replace(Data)
wscript.echo String(50,"-")
wscript.sleep 20000
'-----------------------------------------------
Function Search_Replace(Data)
Dim oRegExp,strPattern1,strPattern2
Dim strReplace1,strReplace2,strResult1,strResult2
strPattern1 = ";(\x22)(\S+\w+)(\x22);"
strReplace1 = ";$2;"
strPattern2 = "[;]\x22([^\x22]+)\x22"
strReplace2 = ";'$1'"
Set oRegExp = New RegExp
oRegExp.Global = True
oRegExp.IgnoreCase = True
oRegExp.Pattern = strPattern1
strResult1 = oRegExp.Replace(Data,strReplace1)
oRegExp.Pattern = strPattern2
strResult2 = oRegExp.Replace(strResult1,strReplace2)
Search_Replace = strResult2
End Function
'-----------------------------------------------
Function ReadFile(path)
Const ForReading = 1
Dim objFSO,objFile
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(path,ForReading)
ReadFile = objFile.ReadAll
objFile.Close
End Function
'----------------------------------------------
Sub ForceCScriptExecution()
Dim Arg, Str, cmd, Title
Title = "Search and Replace using RegExp by Hackoo 2019"
cmd = "CMD /C Title " & Title &" & color 0A & Mode 80,30 & "
If Not LCase( Right( WScript.FullName, 12 ) ) = "\cscript.exe" Then
For Each Arg In WScript.Arguments
If InStr( Arg, " " ) Then Arg = """" & Arg & """"
Str = Str & " " & Arg
Next
CreateObject( "WScript.Shell" ).Run _
cmd & "cscript //nologo """ & _
WScript.ScriptFullName & _
""" " & Str
WScript.Quit
End If
End Sub
'-----------------------------------------------
Edit : Batch Script Code
You can do it easily with a batch script without using Regex :
#echo off
Title Edit CSV File
Set "Input_CSV_File=C:\Test\Test.csv"
Set "OutPut_CSV_File=C:\Test\OutPut_Test.csv"
If Exist "%OutPut_CSV_File%" Del "%OutPut_CSV_File%"
#for /f "tokens=1,2,3 delims=;" %%a in ('Type "%Input_CSV_File%"') Do (
echo "%%~a";%%~b;'%%~c'
echo "%%~a";%%~b;'%%~c'>>"%OutPut_CSV_File%"
)
TimeOut /T 5 /NoBreak>nul
If Exist "%OutPut_CSV_File%" Notepad "%OutPut_CSV_File%" & Exit

Excluding line breaks from regex capture

I realise that a similar question has been asked before and answered, but the problem persists after I've tried the solution proposed in that answer.
I want to write an Excel macro to separate a multi-line string into multiple single lines, trimmed of whitespace including line breaks. This is my code:
Sub testRegexMatch()
Dim r As New VBScript_RegExp_55.regexp
Dim str As String
Dim mc As MatchCollection
r.Pattern = "[\r\n\s]*([^\r\n]+?)[\s\r\n]*$"
r.Global = True
r.MultiLine = True
str = "This is a haiku" & vbCrLf _
& "You may read it if you wish " & vbCrLf _
& " but you don't have to"
Set mc = r.Execute(str)
For Each Line In mc
Debug.Print "^" & Line & "$"
Next Line
End Sub
Expected output:
^This is a haiku$
^You may read it if you wish$
^but you don't have to$
Actual output:
^This is a haiku
$
^
You may read it if you wish
$
^
but you don't have to$
I've tried the same thing on Regex101, but this appears to show the correct captures, so it must be a quirk of VBA's regex engine.
Any ideas?
You just need to access the captured values via SubMatches():
When a regular expression is executed, zero or more submatches can result when subexpressions are enclosed in capturing parentheses. Each item in the SubMatches collection is the string found and captured by the regular expression.
Here is my demo:
Sub DemoFn()
Dim re, targetString, colMatch, objMatch
Set re = New regexp
With re
.pattern = "\s*([^\r\n]+?)\s*$"
.Global = True ' Same as /g at the online tester
.MultiLine = True ' Same as /m at regex101.com
End With
targetString = "This is a haiku " & vbLf & " You may read it if you wish " & vbLf & " but you don't have to"
Set colMatch = re.Execute(targetString)
For Each objMatch In colMatch
Debug.Print objMatch.SubMatches.Item(0) ' <== SEE HERE
Next
End Sub
It prints:
This is a haiku
You may read it if you wish
but you don't have to

classic asp ignore comma within speech marks CSV

I have a CSV File that looks like this
1,HELLO,ENGLISH
2,HELLO1,ENGLISH
3,HELLO2,ENGLISH
4,HELLO3,ENGLISH
5,HELLO4,ENGLISH
6,HELLO5,ENGLISH
7,HELLO6,ENGLISH
8,"HELLO7, HELLO7 ...",ENGLISH
9,HELLO7,ENGLISH
10,HELLO7,ENGLISH
I want to step loop through the lines and write to a table using split classic asp function by comma. When Speech marks are present to ignore the comma within those speech marks and take the string.
<%
dim csv_to_import,counter,line,fso,objFile
csv_to_import="uploads/testLang.csv"
set fso = createobject("scripting.filesystemobject")
set objFile = fso.opentextfile(server.mappath(csv_to_import))
str_imported_data="<table cellpadding='3' cellspacing='1' border='1'>"
Do Until objFile.AtEndOfStream
line = split(objFile.ReadLine,",")
str_imported_data=str_imported_data&"<tr>"
total_records=ubound(line)
for i=0 to total_records
if i>0 then
str_imported_data=str_imported_data&"<td>"&line(i)&"</td>"
else
str_imported_data=str_imported_data&"<th>"&line(i)&"</th>"
end if
next
str_imported_data=str_imported_data&"</tr>" & chr(13)
Loop
str_imported_data=str_imported_data&"<caption>Total Number of Records: "&total_records&"</caption></table>"
objFile.Close
response.Write str_imported_data
%>
Don't write your own CSV parser.
You start with "splitting it on the , is the way to go, now I am finished". Then someone uses a comma in your data and the string with the comma is surrounded by double quotes. You are a smart man, so you count the amount of double quotes and if they are odd, you know you have to escape the comma and if they are even, you don't have to. And then you get a CSV file containing escaped double quote characters...
But wait! There is a solution. Use a Database Connection to your file!
It will be something like this, but you'll have to adapt it to your own situation:
On Error Resume Next
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")
strPathtoTextFile = server.mappath("uploads/")
strFileName = "testLang.csv"
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & strPathtoTextFile & ";" & _
"Extended Properties=""text;HDR=NO;FMT=CSVDelimited"""
objRecordset.Open "SELECT * FROM " & strFileName, _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
Do Until objRecordset.EOF
Wscript.Echo "Number: " & objRecordset.Fields.Item(1)
Wscript.Echo "Greeting: " & objRecordset.Fields.Item(2)
Wscript.Echo "Language: " & objRecordset.Fields.Item(3)
objRecordset.MoveNext
Loop

Find text between two static strings

I parse message data into a CSV file via Outlook rules.
How can I take the example below and store the text under "Customer Log Update:" into a string variable?
[Header Data]
Description: Problem: A2 - MI ERROR - R8036
Customer Log Update:
I'm having trouble with order #458362. I keep getting Error R8036, can you please assist?
Thanks!
View problem at http://...
[Footer Data]
Desired result to be stored into the string variable (note that the result may contain newlines):
I'm having trouble with order #458362. I keep getting Error R8036, can you please assist?
Thanks!
I haven't attempted to code anything pertaining to my question.
Function RegFind(RegInput, RegPattern)
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches, s
regEx.Pattern = RegPattern
regEx.IgnoreCase = True
regEx.Global = False
s = ""
If regEx.Test(RegInput) Then
Set matches = regEx.Execute(RegInput)
For Each Match In matches
s = Match.Value
Next
RegFind = s
Else
RegFind = ""
End If
End Function
Sub CustomMailMessageRule(Item As Outlook.MailItem)
MsgBox "Mail message arrived: " & Item.Subject
Const FileWrite = file.csv `file destination
Dim FF1 As Integer
Dim subj As String
Dim bod As String
On Error GoTo erh
subj = Item.Subject
'this gets a 15 digit number from the subject line
subj = RegFind(subj, "\d{15}")
bod = Item.Body
'following line helps formatting, lots of double newlines in my source data
bod = Replace(bod, vbCrLf & vbCrLf, vbCrLf)
'WRITE FILE
FF1 = FreeFile
Open FileWrite For Append As #FF1
Print #FF1, subj & "," & bod
Close #FF1
Exit Sub
erh:
MsgBox Err.Description, vbCritical, Err.Number
End Sub
While I would also go the more direct route like Jean-François Corbett did as the parsing is very simple, you could apply the Regexp approach as below
The pattern
Update:([\S\s]+)view
says match all characters between "Update" and "view" and return them as a submatch
This piece [\S\s] says match all non-whitespace or whitespace characters - ie everything.
In vbscript a . matches everything but a newline, hence the need for the [\S\s] workaround for this application
The submatch is then extracted by
objRegM(0).submatches(0)
Function ExtractText(strIn As String)
Dim objRegex As Object
Dim objRegM As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.ignorecase = True
.Pattern = "Update:([\S\s]+)view"
If .test(strIn) Then
Set objRegM = .Execute(strIn)
ExtractText = objRegM(0).submatches(0)
Else
ExtractText = "No match"
End If
End With
End Function
Sub JCFtest()
Dim messageBody As String
Dim result As String
messageBody = "Description: Problem: A2 - MI ERROR - R8036" & vbCrLf & _
"Customer Log Update:" & _
"I 'm having trouble with order #458362. I keep getting Error R8036, can you please assist?" & vbCrLf & _
"Thanks!" & vbCrLf & _
"View problem at http://..."
MsgBox ExtractText(messageBody)
End Sub
Why not something simple like this:
Function GetCustomerLogUpdate(messageBody As String) As String
Const sStart As String = "Customer Log Update:"
Const sEnd As String = "View problem at"
Dim iStart As Long
Dim iEnd As Long
iStart = InStr(messageBody, sStart) + Len(sStart)
iEnd = InStr(messageBody, sEnd)
GetCustomerLogUpdate = Mid(messageBody, iStart, iEnd - iStart)
End Function
I tested it using this code and it worked:
Dim messageBody As String
Dim result As String
messageBody = "Description: Problem: A2 - MI ERROR - R8036" & vbCrLf & _
"Customer Log Update:" & vbCrLf & _
"I 'm having trouble with order #458362. I keep getting Error R8036, can you please assist?" & vbCrLf & _
"Thanks!" & vbCrLf & _
"View problem at http://..."
result = GetCustomerLogUpdate(messageBody)
Debug.Print result