classic asp ignore comma within speech marks CSV - regex

I have a CSV File that looks like this
1,HELLO,ENGLISH
2,HELLO1,ENGLISH
3,HELLO2,ENGLISH
4,HELLO3,ENGLISH
5,HELLO4,ENGLISH
6,HELLO5,ENGLISH
7,HELLO6,ENGLISH
8,"HELLO7, HELLO7 ...",ENGLISH
9,HELLO7,ENGLISH
10,HELLO7,ENGLISH
I want to step loop through the lines and write to a table using split classic asp function by comma. When Speech marks are present to ignore the comma within those speech marks and take the string.
<%
dim csv_to_import,counter,line,fso,objFile
csv_to_import="uploads/testLang.csv"
set fso = createobject("scripting.filesystemobject")
set objFile = fso.opentextfile(server.mappath(csv_to_import))
str_imported_data="<table cellpadding='3' cellspacing='1' border='1'>"
Do Until objFile.AtEndOfStream
line = split(objFile.ReadLine,",")
str_imported_data=str_imported_data&"<tr>"
total_records=ubound(line)
for i=0 to total_records
if i>0 then
str_imported_data=str_imported_data&"<td>"&line(i)&"</td>"
else
str_imported_data=str_imported_data&"<th>"&line(i)&"</th>"
end if
next
str_imported_data=str_imported_data&"</tr>" & chr(13)
Loop
str_imported_data=str_imported_data&"<caption>Total Number of Records: "&total_records&"</caption></table>"
objFile.Close
response.Write str_imported_data
%>

Don't write your own CSV parser.
You start with "splitting it on the , is the way to go, now I am finished". Then someone uses a comma in your data and the string with the comma is surrounded by double quotes. You are a smart man, so you count the amount of double quotes and if they are odd, you know you have to escape the comma and if they are even, you don't have to. And then you get a CSV file containing escaped double quote characters...
But wait! There is a solution. Use a Database Connection to your file!
It will be something like this, but you'll have to adapt it to your own situation:
On Error Resume Next
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")
strPathtoTextFile = server.mappath("uploads/")
strFileName = "testLang.csv"
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & strPathtoTextFile & ";" & _
"Extended Properties=""text;HDR=NO;FMT=CSVDelimited"""
objRecordset.Open "SELECT * FROM " & strFileName, _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
Do Until objRecordset.EOF
Wscript.Echo "Number: " & objRecordset.Fields.Item(1)
Wscript.Echo "Greeting: " & objRecordset.Fields.Item(2)
Wscript.Echo "Language: " & objRecordset.Fields.Item(3)
objRecordset.MoveNext
Loop

Related

regex for "finding whole word only" plus allowing one character

Hi I'm using this regular expression to find whole word only:
example:
Dim oRE, bMatch
Set oRE = New RegExp
oRE.Pattern = "\bFunction\b"
bMatch = oRE.Test("Functions") 'return false
bMatch = oRE.Test("Function dummy") 'return true
I want to allow one character at the end of the string. The char i want to allow is the double quote ("). So i would like this line of code to return true:
bMatch = oRE.Test("Function"+chr(34)+" dummy") 'chr(34) is the charcode of doublequote (")
Initiate a variable with chr(34) and concatenate it into your pattern.
dq = Chr(34)
oRE.Pattern = "\bFunction" & dq & "+\b"
Then you will be able to match the double quotes as well.
+ for 1 or more double quotes after Function (modify it per your needs).
The double quote can be written like this \x22 in order to replace it easily in your pattern "
Hope that this what you want as result Demo here
Dim oRE, bMatch
Set oRE = New RegExp
oRE.Pattern = "\bFunction.+?\x22"
aMatch = oRE.Test("Functions""")
bMatch = oRE.Test("Function dummy""")
wscript.echo "Functions " & aMatch
wscript.echo "Functions dummy " & bMatch

Structure replacement possible complex RegexReplace solution?

I need to run a VBScript that changes the structure of a CSV file. To keep it simple I'm only using 3 data fields but there is a lot more. In a production environment I will have a CSV file with hundreds of lines.
The problem is everything is in double quotes. The end result can sometimes be no quotes or single quotes or sometimes a mix of all three.
I have absolutely no idea how I should approach this and was looking for some guidance. This looks like a job for RegexReplace but because it's mixed I'm not sure how to start this. After the file has been modified I have to right over top of the original file.
CSV Example:
"apple";"12";"xyz"
"somereallylongword";"7687";"theredfox"
Pattern
"%1";%2;'%3'
Desired Result
"apple";12;'xyz'
"somereallylongword";7687;'theredfox'
What I'm trying to achieve is to be able to make a new pattern type.  In my example:
"%1" - I keep the original double quotes.
%2 - Remove the double quotes.
'%3' - Replace the double quotes with single quotes.
Any insight would be greatly appreciated.
You can read the CSV file using ADODB:
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H1
Dim objConnection
Dim objRecordset
Dim sCSVFolder
Dim sCSVFile
Dim sValue
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordset = CreateObject("ADODB.Recordset")
sCSVFolder = "C:\CSV_Folder\"
sCSVFile = "your_csv_file.csv"
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & sCSVFolder & ";" & _
"Extended Properties=""text;HDR=YES;FMT=Delimited"""
objRecordset.Open "SELECT * FROM " & sCSVFile, _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
Do Until objRecordset.EOF
' Modify and write fields to new text file here
sValue = objRecordset.Fields.Item("FieldName")
objRecordset.MoveNext
Loop
This way you let ADO handle reading the data and removing the double-quotes and you can manipulate the data easily as a Recordset.
Just give a try for this code by replacing the path of your CSV file and tell me how it works on your side ?
Option Explicit
Dim Data
Call ForceCScriptExecution()
Data = ReadFile("C:\Test\Test.csv")
wscript.echo "Before Replacing"
wscript.echo String(50,"-")
wscript.echo Data
wscript.echo String(50,"-")
wscript.echo "After Replacing"
wscript.echo String(50,"-")
wscript.echo Search_Replace(Data)
wscript.echo String(50,"-")
wscript.sleep 20000
'-----------------------------------------------
Function Search_Replace(Data)
Dim oRegExp,strPattern1,strPattern2
Dim strReplace1,strReplace2,strResult1,strResult2
strPattern1 = ";(\x22)(\S+\w+)(\x22);"
strReplace1 = ";$2;"
strPattern2 = "[;]\x22([^\x22]+)\x22"
strReplace2 = ";'$1'"
Set oRegExp = New RegExp
oRegExp.Global = True
oRegExp.IgnoreCase = True
oRegExp.Pattern = strPattern1
strResult1 = oRegExp.Replace(Data,strReplace1)
oRegExp.Pattern = strPattern2
strResult2 = oRegExp.Replace(strResult1,strReplace2)
Search_Replace = strResult2
End Function
'-----------------------------------------------
Function ReadFile(path)
Const ForReading = 1
Dim objFSO,objFile
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(path,ForReading)
ReadFile = objFile.ReadAll
objFile.Close
End Function
'----------------------------------------------
Sub ForceCScriptExecution()
Dim Arg, Str, cmd, Title
Title = "Search and Replace using RegExp by Hackoo 2019"
cmd = "CMD /C Title " & Title &" & color 0A & Mode 80,30 & "
If Not LCase( Right( WScript.FullName, 12 ) ) = "\cscript.exe" Then
For Each Arg In WScript.Arguments
If InStr( Arg, " " ) Then Arg = """" & Arg & """"
Str = Str & " " & Arg
Next
CreateObject( "WScript.Shell" ).Run _
cmd & "cscript //nologo """ & _
WScript.ScriptFullName & _
""" " & Str
WScript.Quit
End If
End Sub
'-----------------------------------------------
Edit : Batch Script Code
You can do it easily with a batch script without using Regex :
#echo off
Title Edit CSV File
Set "Input_CSV_File=C:\Test\Test.csv"
Set "OutPut_CSV_File=C:\Test\OutPut_Test.csv"
If Exist "%OutPut_CSV_File%" Del "%OutPut_CSV_File%"
#for /f "tokens=1,2,3 delims=;" %%a in ('Type "%Input_CSV_File%"') Do (
echo "%%~a";%%~b;'%%~c'
echo "%%~a";%%~b;'%%~c'>>"%OutPut_CSV_File%"
)
TimeOut /T 5 /NoBreak>nul
If Exist "%OutPut_CSV_File%" Notepad "%OutPut_CSV_File%" & Exit

Using vbscript regular expression to conditional replace dates

I would like to use regex to replace the actual dates in the string to YYYYMMDD. However, my string might contain 2 types of dates, it could either be 20160531 or 160531. For these two, I have to replace them with YYYYMMDD and YYMMDD. So the followings are two examples:
Employment_salary_20160531 -> Employment_salary_YYYYMMDD
Employment_salary_160531 -> Employment_salary_YYMMDD
Wondering if it is possible to do this within a single regex without using an IFELSE statement?
Thank you!
This will provide you with accurate validation of the date that's entered. The other regex will work but it's dirty. It will accept 5000 as year.
The short answer: ((19|20)\d{2}|[0-9]{2})(0[1-9]|1[0-2])([012][0-9]|3[0-1])
The Long but thoroughly tested answer...
stringtest1 = "Employment_salary_20160531"
stringtest2 = "Employment_salary_990212"
stringtest3 = "Employment_salary_990242"
wscript.echo : wscript.echo "---------------------------------------------------" : wscript.echo
wscript.echo "Trying: " & stringtest1 & vbcrlf & vbcrlf & vbtab & " => " & sanitizedate(stringtest1)
wscript.echo : wscript.echo "---------------------------------------------------" : wscript.echo
wscript.echo "Trying: " & stringtest2 & vbcrlf & vbcrlf & vbtab & " => " & sanitizedate(stringtest2)
wscript.echo : wscript.echo "---------------------------------------------------" : wscript.echo
wscript.echo "Trying: " & stringtest3 & vbcrlf & vbcrlf & vbtab & " => " & sanitizedate(stringtest3)
wscript.echo : wscript.echo "---------------------------------------------------" : wscript.echo
Function sanitizedate(str)
Set objRE = New RegExp
objRE.Pattern = "((19|20)\d{2}|[0-9]{2})(0[1-9]|1[0-2])([012][0-9]|3[0-1])"
objRE.IgnoreCase = True
objRE.Global = False
objRE.Multiline = true
Set objMatch = objRE.Execute(str)
If objMatch.Count = 1 Then
Select Case Len(objMatch.Item(0))
Case "8"
sanitizedate = Replace(str, objMatch.Item(0), "YYYYMMDD")
Case "6"
sanitizedate = Replace(str, objMatch.Item(0), "YYMMDD")
End Select
Else
sanitizedate = str
End if
End Function
Validation Results
Trying: Employment_salary_20160531
=> Employment_salary_YYYYMMDD
Trying: Employment_salary_990212
=> Employment_salary_YYMMDD
Trying: Employment_salary_990242 failed because 42 is not a valid date
=> Employment_salary_990242
I'm not sure I get you right. But seems there is two different replacement YYYYMMDD and YYMMDD which doing that is impossible by just one single pattern.
You can match those two separated pattern by this:
/(^(\d{4})(\d{2})(\d{2})$)|(^(\d{2})(\d{2})(\d{2})$)/
Online Demo
As you see, pattern above matches both 20160531 and 160531. But you cannot replace them with both YYYYMMDD (for 20160531) and YYMMDD (for 160531). You actually can replace them with either YYYYMMDD or YYMMDD.
Otherwise you need two separated patterns if you want two separated replacements:
/^(\d{4})(\d{2})(\d{2})$/
/* and replace with `YYYYMMDD` */
/^(\d{2})(\d{2})(\d{2})$/
/* and replace with YYMMDD */

dysfunctional regex based vbscript being used to append multiple lines of text at a specific location in a .c file

I am learning regex and vbscript in order to append text to a .c file on a new line by adding user inputted text on a monthly basis. I removed the positive lookbehind assertion '?<=' from my pattern to void the syntax error from my previous post:
Regular expression syntax error code: 800A1399
This is the modified pattern:
re.Pattern = "(loss_pct_through_([a-zA-Z]{3,5}\d{4})\[([a-zA-Z_]{1,2}\d{1,2})\]\s=\s\d\.\d{14}[;]\n)\n(?=\}\n)"
Now I have a script run, but it does not meet its intended purpose as user input related text to be generated by the following code does not append to the .c file.
path = "<C:\Users\Parth\Desktop\C06S3000.C>"
set re = new regexp
Set objfso = CreateObject("Scripting.FileSystemObject")
If objfso.FileExists(path) Then
Set objFile = objFSO.OpenTextFile(path).ReadAll
End If
inputstr3 = inputbox("enter names of affected groups")`
grpString1 = split(inputstr3, ",")`
inputstr4 = inputbox("enter loss percentage")`
grpString2 = split(inputstr4, ",")`
ublptm = ubound(grpString1)
for i=0 to ublptm 'where lptm = loss_pct_avg_monthyear[group] = percent;'
lptmStr = lptmstr + "loss_pct_through_[" & grpString1(i) & "] = " & grpString2(i) & ";" & vbCrLf
next
re.Pattern = "(loss_pct_through_([a-zA-Z]{3,5}\d{4})\[([a-zA-Z_]{1,2}\d{1,2})\]\s=\s\d\.\d{14}[;]\n)\n(?=\}\n)"
objFile = re.Replace(objFile, vbCrLf & lptmstr & vbCrLf)
For reference, the .c file is supposed to be updated like so:
Original file:
loss_pct_through_nov2015[a4] = 0.13155605112872;
loss_pct_through_nov2015[a5] = 0.23415898757080;
loss_pct_through_dec2015[a2] = 0.00283148378304;
loss_pct_through_dec2015[a3] = 0.39331380134641;
loss_pct_through_dec2015[a4] = 0.56333929692615;
loss_pct_through_dec2015[a5] = 0.04051541794440; <-append content from here
\n <-regex search for this newline character
}
Updated file:
loss_pct_through_nov2015[a4] = 0.13155605112872;
loss_pct_through_nov2015[a5] = 0.23415898757080;
loss_pct_through_dec2015[a2] = 0.00283148378304;
loss_pct_through_dec2015[a3] = 0.39331380134641;
loss_pct_through_dec2015[a4] = 0.56333929692615;
loss_pct_through_dec2015[a5] = 0.04051541794440;
\n <--new newline character replacing the old one to append content below
loss_pct_through_jan2016[a2] = 0.04051541794440;
loss_pct_through_jan2016[a4] = 0.04051541794440;
}
For one thing this code:
If objfso.FileExists(path) Then
Set objFile = objFSO.OpenTextFile(path).ReadAll
End If
should give you an error, because you're reading a string from a file, but try to assign it to a variable using the Set keyword, which is only for assigning objects.
If you don't get an error you most likely have an On Error Resume Next in your code. Remove that.
Change the above code to this so that you a) have a proper assignment, and b) don't use a misleading variable name:
If objfso.FileExists(path) Then
txt = objFSO.OpenTextFile(path).ReadAll
End If
Also, I'd suspect that your regular expression doesn't match what you think it matches. Your input file seems to have linebreaks encoded as CR-LF, since you're adding linebreaks as vbCrLf. In your regular expression, however, you're using \n, which matches only LF. Change that to \r\n (and also remove the pointless groups and assertions):
re.Pattern = "(loss_pct_through_[a-zA-Z]{3,5}\d{4}\[[a-zA-Z_]{1,2}\d{1,2}\]\s=\s\d\.\d{14};\r\n\r\n)(\}\r\n)"
and do the replacement like this:
txt = re.Replace(txt, "$1" & lptmstr & vbCrLf & "$2")
so that the new string is inserted between the last line and the closing curly bracket.
And don't forget to write the modified string back to the file:
objFSO.OpenTextFile(path, 2).Write txt

Split a column in a text file

I have a system which generates 3 text (.txt) files on a daily basis, with 1000's of entries within each.
Once the text files are generated we run a vbscript (below) that modifies the files by entering data at specific column positions.
I now need this vbscript to do an additional task which is to separate a column in one of the text files.
So for example the TR201501554s.txt file looks like this:
6876786786 GFS8978976 I
6786786767 DDF78676 I
4343245443 SBSSK67676 I
8393372263 SBSSK56565 I
6545434347 DDF7878333 I
6757650000 SBSSK453 I
With the additional task of seperating the column, data will now look like this, with the column seperated at a specific position.
6876786786 GFS 8978976 I
6786786767 DDF 78676 I
4343245443 SBSSK 67676 I
8393372263 SBSSK 56565 I
6545434347 DDF 7878333 I
6757650000 SBSSK 453 I
I was thinking maybe I could add another "case" to accomplish this with maybe using a "regex" pattern, since the pattern would be only 3 companies to find
(DDF, GFS and SBSSK).
But after looking at many examples, I am not really sure where to start.
Could someone let me know how to accomplish this additional task in our vbscript (below)?
Option Explicit
Const ForReading = 1
Const ForWriting = 2
Dim objFSO, pFolder, cFile, objWFSO, objFileInput, objFileOutput,strLine
Dim strInputPath, strOutputPath , sName, sExtension
Dim strSourceFileComplete, strTargetFileComplete, objSourceFile, objTargetFile
Dim iPos, rChar
Dim fileMatch
'folder paths
strInputPath = "C:\Scripts\Test"
strOutputPath = "C:\Scripts\Test"
'Create the filesystem object
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get a reference to the processing folder
Set pFolder = objFSO.GetFolder(strInputPath)
'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
ProcessAFile cFile
Next
Sub ProcessAFile(objFile)
fileMatch = false
Select Case Left(objFile.Name,2)
Case "MV"
iPos = 257
rChar = "YES"
fileMatch = true
Case "CA"
iPos = 45
rChar = "OCCUPIED"
fileMatch = true
Case "TR"
iPos = 162
rChar = "EUR"
fileMatch = true
End Select
If fileMatch = true Then
Set objWFSO = CreateObject("Scripting.FileSystemObject")
Set objFileInput = objWFSO.OpenTextFile(objFile.Path, ForReading)
strSourceFileComplete = objFile.Path
sExtension = objWFSO.GetExtensionName(objFile.Name)
sName = Replace(objFile.Name, "." & sExtension, "")
strTargetFileComplete = strOutputPath & "\" & sName & "_mod." & sExtension
Set objFileOutput = objFSO.OpenTextFile(strTargetFileComplete, ForWriting, True)
Do While Not objFileInput.AtEndOfStream
strLine = objFileInput.ReadLine
If Len(strLine) >= iPos Then
objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
End If
Loop
objFileInput.Close
objFileOutput.Close
Set objFileInput = Nothing
Set objFileOutput = Nothing
Set objSourceFile = objWFSO.GetFile(strSourceFileComplete)
objSourceFile.Delete
Set objSourceFile = Nothing
Set objTargetFile = objWFSO.GetFile(strTargetFileComplete)
objTargetFile.Move strSourceFileComplete
Set objTargetFile = Nothing
Set objWFSO = Nothing
End If
End Sub
You could add a regular expression replacement to your input processing loop. Since you want to re-format the columns I'd do it with a replacement function. Define both the regular expression and the function in the global scope:
...
Set pFolder = objFSO.GetFolder(strInputPath)
Set re = New RegExp
re.Pattern = " ([A-Z]+)(\d+)( +)"
Function ReFormatCol(m, g1, g2, g3, p, s)
ReFormatCol = Left(" " & Left(g1 & " ", 7) & g2 & g3, Len(m)+2)
End Function
'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
...
and modify the input processing loop like this:
...
Do While Not objFileInput.AtEndOfStream
strLine = re.Replace(objFileInput.ReadLine, GetRef("ReFormatCol"))
If Len(strLine) >= iPos Then
objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
End If
Loop
...
Note that you may need to change your iPos values, since splitting and re-formatting the columns increases the length of the lines by 2 characters.
The callback function ReFormatCol has the following (required) parameters:
m: the match of the regular expression (used to determine the length of the match)
g1, g2, g3: the three groups from the expression
p: the starting position of the match in the source string (but not used here)
s: the source string (but not used here)
The function constructs the replacement for the match from the 3 groups like this:
Left(g1 & " ", 7) appends 4 spaces to the first group (e.g. GFS) and trims it to 7 characters. This is based on the assumption that the first group will always be 3-5 characters long.→ GFS
" " & ... & g2 & g3 prepends the result of the above operation with 2 spaces and appends the other 2 groups (8978976 & ).→ GFS 8978976
Left(..., Len(m)+2) then trims the result string to the length of the original match plus 2 characters (to account for the additional 2 spaces inserted to separate the new second column from the former second, now third, column).→ GFS 8978976
At first replace by regex pattern (\d+)\s+([A-Z]+)(\d+)\s+(\w+) replace with $1 $2 $3 $4
and split that by +. then ok.
Live demo