Regular Expression with findstr (ms-dos) - regex

I am trying to use ms-dos command findstr to find a string and eliminate it from the file.
At the moment I can find an explicit string but I am really struggling with regular expressions.
The file looks something like the below:
PLs - TULIP Report
Output_Format, PLS - TULIP REPORT
NUMLINES, 110907
VARIABLE_TYPES,T1,T8,I,T9,T2,N,N,N
[[data below]]
The file is an export from some system and annoyingly has that header in it - so I would like to clean it before using SQL Loader to bring it into an Oracle database.
There's more than just the one file and all would have the same type of header but ever so slightly different in every file.
Although I am happy to first remove the first 2 lines using hardcoded values, e.g.:
findstr /v "PLs - TULIP Report" "c:\myfiles\file1.PRO" > "c:\myfiles\file1.csv"</code><br>
findstr /v "Output_Format, PLS - TULIP REPORT" "c:\myfiles\file1.csv" > "c:\myfiles\file2.csv"
(note how I do that in 2 steps - any suggestions to make this happen in a single step, would be massivelly appreciated)
The third line is mnore complicated for me, it will always be in that format:
NUMLINES, 110907
except that the number at the end would be different for each file. So how do I get to find this entire line using a regular expression? I have tried:
findstr /v /b /r "\D+ \s+ \d+"
but without any luck.
FYI, the data in [[data below]] looks like
*,"00000161",456823,"017896532","FU",23.95,3.34,20.61
etc ..
Obviously, I do not want to modify the data area.
I hope the above makes sense,
Thanks

You must exclude single lines, findstr cannot match multiple lines. Just separate the different regexes with a space
findstr /r /b /v "NUMLINES PLs Output_Format" *.txt
^regex1 ^2 ^3
Specifying /b allows you to find matches only at the beginning of the lines and /v excludes those lines.
EDIT:
Of course the usage is
findstr /r /b /v "NUMLINES PLs Output_Format" yourfile > yourtarget
And in yourtarget you will find the data of yourfile except the lines excluded by the regex.
EDIT 2:
Based on your comments you need just to add VARIABLE_TYPES to your regex making it
findstr /r /b /v "NUMLINES PLs Output_Format VARIABLE_TYPES" yourfile > yourtarget
This is the way to complete the whole operation in one single instruction.

Here is a one liner using regex that will exclude all four lines. (I used line continuation so that the code looks better.) Each line must match exactly. I allow for each line to end in any number of spaces because I wasn't sure of your format. Note - FINDSTR regex support is very limited and non-standard. There are many other FINDSTR quirks and bugs. See What are the undocumented features and limitations of the Windows FINDSTR command? for more info.
findstr /vrx /c:"PLs - TULIP Report *"^
/c:"Output_Format, PLS - TULIP REPORT *"^
/c:"NUMLINES, *[0-9]* *"^
/c:"VARIABLE_TYPES,T1,T8,I,T9,T2,N,N,N *"^
"c:\myfiles\file1.PRO" >"c:\myfiles\file1.csv"
If all you need to do is skip the first 4 lines, then you normally should be able to use MORE. But there are some circumstances with large files where MORE can hang, but I can't remember the specifics. Also MORE will convert tabs into a series of spaces.
more +4 "c:\myfiles\file1.PRO" >"c:\myfiles\file1.csv"
Another option is to use a FOR /F loop. The FOR /F skips empty lines, but I don't think that is a concern for you.
>"c:\myfiles\file1.csv" (
for "usebackq skip=4 delims=" %%A in ("c:\myfiles\file1.PRO") do echo(%%A
)
If any of your data can begin with a ; then the code gets a bit uglier. You would then want to disable the EOL option by setting it to a line feed character.
set LF=^
::above 2 blank lines are critical - do not remove
>"c:\myfiles\file1.csv" (
for usebackq^ skip^=4^ eol^=^%LF%%LF%^ delims^= %%A in ("c:\myfiles\file1.PRO") do echo(%%A
)

Related

Extract value between two strings in batch using a CSV file

I have a CSV file which contains multiple columns. One of these columns is HTML content. My first step is to search for <<< and replace it with <<< - secondly I'm searching for >>> and replace it with >>>.
My goal is to create an array in batch. For this procedure I would like to search for all elements which look like the scheme above <<<VALUE>>> and create an array.
I found the following code but it doesn't work for me...
for /F "tokens=1-2 delims=<<<>>>-" %%a in (temp.csv) do (#echo %%a %%b)
Any suggestions?
UPDATE:
I would like to use regular expressions now, but this doesn't work either...
for /f %%x in ("temp.csv") do (
echo %%x | findstr /r "^<^<^<^(\.\?\*^)^>^>^>"
)
...any help? :)
kind regards,
markus
I'm using now the simple tool batchRex.exe from administrator.de with an regular expression. I'm using the pattern <<<(.*?)>>> to get my values and save them to an .txt file. Afterwards I read line by line from this file into an array to work further on - just in case someone has the same problem ;-)
kind regards
markus

FINDSTR and RegEx issuse

I have a batch file that asks for input, stores this input in a var and then uses the var in a ping. I need to make sure that the input matches one of several naming conventions
Naming conventions:
PCX1 can be as high as 100
GENPRT1 can be as high as 100
NETPRT1 can be as high as 100
FAXPRT1 can be as high as 100
So if i enter 12 it will not work but if I enter PCX12 it will.
Everything in the script works except the regex. How can i get this to work?
if "%sta%" == "findstr %sta% ^PCX[0-9]*[0-9]*[0-9]$ \i" (
echo The syntax is correct
goto PING
) else (
set errmsg=The syntax is wrong
goto START
)
This should help:
^(PCX|GENPRT|NETPRT|FAXPRT)([\d]|0[\d]|00[\d]|0[\d][\d]|[\d][\d]|100)$
FINDSTR's regex flavor is extremely limited. It doesn't even support alternation (|), so even very simple problems are going to have very messy solutions. Here's the most compact expression I can come up with:
FINDSTR /R /I "^PCX[1-9][0-9]?$ ^PCX100$ ^GENPRT[1-9][0-9]?$ ^GENPRT100$ ^NETPRT[1-9][0-9]?$ ^NETPRT100$ ^FAXPRT[1-9][0-9]?$ ^FAXPRT100$"
Each space-separated sequence is treated as a separate regex, so this tries to perform up to eight matches on each string it tests. That's not to say it's slow, but it's a pain in the butt to use when you're used to real regexes.
For reference, here's how I would have written that in a serous regex flavor:
^(PCX|((GEN|NET|FAX)PRT))([1-9][0-9]?|100)$
If you have the option of using a different tool (like PowerShell, which uses .NET's very powerful and feature-rich regex flavor), I strongly recommend you do so.
#echo off
setlocal disabledelayedexpansion
:start
set /p "sta=What ? "
cmd /v /d /q /c "(echo(!sta!)" ^
| findstr /i /r /b /e "PCX[0-9]* GENPRT[0-9]* NETPRT[0-9]* FAXPRT[0-9]*" ^
| findstr /r /e "[^0-9][1-9] [^0-9][1-9][0-9] [^0-9]100" > nul
if errorlevel 1 (
echo The syntax is wrong
goto :start
)
echo The syntax is correct
A new cmd instance is used to ensure the tested string will not include any parser added space at the end. The output of the echo command is tested to see if it matches any of the starting strings followed by numbers up to the end. Then it is tested again for a valid number range.
If errorlevel is set, the value does not match the condition and a new value is requested.
If errorlevel is not set, the value is correct.

FindStr with regex

I have a system log file like following:
</t>Processed 8 rows.<LF>
</t>Success: 8<LF>
</t>Skip: 0<LF>
</t>Error: 0<LF>
</t>Exceptions: 0<LF>
// other log details
</t>Processed 8 rows.<LF>
</t>Success: 6<LF>
</t>Skip: 1<LF>
</t>Error: 1<LF>
</t>Exceptions: 0<LF>
<\t> is tab character, <LF> is line feed character.
My job need to create a dos batch to examine these files, and take action if any Skip, Error or Exceptions found.
What's on my mind is using findstr with regular expression to locate any line have case fail, I have tested this regex:
// Should be one line here
\t+Skip\:\s+([1-9]|[1-9][0-9])\n|
\s+Error\:\s+([1-9]|[1-9][0-9])\n|
\s+Exceptions\:\s+([1-9]|[1-9][0-9])\n
However, findstr do not accept normal regular expression (\t\s\n...), so I did split into 6 regex:
findstr /rc:"Skip\:[ ]*[1-9]" %file%
findstr /rc:"Skip\:[ ]*[1-9][0-9]" %file%
findstr /rc:"Error\:[ ]*[1-9]" %file%
findstr /rc:"Error\:[ ]*[1-9][0-9]" %file%
findstr /rc:"Exceptions\:[ ]*[1-9]" %file%
findstr /rc:"Exceptions\:[ ]*[1-9][0-9]" %file%
Which this job required to use dos batch only (it's sad but can't change), do any way to simply the findstr syntax? Thanks
Only one findstr with all the matching cases
findstr /r /c:"Skip: *[1-9]" /c:"Error: *[1-9]" /c:"Exceptions: *[1-9]" input.txt
Two findstr commands piped. First one extracts the required lines and second one search for problem conditions
findstr /l "Skip: Error: Exceptions:" input.txt | findstr /r /c:": *[1-9].*"
First option is faster as it involves only one command. Second option is less redundant and less prone to typing errors.

Writing .txt files from lines 1 to i

I am very close to my answer, but I cant seem to find it.
I am using the Findstr function in a batch file to narrow now an entire directory to just one file.
cd ...
findstr /s /m "Desktop" *class.asasm >results1.txt
findstr /m /f:results1.txt "Production" *class.asasm >results2.txt
findstr /n /f:results2.txt "Capabilities" *class.asasm >results3.txt
TASK 1: I need to figure out a way to make findstr search backwards for a fourth string from the line number the third line was found on
TASK 2: I need to write lines 1-the one we arrive at of the file in results2.txt
the insert a .txt file. Then write the rest of the original lines.
I am writing an application in VB.Net with Visual Studios and I am having a difficult time figuring out how to complete this process. I am currently having better luch with having the application run batch files that are written within the application.
The correct solution is to find a tool that does this properly. batch/CMD does not.
Here's a script that tells you the line numbers of the 3rd and 4th match. It's probably not exactly what you want, but it is a demonstration of how one can effectively work with line numbers.
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET FILE=TestFile.txt
SET _LINENO=1
SET _MATCHNO=0
SET _THIRDLINENUM=
SET _FOURTHLINENUM=
FOR /F %%l IN (%FILE%) DO (
ECHO %%l | FINDSTR "Target" %_TMP% >NUL
IF NOT ERRORLEVEL 1 (
SET /A _MATCHNO=!_MATCHNO!+1
IF !_MATCHNO!==3 SET _THIRDLINENUM=!_LINENO!
IF !_MATCHNO!==4 SET _FOURTHLINENUM=!_LINENO!
)
SET /A _LINENO=!_LINENO!+1
)
#ECHO %_THIRDLINENUM% : %_FOURTHLINENUM%
Here's what's in TestFile.txt
abcdefg
bcdefgh
Target 1
cdefghi
defghij
fghijkl
Target 2
ghijklm
hijklmn
ijklmno
jklmnop
klmnopq
lmnopqr
mnopqrs
Target 3
nopqrst
Target 4
opqrstu
pqrstuv
qrstuvw
rstuvwx
stuvwxy
tuvwxyz
If you insist on using batch/CMD (and I sometimes do when nothing else is available), and you need to get the text on line #n (otherwise, head and tail would do just fine), you could produce a similar loop but replace the code from FINDSTR down to the end of the IF statement with something that compares _LINENO with some other variable, ECHO'ing the line if it is between the two values. I don't know if IF supports logical operators, so you may have to nest the IF statements, like
IF !_LINENO! GEQ %START_LINE% IF !_LINENO! LEQ %END_LINE% #ECHO %%l
assuming you need this (from your first comment):
I still have not found a way to search starting at line xx rather than 1 or to search in reverse order
you can try this (from the command line):
for /r %i in ("file pattern") do #more "%~i" +starting_line |findstr "search string"
for /r = recursively (if you mean really reverse, please explain)
"file pattern" = files to find, eg. "*class.asasm"
starting_line = search starting line, eg. 7 (more +6)
"search string" = your search pattern, eg. "Desktop"
OR search "Desktop Production Capabilities"
AND search |findstr "Desktop"|findstr "Production"|findstr "Capabilities"

Piped Variable Into FINDSTR w/ Regular Expressions and Escaped Double Quotes

I am trying to understand a batch file that was sent to me in order to work around a bug in a third party program while they resolve the issue. Basically they are running a findstr regular expression command in order to determine whether or not the string matches. If it does, then the special characters that should not be stripped out are being added back in manually before it is passed off to the original commandline program.
As best I can tell though, what has been provided does not work or I do not understand it. I am pasting the relevant section of code below.
#echo off
setlocal
set username=%1
shift
echo %username% | findstr /r "^\"[0-9][0-9]*\"" >nul
if not errorlevel 1 (set username=";%username:~0,9%=%username:~10,4%?")
echo %username%
The three pieces I really have questions about are as follows:
I believe the unescaped interpretation of the regular express above is ^"[0-9][0-9]*" which I think means that the string must begin with a numeric character and then must consist of zero or more additional numeric-only characters in order for a match to be found. Well, FINDSTR seems to be doing something weird with the escaped quotes and I cannot get it to match anything I have tried. If I remove the \" around [0-9][0-9]* then I can get it to work, but it does not properly reject non-numeric characters such as an input string of 123456789O1234 (there is a letter O instead of a zero in that sample string).
What is the point of the >nul
Wouldn't it be better to check for an errorlevel equal to 0 instead of "not errorlevel 1" since it could possibly return an error level of 2?
Anyway, the following code works, but it is not as precise as I would like. I am just looking to understand why the quotes in the regex string are not working. Perhaps this is a limitation of FINDSTR, but I have not came across anything definitive yet.
#echo off
setlocal
set username=%1
shift
echo %username% | findstr /r "^[0-9][0-9]*" >nul
if not errorlevel 1 (set username=";%username:~0,9%=%username:~10,4%?")
echo %username%
I can workaround the problem by repeating the class 14 times since that is the number of characters in my situation (more than 15 classes will cause it to crash - scroll to the bottom). I am still curious as to how this could be achieved more simply, and of course the remaining 2 questions.
EDIT / WORKING SOLUTION
#echo off
setlocal enableDelayedExpansion
set username=%~1
shift
echo !username!|findstr /r /c:"^[0-9][0-9]*$" >nul
if not errorlevel 1 (set username=";!username:~0,9!=!username:~10,4!?")
echo !username!
NOTES:
When I first ran it after modifying my existing code to more cloesly resemble dbenham's, enableDelayedExpansion gave an error as did the quotes around setting the username (see below). I can't replicate what I did wrong, but it is all working now (this is in case someone else comes across the same issue).
I had tried the $ for the EOL marker (which is the key to forcing it match numeric content only), but I think that the other problems were getting in the way which made me think it was not the solution. Also, to ensure the $ works don't miss this part of dbenham's answer "...you must also make sure there are no spaces between your echoed value and the pipe symbol."
In short it pretty much seems that trying to put double quotes inside a regex for findstr is wrong syntax/does not work/etc... unless you are actually looking to match " in the string/files you are parsing through. See dbenham's answer for clarity here. As he noted, you can use %~1 to strip the quotes from the argument instead of adding it to your regex (and programmatically add them back in if needed).
Error Message
C:>sample.bat 123456789
'enableDelayedExpansion' is not recognized as an internal or external command,
operable program or batch file.
'"' is not recognized as an internal or external command,
operable program or batch file.
!username!
Reference Links:
Undocumented features and limitations of the Windows FINDSTR command
Case sesntive anomalies with findstr (not handling case properly in some circumstances)
http://ss64.com/nt/findstr.html
http://www.robvanderwoude.com/findstr.php
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/findstr.mspx
Answering your questions in reverse order:
3) if not errorlevel 1 is probably the same as if %errorlevel%==0 because IF ERRORLEVEL 1 means if ERRORLEVEL is greater than or equal to 1. So putting a NOT in front means if ERRORLEVEL is less than 1. I believe FINDSTR never returns a negative ERRORLEVEL, so the syntax should be OK.
2) The >nul redirects the stdout output of FINDSTR to the nul device, meaning it disables the output. Normally any matching line would be printed. You are only interested in the return code - you don't want to see the output.
1) The original regex will match any input string that starts with a quote, followed by at least one digit, followed by another quote. It ignores any characters that may appear after the 2nd quote.
So the following strings (quotes included) will match:
"0"
"01234"
"0"a
"01234"a
The following strings will not match:
0
01234
""
"0a"
The original code has problems if the number of digits in the matching string reaches a certain length because the ending quote gets stripped causing the closing ) to be quoted and so the rest of the script fails.
I don't understand your requirements so I don't know how to fix the code.
It sounds like you don't want to match strings that have non digits. That means you need to include the end of line marker $ at the end of the regex. But you must also make sure there are no spaces between your echoed value and the pipe symbol.
I believe you probably don't want quotes in your value, (or else you should programatically add them at the very end). You can use %~1 to strip any enclosing quotes from the supplied argument.
If you are looking to check if argument 1 consists of nothing but numeric digits, then you can use:
setlocal enableDelayedExpansion
set "username=%~1"
echo !username!|findstr /r "^[0-9][0-9]*$" >nul
I used delayed expansion because you have no control over what characters are in %1, and if it contains special characters like & or | it will cause problems if you use normal expansion. The syntax I have given is not bullet proof, but it handles most "normal" situations.
It is not necessary in your case, but I prefer to use the /c option, just in case your search string contains spaces. So the above could be written as
echo !username!|findstr /r /c:"^[0-9][0-9]*$" >nul
It seems odd to me that both the original and your modified code simply pass through the username if it does not match your regex. Maybe that is your intent, maybe not.