How to use Findstr regex to match a dash (-)? - regex

Trying to use findstr to match text that follows the format below:
PTB-14
AIR-217
The problem I'm having is that I just can't seem to get findstr to match on the dash, -. I've created the script below in a batch file:
set dash=-
echo.%dash%
echo !dash! | findstr /i /r /C:- >nul
if errorlevel 1 (
echo "ERROR!" >&2
)
I've tried the regex with /C:-, /C:"-", /C:"\-" and /C:"\\-". I just can't seem to get it matched. Anyone know what I am doing wrong?

Actually there is no need to use a regular expression (/R), but you can use a literal search string -. And case-insensitivity (/I) does also not make much sense with non-letter characters.
Anyway, I think the problem in your code is that you do not have delayed expansion enabled, although you are trying to use it, so echo !dash! actually echoes !dash! literally.
To solve that, there are a few options:
Place setlocal EnableDelayedExpansion before your code and (optionally) place endlocal after it in order to enable delayed expansion in the parent cmd instance that executes your batch file, like this:
setlocal EnableDelayedExpansion
set "dash=-"
echo(%dash%
echo(!dash!| > nul findstr /C:"-" || >&2 echo ERROR^^!
endlocal
A pipe | initiates two new cmd instances for either side, which do not have delayed expansion enabled, even if the parenth cmd instance does. However, you can explicitly initiate another cmd instance on the left side of the pipe with delayed expansion enabled (/V):
set "dash=-"
echo(%dash%
cmd /V /C echo(^!dash^!| > nul findstr /C:"-" || >&2 echo ERROR!
The exclamation marks are escaped (^!) in order for them not to be consumed by the parent cmd instance in case delayed expansion is enabled there; if not, the escaping does not harm.
In the above code fragments, I additionally changed the following:
set dash=- has become set "dash=-", as this is the most secure syntax that prevents unintended trailing white-spaces and accepts even special characters like ^, &, (, ), <, > and |;
echo. has become echo(, which is the only reliable syntax, although it looks odd;
the SPACE in front of the pipe | has disappeared, because it would also have been echoed;
if ErrorLevel 1 has been replaced by the conditional command concatenation operator ||, which lets the following command execute only in case an error occurred, or, technically spoken, in case the exit code of the previous command was non-zero;
echo "ERROR!" >&2 has been changed to >&2 echo ERROR^^! or >&2 echo ERROR! in order to avoid the quotation marks "" and the SPACE before >&2 to be echoed also; the double-escaping of the ! is needed to display it in case delayed expansion is enabled;

Related

Which regex method is best for validating user input? (for /f with delims vs. echo %var%|Findstr /ri)

I would like to validate a user's input and limit the input to alphanumeric characters only (underscores may be allowed as well), but i'm not sure which method is best for this.
I've seen various examples on SA and the first one that raises some questions for me is the following one:
:input
set "in="
set /p "in=Please enter your username: "
ECHO(%in%|FINDSTR /ri "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" >nul || (
goto input
)
I see a second case that's identical to the first one (with as expection, the leading ^ and ending *$).
Why is the extra case and ^ *$ needed when the following also works?:
:input
set "in="
set /p "in=Please enter your username: "
ECHO(%in%|FINDSTR /ri "[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]" >nul || (
goto input
)
Finally, The FOR /F loop method i've noticed on here as well:
for /f "delims=1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ" %%a in ("%in%") do goto :input
Is there any (dis)advantage in using this over the beforementioned FINDSTR regex one?
For safely validating user input, both methods are reliable, but you must improve them:
findstr method
At first, let us focus on the search string like ^[...][...]*$ (where ... stands for a character class, meaning a set of characters): A character class [...] matches any one character from set ...; * means repetition, so matching zero or more occurrences, hence [...]* matches zero or more occurrences of characters from set ...; therefore, [...][...]* matches one or more occurrences of characters from set .... The leading ^ anchors the match to the beginning of the line, the trailing $ anchors it to the end; therefore, when both anchors are specified, the entire line must match the search string.
Concerning character classes [...]: According to the thread What are the undocumented features and limitations of the Windows FINDSTR command?, classes are buggy; for instance, the class [A-Z] matches small letters b to z, and [a-z] matches capital letters A to Y (this does of course not matter in case a case-insensitive search is done, so when /I is given); the class [0-9] may match ² or ³, depending on the current code page; [A-Z] and [a-z] may match special letters like Á or á, for example, also depending on current code page. Hence to safely match certain characters only, do not use ranges, but specify each character individually, like [0123456789], [ABCDEFGHIJKLMNOPQRSTUVWXYZ] or [abcdefghijklmnopqrstuvwxyz].
All this leads us to the following findstr command line:
findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"
Nevertheless, the whole approach with the piped echo might still fail, because special characters like ", &, ^, %, !, (, ), <, >, | could lead to syntax errors or other unintended behaviour. To avoid that, we need to establish delayed expansion, so the special characters become hidden from the command parser. However, since pipes (|) initialise new cmd instances for either side (which inherit the current environment), we need to ensure to do the actual variable expansion in the left child cmd instance rather than in the parent one, like this:
:INPUT
set "IN="
set /P IN="Please enter your username: "
cmd /V /C echo(^^!IN^^!| findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" > nul || goto :INPUT
The extra explicit cmd instance is needed to enable delayed expansion (/V), because the instances initiated by the pipe have delayed expansion disabled.
The doubled escaping of the exclamation marks ^^! is only needed in case delayed expansion is also enabled in the parent cmd instance; if not, single escaping ^! was sufficient, but doubled escaping does not harm.
for /F method
This approach makes life easier, because there is no pipe involved and so, you do not have to deal with multiple cmd instances, but there is still room for improvement. Again, special characters may cause trouble, so delayed expansion needs to be enabled.
The for /F loop ignores empty lines and such beginning with the default eol character, the semicolon ;. To disable the eol option, simply define one of the delimiter characters, so eol becomes hidden behind delims. Empty lines are not iterated, so the goto command in your approach would never execute in case of empty user input. Therefore, we must capture empty user input explicitly, using an if statement. Now all this leads to the following code:
setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "
if not defined IN goto :INPUT
for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do goto :INPUT
endlocal
This approach detects capital letters only; to include small letters as well, you have to add them to the delims option: delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.
Note that variable IN is no longer available beyond endlocal, but this should be the very last comand of your script anyway.
To detect whether or not a for /F loop iterated or not, there is an undocumented feature, which we can make use of: for /F returns a non-zero exit code if it does not iterate, hence conditional execution operators && or || can be used; so, when the user input is empty, the loop does not iterate, then ||; for this to work, the for /F loop must be enclosed within parentheses:
setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "
if not defined IN goto :INPUT
(for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do rem/) && goto :INPUT
endlocal
First, you have to reference environment variable in with using delayed expansion to avoid an exit of batch file execution because of a syntax error when the user enters a string with critical characters like ><|&". Always take into account that a variable specified with %variable% is expanded before execution of the command line which can easily break batch execution on user input variable strings.
Second, it is strongly recommended to immediately verify if the user has input anything at all after the prompt, i.e. use if not defined in goto input after the prompt command line.
Third, I think the FOR method is better because of being faster.
FINDSTR is not an internal command of cmd.exe like FOR. So when specifying FINDSTR in batch file without path and without file extension Windows command interpreter must first search for this executable and hopefully really finds %SystemRoot%\System32\findstr.exe via PATHEXT and PATH.
Next with an anti-virus process running in background the execution of findstr.exe triggers the scanning process of anti-virus process which results in a delay of execution.
The execution of an application like FINDSTR by Windows command interpreter takes always a bit longer as the execution of an internal command of cmd.exe even with no anti-virus scan process running. So the FOR loop approach is most likely (not verified by me) faster than the FINDSTR approach.
On using FINDSTR the regular expression characters ^ and *$ are needed because the regular expression search string [0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ] results in a positive match if the processed line contains anywhere at least 1 digit or letter. So it is not checked if the line (= string of variable) consists of only digits and letters. The shorter character class definitions [0-9A-Z] with depending on option /I or [0-9A-Za-z] can't be used in this case as explained by aschipfl in his comment below.
With ^ is specified that the searched string must be found at beginning of a line, with * that 0 or more digits or letters must be found, and with $ that the searched string must be found at end of line. Or in other words the entire line (user input) not being completely empty as checked before must completely consist of only digits and letters for a positive match.
For every internal or external command help on command can be get by running the command from within a command prompt window with /? as parameter. Try it out with opening a command prompt window and run findstr /? and for /? and set /?.

FINDSTR and RegEx issuse

I have a batch file that asks for input, stores this input in a var and then uses the var in a ping. I need to make sure that the input matches one of several naming conventions
Naming conventions:
PCX1 can be as high as 100
GENPRT1 can be as high as 100
NETPRT1 can be as high as 100
FAXPRT1 can be as high as 100
So if i enter 12 it will not work but if I enter PCX12 it will.
Everything in the script works except the regex. How can i get this to work?
if "%sta%" == "findstr %sta% ^PCX[0-9]*[0-9]*[0-9]$ \i" (
echo The syntax is correct
goto PING
) else (
set errmsg=The syntax is wrong
goto START
)
This should help:
^(PCX|GENPRT|NETPRT|FAXPRT)([\d]|0[\d]|00[\d]|0[\d][\d]|[\d][\d]|100)$
FINDSTR's regex flavor is extremely limited. It doesn't even support alternation (|), so even very simple problems are going to have very messy solutions. Here's the most compact expression I can come up with:
FINDSTR /R /I "^PCX[1-9][0-9]?$ ^PCX100$ ^GENPRT[1-9][0-9]?$ ^GENPRT100$ ^NETPRT[1-9][0-9]?$ ^NETPRT100$ ^FAXPRT[1-9][0-9]?$ ^FAXPRT100$"
Each space-separated sequence is treated as a separate regex, so this tries to perform up to eight matches on each string it tests. That's not to say it's slow, but it's a pain in the butt to use when you're used to real regexes.
For reference, here's how I would have written that in a serous regex flavor:
^(PCX|((GEN|NET|FAX)PRT))([1-9][0-9]?|100)$
If you have the option of using a different tool (like PowerShell, which uses .NET's very powerful and feature-rich regex flavor), I strongly recommend you do so.
#echo off
setlocal disabledelayedexpansion
:start
set /p "sta=What ? "
cmd /v /d /q /c "(echo(!sta!)" ^
| findstr /i /r /b /e "PCX[0-9]* GENPRT[0-9]* NETPRT[0-9]* FAXPRT[0-9]*" ^
| findstr /r /e "[^0-9][1-9] [^0-9][1-9][0-9] [^0-9]100" > nul
if errorlevel 1 (
echo The syntax is wrong
goto :start
)
echo The syntax is correct
A new cmd instance is used to ensure the tested string will not include any parser added space at the end. The output of the echo command is tested to see if it matches any of the starting strings followed by numbers up to the end. Then it is tested again for a valid number range.
If errorlevel is set, the value does not match the condition and a new value is requested.
If errorlevel is not set, the value is correct.

Windows Batch - check if string starts with ... in a loop

this grabs the output from a remote branch list with git::
for /f "delims=\" %r in ('git branch -r') do (
::then in here I want to get rid of the origin/HEAD -> line of the output
::and do a few git ops on the other lines, which are the names of branches
)
anyhow, I'm finding this frustrating as apparently batch doesn't have regex
here's the code I'm using to do this in bash
for remote in `git branch -r | grep -v '\->'`;
do echo $remote; #...git stuff
done;
the grep removes the "origin/HEAD -> origin/master" line of the output of git branch -r
So I'm hoping to ask how to implement the 'contains' verb
for /f "delims=\" %r in ('git branch -r') do (
if not %r contains HEAD (
::...git stuff
)
)
on a loop variable
this stackoverflow question appears to answer a similar question, although in my attempts to implement as such, I became confused by % symbols and no permutation of them yielded function
EDIT FOR FUTURE READERS: there is some regex with findstr /r piped onto git branch -r
for /f "delims=\" %%r in ('git branch -r^|findstr "HEAD"') do (
echo ...git stuff %%r
)
should give you a start.
Note: %%r, not %r within a batch file - %r would work directly from the prompt.
Your delims=\ filter will produce that portion up to the first \ of any line from git branch -r which contains HEAD - sorry, I don't talk bash-ish; you'd need to say precisely what the HEAD string you want to locate is.
Use "delims=" fo the entire line - omitting the delims option will set delimiters to the default set (space, comma, semicolon, etc.)
Don't use ::-comments within a block (parenthesised statement-sequence) as it's actually a broken label and cmd doesn't appeciate labels within a block. Use REM comments here instead.
The resultant strings output from the findstr (which acts on a brain-dead verion of regex) will be processed through to the echo (or whatever statement you may substitute here) - if there are none, the for will appear to be skipped.
Quite what your target string would be for findstr I can't tell. From the prompt, findstr /? may reveal. You may also be able to use find (find /?) - but if you are using cygwin the *nix version of find overrides windows-native.
I don't know what the git branch output looks like, but with a test case of
test 1
HEAD test \-> 2
test 3
test 4
the following prints all the text lines except the one containing \->
#setlocal enableextensions enabledelayedexpansion
#echo off
for /f "tokens=*" %%r in (d:\test2.txt) do (
set str1=%%r
if "!str1:\->=!"=="!str1!" (
echo %%r
)
)
The if test is fundamentally doing this test: string1.replace("HEAD", "") == string1.
Your loop variable needs to be %r if used directly in the command prompt, but %%r if in a batch file.
The string replacement is a part of environment variables, not loop variables, so it needs to be put into a holding string (str1) to work with. If you have the command extensions enabled ( enableextensions ).
And because environment variable setting operations happen when the script is read, you need to override that with enabledelayedexpansion and using !str1! instead of %str1%, otherwise the value of str1 won't change from one loop to the next.
(PS. Use PowerShell instead. Get-Content D:\test2.txt | Select-String "\->" -NotMatch ).

FindStr with regex

I have a system log file like following:
</t>Processed 8 rows.<LF>
</t>Success: 8<LF>
</t>Skip: 0<LF>
</t>Error: 0<LF>
</t>Exceptions: 0<LF>
// other log details
</t>Processed 8 rows.<LF>
</t>Success: 6<LF>
</t>Skip: 1<LF>
</t>Error: 1<LF>
</t>Exceptions: 0<LF>
<\t> is tab character, <LF> is line feed character.
My job need to create a dos batch to examine these files, and take action if any Skip, Error or Exceptions found.
What's on my mind is using findstr with regular expression to locate any line have case fail, I have tested this regex:
// Should be one line here
\t+Skip\:\s+([1-9]|[1-9][0-9])\n|
\s+Error\:\s+([1-9]|[1-9][0-9])\n|
\s+Exceptions\:\s+([1-9]|[1-9][0-9])\n
However, findstr do not accept normal regular expression (\t\s\n...), so I did split into 6 regex:
findstr /rc:"Skip\:[ ]*[1-9]" %file%
findstr /rc:"Skip\:[ ]*[1-9][0-9]" %file%
findstr /rc:"Error\:[ ]*[1-9]" %file%
findstr /rc:"Error\:[ ]*[1-9][0-9]" %file%
findstr /rc:"Exceptions\:[ ]*[1-9]" %file%
findstr /rc:"Exceptions\:[ ]*[1-9][0-9]" %file%
Which this job required to use dos batch only (it's sad but can't change), do any way to simply the findstr syntax? Thanks
Only one findstr with all the matching cases
findstr /r /c:"Skip: *[1-9]" /c:"Error: *[1-9]" /c:"Exceptions: *[1-9]" input.txt
Two findstr commands piped. First one extracts the required lines and second one search for problem conditions
findstr /l "Skip: Error: Exceptions:" input.txt | findstr /r /c:": *[1-9].*"
First option is faster as it involves only one command. Second option is less redundant and less prone to typing errors.

Piped Variable Into FINDSTR w/ Regular Expressions and Escaped Double Quotes

I am trying to understand a batch file that was sent to me in order to work around a bug in a third party program while they resolve the issue. Basically they are running a findstr regular expression command in order to determine whether or not the string matches. If it does, then the special characters that should not be stripped out are being added back in manually before it is passed off to the original commandline program.
As best I can tell though, what has been provided does not work or I do not understand it. I am pasting the relevant section of code below.
#echo off
setlocal
set username=%1
shift
echo %username% | findstr /r "^\"[0-9][0-9]*\"" >nul
if not errorlevel 1 (set username=";%username:~0,9%=%username:~10,4%?")
echo %username%
The three pieces I really have questions about are as follows:
I believe the unescaped interpretation of the regular express above is ^"[0-9][0-9]*" which I think means that the string must begin with a numeric character and then must consist of zero or more additional numeric-only characters in order for a match to be found. Well, FINDSTR seems to be doing something weird with the escaped quotes and I cannot get it to match anything I have tried. If I remove the \" around [0-9][0-9]* then I can get it to work, but it does not properly reject non-numeric characters such as an input string of 123456789O1234 (there is a letter O instead of a zero in that sample string).
What is the point of the >nul
Wouldn't it be better to check for an errorlevel equal to 0 instead of "not errorlevel 1" since it could possibly return an error level of 2?
Anyway, the following code works, but it is not as precise as I would like. I am just looking to understand why the quotes in the regex string are not working. Perhaps this is a limitation of FINDSTR, but I have not came across anything definitive yet.
#echo off
setlocal
set username=%1
shift
echo %username% | findstr /r "^[0-9][0-9]*" >nul
if not errorlevel 1 (set username=";%username:~0,9%=%username:~10,4%?")
echo %username%
I can workaround the problem by repeating the class 14 times since that is the number of characters in my situation (more than 15 classes will cause it to crash - scroll to the bottom). I am still curious as to how this could be achieved more simply, and of course the remaining 2 questions.
EDIT / WORKING SOLUTION
#echo off
setlocal enableDelayedExpansion
set username=%~1
shift
echo !username!|findstr /r /c:"^[0-9][0-9]*$" >nul
if not errorlevel 1 (set username=";!username:~0,9!=!username:~10,4!?")
echo !username!
NOTES:
When I first ran it after modifying my existing code to more cloesly resemble dbenham's, enableDelayedExpansion gave an error as did the quotes around setting the username (see below). I can't replicate what I did wrong, but it is all working now (this is in case someone else comes across the same issue).
I had tried the $ for the EOL marker (which is the key to forcing it match numeric content only), but I think that the other problems were getting in the way which made me think it was not the solution. Also, to ensure the $ works don't miss this part of dbenham's answer "...you must also make sure there are no spaces between your echoed value and the pipe symbol."
In short it pretty much seems that trying to put double quotes inside a regex for findstr is wrong syntax/does not work/etc... unless you are actually looking to match " in the string/files you are parsing through. See dbenham's answer for clarity here. As he noted, you can use %~1 to strip the quotes from the argument instead of adding it to your regex (and programmatically add them back in if needed).
Error Message
C:>sample.bat 123456789
'enableDelayedExpansion' is not recognized as an internal or external command,
operable program or batch file.
'"' is not recognized as an internal or external command,
operable program or batch file.
!username!
Reference Links:
Undocumented features and limitations of the Windows FINDSTR command
Case sesntive anomalies with findstr (not handling case properly in some circumstances)
http://ss64.com/nt/findstr.html
http://www.robvanderwoude.com/findstr.php
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/findstr.mspx
Answering your questions in reverse order:
3) if not errorlevel 1 is probably the same as if %errorlevel%==0 because IF ERRORLEVEL 1 means if ERRORLEVEL is greater than or equal to 1. So putting a NOT in front means if ERRORLEVEL is less than 1. I believe FINDSTR never returns a negative ERRORLEVEL, so the syntax should be OK.
2) The >nul redirects the stdout output of FINDSTR to the nul device, meaning it disables the output. Normally any matching line would be printed. You are only interested in the return code - you don't want to see the output.
1) The original regex will match any input string that starts with a quote, followed by at least one digit, followed by another quote. It ignores any characters that may appear after the 2nd quote.
So the following strings (quotes included) will match:
"0"
"01234"
"0"a
"01234"a
The following strings will not match:
0
01234
""
"0a"
The original code has problems if the number of digits in the matching string reaches a certain length because the ending quote gets stripped causing the closing ) to be quoted and so the rest of the script fails.
I don't understand your requirements so I don't know how to fix the code.
It sounds like you don't want to match strings that have non digits. That means you need to include the end of line marker $ at the end of the regex. But you must also make sure there are no spaces between your echoed value and the pipe symbol.
I believe you probably don't want quotes in your value, (or else you should programatically add them at the very end). You can use %~1 to strip any enclosing quotes from the supplied argument.
If you are looking to check if argument 1 consists of nothing but numeric digits, then you can use:
setlocal enableDelayedExpansion
set "username=%~1"
echo !username!|findstr /r "^[0-9][0-9]*$" >nul
I used delayed expansion because you have no control over what characters are in %1, and if it contains special characters like & or | it will cause problems if you use normal expansion. The syntax I have given is not bullet proof, but it handles most "normal" situations.
It is not necessary in your case, but I prefer to use the /c option, just in case your search string contains spaces. So the above could be written as
echo !username!|findstr /r /c:"^[0-9][0-9]*$" >nul
It seems odd to me that both the original and your modified code simply pass through the username if it does not match your regex. Maybe that is your intent, maybe not.