Syntax for specific RegEx in command line FINDSTR call - regex

I am writing a batch script that takes in various arguments before starting another process. In the example below I am checking the case where the first argument was 1, and the second argument is in the form of "any number of digits 0-9, followed by the letter k, m, or g" (I am specifying the amount of memory the process should start with i.e. 10g = 10 Gb memory).
If I just want a number this will suffice:
IF [%1] EQU [1] ECHO %2|findstr /r "[^0-9]" > nul
IF [%1] EQU [1] IF errorlevel 1 echo starting test number %1 with %2 of memory
What I thought would be an obvious segway to add the letters k, m, or g led me to this (I've tried with and without the '*'):
IF [%1] EQU [1] ECHO %2|findstr /r "[^0-9]*[kmg]" > nul
IF [%1] EQU [1] IF errorlevel 1 echo starting test number %1 with %2 of memory
However I have been unable to match any string to this FINDSTR pattern. Basically I am looking for a FINDSTR that matches [0-9][0-9]*[kmg]. I am fairly certain I am close but am having trouble working out the correct syntax.

Even the first code you posted does not work. [^0-9] looks for any non-digit. I think you wanted ^[0-9], which means any string that starts with a digit. Your logic is also wrong: FINDSTR sets errorlevel to 0 if found, and 1 if not found. I prefer to use the conditional && and || operators to test the result instead of IF.
I recommend the following for what you are attempting. I've thrown in the /I switch to make it case insensitive. I add the /X switch to prevent the string from matching if there are extra characters before or after the number with suffix.
#echo off
if "%~1" equ "1" echo(%~2|findstr /rix "[0-9][0-9]*[kgm]" >nul && (
echo starting test number %~1 with %~2 of memory
)
Unfortunately, FINDSTR does not support the ? meta-character. So the solution is slightly more complicated if the suffix is optional (if you want to support bytes, kilobytes, megabytes, and gigabytes). You would need to search for either of 2 strings, one with the suffix, and one without. FINDSTR breaks the search string into multiple search strings at spaces.
#echo off
if "%~1" equ "1" echo(%~2|findstr /rix "[0-9][0-9]*[kgm] [0-9][0-9]*" >nul && (
echo starting test number %~1 with %~2 of memory
)

Related

How can I see if a string is four letters long? – Windows Batch

So I'm working on a Windows Batch script and I want to know if an input string (the name of a file) is exactly four letters long. I want to do it with regular expressions or string matching.
I tried the following but it didn't work...
for /R "%windir%\system32" %%f in (*) do (
set filename=%%~nf
if not "!filename!"=="!filename:[a-z][a-z][a-z][a-z]=!" (
echo %%~nf
)
)
So my code loops through all the files in \system32. The files like mode.com should be echoed, but it's not the case.
This works:
dir /B "%windir%\system32" | findstr "^[a-z][a-z][a-z][a-z]\."
Tested on Windows 10
Aacini's answer is the best when no recursion is required.
Just in case you need something more flexible (but way slower):
#echo off
for /R "%windir%\system32" %%f in (*) do (
echo %%~nf|findstr /rix "[a-z][a-z][a-z][a-z]" >nul && (
echo %%~ff has a 4 letter filename: %%~nf and a size of %%~zf Bytes
)
)
As implied in my comment, and assuming four characters, not four alphabetic characters:
#For /R "%__AppDir__%" %%A In (*)Do #(Set "FN=%%~nA"
SetLocal EnableDelayedExpansion
If Not "%%~nA"=="!FN:~,3!" If "%%~nA"=="!FN:~,4!" Echo %%~nA
EndLocal)
And here's a possible alternative, for four alphabetic characters. Run it 'As administrator' if you're really trying to parse all files inside \Windows\System32\, (not essential but may pick up more files):
#Dir /B/S/A-D "%__AppDir__%" 2>NUL|"%__AppDir__%findstr.exe" "\\[a-Z][a-Z][a-Z][a-Z]\.[^\.]*$ \\[a-Z][a-Z][a-Z][a-Z]$"
You could put that inside a for-loop if, for some inexplicable reason, you only want only the basenames:
#For /F "EOL=?Tokens=*" %%A In ('Dir /B/S/A-D "%__AppDir__%" 2^>NUL^|"%__AppDir__%findstr.exe" "\\[a-Z][a-Z][a-Z][a-Z]\.[^\.]*$ \\[a-Z][a-Z][a-Z][a-Z]$"')Do #Echo(%%~nA
Try this:
dir /b C:\Windows\system32 | findstr /r "[a-z][a-z][a-z][a-z]"
The problem in your code was regular expression using style. You need to use findstr for regular expressions.

Trying to extract a GUID from a text, using batch (findstr + regexp)

I want to isolate a specific string from a text provided in a variable, using batch, but it doesn't seem to work as intended. I may do the regexp wrong, or maybe I misunderstood the way "findstr" works.
Te specific string that I need to isolate is a GUID (which has a standard format of alphanumeric characters, arranged in groups of characters separated by a "-", like this: 8-4-4-4-12)
#echo off
setlocal enabledelayedexpansion
SET str="This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
SET rx=[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}
FOR %%u IN ('FINDSTR /r "!rx!" "!str!"') DO ECHO %%u
endlocal
Basically, what I need is to store the GUID in a separate variable, so I can use it later on. If that can be achieved in a different manner, I'm happy to learn!
Thanks!
#ECHO Off
SETLOCAL
SET "str=This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
:: Theoretical
SET "hn=[a-f0-9]"
SET "hn4=%hn%%hn%%hn%%hn%"
SET "hn8=%hn4%%hn4%"
SET "wrx=%hn8%-%hn4%-%hn4%-%hn4%-%hn8%%hn4%"
:again
IF NOT DEFINED str ECHO notfound&GOTO done
ECHO %str%|FINDSTR /b /r /i "%wrx%">NUL
IF ERRORLEVEL 1 (
REM did not find string
SET "str=%str:~1%"
GOTO again
)
SET "str=%str:~0,36%"
ECHO found "%str%"
:done
:: BFI method
SET "str=This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
SET "hn=[a-f0-9]"
SET "hn4=%hn%%hn%%hn%%hn%"
SET "hn8=%hn4%%hn4%"
:bfiagain
IF NOT DEFINED str ECHO notfound&GOTO donebfi
:: "regex" using brute-force and ignorance
ECHO %str:~0,9%|FINDSTR /b /i /r "%hn8%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~9,5%|FINDSTR /b /i /r "%hn4%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~14,10%|FINDSTR /b /i /r "%hn4%-%hn4%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~24,12%|FINDSTR /b /i /r "%hn4%%hn8%">NUL
:bfino
IF ERRORLEVEL 1 (
SET "str=%str:~1%"
GOTO bfiagain
)
SET "str=%str:~0,36%"
ECHO found "%str%"
:donebfi
GOTO :EOF
Well, not so squeezy...
Fundamentally, findstr implements a very small subset of regex. It's intended to locate a character-string in a file.
Theoretically, you could string [a-f0-9] together the requisite number of times and add in the - separators for use as the "regex", then see whether the subject string /b (begins) with such a pattern; lop off the start character if not and repeat until found or subject-string is empty.
Notes here: I believe GUID uses HEX digits only, not alphanumerics. findstr supports /i to have the comparison made case-insensitively (which shortens the individual "character-match" string). Yes - I know ^ can be used in a regex (even one from Uncle Bill's little programmers' toolset) but I prefer /b.
The only small problem with this is that it yielded an out of memory error...
So, feed it small chunks at a time, and it appears happy...
I've done no further testing, and predict stormy weather if your text-string contains characters which cmd regards as specials - the usual suspects like redirectors, % and rabbit's ears.

Ridiculous caret escape sequence when mixing FOR and FINDSTR

I've got a string verification batch that handles rudimentary regex with FINDSTR and almost called it quits today when I thought I was unable to properly escape the caret character until I added a over a dozen ^.
Fail-verification Command: stringVerifier.bat "Derpy QFail" "^^^^^^^^^^^^^^^^QFail" /R
Pass-verification Command: stringVerifier.bat "QFail Derpy" "^^^^^^^^^^^^^^^^QFail" /R
stringVerifier.bat
#echo off
REM ===== Verify Sub-String In String Script =====
REM It uses Window's native findstr.exe commandline application with simplified options and scope for checking strings instead of file strings.
REM Call this script by preceeding the commandline call with the word `call` instead of directly running it.
REM
REM Parameter 1 is the string to search through. This must be wrapped in double-quotes.
REM Parameter 2 is the search pattern, e.g., "^QWARN". This must be wrapped in double-quotes.
REM Parameter 3 should be either /R for Regular Expression search or /L for a string-literal search.
REM Parameter 4 is optional. It should be true/false or t/f for case-sensitive searches.
REM Parameter 4 behavior will default to false for case-sensitivity if left out of the commandline parameters when called.
REM
REM You can check success by exit code or if the value of %match_was_found% is true/false or if %match_found% isn't empty.
REM A false value for %match_was_found% means there's no result to check due to no match occurring.
REM A non-empty value for %match_found% always indicates success, and vice-versa.
REM These values reset every time you run this script.
REM Extract between 1 from the front and 1 from the end to strip commandline quotes
Set string_to_search=%1
Set string_to_search=%string_to_search:~1,-1%
Set search_pattern=%2
Set search_pattern=%search_pattern:~1,-1%
Set search_type=%3
Set case_sensitive=%4
IF /i "%case_sensitive%"=="t" Set case_sensitive=true
IF /i "%case_sensitive%"=="f" Set case_sensitive=false
IF /i "%case_sensitive%"=="" Set case_sensitive=false
IF "%string_to_search%"=="" echo You forgot to provide parameter one, %string_to_search%, to specify your string to search, e.g., "Start of line of this string"&&exit /b 1
IF "%search_pattern%"=="" echo You forgot to provide parameter two, %search_pattern%, to specify your search pattern, e.g., "^Start of.*string$"&&exit /b 1
IF "%search_type%"=="" echo You forgot to provide parameter three, %search_type%, to specify your search type, e.g., /R or /L&&exit /b 1
IF /i NOT "%search_type%"=="/R" IF NOT "%search_type%"=="/L" echo You didn't provide the correct third parameter, %search_type%, for /R or /L&&exit /b 1
IF /i NOT "%case_sensitive%"=="" IF NOT "%case_sensitive%"=="true" IF NOT "%case_sensitive%"=="false" echo You didn't provide the correct fourth, %case_sensitive%, parameter for true or false&&exit /b 1
Set match_was_found=
Set match_found=
Set Command_To_Run=
Set Command_Ender=
Set Command_To_Run=echo.%string_to_search%
IF NOT "%case_sensitive%"=="" IF NOT "%case_sensitive%"=="true" Set Command_Ender=/I
IF "%search_type%"=="/R" (Set Command_Ender=%Command_Ender% /R %search_pattern%) ELSE (Set Command_Ender=%Command_Ender% /C:%search_pattern%)
FOR /F "tokens=*" %%i IN (' %Command_To_Run% ^| findstr.exe %Command_Ender% ') DO Set match_found=%%i
REM Deleting all variables we don't want retained as temporary env vars.
IF "%match_found%"=="" Set match_was_found=false&&Set string_to_search=&&Set search_pattern=&&Set search_type=&&Set Command_To_Run=&&Set Command_Ender=&&Set case_sensitive=&&exit /b 1
IF NOT "%match_found%"=="" Set match_was_found=true&&Set string_to_search=&&Set search_pattern=&&Set search_type=&&Set Command_To_Run=&&Set Command_Ender=&&Set case_sensitive=
REM Comment out this line or add more script logic if you want to disable console output of the matching line
echo %match_found%
exit /b 0
Is there any way to circumvent this ridiculous escape sequence in the batch itself without generated temp files and other such annoyances for escaping regex metacharacters?
You used set variable syntax is adverse.
As without quotes the carets will be used in any SET command to escape the next character.
This line will consume half of your carets
Set search_pattern=%search_pattern:~1,-1%
You should use the extended set syntax:
set "variable=content"
But you need only to change some of your lines to reduce the total amount of carets to two.
Set "search_pattern=%~2" This takes the argument and removes also the quotes
...
IF "%search_type%"=="/R" (Set "Command_Ender=%Command_Ender% /R %search_pattern%") ELSE (Set "Command_Ender=%Command_Ender% /C:%search_pattern%")
...
FOR /F "tokens=*" %%i IN (' %Command_To_Run% ^| findstr.exe %%Command_Ender%% ') DO Set match_found=%%i
Now you only need to use
stringVerifier.bat "QFail Derpy" "^^QFail" /R
That's because the last findstr still consumes one time the carets.
This could be changed also with quotes, but then you have to change your Command_Ender variable to hold only the options, but not the search string anymore.
To inspect the code you should use at some points a set Command_Ender or set search_pattern to show the real content of your variables.
You should also have a look at delayed expansion, as delayed expansion never changes the variable content.

Use subpatterns in FINDSTR

I must check the validity of a string stored in a variable, I can not use external CLI utilities (grep, awk, etc.) so I chose FINDSTR.
The string has this format (in regexp):
([1-9][0-9]*:".*"(|".*")*)
I do not know how to check the subpattern (|. "*").
Currently my code is:
((ECHO.) | (SET /P "=(11:"a"|"b"|"c")") | (FINDSTR /R /C:"^([1-9][0-9]*:".*")$"))
Regards.
Mat M is correct about the limitation of FINDSTR. The FINDSTR regex support is very primitive and non-standard. Type HELP FINDSTR or FINDSTR /? from the command line to get a brief synopsis of what is supported. For an in depth explanation, refer to What are the undocumented features and limitations of the Windows FINDSTR command?
I like Harry Johnston's comment - It would be quite easy to create a solution using VBScript or JavaScript. I think that would be a much better choice.
But, here is a native batch solution. I've incorporated the extra rule about the number of subpatterns that the OP stated in the comment to Mat M's answer.
The solution is surprisingly tricky. Special characters can cause problems when piping the ECHO output to FINDSTR because of the way pipes work. Each side of the pipe is executed in it's own CMD session. The special characters must either be quoted, escaped twice, or only exposed via delayed expansion. I chose to use delayed expansion, but the ! characters must be escaped twice to make sure the delayed expansion occurs at the correct time.
The easiest way to parse a variable number of subpatterns is to replace the delimiter with a newline and use FOR /F to iterate each subpattern.
The top half of my code is a brittle coding harness to conveniently iterate and test a set of strings. It will not work properly with any of <space> ; , = <tab> * or ? in the string. Also, the quotes must be balanced in each string.
But the more important validate routine can handle any string in the var variable.
#echo off
setlocal
set LF=^
::Above 2 blank lines are critical for creating a linefeed variable. Do not remove
set test=a
for %%S in (
"(3:"a"|"c"|"c")"
"(11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
"(4:"a"|"b"|"c")"
"(10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
"(3:"a"|"b"|"c""
"(3:"a"|"b^|c")"
"(3:"a"|"b"|c)"
"(3:"a"|"b"||"c")"
"(3:"a"|"b"|;|"c")"
) do (
set "var=%%~S"
call :validate
)
exit /b
:validate
setlocal enableDelayedExpansion
cmd /v:on /c echo ^^^!var^^^!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid FINDSTR fail& exit /b)
if "!var:||=!" neq "!var!" (call :invalid double pipe fail& exit /b)
for /f "delims=(:" %%N in ("!var!") do set "expectedCount=%%N"
set "str=!var:*:=!"
set "str=!str:~0,-1!"
set foundCount=0
for %%A in ("!LF!") do for /f eol^=^%LF%%LF%^ delims^= %%B in ("!str:|=%%~A!") do (
if %%B neq "%%~B" (call :invalid sub-pattern fail& exit /b)
set /a foundCount+=1
)
if %foundCount% neq %expectedCount% (call :invalid count fail& exit /b)
echo Valid: !var!
exit /b
:invalid
echo Invalid - %*: !var!
exit /b
Here are the results after running the batch script
Valid: (3:"a"|"c"|"c")
Valid: (11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - count fail: (4:"a"|"b"|"c")
Invalid - count fail: (10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - FINDSTR fail: (3:"a"|"b"|"c"
Invalid - sub-pattern fail: (3:"a"|"b|c")
Invalid - sub-pattern fail: (3:"a"|"b"|c)
Invalid - double pipe fail: (3:"a"|"b"||"c")
Invalid - sub-pattern fail: (3:"a"|"b"|;|"c")
Update
The :validate routine can be simplified a bit by postponing the enablement of delayed expansion until after the CMD /V:ON pipe. This means I no longer have to worry about double escaping the ! on the left side of the pipe.
:validate
cmd /v:on /c echo !var!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid FINDSTR fail& exit /b)
setlocal enableDelayedExpansion
... remainder unchanged
As far as I know, findstr is not able to group regexps, so (|".*")* is a no-no. If you know how many blocks you have and you duplicate your code like this
FINDSTR /R /C:"^([1-9][0-9]*:\"..*\"|\"..*\"|\"..*\")$"
This way, if you are sure the number of blocks is constant, having empty ones "" if required, then you can check for it.
The double quotes inside the expression are ignored unless you prefix them with \.
The ..* construct is meant to replace .+ : one or more characters.

Check a string for a substring in a batch file (Windows)?

Let's say I have some text in a variable called $1. Now I want to check if that $1 contains a certain string. If it contains a certain string I want to print a message. The printing is not the problem, the problem is the check. Any ideas how to do that?
The easiest way in my opinion is this :
set YourString=This is a test
If NOT "%YourString%"=="%YourString:test=%" (
echo Yes
) else (
echo No
)
Basiclly the string after ':' is the string you are looking for and you are using not infront of the if because %string:*% will remove the * from the string making them not equal.
The SET search and replace trick works in many cases, but it does not support case sensitive or regular expression searches.
If you need a case sensitive search or limited regular expression support, you can use FINDSTR.
To avoid complications of escaping special characters, it is best if the search string is in a variable and both search and target are accessed via delayed expansion.
You can pipe $1 into the FINDSTR command with the ECHO command. Use ECHO( in case $1 is undefined, and be careful not to add extra spaces. ECHO !$1! will echo ECHO is off. (or on) if $1 is undefined, whereas ECHO(!$1! will echo a blank line if undefined.
FINDSTR will echo $1 if it finds the search string - you don't want that so you redirect output to nul. FINDSTR sets ERRORLEVEL to 0 if the search string is found, and 1 if it is not found. That is what is used to check if the string was found. The && and || is a convenient syntax to use to test for match (ERRORLEVEL 0) or no match (ERRORLEVEL not 0)
The regular expression support is rudimentary, but still useful.
See FINDSTR /? for more info.
This regular expression example will search $1 for "BEGIN" at start of string, "MID" anywhere in middle, and "END" at end. The search is case sensitive by default.
set "search=^BEGIN.*MID.*END$"
setlocal enableDelayedExpansion
echo(!$1!|findstr /r /c:"!search!" >nul && (
echo FOUND
rem any commands can go here
) || (
echo NOT FOUND
rem any commands can go here
)
As far as I know cmd.exe has no built-in function which answers your question directly. But it does support replace operation. So the trick is: in your $1 replace the substring you need to test the presence of with an empty string, then check if $1 has changed. If it has then it did contain the substring (otherwise the replace operation would have had nothing to replace in the first place!). See the code below:
set longString=the variable contating (or not containing) some text
#rem replace xxxxxx with the string you are looking for
set tempStr=%longString:xxxxxx=%
if "%longString%"=="%tempStr%" goto notFound
echo Substring found!
goto end
:notFound
echo Substring not found
:end