Use subpatterns in FINDSTR - regex

I must check the validity of a string stored in a variable, I can not use external CLI utilities (grep, awk, etc.) so I chose FINDSTR.
The string has this format (in regexp):
([1-9][0-9]*:".*"(|".*")*)
I do not know how to check the subpattern (|. "*").
Currently my code is:
((ECHO.) | (SET /P "=(11:"a"|"b"|"c")") | (FINDSTR /R /C:"^([1-9][0-9]*:".*")$"))
Regards.

Mat M is correct about the limitation of FINDSTR. The FINDSTR regex support is very primitive and non-standard. Type HELP FINDSTR or FINDSTR /? from the command line to get a brief synopsis of what is supported. For an in depth explanation, refer to What are the undocumented features and limitations of the Windows FINDSTR command?
I like Harry Johnston's comment - It would be quite easy to create a solution using VBScript or JavaScript. I think that would be a much better choice.
But, here is a native batch solution. I've incorporated the extra rule about the number of subpatterns that the OP stated in the comment to Mat M's answer.
The solution is surprisingly tricky. Special characters can cause problems when piping the ECHO output to FINDSTR because of the way pipes work. Each side of the pipe is executed in it's own CMD session. The special characters must either be quoted, escaped twice, or only exposed via delayed expansion. I chose to use delayed expansion, but the ! characters must be escaped twice to make sure the delayed expansion occurs at the correct time.
The easiest way to parse a variable number of subpatterns is to replace the delimiter with a newline and use FOR /F to iterate each subpattern.
The top half of my code is a brittle coding harness to conveniently iterate and test a set of strings. It will not work properly with any of <space> ; , = <tab> * or ? in the string. Also, the quotes must be balanced in each string.
But the more important validate routine can handle any string in the var variable.
#echo off
setlocal
set LF=^
::Above 2 blank lines are critical for creating a linefeed variable. Do not remove
set test=a
for %%S in (
"(3:"a"|"c"|"c")"
"(11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
"(4:"a"|"b"|"c")"
"(10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
"(3:"a"|"b"|"c""
"(3:"a"|"b^|c")"
"(3:"a"|"b"|c)"
"(3:"a"|"b"||"c")"
"(3:"a"|"b"|;|"c")"
) do (
set "var=%%~S"
call :validate
)
exit /b
:validate
setlocal enableDelayedExpansion
cmd /v:on /c echo ^^^!var^^^!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid FINDSTR fail& exit /b)
if "!var:||=!" neq "!var!" (call :invalid double pipe fail& exit /b)
for /f "delims=(:" %%N in ("!var!") do set "expectedCount=%%N"
set "str=!var:*:=!"
set "str=!str:~0,-1!"
set foundCount=0
for %%A in ("!LF!") do for /f eol^=^%LF%%LF%^ delims^= %%B in ("!str:|=%%~A!") do (
if %%B neq "%%~B" (call :invalid sub-pattern fail& exit /b)
set /a foundCount+=1
)
if %foundCount% neq %expectedCount% (call :invalid count fail& exit /b)
echo Valid: !var!
exit /b
:invalid
echo Invalid - %*: !var!
exit /b
Here are the results after running the batch script
Valid: (3:"a"|"c"|"c")
Valid: (11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - count fail: (4:"a"|"b"|"c")
Invalid - count fail: (10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - FINDSTR fail: (3:"a"|"b"|"c"
Invalid - sub-pattern fail: (3:"a"|"b|c")
Invalid - sub-pattern fail: (3:"a"|"b"|c)
Invalid - double pipe fail: (3:"a"|"b"||"c")
Invalid - sub-pattern fail: (3:"a"|"b"|;|"c")
Update
The :validate routine can be simplified a bit by postponing the enablement of delayed expansion until after the CMD /V:ON pipe. This means I no longer have to worry about double escaping the ! on the left side of the pipe.
:validate
cmd /v:on /c echo !var!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid FINDSTR fail& exit /b)
setlocal enableDelayedExpansion
... remainder unchanged

As far as I know, findstr is not able to group regexps, so (|".*")* is a no-no. If you know how many blocks you have and you duplicate your code like this
FINDSTR /R /C:"^([1-9][0-9]*:\"..*\"|\"..*\"|\"..*\")$"
This way, if you are sure the number of blocks is constant, having empty ones "" if required, then you can check for it.
The double quotes inside the expression are ignored unless you prefix them with \.
The ..* construct is meant to replace .+ : one or more characters.

Related

How can I see if a string is four letters long? – Windows Batch

So I'm working on a Windows Batch script and I want to know if an input string (the name of a file) is exactly four letters long. I want to do it with regular expressions or string matching.
I tried the following but it didn't work...
for /R "%windir%\system32" %%f in (*) do (
set filename=%%~nf
if not "!filename!"=="!filename:[a-z][a-z][a-z][a-z]=!" (
echo %%~nf
)
)
So my code loops through all the files in \system32. The files like mode.com should be echoed, but it's not the case.
This works:
dir /B "%windir%\system32" | findstr "^[a-z][a-z][a-z][a-z]\."
Tested on Windows 10
Aacini's answer is the best when no recursion is required.
Just in case you need something more flexible (but way slower):
#echo off
for /R "%windir%\system32" %%f in (*) do (
echo %%~nf|findstr /rix "[a-z][a-z][a-z][a-z]" >nul && (
echo %%~ff has a 4 letter filename: %%~nf and a size of %%~zf Bytes
)
)
As implied in my comment, and assuming four characters, not four alphabetic characters:
#For /R "%__AppDir__%" %%A In (*)Do #(Set "FN=%%~nA"
SetLocal EnableDelayedExpansion
If Not "%%~nA"=="!FN:~,3!" If "%%~nA"=="!FN:~,4!" Echo %%~nA
EndLocal)
And here's a possible alternative, for four alphabetic characters. Run it 'As administrator' if you're really trying to parse all files inside \Windows\System32\, (not essential but may pick up more files):
#Dir /B/S/A-D "%__AppDir__%" 2>NUL|"%__AppDir__%findstr.exe" "\\[a-Z][a-Z][a-Z][a-Z]\.[^\.]*$ \\[a-Z][a-Z][a-Z][a-Z]$"
You could put that inside a for-loop if, for some inexplicable reason, you only want only the basenames:
#For /F "EOL=?Tokens=*" %%A In ('Dir /B/S/A-D "%__AppDir__%" 2^>NUL^|"%__AppDir__%findstr.exe" "\\[a-Z][a-Z][a-Z][a-Z]\.[^\.]*$ \\[a-Z][a-Z][a-Z][a-Z]$"')Do #Echo(%%~nA
Try this:
dir /b C:\Windows\system32 | findstr /r "[a-z][a-z][a-z][a-z]"
The problem in your code was regular expression using style. You need to use findstr for regular expressions.

Trying to extract a GUID from a text, using batch (findstr + regexp)

I want to isolate a specific string from a text provided in a variable, using batch, but it doesn't seem to work as intended. I may do the regexp wrong, or maybe I misunderstood the way "findstr" works.
Te specific string that I need to isolate is a GUID (which has a standard format of alphanumeric characters, arranged in groups of characters separated by a "-", like this: 8-4-4-4-12)
#echo off
setlocal enabledelayedexpansion
SET str="This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
SET rx=[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}
FOR %%u IN ('FINDSTR /r "!rx!" "!str!"') DO ECHO %%u
endlocal
Basically, what I need is to store the GUID in a separate variable, so I can use it later on. If that can be achieved in a different manner, I'm happy to learn!
Thanks!
#ECHO Off
SETLOCAL
SET "str=This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
:: Theoretical
SET "hn=[a-f0-9]"
SET "hn4=%hn%%hn%%hn%%hn%"
SET "hn8=%hn4%%hn4%"
SET "wrx=%hn8%-%hn4%-%hn4%-%hn4%-%hn8%%hn4%"
:again
IF NOT DEFINED str ECHO notfound&GOTO done
ECHO %str%|FINDSTR /b /r /i "%wrx%">NUL
IF ERRORLEVEL 1 (
REM did not find string
SET "str=%str:~1%"
GOTO again
)
SET "str=%str:~0,36%"
ECHO found "%str%"
:done
:: BFI method
SET "str=This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
SET "hn=[a-f0-9]"
SET "hn4=%hn%%hn%%hn%%hn%"
SET "hn8=%hn4%%hn4%"
:bfiagain
IF NOT DEFINED str ECHO notfound&GOTO donebfi
:: "regex" using brute-force and ignorance
ECHO %str:~0,9%|FINDSTR /b /i /r "%hn8%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~9,5%|FINDSTR /b /i /r "%hn4%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~14,10%|FINDSTR /b /i /r "%hn4%-%hn4%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~24,12%|FINDSTR /b /i /r "%hn4%%hn8%">NUL
:bfino
IF ERRORLEVEL 1 (
SET "str=%str:~1%"
GOTO bfiagain
)
SET "str=%str:~0,36%"
ECHO found "%str%"
:donebfi
GOTO :EOF
Well, not so squeezy...
Fundamentally, findstr implements a very small subset of regex. It's intended to locate a character-string in a file.
Theoretically, you could string [a-f0-9] together the requisite number of times and add in the - separators for use as the "regex", then see whether the subject string /b (begins) with such a pattern; lop off the start character if not and repeat until found or subject-string is empty.
Notes here: I believe GUID uses HEX digits only, not alphanumerics. findstr supports /i to have the comparison made case-insensitively (which shortens the individual "character-match" string). Yes - I know ^ can be used in a regex (even one from Uncle Bill's little programmers' toolset) but I prefer /b.
The only small problem with this is that it yielded an out of memory error...
So, feed it small chunks at a time, and it appears happy...
I've done no further testing, and predict stormy weather if your text-string contains characters which cmd regards as specials - the usual suspects like redirectors, % and rabbit's ears.

Ridiculous caret escape sequence when mixing FOR and FINDSTR

I've got a string verification batch that handles rudimentary regex with FINDSTR and almost called it quits today when I thought I was unable to properly escape the caret character until I added a over a dozen ^.
Fail-verification Command: stringVerifier.bat "Derpy QFail" "^^^^^^^^^^^^^^^^QFail" /R
Pass-verification Command: stringVerifier.bat "QFail Derpy" "^^^^^^^^^^^^^^^^QFail" /R
stringVerifier.bat
#echo off
REM ===== Verify Sub-String In String Script =====
REM It uses Window's native findstr.exe commandline application with simplified options and scope for checking strings instead of file strings.
REM Call this script by preceeding the commandline call with the word `call` instead of directly running it.
REM
REM Parameter 1 is the string to search through. This must be wrapped in double-quotes.
REM Parameter 2 is the search pattern, e.g., "^QWARN". This must be wrapped in double-quotes.
REM Parameter 3 should be either /R for Regular Expression search or /L for a string-literal search.
REM Parameter 4 is optional. It should be true/false or t/f for case-sensitive searches.
REM Parameter 4 behavior will default to false for case-sensitivity if left out of the commandline parameters when called.
REM
REM You can check success by exit code or if the value of %match_was_found% is true/false or if %match_found% isn't empty.
REM A false value for %match_was_found% means there's no result to check due to no match occurring.
REM A non-empty value for %match_found% always indicates success, and vice-versa.
REM These values reset every time you run this script.
REM Extract between 1 from the front and 1 from the end to strip commandline quotes
Set string_to_search=%1
Set string_to_search=%string_to_search:~1,-1%
Set search_pattern=%2
Set search_pattern=%search_pattern:~1,-1%
Set search_type=%3
Set case_sensitive=%4
IF /i "%case_sensitive%"=="t" Set case_sensitive=true
IF /i "%case_sensitive%"=="f" Set case_sensitive=false
IF /i "%case_sensitive%"=="" Set case_sensitive=false
IF "%string_to_search%"=="" echo You forgot to provide parameter one, %string_to_search%, to specify your string to search, e.g., "Start of line of this string"&&exit /b 1
IF "%search_pattern%"=="" echo You forgot to provide parameter two, %search_pattern%, to specify your search pattern, e.g., "^Start of.*string$"&&exit /b 1
IF "%search_type%"=="" echo You forgot to provide parameter three, %search_type%, to specify your search type, e.g., /R or /L&&exit /b 1
IF /i NOT "%search_type%"=="/R" IF NOT "%search_type%"=="/L" echo You didn't provide the correct third parameter, %search_type%, for /R or /L&&exit /b 1
IF /i NOT "%case_sensitive%"=="" IF NOT "%case_sensitive%"=="true" IF NOT "%case_sensitive%"=="false" echo You didn't provide the correct fourth, %case_sensitive%, parameter for true or false&&exit /b 1
Set match_was_found=
Set match_found=
Set Command_To_Run=
Set Command_Ender=
Set Command_To_Run=echo.%string_to_search%
IF NOT "%case_sensitive%"=="" IF NOT "%case_sensitive%"=="true" Set Command_Ender=/I
IF "%search_type%"=="/R" (Set Command_Ender=%Command_Ender% /R %search_pattern%) ELSE (Set Command_Ender=%Command_Ender% /C:%search_pattern%)
FOR /F "tokens=*" %%i IN (' %Command_To_Run% ^| findstr.exe %Command_Ender% ') DO Set match_found=%%i
REM Deleting all variables we don't want retained as temporary env vars.
IF "%match_found%"=="" Set match_was_found=false&&Set string_to_search=&&Set search_pattern=&&Set search_type=&&Set Command_To_Run=&&Set Command_Ender=&&Set case_sensitive=&&exit /b 1
IF NOT "%match_found%"=="" Set match_was_found=true&&Set string_to_search=&&Set search_pattern=&&Set search_type=&&Set Command_To_Run=&&Set Command_Ender=&&Set case_sensitive=
REM Comment out this line or add more script logic if you want to disable console output of the matching line
echo %match_found%
exit /b 0
Is there any way to circumvent this ridiculous escape sequence in the batch itself without generated temp files and other such annoyances for escaping regex metacharacters?
You used set variable syntax is adverse.
As without quotes the carets will be used in any SET command to escape the next character.
This line will consume half of your carets
Set search_pattern=%search_pattern:~1,-1%
You should use the extended set syntax:
set "variable=content"
But you need only to change some of your lines to reduce the total amount of carets to two.
Set "search_pattern=%~2" This takes the argument and removes also the quotes
...
IF "%search_type%"=="/R" (Set "Command_Ender=%Command_Ender% /R %search_pattern%") ELSE (Set "Command_Ender=%Command_Ender% /C:%search_pattern%")
...
FOR /F "tokens=*" %%i IN (' %Command_To_Run% ^| findstr.exe %%Command_Ender%% ') DO Set match_found=%%i
Now you only need to use
stringVerifier.bat "QFail Derpy" "^^QFail" /R
That's because the last findstr still consumes one time the carets.
This could be changed also with quotes, but then you have to change your Command_Ender variable to hold only the options, but not the search string anymore.
To inspect the code you should use at some points a set Command_Ender or set search_pattern to show the real content of your variables.
You should also have a look at delayed expansion, as delayed expansion never changes the variable content.

Linefeed in batch regex

I want to match all lines of the following text with FINDSTR /R
LABO_A =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = LABO)
)
)
I already tried What are the undocumented features and limitations of the Windows FINDSTR command?
Especially the "Searching across line breaks" part. But unfortunately it didn't work.
My approach is the following:
SETLOCAL
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
SETLOCAL enableDelayedExpansion
FINDSTR /R "LABO_A.=.!CR!*!LF!.*(DESCRIPTION.=.!CR!*!LF!.*(ADDRESS.=.(PROTOCOL.=.TCP)(HOST.=.host01)(PORT.=.1521))!CR!*!LF!.*(CONNECT_DATA.=!CR!*!LF!.*(SERVICE_NAME.=.LABO)!CR!*!LF!.*)!CR!*!LF!.*)" %FINDPATH%
Am I missing something? Or is the batch regex simply not powerful enough to realize this?
SOLUTION:
The approach of #dbenham let me reconsider my regex-string. So I edited it to
FINDSTR /R /C:"LABO_A =!CR!*!LF!.*(DESCRIPTION =!CR!*!LF!.*(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!CR!*!LF!.*(CONNECT_DATA =!CR!*!LF!.*(SERVICE_NAME = LABO)!CR!*!LF!.*)!CR!*!LF!.*)" %FINDPATH% > NUL
I removed some unnecessary white spaces and adapted the parameters of FINDSTR.
Now it works.
Your regex is wrong. Your source lines end immediately after the =, but the extra . in your regex is looking for an additional character after the =.
It looks to me you are using . to represent white space. I think you would be better off using actual spaces, but then you need the /C option.
The following matches the lines successfully.
#echo off
SETLOCAL
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
SETLOCAL enableDelayedExpansion
FINDSTR /R /C:"LABO_A =!CR!*!LF! *(DESCRIPTION =!CR!*!LF! *(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!CR!*!LF! *(CONNECT_DATA =!CR!*!LF! *(SERVICE_NAME = LABO)!CR!*!LF! *)!CR!*!LF! *)" test.txt
Note that even though all lines in the regex are matched, only the first line of the matching set is printed.
I suspect that the line breaks are not required in your configuration file. Here is another variation that allows for more variation in the white space.
#echo off
setlocal enableDelayedExpansion
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
set "ws=[ !cr!!lf!]*"
FINDSTR /RX /C:"LABO_A =!ws!(DESCRIPTION =!ws!(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!ws!(CONNECT_DATA =!ws!(SERVICE_NAME = LABO)!ws!)!ws!)!ws!" test.txt
I also attempted to allow white space in every place I thought possible, but that exceeded FINDSTR's maximum REGEX string length.
Essentially, batch regex isn't powerful enough. SED would be better no doubt.
Nonetheless, here's a way to detect that a sequence of lines appears in a file. It's a little restricted, but should suffice for the sequence you've nominated. It assumes that leading spaces are not significant.
#ECHO OFF
SETLOCAL enabledelayedexpansion
FOR /f "delims==" %%a IN ('set l_ 2^>nul') DO "SET %%a="
SET /a lines=0
FOR /f "tokens=*" %%a IN (q19859936.txt) DO SET /a lines+=1&SET l_!lines!=%%a
SET hits=0
SET "stop="
FOR /f "tokens=*" %%a IN (q19859936.test) DO (
SET l_0=%%~a
CALL :test
IF DEFINED stop GOTO done
)
:done
IF DEFINED stop (ECHO FOUND ) ELSE (ECHO NOT FOUND)
GOTO :EOF
:test
SET /a hits+=1
ECHO IF NOT "!l_%hits%!"=="%l_0%"
IF NOT "!l_%hits%!"=="%l_0%" SET hits=0&IF %hits%==1 (GOTO :eof) ELSE (GOTO test)
IF %hits%==%lines% SET stop=Y
GOTO :eof
[edited code 20131111T1408Z - first FOR had tokens=2]
The initial FOR ensures that variables L_* are cleared.
The file q19859936.txt is read as the line-sequence-to-be-detected data.
q19859936.test is then examined. Each line is assigned to L_0 in turn and the internal subroutine :test will check to see whether it matches the next-line-expected.
The IF NOT statement is significant - and seemingly illogical (you'd need to add the /i switch to make it case-insensitive if you so want...) When batch parses the line, %hits% is replaced by the then-current value of hits and THEN the line is executed, so hits will be reset to 0 if ever a mismatch is found. If the HITS count WAS not 1, then the test is repeated. This takes care of the case
matches line 1
matches line 2
matches line 3
matches line 1
matches line 2
matches line 3
matches line 4
matches line 5
matches line 6
where the second "line 1" is encountered when "line 4" was expected. HITS is thus changed to 0, but it WAS 4 so execution passes back to :test and the test repeated with HITS=1.
Another approach could have been to read lines into another array (say L#*) and test that L_* matched L#*, for %LINES% entries. On no match, ripple-up and assign the next line read to L#!lines! ... but I thought of that later. Probably be easier and better, too - I'll leave it as an exercise for whoever may be interested.
This will work if you are after the LABO_A reference.
It uses a helper batch file called findrepl.bat from - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat
Place findrepl.bat in the same folder as the batch file or on the path.
type "file.txt" | findrepl "^LABO_A =" /e:"^ \)"

Syntax for specific RegEx in command line FINDSTR call

I am writing a batch script that takes in various arguments before starting another process. In the example below I am checking the case where the first argument was 1, and the second argument is in the form of "any number of digits 0-9, followed by the letter k, m, or g" (I am specifying the amount of memory the process should start with i.e. 10g = 10 Gb memory).
If I just want a number this will suffice:
IF [%1] EQU [1] ECHO %2|findstr /r "[^0-9]" > nul
IF [%1] EQU [1] IF errorlevel 1 echo starting test number %1 with %2 of memory
What I thought would be an obvious segway to add the letters k, m, or g led me to this (I've tried with and without the '*'):
IF [%1] EQU [1] ECHO %2|findstr /r "[^0-9]*[kmg]" > nul
IF [%1] EQU [1] IF errorlevel 1 echo starting test number %1 with %2 of memory
However I have been unable to match any string to this FINDSTR pattern. Basically I am looking for a FINDSTR that matches [0-9][0-9]*[kmg]. I am fairly certain I am close but am having trouble working out the correct syntax.
Even the first code you posted does not work. [^0-9] looks for any non-digit. I think you wanted ^[0-9], which means any string that starts with a digit. Your logic is also wrong: FINDSTR sets errorlevel to 0 if found, and 1 if not found. I prefer to use the conditional && and || operators to test the result instead of IF.
I recommend the following for what you are attempting. I've thrown in the /I switch to make it case insensitive. I add the /X switch to prevent the string from matching if there are extra characters before or after the number with suffix.
#echo off
if "%~1" equ "1" echo(%~2|findstr /rix "[0-9][0-9]*[kgm]" >nul && (
echo starting test number %~1 with %~2 of memory
)
Unfortunately, FINDSTR does not support the ? meta-character. So the solution is slightly more complicated if the suffix is optional (if you want to support bytes, kilobytes, megabytes, and gigabytes). You would need to search for either of 2 strings, one with the suffix, and one without. FINDSTR breaks the search string into multiple search strings at spaces.
#echo off
if "%~1" equ "1" echo(%~2|findstr /rix "[0-9][0-9]*[kgm] [0-9][0-9]*" >nul && (
echo starting test number %~1 with %~2 of memory
)