Trying to extract a GUID from a text, using batch (findstr + regexp) - regex

I want to isolate a specific string from a text provided in a variable, using batch, but it doesn't seem to work as intended. I may do the regexp wrong, or maybe I misunderstood the way "findstr" works.
Te specific string that I need to isolate is a GUID (which has a standard format of alphanumeric characters, arranged in groups of characters separated by a "-", like this: 8-4-4-4-12)
#echo off
setlocal enabledelayedexpansion
SET str="This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
SET rx=[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}
FOR %%u IN ('FINDSTR /r "!rx!" "!str!"') DO ECHO %%u
endlocal
Basically, what I need is to store the GUID in a separate variable, so I can use it later on. If that can be achieved in a different manner, I'm happy to learn!
Thanks!

#ECHO Off
SETLOCAL
SET "str=This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
:: Theoretical
SET "hn=[a-f0-9]"
SET "hn4=%hn%%hn%%hn%%hn%"
SET "hn8=%hn4%%hn4%"
SET "wrx=%hn8%-%hn4%-%hn4%-%hn4%-%hn8%%hn4%"
:again
IF NOT DEFINED str ECHO notfound&GOTO done
ECHO %str%|FINDSTR /b /r /i "%wrx%">NUL
IF ERRORLEVEL 1 (
REM did not find string
SET "str=%str:~1%"
GOTO again
)
SET "str=%str:~0,36%"
ECHO found "%str%"
:done
:: BFI method
SET "str=This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
SET "hn=[a-f0-9]"
SET "hn4=%hn%%hn%%hn%%hn%"
SET "hn8=%hn4%%hn4%"
:bfiagain
IF NOT DEFINED str ECHO notfound&GOTO donebfi
:: "regex" using brute-force and ignorance
ECHO %str:~0,9%|FINDSTR /b /i /r "%hn8%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~9,5%|FINDSTR /b /i /r "%hn4%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~14,10%|FINDSTR /b /i /r "%hn4%-%hn4%-">NUL
IF ERRORLEVEL 1 GOTO bfino
ECHO %str:~24,12%|FINDSTR /b /i /r "%hn4%%hn8%">NUL
:bfino
IF ERRORLEVEL 1 (
SET "str=%str:~1%"
GOTO bfiagain
)
SET "str=%str:~0,36%"
ECHO found "%str%"
:donebfi
GOTO :EOF
Well, not so squeezy...
Fundamentally, findstr implements a very small subset of regex. It's intended to locate a character-string in a file.
Theoretically, you could string [a-f0-9] together the requisite number of times and add in the - separators for use as the "regex", then see whether the subject string /b (begins) with such a pattern; lop off the start character if not and repeat until found or subject-string is empty.
Notes here: I believe GUID uses HEX digits only, not alphanumerics. findstr supports /i to have the comparison made case-insensitively (which shortens the individual "character-match" string). Yes - I know ^ can be used in a regex (even one from Uncle Bill's little programmers' toolset) but I prefer /b.
The only small problem with this is that it yielded an out of memory error...
So, feed it small chunks at a time, and it appears happy...
I've done no further testing, and predict stormy weather if your text-string contains characters which cmd regards as specials - the usual suspects like redirectors, % and rabbit's ears.

Related

How can I see if a string is four letters long? – Windows Batch

So I'm working on a Windows Batch script and I want to know if an input string (the name of a file) is exactly four letters long. I want to do it with regular expressions or string matching.
I tried the following but it didn't work...
for /R "%windir%\system32" %%f in (*) do (
set filename=%%~nf
if not "!filename!"=="!filename:[a-z][a-z][a-z][a-z]=!" (
echo %%~nf
)
)
So my code loops through all the files in \system32. The files like mode.com should be echoed, but it's not the case.
This works:
dir /B "%windir%\system32" | findstr "^[a-z][a-z][a-z][a-z]\."
Tested on Windows 10
Aacini's answer is the best when no recursion is required.
Just in case you need something more flexible (but way slower):
#echo off
for /R "%windir%\system32" %%f in (*) do (
echo %%~nf|findstr /rix "[a-z][a-z][a-z][a-z]" >nul && (
echo %%~ff has a 4 letter filename: %%~nf and a size of %%~zf Bytes
)
)
As implied in my comment, and assuming four characters, not four alphabetic characters:
#For /R "%__AppDir__%" %%A In (*)Do #(Set "FN=%%~nA"
SetLocal EnableDelayedExpansion
If Not "%%~nA"=="!FN:~,3!" If "%%~nA"=="!FN:~,4!" Echo %%~nA
EndLocal)
And here's a possible alternative, for four alphabetic characters. Run it 'As administrator' if you're really trying to parse all files inside \Windows\System32\, (not essential but may pick up more files):
#Dir /B/S/A-D "%__AppDir__%" 2>NUL|"%__AppDir__%findstr.exe" "\\[a-Z][a-Z][a-Z][a-Z]\.[^\.]*$ \\[a-Z][a-Z][a-Z][a-Z]$"
You could put that inside a for-loop if, for some inexplicable reason, you only want only the basenames:
#For /F "EOL=?Tokens=*" %%A In ('Dir /B/S/A-D "%__AppDir__%" 2^>NUL^|"%__AppDir__%findstr.exe" "\\[a-Z][a-Z][a-Z][a-Z]\.[^\.]*$ \\[a-Z][a-Z][a-Z][a-Z]$"')Do #Echo(%%~nA
Try this:
dir /b C:\Windows\system32 | findstr /r "[a-z][a-z][a-z][a-z]"
The problem in your code was regular expression using style. You need to use findstr for regular expressions.

Ridiculous caret escape sequence when mixing FOR and FINDSTR

I've got a string verification batch that handles rudimentary regex with FINDSTR and almost called it quits today when I thought I was unable to properly escape the caret character until I added a over a dozen ^.
Fail-verification Command: stringVerifier.bat "Derpy QFail" "^^^^^^^^^^^^^^^^QFail" /R
Pass-verification Command: stringVerifier.bat "QFail Derpy" "^^^^^^^^^^^^^^^^QFail" /R
stringVerifier.bat
#echo off
REM ===== Verify Sub-String In String Script =====
REM It uses Window's native findstr.exe commandline application with simplified options and scope for checking strings instead of file strings.
REM Call this script by preceeding the commandline call with the word `call` instead of directly running it.
REM
REM Parameter 1 is the string to search through. This must be wrapped in double-quotes.
REM Parameter 2 is the search pattern, e.g., "^QWARN". This must be wrapped in double-quotes.
REM Parameter 3 should be either /R for Regular Expression search or /L for a string-literal search.
REM Parameter 4 is optional. It should be true/false or t/f for case-sensitive searches.
REM Parameter 4 behavior will default to false for case-sensitivity if left out of the commandline parameters when called.
REM
REM You can check success by exit code or if the value of %match_was_found% is true/false or if %match_found% isn't empty.
REM A false value for %match_was_found% means there's no result to check due to no match occurring.
REM A non-empty value for %match_found% always indicates success, and vice-versa.
REM These values reset every time you run this script.
REM Extract between 1 from the front and 1 from the end to strip commandline quotes
Set string_to_search=%1
Set string_to_search=%string_to_search:~1,-1%
Set search_pattern=%2
Set search_pattern=%search_pattern:~1,-1%
Set search_type=%3
Set case_sensitive=%4
IF /i "%case_sensitive%"=="t" Set case_sensitive=true
IF /i "%case_sensitive%"=="f" Set case_sensitive=false
IF /i "%case_sensitive%"=="" Set case_sensitive=false
IF "%string_to_search%"=="" echo You forgot to provide parameter one, %string_to_search%, to specify your string to search, e.g., "Start of line of this string"&&exit /b 1
IF "%search_pattern%"=="" echo You forgot to provide parameter two, %search_pattern%, to specify your search pattern, e.g., "^Start of.*string$"&&exit /b 1
IF "%search_type%"=="" echo You forgot to provide parameter three, %search_type%, to specify your search type, e.g., /R or /L&&exit /b 1
IF /i NOT "%search_type%"=="/R" IF NOT "%search_type%"=="/L" echo You didn't provide the correct third parameter, %search_type%, for /R or /L&&exit /b 1
IF /i NOT "%case_sensitive%"=="" IF NOT "%case_sensitive%"=="true" IF NOT "%case_sensitive%"=="false" echo You didn't provide the correct fourth, %case_sensitive%, parameter for true or false&&exit /b 1
Set match_was_found=
Set match_found=
Set Command_To_Run=
Set Command_Ender=
Set Command_To_Run=echo.%string_to_search%
IF NOT "%case_sensitive%"=="" IF NOT "%case_sensitive%"=="true" Set Command_Ender=/I
IF "%search_type%"=="/R" (Set Command_Ender=%Command_Ender% /R %search_pattern%) ELSE (Set Command_Ender=%Command_Ender% /C:%search_pattern%)
FOR /F "tokens=*" %%i IN (' %Command_To_Run% ^| findstr.exe %Command_Ender% ') DO Set match_found=%%i
REM Deleting all variables we don't want retained as temporary env vars.
IF "%match_found%"=="" Set match_was_found=false&&Set string_to_search=&&Set search_pattern=&&Set search_type=&&Set Command_To_Run=&&Set Command_Ender=&&Set case_sensitive=&&exit /b 1
IF NOT "%match_found%"=="" Set match_was_found=true&&Set string_to_search=&&Set search_pattern=&&Set search_type=&&Set Command_To_Run=&&Set Command_Ender=&&Set case_sensitive=
REM Comment out this line or add more script logic if you want to disable console output of the matching line
echo %match_found%
exit /b 0
Is there any way to circumvent this ridiculous escape sequence in the batch itself without generated temp files and other such annoyances for escaping regex metacharacters?
You used set variable syntax is adverse.
As without quotes the carets will be used in any SET command to escape the next character.
This line will consume half of your carets
Set search_pattern=%search_pattern:~1,-1%
You should use the extended set syntax:
set "variable=content"
But you need only to change some of your lines to reduce the total amount of carets to two.
Set "search_pattern=%~2" This takes the argument and removes also the quotes
...
IF "%search_type%"=="/R" (Set "Command_Ender=%Command_Ender% /R %search_pattern%") ELSE (Set "Command_Ender=%Command_Ender% /C:%search_pattern%")
...
FOR /F "tokens=*" %%i IN (' %Command_To_Run% ^| findstr.exe %%Command_Ender%% ') DO Set match_found=%%i
Now you only need to use
stringVerifier.bat "QFail Derpy" "^^QFail" /R
That's because the last findstr still consumes one time the carets.
This could be changed also with quotes, but then you have to change your Command_Ender variable to hold only the options, but not the search string anymore.
To inspect the code you should use at some points a set Command_Ender or set search_pattern to show the real content of your variables.
You should also have a look at delayed expansion, as delayed expansion never changes the variable content.

Split String with Random Length using DOS/Batch

I have a log file I need to process and extract data from. Each line contains a string of an event log output. Unfortunately, the parts of the string is NOT uniformly formatted. Here are a few example lines:
"Some random length string. 0x8dda46 0x1 0x384 C:\Program Files (x86)\some\path\foo0.exe "
"Some random leeeength string. 0xa95ac2 0x8cc C:\Program Files (x86)\some\path\foo1.exe %%1936 0xcc0 "
"Some random leength string. 0xbcd668 0x330 C:\Program Files (x86)\some\path\foo2.exe %%1936 0xf38 "
"Some random leeeeeeeength string. 0xbcd668 0x1 0x330 C:\Program Files (x86)\some\path\foo2.exe "
"Some random leeength string. 0x352c44 0xfc0 C:\Program Files (x86)\some\path\foo3.exe %%1936 0x92c "
"Some random leeeeength string. 0xa95ac2 0x0 0x8cc C:\Program Files (x86)\some\path\foo1.exe "
"Some random leength string. 0x352c44 0x0 0xfc0 C:\Program Files (x86)\some\path\foo3.exe "
I need to extract the "foo.exe" file name without the full path and the HEX value just before the "C:\Progra..." (it's the process ID)
so I want the output be:
0x384 foo0.exe
0x8cc foo1.exe
0x330 foo2.exe
0x330 foo2.exe
0xfc0 foo3.exe
0x8cc foo1.exe
0xfc0 foo3.exe
I'm trying to achieve the goal with as less "hard coded" search/replace as possible since many parts of the string is not going to be the same content or same length. I tried to use FOR /F to split the string, but I have no way to locate the two columns as they are always changing. Only thing is constant is the "C:\Program Files (x86)" part. (Plus FOR has a 52 variable limit)
I have written some tricky batch files, but I'm starting to think I'm asking too much of DOS ;-)
Thanks in advance for any help!
#ECHO OFF
SETLOCAL
FOR /f "tokens=1*delims=." %%a IN (q28333414.txt) DO (
FOR /f "tokens=1*delims=:" %%c IN ("%%~b") DO CALL :process %%c&CALL :report "%%d
)
GOTO :EOF
:process
SET hexval=%~3
IF DEFINED hexval shift&GOTO process
SET "hexval=%~1"
SET "drive=%~2:"
GOTO :eof
:report
SET "line=%drive%%~1"
SET "line="%line:.exe=.exe"%"
FOR %%r IN (%line%) DO ECHO %hexval% %%~nxr&GOTO :eof
I used a file named q28333414.txt containing your data for my testing.
The first process simply throws away each (space-delimited) parameter between the . and : until there are exactly two left - the required hexval and the drive letter.
The report process re-attaches the drive letter and encloses it and the .exe name in quotes. the for %%r picks the first string, shucks off the quotes, spits out the result and all's done.
Edit : fixed report to show name and extension of file only as required and dbenham comment
Breaking news: (literally!)
#ECHO OFF
SETLOCAL enabledelayedexpansion
FOR /f "delims=" %%a IN (q28333414.txt) DO SET "line=%%~a"&CALL :process "!line::=" "!"
)
GOTO :EOF
:process
SET "hexval=%~3"
IF DEFINED hexval shift&GOTO process
CALL :lastbar1 %%~1
SET "filename=%~2"
SET filename="c:%filename:.exe =.exe" %
FOR %%r IN (%filename%) DO ECHO %hexval% %%~nxr&GOTO :eof
GOTO :eof
:lastbar1
SET "hexval=%~3"
IF DEFINED hexval shift&GOTO lastbar1
SET "hexval=%~1"
GOTO :eof
OK - let's try this, then.
For each line, replace all evil colons with " " and pass resultant quoted-string sequence to a subroutine.
Shift the parameters until there are but 2, which will be the string before and after the final countdown - er, colon.
Repeat the process for the first parameter. The penultimate value is the required hexval.
with the second parameter, add "c: before and " after any .exe, so the result is a quoted full-filename and dross; spit out the hexval and filename and done...
small revision in the rather dim light of the "&" comment - the famous set "var=whatever" formula fails with & included in this case (as in subdirectory "Documents & Settings") so the enclosing quotes can be removed as trailing spaces are not relevant. Would have been useful to know what the test data triggering the problem was though - reduces guesswork.
Any good regex utility you can lay your hands on should be able to solve your problem. I like to use my JREPL.BAT hybrid JScript/batch utility. It is pure script that runs natively on any Windows machine from XP onward.
Assuming your file is test.log, then I would use:
jrepl ".* (0x[0-9A-F]+) C:\\Program Files \(x86\)\\(?:.*\\)?([^\\]+\.exe) .*" "$1 $2" /i /f test.log
On each line it looks for the last occurrence of a hex string sandwiched by spaces that precedes a file path that begins with "C:\Program Files (x86)\" and ends with ".exe". I made the search ignore case.
This solution assumes that there are not backslashes into the random string.
#echo off
setlocal EnableDelayedExpansion
for /F "tokens=1-5 delims=\" %%a in (logFile.txt) do (
rem Extract the HEX value
for %%A in (%%~a) do (
set "value=!lastButOne!"
set "lastButOne=%%A"
)
rem Extract the file name
for /F %%A in ("%%e") do set "name=%%A"
echo !value! !name!
)
Here's a hybrid batch + JScript script (but still a .bat file) that will perform a regexp replace similar to NextInLine's PowerShell solution.
#if (#CodeSection == #Batch) #then
#echo off
setlocal
set "logfile=test.log"
rem // Ask JScript to parse log. On each line, %%I = hex. %%J = exe.
for /f "tokens=1*" %%I in ('cscript /nologo /e:JScript "%~f0" "%logfile%"') do (
echo %%I %%J
)
rem // End main runtime.
goto :EOF
#end
// JScript chimera portion
var fso = WSH.CreateObject('Scripting.FileSystemObject'),
log = fso.OpenTextFile(WSH.Arguments(0), 1);
while (!log.AtEndOfStream) {
var line = log.ReadLine();
WSH.Echo(line.replace(/^.+(0x[0-9a-f]+) \w:\\.+?\\(\w+\.exe).+$/i, "$1 $2"));
}
log.Close();
Course if I were in your boat I'd probably use GnuWin32 sed.
sed -r -e "s/^.*(0x[a-f0-9]+) \w:.+\\(.+\.exe).*$/\1 \2/i" test.log
Just for giggles, I ran some time tests of each fully-working solution against the O.P.'s test log file above, running each several times and getting the mode duration (the result occurring most often).
Aacini's solution: 0.013s (Excellent, but depends on narrow matches)
sed: 0.015s (simplest)
Magoo's solution: 0.034s (clever!)
my JScript hybrid: 0.034s (the best, of course)
dbenham's jrepl.bat: 0.051s (powerful Swiss army knife solution)
NextInLine's PowerShell: hanged my timer script, but felt like about a half a second after the initial painful priming of PowerShell
This is really a task that calls for regular expressions, and for regular expressions at the windows command-line you want powershell. Fortunately, you can run powershell from a batch file or the DOS command-prompt:
powershell -Command "(Get-Content 'c:\full_path_here\input.log') -replace '.+?(0x[0-9a-f]{3}) .+?\\([^\\]+\.exe).*', '$1 $2'"
This has a few parts
powershell -Command runs the entire expression in quotation marks as though it were run from the powershell command line
Get-Content is like the linux cat command - it reads the entirety of the file contents
-replace uses regular expressions to replace the content on each line of the file with the two matched expressions in parentheses

Linefeed in batch regex

I want to match all lines of the following text with FINDSTR /R
LABO_A =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = LABO)
)
)
I already tried What are the undocumented features and limitations of the Windows FINDSTR command?
Especially the "Searching across line breaks" part. But unfortunately it didn't work.
My approach is the following:
SETLOCAL
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
SETLOCAL enableDelayedExpansion
FINDSTR /R "LABO_A.=.!CR!*!LF!.*(DESCRIPTION.=.!CR!*!LF!.*(ADDRESS.=.(PROTOCOL.=.TCP)(HOST.=.host01)(PORT.=.1521))!CR!*!LF!.*(CONNECT_DATA.=!CR!*!LF!.*(SERVICE_NAME.=.LABO)!CR!*!LF!.*)!CR!*!LF!.*)" %FINDPATH%
Am I missing something? Or is the batch regex simply not powerful enough to realize this?
SOLUTION:
The approach of #dbenham let me reconsider my regex-string. So I edited it to
FINDSTR /R /C:"LABO_A =!CR!*!LF!.*(DESCRIPTION =!CR!*!LF!.*(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!CR!*!LF!.*(CONNECT_DATA =!CR!*!LF!.*(SERVICE_NAME = LABO)!CR!*!LF!.*)!CR!*!LF!.*)" %FINDPATH% > NUL
I removed some unnecessary white spaces and adapted the parameters of FINDSTR.
Now it works.
Your regex is wrong. Your source lines end immediately after the =, but the extra . in your regex is looking for an additional character after the =.
It looks to me you are using . to represent white space. I think you would be better off using actual spaces, but then you need the /C option.
The following matches the lines successfully.
#echo off
SETLOCAL
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
SETLOCAL enableDelayedExpansion
FINDSTR /R /C:"LABO_A =!CR!*!LF! *(DESCRIPTION =!CR!*!LF! *(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!CR!*!LF! *(CONNECT_DATA =!CR!*!LF! *(SERVICE_NAME = LABO)!CR!*!LF! *)!CR!*!LF! *)" test.txt
Note that even though all lines in the regex are matched, only the first line of the matching set is printed.
I suspect that the line breaks are not required in your configuration file. Here is another variation that allows for more variation in the white space.
#echo off
setlocal enableDelayedExpansion
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
set "ws=[ !cr!!lf!]*"
FINDSTR /RX /C:"LABO_A =!ws!(DESCRIPTION =!ws!(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!ws!(CONNECT_DATA =!ws!(SERVICE_NAME = LABO)!ws!)!ws!)!ws!" test.txt
I also attempted to allow white space in every place I thought possible, but that exceeded FINDSTR's maximum REGEX string length.
Essentially, batch regex isn't powerful enough. SED would be better no doubt.
Nonetheless, here's a way to detect that a sequence of lines appears in a file. It's a little restricted, but should suffice for the sequence you've nominated. It assumes that leading spaces are not significant.
#ECHO OFF
SETLOCAL enabledelayedexpansion
FOR /f "delims==" %%a IN ('set l_ 2^>nul') DO "SET %%a="
SET /a lines=0
FOR /f "tokens=*" %%a IN (q19859936.txt) DO SET /a lines+=1&SET l_!lines!=%%a
SET hits=0
SET "stop="
FOR /f "tokens=*" %%a IN (q19859936.test) DO (
SET l_0=%%~a
CALL :test
IF DEFINED stop GOTO done
)
:done
IF DEFINED stop (ECHO FOUND ) ELSE (ECHO NOT FOUND)
GOTO :EOF
:test
SET /a hits+=1
ECHO IF NOT "!l_%hits%!"=="%l_0%"
IF NOT "!l_%hits%!"=="%l_0%" SET hits=0&IF %hits%==1 (GOTO :eof) ELSE (GOTO test)
IF %hits%==%lines% SET stop=Y
GOTO :eof
[edited code 20131111T1408Z - first FOR had tokens=2]
The initial FOR ensures that variables L_* are cleared.
The file q19859936.txt is read as the line-sequence-to-be-detected data.
q19859936.test is then examined. Each line is assigned to L_0 in turn and the internal subroutine :test will check to see whether it matches the next-line-expected.
The IF NOT statement is significant - and seemingly illogical (you'd need to add the /i switch to make it case-insensitive if you so want...) When batch parses the line, %hits% is replaced by the then-current value of hits and THEN the line is executed, so hits will be reset to 0 if ever a mismatch is found. If the HITS count WAS not 1, then the test is repeated. This takes care of the case
matches line 1
matches line 2
matches line 3
matches line 1
matches line 2
matches line 3
matches line 4
matches line 5
matches line 6
where the second "line 1" is encountered when "line 4" was expected. HITS is thus changed to 0, but it WAS 4 so execution passes back to :test and the test repeated with HITS=1.
Another approach could have been to read lines into another array (say L#*) and test that L_* matched L#*, for %LINES% entries. On no match, ripple-up and assign the next line read to L#!lines! ... but I thought of that later. Probably be easier and better, too - I'll leave it as an exercise for whoever may be interested.
This will work if you are after the LABO_A reference.
It uses a helper batch file called findrepl.bat from - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat
Place findrepl.bat in the same folder as the batch file or on the path.
type "file.txt" | findrepl "^LABO_A =" /e:"^ \)"

Use subpatterns in FINDSTR

I must check the validity of a string stored in a variable, I can not use external CLI utilities (grep, awk, etc.) so I chose FINDSTR.
The string has this format (in regexp):
([1-9][0-9]*:".*"(|".*")*)
I do not know how to check the subpattern (|. "*").
Currently my code is:
((ECHO.) | (SET /P "=(11:"a"|"b"|"c")") | (FINDSTR /R /C:"^([1-9][0-9]*:".*")$"))
Regards.
Mat M is correct about the limitation of FINDSTR. The FINDSTR regex support is very primitive and non-standard. Type HELP FINDSTR or FINDSTR /? from the command line to get a brief synopsis of what is supported. For an in depth explanation, refer to What are the undocumented features and limitations of the Windows FINDSTR command?
I like Harry Johnston's comment - It would be quite easy to create a solution using VBScript or JavaScript. I think that would be a much better choice.
But, here is a native batch solution. I've incorporated the extra rule about the number of subpatterns that the OP stated in the comment to Mat M's answer.
The solution is surprisingly tricky. Special characters can cause problems when piping the ECHO output to FINDSTR because of the way pipes work. Each side of the pipe is executed in it's own CMD session. The special characters must either be quoted, escaped twice, or only exposed via delayed expansion. I chose to use delayed expansion, but the ! characters must be escaped twice to make sure the delayed expansion occurs at the correct time.
The easiest way to parse a variable number of subpatterns is to replace the delimiter with a newline and use FOR /F to iterate each subpattern.
The top half of my code is a brittle coding harness to conveniently iterate and test a set of strings. It will not work properly with any of <space> ; , = <tab> * or ? in the string. Also, the quotes must be balanced in each string.
But the more important validate routine can handle any string in the var variable.
#echo off
setlocal
set LF=^
::Above 2 blank lines are critical for creating a linefeed variable. Do not remove
set test=a
for %%S in (
"(3:"a"|"c"|"c")"
"(11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
"(4:"a"|"b"|"c")"
"(10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")"
"(3:"a"|"b"|"c""
"(3:"a"|"b^|c")"
"(3:"a"|"b"|c)"
"(3:"a"|"b"||"c")"
"(3:"a"|"b"|;|"c")"
) do (
set "var=%%~S"
call :validate
)
exit /b
:validate
setlocal enableDelayedExpansion
cmd /v:on /c echo ^^^!var^^^!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid FINDSTR fail& exit /b)
if "!var:||=!" neq "!var!" (call :invalid double pipe fail& exit /b)
for /f "delims=(:" %%N in ("!var!") do set "expectedCount=%%N"
set "str=!var:*:=!"
set "str=!str:~0,-1!"
set foundCount=0
for %%A in ("!LF!") do for /f eol^=^%LF%%LF%^ delims^= %%B in ("!str:|=%%~A!") do (
if %%B neq "%%~B" (call :invalid sub-pattern fail& exit /b)
set /a foundCount+=1
)
if %foundCount% neq %expectedCount% (call :invalid count fail& exit /b)
echo Valid: !var!
exit /b
:invalid
echo Invalid - %*: !var!
exit /b
Here are the results after running the batch script
Valid: (3:"a"|"c"|"c")
Valid: (11:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - count fail: (4:"a"|"b"|"c")
Invalid - count fail: (10:"a"|"b"|"c"|"d"|"esdf"|"f"|"g"|"h"|"i"|"j"|"k")
Invalid - FINDSTR fail: (3:"a"|"b"|"c"
Invalid - sub-pattern fail: (3:"a"|"b|c")
Invalid - sub-pattern fail: (3:"a"|"b"|c)
Invalid - double pipe fail: (3:"a"|"b"||"c")
Invalid - sub-pattern fail: (3:"a"|"b"|;|"c")
Update
The :validate routine can be simplified a bit by postponing the enablement of delayed expansion until after the CMD /V:ON pipe. This means I no longer have to worry about double escaping the ! on the left side of the pipe.
:validate
cmd /v:on /c echo !var!|findstr /r /c:"^([1-9][0-9]*:.*)$" >nul || (call :invalid FINDSTR fail& exit /b)
setlocal enableDelayedExpansion
... remainder unchanged
As far as I know, findstr is not able to group regexps, so (|".*")* is a no-no. If you know how many blocks you have and you duplicate your code like this
FINDSTR /R /C:"^([1-9][0-9]*:\"..*\"|\"..*\"|\"..*\")$"
This way, if you are sure the number of blocks is constant, having empty ones "" if required, then you can check for it.
The double quotes inside the expression are ignored unless you prefix them with \.
The ..* construct is meant to replace .+ : one or more characters.