Regex in batch script - regex

I have file.txt where I have values in format:
text1
text2
text3
Anyone have the idea to open such a file in a script and modify it to
'text1','text2','text3'
so far i have done it manually in notepad++ but i'd rather put it in some script just don't know how:
first step:
find what: ^|$
replace with: '
and
second step:
find what: [\r\n]+
replace with: ,

You want to convert a file with lines into a single-line file, where the single (original) lines are single-quoted and separated by a comma.
Not very efficient, but straightforward:
#echo off
set "infile=input.txt"
set "outfile=output.txt"
del "%outfile%" 2>nul
for /f "usebackq delims=" %%a in ("%infile%") do (
if not exist "%outfile%" (
<nul set /p "='%%a'" > "%outfile%
) else (
<nul set /p "=,'%%a'" >>"%outfile%"
)
)
>>"%outfile%" echo(
type "%outfile%"
Take line by line, if it's the first line, write the quoted value, else write a comma plus the quoted value, both with the <nul set /p trick to write without a linefeed.

Related

Why does this regular expression in cmd findstr work?

I need to create a cmd script (and somehow I did) that extracts some lines of text from a series of files and puts them in a new txt file.
The source files are like this:
%
!
! AAA
!
! ------------------------ SOME TEXT ABCDEFGHIJKLMN --------------------------
!
! BBB
! ----------------------------------------------------------------------------
! T5 PUNTA ø 6.5/9.5~ $ 63~
! ----------------------------------------------------------------------------
! T12 PUNTA ø 2.5~ $ 39~
! ----------------------------------------------------------------------------
!
! SOME OTHER TEXT
!
! 1] ABC
! 2] DEF
! 3] ...
OTHER LINE 1
OTHER LINE 2
ETC
%
And the lines I need to extract are the ones between two "! ----------------------------------------------------------------------------" so, in this case, T5 PUNTA ø 6.5/9.5~ $ 63~ and T12 PUNTA ø 2.5~ $ 39~.
I was trying some regular expressions with findstr to match a line with ! only after the relevant lines, which indicates the end of the search, until I came up (by pure chance) with an instruction that matches all and only the lines that I need (luck, I guess).
The snippet is this:
#echo off
setlocal enabledelayedexpansion
if exist output.txt ( break > output.txt )
for /r <path> %%g in (<filename>) do (
...
for /f "tokens=* delims= " %%a in (%%g) do (
echo %%a | findstr /r /c:^\!$ >nul
if errorlevel 1 (...)
) else ( echo %%a >> srcoutput.txt
...
)
)
)
Please focus on the instruction echo %%a | findstr /r /c:^\!$ >nul.
This, for a reason I don't know, matches only the lines T5 PUNTA ø 6.5/9.5~ $ 63~ and T12 PUNTA ø 2.5~ $ 39~. Which is exactly what I want, but I don't know why it works!
Can someone help me understand why this simple expression ^\!$ works?
In my (wrong) understanding, it should match only a line with a single ! (which I had escaped, because otherwise it didn't work) at the beginning and at the end.
Thank you in advance
Actually the comand line:
echo %%a | findstr /r /c:^\!$ >nul
just returns lines that contain a $-character.
This is what happens, step by step:
the command line becomes parsed to (assuming %%a holds <expanded text>):
echo <expanded text> | findstr /r /c:\!$ >nul
so the (unquoted) caret (^) disappears as it is the escape character for cmd; since \ has no special meaning, you could just omit the ^ after all;
since delayed expansion is enabled (actually unnecessarily), the !-sign disappears, because there is only one, so the command line becomes:
echo <expanded text> | findstr /r /c:\$ >nul
the \-symbol acts as an escape character (though particularly for findstr!), so the $-sign loses its special meaning in regular expression (/R) mode (namely to anchor a match to the end of a line) and is therefore treated as a literal character;
the left side of the pipe passes on the text <expanded text> (with a trailing SPACE since there is one before the |), and the right side eventually searches for literal $-characters in that text;
You would achieve the exactly same result using the following command line instead:
echo %%a | findstr /C:$ > nul
though I would rather write it as:
echo(%%a| findstr /C:"$" > nul
to avoid the trailing SPACE and to safely echo any text.
For this task I would probably go for another approach (see all the explanatory rem remarks):
#echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_ROOT=D:\Target\Path" & rem // (path to root directory)
set "_MASK=*.txt" & rem // (name or mask of files to process)
set "_SAVE=D:\Path\To\output.txt" & rem // (location of output file)
rem // Gather line-feed character:
(set ^"_LF=^
%= blank line =%
^")
rem // Gather carriage-return character:
for /F %%C in ('copy /Z "%~f0" nul') do set "_CR=%%C"
rem // Open output file only once and write to it:
> "%_SAVE%" (
rem // Find matching files and loop through them:
for /R "%_ROOT%" %%F in ("%_MASK%") do (
rem // Check for file existence (only necessary when a dedicated name is given):
if exist "%%~F" (
rem // Store path of current file:
set "FILE=%%~F"
rem // Toggle delayed expansion to avoid troubles with `!`:
setlocal EnableDelayedExpansion
rem // Remove remaining quotes (only necessary when a dedicated name is given):
set "FILE=!FILE:"=!
rem /* Do a multi-line search by `findstr`, which only returns the first line;
rem the searched string is:
rem # anchored to the beginning of a line,
rem # an `!`, a space and a `T`, then
rem # some arbitrary text (without line-breaks), then
rem # a line-break, then another `!` and a space, then
rem # a sequence of one or more `-`,
rem # anchored to the end of a line;
rem only the portion before the explicit line-break is then returned: */
findstr /R /C:"^^^! T.*~!_CR!!_LF!^! --*$" "!FILE!"
endlocal
)
)
)
endlocal
exit /B
This does not exactly search for lines between ! --- etc., but it searches for two adjacent lines where the first one begins with ! + SPACE + T and ends with ~, and the second one consists of ! + SPACE + a sequence of one or more -.
If the input file contains Unix-/Linux-style line-breaks rather than DOS-/Windows-style ones, replace !_CR!!_LF! in the findstr search string in the script by !_LF!.
I have decided to post this as a potential method of achieving your intented goal. It uses a different methodology from the currently accepted answer, the idea is to retrieve the ! ----etc. line numbers, then determine if the lines between any two of them have the required content. This means that it isn't looking to match specific content between those lines and should therefore work, whichever characters your strings are formed using.
#Echo Off
SetLocal EnableExtensions
Set "InFile=somefile.ext"
Set "OutFile=someoutfile.ext"
Set "$#="&For /F "Delims=:" %%G In (
'"%__AppDir__%findstr.exe /RNC:"^! --*$" "%InFile%""')Do (
Set /A _2=%%G-2&Call Set "$#= %%G %%$#%%"&Call Set "$2= %%_2%% %%$2%%")
If Not Defined $# Echo No Matches&%__AppDir__%timeout.exe -3&Exit /B
SetLocal EnableDelayedExpansion
For %%G In (%$2%)Do If "!$#: %%G =!"=="%$#%" Set "$2=!$2: %%G =!"
For %%G In (%$2%)Do Set /A _1=%%G+1&Set "$1= !_1! !$1!"
EndLocal&(For %%G In (%$1%)Do For /F "Tokens=1*Delims=]" %%H In (
'%__AppDir__%find.exe /V /N "" "%InFile%"^
^|%__AppDir__%findstr.exe "^\[%%G\]"')Do Echo %%I)>"%OutFile%"
GoTo :EOF
Just change your input file and output file names on lines 3 and 4, as required.
Please note that I'm unble to test this, so it may not work, or could possibly work in the wrong way. Please test it on files with various similar formats, before using it for real!

FOR /F is not working for Comma separated double quoted string in batch file

I am having trouble splitting a line which contains double quoted strings separated by comma. String looks something like:
"DevLoc","/Root/Docs/srvr/temp test","171.118.108.22","/Results/data/Procesos Batch","C:\DataExport\ExportTool\Winsock Folder DB","C:\Export\ExportTool\Temp Folder","22"
Some strings values contain spaces. I want to store each double quoted string into a variable. Can anyone please help
Below is my batch script. variable 'EnvDetails' contains above line which need to be parsed.
FOR /F "tokens=1,2,3,4,5,6,7 delims=," %%i in ("%EnvDetails%") do (
SET TEMPS=%%i
SET Path=%%j
SET host=%%k
SET scriptPath=%%l
SET WINSP_HOME=%%m
SET PUTTY_HOME=%%n
SET portNum=%%o
#echo %TEMPS% > temp.txt
#echo %MPath% >> temp.txt
#echo %host% >> temp.txt
#echo %scriptPath% >> temp.txt
#echo %WINSCP_HOME% >> temp.txt
#echo %PUTTY_HOME% >> temp.txt
#echo %portNum% >> temp.txt
)
Part of the problem is that you're attempting to retrieve your variable values within the same parenthetical code block as they're set. Because the cmd interpreter replaces variables with their values before the commands are executed, you're basically echoing empty values to temp.txt. To wait until the variables have been defined before expanding them, you'd need delayed expansion.
But you're really making this more complicated than it needs to be. What else are you doing with the variables, besides echoing them out to a text file?
What you should do instead is use a basic for loop rather than for /f. for without any switches evaluates lines similar to CSV parsers anyway, splitting on commas, semicolons, unquoted spaces and tabs, and so forth.
Given that you're basically splitting a line on commas and echoing each token, in order, to a text file, one token per line, you can simplify your code quite a bit like this:
#echo off
>temp.txt (
for %%I in (%EnvDetails%) do echo %%~I
)
If I'm mistaken and you do indeed intend to perform further processing on the data; if you do actually need the variables, then this example demonstrates delayed expansion:
#echo off
setlocal
set EnvDetails="DevLoc","/Root/Docs/srvr/temp test","171.118.108.22","/Results/data/Procesos Batch","C:\DataExport\ExportTool\Winsock Folder DB","C:\Export\ExportTool\Temp Folder","22"
>temp.txt (
FOR /F "tokens=1-7 delims=," %%i in ("%EnvDetails%") do (
SET "TEMPS=%%~i"
SET "MPath=%%~j"
SET "host=%%~k"
SET "scriptPath=%%~l"
SET "WINSCP_HOME=%%~m"
SET "PUTTY_HOME=%%~n"
SET "portNum=%%~o"
setlocal enabledelayedexpansion
echo !TEMPS!
echo !MPath!
echo !host!
echo !scriptPath!
echo !WINSCP_HOME!
echo !PUTTY_HOME!
echo !portNum!
endlocal
)
)
Final note: The tilde notation of %%~i, %%~j, etc, strips surrounding quotation marks from each token. If you intentionally wish to preserve the quotation marks as part of the variable values, remove the tildes.

Linefeed in batch regex

I want to match all lines of the following text with FINDSTR /R
LABO_A =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = LABO)
)
)
I already tried What are the undocumented features and limitations of the Windows FINDSTR command?
Especially the "Searching across line breaks" part. But unfortunately it didn't work.
My approach is the following:
SETLOCAL
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
SETLOCAL enableDelayedExpansion
FINDSTR /R "LABO_A.=.!CR!*!LF!.*(DESCRIPTION.=.!CR!*!LF!.*(ADDRESS.=.(PROTOCOL.=.TCP)(HOST.=.host01)(PORT.=.1521))!CR!*!LF!.*(CONNECT_DATA.=!CR!*!LF!.*(SERVICE_NAME.=.LABO)!CR!*!LF!.*)!CR!*!LF!.*)" %FINDPATH%
Am I missing something? Or is the batch regex simply not powerful enough to realize this?
SOLUTION:
The approach of #dbenham let me reconsider my regex-string. So I edited it to
FINDSTR /R /C:"LABO_A =!CR!*!LF!.*(DESCRIPTION =!CR!*!LF!.*(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!CR!*!LF!.*(CONNECT_DATA =!CR!*!LF!.*(SERVICE_NAME = LABO)!CR!*!LF!.*)!CR!*!LF!.*)" %FINDPATH% > NUL
I removed some unnecessary white spaces and adapted the parameters of FINDSTR.
Now it works.
Your regex is wrong. Your source lines end immediately after the =, but the extra . in your regex is looking for an additional character after the =.
It looks to me you are using . to represent white space. I think you would be better off using actual spaces, but then you need the /C option.
The following matches the lines successfully.
#echo off
SETLOCAL
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
SETLOCAL enableDelayedExpansion
FINDSTR /R /C:"LABO_A =!CR!*!LF! *(DESCRIPTION =!CR!*!LF! *(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!CR!*!LF! *(CONNECT_DATA =!CR!*!LF! *(SERVICE_NAME = LABO)!CR!*!LF! *)!CR!*!LF! *)" test.txt
Note that even though all lines in the regex are matched, only the first line of the matching set is printed.
I suspect that the line breaks are not required in your configuration file. Here is another variation that allows for more variation in the white space.
#echo off
setlocal enableDelayedExpansion
set LF=^
FOR /F %%A IN ('COPY /Z "%~dpf0" NUL') DO SET "CR=%%A"
set "ws=[ !cr!!lf!]*"
FINDSTR /RX /C:"LABO_A =!ws!(DESCRIPTION =!ws!(ADDRESS = (PROTOCOL = TCP)(HOST = host01)(PORT = 1521))!ws!(CONNECT_DATA =!ws!(SERVICE_NAME = LABO)!ws!)!ws!)!ws!" test.txt
I also attempted to allow white space in every place I thought possible, but that exceeded FINDSTR's maximum REGEX string length.
Essentially, batch regex isn't powerful enough. SED would be better no doubt.
Nonetheless, here's a way to detect that a sequence of lines appears in a file. It's a little restricted, but should suffice for the sequence you've nominated. It assumes that leading spaces are not significant.
#ECHO OFF
SETLOCAL enabledelayedexpansion
FOR /f "delims==" %%a IN ('set l_ 2^>nul') DO "SET %%a="
SET /a lines=0
FOR /f "tokens=*" %%a IN (q19859936.txt) DO SET /a lines+=1&SET l_!lines!=%%a
SET hits=0
SET "stop="
FOR /f "tokens=*" %%a IN (q19859936.test) DO (
SET l_0=%%~a
CALL :test
IF DEFINED stop GOTO done
)
:done
IF DEFINED stop (ECHO FOUND ) ELSE (ECHO NOT FOUND)
GOTO :EOF
:test
SET /a hits+=1
ECHO IF NOT "!l_%hits%!"=="%l_0%"
IF NOT "!l_%hits%!"=="%l_0%" SET hits=0&IF %hits%==1 (GOTO :eof) ELSE (GOTO test)
IF %hits%==%lines% SET stop=Y
GOTO :eof
[edited code 20131111T1408Z - first FOR had tokens=2]
The initial FOR ensures that variables L_* are cleared.
The file q19859936.txt is read as the line-sequence-to-be-detected data.
q19859936.test is then examined. Each line is assigned to L_0 in turn and the internal subroutine :test will check to see whether it matches the next-line-expected.
The IF NOT statement is significant - and seemingly illogical (you'd need to add the /i switch to make it case-insensitive if you so want...) When batch parses the line, %hits% is replaced by the then-current value of hits and THEN the line is executed, so hits will be reset to 0 if ever a mismatch is found. If the HITS count WAS not 1, then the test is repeated. This takes care of the case
matches line 1
matches line 2
matches line 3
matches line 1
matches line 2
matches line 3
matches line 4
matches line 5
matches line 6
where the second "line 1" is encountered when "line 4" was expected. HITS is thus changed to 0, but it WAS 4 so execution passes back to :test and the test repeated with HITS=1.
Another approach could have been to read lines into another array (say L#*) and test that L_* matched L#*, for %LINES% entries. On no match, ripple-up and assign the next line read to L#!lines! ... but I thought of that later. Probably be easier and better, too - I'll leave it as an exercise for whoever may be interested.
This will work if you are after the LABO_A reference.
It uses a helper batch file called findrepl.bat from - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat
Place findrepl.bat in the same folder as the batch file or on the path.
type "file.txt" | findrepl "^LABO_A =" /e:"^ \)"

Extract number from string in batch file

From a batch file I want to extract the number 653456 from the following string:
C:\Users\testing\AppData\Local\Test\abc123\643456\VSALBT81_COM
The number will change, however it will always be just digits.
My current theory is to search for something that fits \alldigits\, then replace the two \s with white space, but I can’t quite get it.
Assuming the number is always the parent folder (the folder before the end):
#echo off
set "str=C:\Users\testing\AppData\Local\Test\abc123\643456\VSALBT81_COM"
for %%F in ("%str%\..") do set "number=%%~nxF"
EDIT - Code sample adapted to correct errors shown in comments
set d=C:\Users\testing\AppData\Local\Test\abc123\643456\VSALBT81_COM
for %%f in ("%d:\=" "%") do for /f %%n in ('echo %%f^|findstr /b /e /r "\"[0-9]*\""') do (
echo %%~n
)
Just precede the path with a quote, split the path, replacing each backslash with a quote a space and a quote and append a quote (so we have a list of elements to iterate), and for each part check if it is formed only by numbers
#echo off
setlocal EnableDelayedExpansion
set "string=C:\Users\testing\AppData\Local\Test\abc123\643456\VSALBT81_COM"
for /L %%d in (0,1,9) do set "string=!string:\%%d=\ %%d!"
for /F "tokens=2" %%a in ("%string%") do for /F "delims=\" %%b in ("%%a") do echo Number: [%%b]
This uses a helper batch file called repl.bat from - https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat
#echo off
set "string=C:\Users\testing\AppData\Local\Test\abc123\643456\VSALBT81_COM"
echo "%string%"|repl ".*\\([0-9]*)\\.*" "$1"
Here is how I striped numbers from a string in batch (not a file path, should be generically working for a "string")
#ECHO OFF
::set mystring=Microsoft Office 64-bit Components 2013
set mystring=Microsoft 365 Apps for enterprise - en-us
echo mystring = %mystring%
for /f "tokens=1-20 delims=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!##$&*()-= " %%a in ("%mystring%") do (
IF %%a == 64 (
set ONum=%%b
GoTo varset
)
IF %%a == 32 (
set ONum=%%b
GoTo varset
)
set ONum=%%a
)
:varset
echo numfromalphanumstr = %numfromalphanumstr%
pause
https://www.dostips.com/forum/viewtopic.php?t=3499
https://superuser.com/questions/1065531/filter-only-numbers-0-9-in-output-in-classic-windows-cmd
Extract number from string in batch file
How to extract number from string in BATCH

Problem parsing a list in batch

I am trying to extract tokens from a list of strings using a batch script, but for some reason it ignores my string if it contains an asterisk.
An example to illustrate this problem is as follows:
#echo off
set mylist="test1a,test1b"
set mylist="test2a,test2b*" %mylist%
set mylist="test3a,test3b" %mylist%
echo %mylist%
for %%a in ( %mylist% ) do (
for /F "tokens=1,2 delims=," %%i in ( %%a ) do (
echo %%i
echo %%j
)
)
I would expect this to print out all six tokens but instead it only prints test3a, test3b, test1a, and test1b, like it is ignoring the second string completely.
The placement of the asterisk within the second string doesn't seem to matter, but if I remove it everything works as I expect.
Does anyone know what is going on here?
Got it. The interpreter is trying to match a filename. If you change "test2a,test2b*" by pp.* and create a file named pp.txt (same dir) your script will proces the contents of pp.txt