Extract value between two strings in batch using a CSV file - regex

I have a CSV file which contains multiple columns. One of these columns is HTML content. My first step is to search for <<< and replace it with <<< - secondly I'm searching for >>> and replace it with >>>.
My goal is to create an array in batch. For this procedure I would like to search for all elements which look like the scheme above <<<VALUE>>> and create an array.
I found the following code but it doesn't work for me...
for /F "tokens=1-2 delims=<<<>>>-" %%a in (temp.csv) do (#echo %%a %%b)
Any suggestions?
UPDATE:
I would like to use regular expressions now, but this doesn't work either...
for /f %%x in ("temp.csv") do (
echo %%x | findstr /r "^<^<^<^(\.\?\*^)^>^>^>"
)
...any help? :)
kind regards,
markus

I'm using now the simple tool batchRex.exe from administrator.de with an regular expression. I'm using the pattern <<<(.*?)>>> to get my values and save them to an .txt file. Afterwards I read line by line from this file into an array to work further on - just in case someone has the same problem ;-)
kind regards
markus

Related

Batch Script: Extract number from a string

I'm writing a batch script where I need to extract numbers from a string (which indicates the version of a file), so that I can compare it with another number.
Below is my script written so far
:: Over here I'm trying to extract number from the string , this isn't working
for %%F in ("!name!\.." ) do (
::set "number=%%~nxF" |findstr /b /e /r "\"[0-9]*\""
set res=%name:findstr /r "^[1-9][0-9]*$"
echo !res!
)
In the last two for loop I have implemented the extract number logic, but it just prints The system cannot find the drive specified.
If anybody could help me with this issue that would be a great help.

Using findstr to pass to a variable

I've got some files I'm running with a batch file that loops through everyone in a directory and dumps certain data into a sql table. I'm adding in a time stamp that I'm passing into a variable and trying to add to the sql table using sqlcmd the only problem is that to add in all relevant columns for that entry, I need to pass the names of the files that are being added to the sql table.
Okay here's the catch... the names being added to the sql table aren't the actual file names but database names that can be found in each of these xml files (close enough to xml). So I know where that is and every single one looks something like this abcdir (rest of the name) where the abcdir is a string that starts every single database.
So I thought I could use the findstr function to get the database name but I have very little experience with regex and I'd like to be able to parse out the tags and be left with just name=abcdir (rest of the name)
** * I didn't think any of my code would really be necessary since I'm just asking questions about a particular command but if thats not the case then let me know and I'll post it* **
EDIT: Okay so each file will have something like this if opened in notepad.
<Name>ABCDir Sample Name</Name>
or
<Name>ABCDir Sample Name2</Name>
and I'd like ABCDir Sample Name to be passed to a batch variable. So I thought to use findstr.
I have very little grasp of regex but I've tried using findstr >ABCDir[A-Za-z] \path\filename.ext
As I commented above, findstr (or find) will let you scrape lines containing <Name> from a text file, and for /f "delims=<>" will let you split those lines into substrings. With findstr /n, you're looking for "tokens=3 delims=<>" to get the string between <Name> and </Name>.
Try this:
#echo off
setlocal
set "file=temp.txt"
for /f "tokens=3 delims=<>" %%I in ('findstr /n /i "<Name>" "%file%"') do (
#echo %%I
)
I'm using /n with findstr to insert line numbers. The numbers aren't needed, but the switch ensures there's always a token before <Name>. Therefore, the string you want is always tokens=3 regardless of whether the line is indented or not. Otherwise, your string could be token 3 if indented, or token 2 if not. This is easier than trying to determine whether the tags are indented or not.

Writing .txt files from lines 1 to i

I am very close to my answer, but I cant seem to find it.
I am using the Findstr function in a batch file to narrow now an entire directory to just one file.
cd ...
findstr /s /m "Desktop" *class.asasm >results1.txt
findstr /m /f:results1.txt "Production" *class.asasm >results2.txt
findstr /n /f:results2.txt "Capabilities" *class.asasm >results3.txt
TASK 1: I need to figure out a way to make findstr search backwards for a fourth string from the line number the third line was found on
TASK 2: I need to write lines 1-the one we arrive at of the file in results2.txt
the insert a .txt file. Then write the rest of the original lines.
I am writing an application in VB.Net with Visual Studios and I am having a difficult time figuring out how to complete this process. I am currently having better luch with having the application run batch files that are written within the application.
The correct solution is to find a tool that does this properly. batch/CMD does not.
Here's a script that tells you the line numbers of the 3rd and 4th match. It's probably not exactly what you want, but it is a demonstration of how one can effectively work with line numbers.
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET FILE=TestFile.txt
SET _LINENO=1
SET _MATCHNO=0
SET _THIRDLINENUM=
SET _FOURTHLINENUM=
FOR /F %%l IN (%FILE%) DO (
ECHO %%l | FINDSTR "Target" %_TMP% >NUL
IF NOT ERRORLEVEL 1 (
SET /A _MATCHNO=!_MATCHNO!+1
IF !_MATCHNO!==3 SET _THIRDLINENUM=!_LINENO!
IF !_MATCHNO!==4 SET _FOURTHLINENUM=!_LINENO!
)
SET /A _LINENO=!_LINENO!+1
)
#ECHO %_THIRDLINENUM% : %_FOURTHLINENUM%
Here's what's in TestFile.txt
abcdefg
bcdefgh
Target 1
cdefghi
defghij
fghijkl
Target 2
ghijklm
hijklmn
ijklmno
jklmnop
klmnopq
lmnopqr
mnopqrs
Target 3
nopqrst
Target 4
opqrstu
pqrstuv
qrstuvw
rstuvwx
stuvwxy
tuvwxyz
If you insist on using batch/CMD (and I sometimes do when nothing else is available), and you need to get the text on line #n (otherwise, head and tail would do just fine), you could produce a similar loop but replace the code from FINDSTR down to the end of the IF statement with something that compares _LINENO with some other variable, ECHO'ing the line if it is between the two values. I don't know if IF supports logical operators, so you may have to nest the IF statements, like
IF !_LINENO! GEQ %START_LINE% IF !_LINENO! LEQ %END_LINE% #ECHO %%l
assuming you need this (from your first comment):
I still have not found a way to search starting at line xx rather than 1 or to search in reverse order
you can try this (from the command line):
for /r %i in ("file pattern") do #more "%~i" +starting_line |findstr "search string"
for /r = recursively (if you mean really reverse, please explain)
"file pattern" = files to find, eg. "*class.asasm"
starting_line = search starting line, eg. 7 (more +6)
"search string" = your search pattern, eg. "Desktop"
OR search "Desktop Production Capabilities"
AND search |findstr "Desktop"|findstr "Production"|findstr "Capabilities"

RegEx in Batch File

Hey I'm trying to create a function that parses a string passed via a browser protocol. It's a "callto://" protocol and it is in this format: "callto://5551234567/" with the persons phone number inside there. I need to extract the number and pass it to another program that dials the number. The syntax for that other program is like this: "CallClerk.exe dial=5551234567=".
I'm a beginner to batch however, and can't figure out exactly what to do. Here's my current code:
#echo off
set var=%1
set number=theirphone
FindStr /R "callto://(..........)/" %var% > %number%
start C:\Program Files (x86)\CallClerk\CallClerk.exe dial=%number%=
Exit /B
Thanks for the help!
#echo off
FOR /f "tokens=2 delims=/" %%i IN ('echo %~1') DO start "" "C:\Program Files (x86)\CallClerk\CallClerk.exe" dial=%%i=
Exit /B
should work for you (untested) - assuming your input parameter is callto://5551234567/
Note the use of quoting - the .exe needs to be quoted since it contains a space in the path. The extra pair of quotes in the window-name. If you like, you could replace that pair with "Calling %%i". This parameter is optional, but inserting it ensures that START doesn't get confused between window-title, executable-name and parameter-to-executable.
This works to extract numbers from a string.
It uses two for loops, the first one gathers all the non-numeric characters and they are used as delimiters in the second for loop to gather the numerics and dial the number.
Strings of variable lengths can be handled, as long as all numbers are used in the desired telephone number.
If you want to keep the + as a valid telephone character then include it in the first for command in the delims with the numbers.
#echo off
set "var=callto://5551234567/"
for /f "delims=0123456789" %%a in ("%var%") do set "delims=%%a"
for /f "delims=%delims%" %%a in ("%var%") do (
start "" "C:\Program Files (x86)\CallClerk\CallClerk.exe" dial=%%a=
)
You should be able to use a regex along the lines of (?<=callto:\/\/)[\d]+(?=\/) to grab the number itself. This uses a positive look ahead and look behind to make sure you are matching at least one number that is preceded by the callto:// and followed by a /.
If you left it as something like callto:\/\/[\d]+\/, then it is matching the entire string and will return back with the callto text included. If you are intending to pass just the numbers along to the next part of you code, extract them using the look ahead to guarantee the before and after conditions are met.
I did a quick test using the strings you used in your example. You can see the regex in action here.

Regular Expression with findstr (ms-dos)

I am trying to use ms-dos command findstr to find a string and eliminate it from the file.
At the moment I can find an explicit string but I am really struggling with regular expressions.
The file looks something like the below:
PLs - TULIP Report
Output_Format, PLS - TULIP REPORT
NUMLINES, 110907
VARIABLE_TYPES,T1,T8,I,T9,T2,N,N,N
[[data below]]
The file is an export from some system and annoyingly has that header in it - so I would like to clean it before using SQL Loader to bring it into an Oracle database.
There's more than just the one file and all would have the same type of header but ever so slightly different in every file.
Although I am happy to first remove the first 2 lines using hardcoded values, e.g.:
findstr /v "PLs - TULIP Report" "c:\myfiles\file1.PRO" > "c:\myfiles\file1.csv"</code><br>
findstr /v "Output_Format, PLS - TULIP REPORT" "c:\myfiles\file1.csv" > "c:\myfiles\file2.csv"
(note how I do that in 2 steps - any suggestions to make this happen in a single step, would be massivelly appreciated)
The third line is mnore complicated for me, it will always be in that format:
NUMLINES, 110907
except that the number at the end would be different for each file. So how do I get to find this entire line using a regular expression? I have tried:
findstr /v /b /r "\D+ \s+ \d+"
but without any luck.
FYI, the data in [[data below]] looks like
*,"00000161",456823,"017896532","FU",23.95,3.34,20.61
etc ..
Obviously, I do not want to modify the data area.
I hope the above makes sense,
Thanks
You must exclude single lines, findstr cannot match multiple lines. Just separate the different regexes with a space
findstr /r /b /v "NUMLINES PLs Output_Format" *.txt
^regex1 ^2 ^3
Specifying /b allows you to find matches only at the beginning of the lines and /v excludes those lines.
EDIT:
Of course the usage is
findstr /r /b /v "NUMLINES PLs Output_Format" yourfile > yourtarget
And in yourtarget you will find the data of yourfile except the lines excluded by the regex.
EDIT 2:
Based on your comments you need just to add VARIABLE_TYPES to your regex making it
findstr /r /b /v "NUMLINES PLs Output_Format VARIABLE_TYPES" yourfile > yourtarget
This is the way to complete the whole operation in one single instruction.
Here is a one liner using regex that will exclude all four lines. (I used line continuation so that the code looks better.) Each line must match exactly. I allow for each line to end in any number of spaces because I wasn't sure of your format. Note - FINDSTR regex support is very limited and non-standard. There are many other FINDSTR quirks and bugs. See What are the undocumented features and limitations of the Windows FINDSTR command? for more info.
findstr /vrx /c:"PLs - TULIP Report *"^
/c:"Output_Format, PLS - TULIP REPORT *"^
/c:"NUMLINES, *[0-9]* *"^
/c:"VARIABLE_TYPES,T1,T8,I,T9,T2,N,N,N *"^
"c:\myfiles\file1.PRO" >"c:\myfiles\file1.csv"
If all you need to do is skip the first 4 lines, then you normally should be able to use MORE. But there are some circumstances with large files where MORE can hang, but I can't remember the specifics. Also MORE will convert tabs into a series of spaces.
more +4 "c:\myfiles\file1.PRO" >"c:\myfiles\file1.csv"
Another option is to use a FOR /F loop. The FOR /F skips empty lines, but I don't think that is a concern for you.
>"c:\myfiles\file1.csv" (
for "usebackq skip=4 delims=" %%A in ("c:\myfiles\file1.PRO") do echo(%%A
)
If any of your data can begin with a ; then the code gets a bit uglier. You would then want to disable the EOL option by setting it to a line feed character.
set LF=^
::above 2 blank lines are critical - do not remove
>"c:\myfiles\file1.csv" (
for usebackq^ skip^=4^ eol^=^%LF%%LF%^ delims^= %%A in ("c:\myfiles\file1.PRO") do echo(%%A
)