FINDSTR command in a batch file to display variable output - regex

I was wondering if anyone could help me perhaps write a relatively simple batch file command that I can use to base the rest of my batch file off of. I work in a support group that supports many products and there is one in specific that I am the only one that understands the XML config files. What I am trying to do is the following:
Here is an excerpt from the config file:
<!-- FILEDROP SETTINGS -->
<!-- metadataType = X - XML; F - Flat file; E - embedded in filename; B - embedded PDF with bookmarks -->
<add key="metadataType" value="E" />
What I am trying to do is to create some GUI (batch file) that a user can run. Upon running the batch file, a user would be prompted to enter the name of the file to search for. In this example, the file name is importer.config. I want the batch file to search for the string
<add key="metadataType" value="E" />
I would like for it to take the value in between the quotation marks "E" in this case and output something to the DOS window to let the user know, that this component uses Metadata embedded in file name. Of course, if the value is F, this component uses metadata from a flat file....i am just trying to spell it out to the user in laments turn instead of having the user search through this large large config file because they never seem to know where to look.
Anyone that can help would be a huge huge help as this would be a basis for the rest of my code to display values to users. I have thought that using regular expressions and FINDSTR may be the best but i have tried so many things and cant get it working
something like: (?<=<add key="metadataType" value=")\w
This would look for the string i need and then take the value that follows (E in this case)...I just dont know how to write out where to store this or how to output it to something different....any help would be appreciated!

The regex support for FINDSTR is severely limited, and what is there does not work like what you are used to in traditional implementations. Read the documentation by typing help findstr or findstr /? from the command window. I also recommend reading What are the undocumented features and limitations of the Windows FINDSTR command?. The description of the regex oddities are toward the bottom of the answer.
You could download and use a Windows version of something like awk, grep, sed or perl. Or you could use VBScript or JScript.
Parsing XML with native batch is a nightmare. You could try something like the following. It is not very robust, but it will work in most cases:
#echo off
setlocal enableDelayedExpansion
for /f "delims=" %%A in ('findstr /rc:"\<add key=[\"\"]metadataType[\"\"] value=[\"\"]" "fileName.txt"') do set "ln=%%A"
set ^"ln=!ln:*"metadataType" value=!"
for /f delims^=^=^" %%A in ("!ln!") do set value=%%A
echo value=!value!

Related

Find string in file with regex in CMD

Hi i have a xml file and need to find a specific string in it.
The string i search for is a value for a xml tag. Then i need to set it to a variable. How do i do it in CMD?
We can assume that file looks something like this
<rootElement>
<childElement.version>1.0.3</childElement.version>
</rootElement>
i need to extract "1.0.3" and set it to a variable.
#echo off
for /f "tokens=3 delims=<>" %%a in ('find "childElement.version" file.xml') do set "var=%%a"
echo %var%
Note: this works with your example, but surely not for every xml file. Batch is not the right tool for xml.

Using findstr to pass to a variable

I've got some files I'm running with a batch file that loops through everyone in a directory and dumps certain data into a sql table. I'm adding in a time stamp that I'm passing into a variable and trying to add to the sql table using sqlcmd the only problem is that to add in all relevant columns for that entry, I need to pass the names of the files that are being added to the sql table.
Okay here's the catch... the names being added to the sql table aren't the actual file names but database names that can be found in each of these xml files (close enough to xml). So I know where that is and every single one looks something like this abcdir (rest of the name) where the abcdir is a string that starts every single database.
So I thought I could use the findstr function to get the database name but I have very little experience with regex and I'd like to be able to parse out the tags and be left with just name=abcdir (rest of the name)
** * I didn't think any of my code would really be necessary since I'm just asking questions about a particular command but if thats not the case then let me know and I'll post it* **
EDIT: Okay so each file will have something like this if opened in notepad.
<Name>ABCDir Sample Name</Name>
or
<Name>ABCDir Sample Name2</Name>
and I'd like ABCDir Sample Name to be passed to a batch variable. So I thought to use findstr.
I have very little grasp of regex but I've tried using findstr >ABCDir[A-Za-z] \path\filename.ext
As I commented above, findstr (or find) will let you scrape lines containing <Name> from a text file, and for /f "delims=<>" will let you split those lines into substrings. With findstr /n, you're looking for "tokens=3 delims=<>" to get the string between <Name> and </Name>.
Try this:
#echo off
setlocal
set "file=temp.txt"
for /f "tokens=3 delims=<>" %%I in ('findstr /n /i "<Name>" "%file%"') do (
#echo %%I
)
I'm using /n with findstr to insert line numbers. The numbers aren't needed, but the switch ensures there's always a token before <Name>. Therefore, the string you want is always tokens=3 regardless of whether the line is indented or not. Otherwise, your string could be token 3 if indented, or token 2 if not. This is easier than trying to determine whether the tags are indented or not.

Vim how to add key binding that accepts input

I'm pretty new to vim but I'm trying to create some C++ IDE.
I'm used to ctrl f (or ctrl-shift-f) to help me find in files.. so I saw a plugin I liked called pss.
I'd like to replace ctrl-f with something that would accept input but still add parameters of it's own(*.cpp for example)..
I was thinking of something like:
how can I do it correctly?
noremap <C-f>:Pss $1 *.cpp
Since you have editing capabilities in the command-line, a commonly used approach just builds an incomplete mapping. You can position the cursor in the edit location, like this:
:noremap <C-f> :Pss *.cpp<Left><Left><Left><Left><Left><Left>
After triggering the mapping (via <C-f>), you can insert the search pattern, and then trigger the search via <CR>.
Alternative
You can query for input via the input() function; its result can be inserted into the command-line via :execute:
:noremap <C-f> :execute 'Pss' input('Pattern: ') '*.cpp'<CR>
The default search hot-key is faster than the "Ctrl-F", you may type "/" in the normal mode, and continue input with your keywords. Once you need to search in many files, grep is your friend.

Applescript to extract the Digital Object Identifier (DOI) from a PDF file

I looked for an applescript to extract the DOI from a PDF file, but could not find it. There is enough information available on the actual format of the DOI (i.e. the regular expression), but how could I use this to get the identifier from the PDF file?
(It would be no problem if some external program were used, such as Hazel.)
If you're ok with using an app, I'd recommend Skim. Good AppleScript support. I'd probably structure it like this (especially if the document might be large):
set DOIFound to false
tell application "Skim"
set pp to pages of document 1
repeat with p in pp
set t to text of p
--look for DOI and set DOIFound to true
if DOIFound then exit repeat--if it's not found then use url?
end repeat
end tell
I'm assuming a DOI would always exist on one page (not spread out to between two). Looks like they are invariably (?) on the first page of an article, which would make this quick of course, even with a large doc.
[edit]
Another way would be to get the Xpdf OSX binaries from http://www.foolabs.com/xpdf/download.html and use pdftotext in the command line (just tested this; it works well) and parse the text using AppleScript. If you want to stay in AppleScript, you can do something like:
do shell script "path/to/pdftotext 'path/to/pdf/file.pdf'"
which would output a file in the same directory with a txt file extension -- you parse that for DOI.
Have you tried it with pdfgrep? It works really well in commmandline
pdfgrep -n --max-count 1 --include "*.pdf" "DOI"
i have no idea to build an apple script though, but i would be interested in one also. so that if i drop a pdf into that folder it just automatically extracts the DOI and renames the file with the DOI in the filename.

Can Notepad++ save out search results to a text file?

I need to do quite a few regular expression search/replaces throughout hundreds and hundreds of static files. I'm looking to build an audit trail so I at least know what files were touched by what searches/replaces.
I can do my regular expression searches in Notepad++ and it gives me file names/paths and number of hits in each file. It also gives me the line #s which I don't really care that much about.
What I really want is a separate text file of the file names/paths. The # of hits in each file would be a nice addition, but really it's just a list of file names/paths that I'm after.
In Notepad++'s search results pane, I can do a right click and copy, but that includes all the line #s and code which is just too much noise, especially when you're getting hundreds of matches.
Anyone know how I can get these results to just the file name/paths? I'm after something like:
/about/foo.html
/about/bar.html
/faq/2012/awesome.html
/faq/2013/awesomer.html
/foo/bar/baz/wee.html
etc.
Then I can name that file regex_whatever_search.txt and at the top of it include the regex used for the search and replace. Below that, I've got my list of files it touched.
UPDATE What looks like the easiest thing to do (at least that I've found) is to just copy all the search results into a new text file and run the following regex:
^\tLine.+$
And replace that with an empty string. That'll give you just the file path and hit counts with a lot of empty space between each entry. Then run the following regex:
\s+\n
And replace with:
\n
That'll strip out all the unwanted empty space and you'll be left with a nice list.
maybe you need power of unix tools
assume you have GNUWin32 installed in c:\tools\gnuwin32
than if you have replace.bat file with that content:
#echo off
set BIN=c:\tools\gnuwin32\bin
set WHAT=%1
set TOWHAT=%2
set MASK=%3
rem Removing quotes
SET WHAT=###%WHAT%###
SET WHAT=%WHAT:"###=%
SET WHAT=%WHAT:###"=%
SET WHAT=%WHAT:###=%
SET TOWHAT=###%TOWHAT%###
SET TOWHAT=%TOWHAT:"###=%
SET TOWHAT=%TOWHAT:###"=%
SET TOWHAT=%TOWHAT:###=%
SET MASK=###%MASK%###
SET MASK=%MASK:"###=%
SET MASK=%MASK:###"=%
SET MASK=%MASK:###=%
echo %WHAT% replaces to %TOWHAT%
rem printing matching files
%BIN%\grep -r -c "%WHAT%" %MASK%
rem actual replace
%BIN%\find %MASK% -type f -exec %BIN%\sed -i "s/%WHAT%/%TOWHAT%/g" {} +
you can do regex replace in masked files recursively with output you required
replace "using System.Windows" "using Nothing" *.cs
The regulat expression I use for this kind of problem is
^\tLine.[0-9]*:.
And it works for me
This works well if you have Excel available and want to avoid using regular expressions:
Ctrl+A to select all the results
drag & drop the selected results to Excel
Create a Filter on the 1st row
Filter out the lines that have "(Blank)" on the 1st column
Select the remaining lines (i.e. the lines with the filenames) and copy/paste them to another sheet or any wanted destination
You could also Ctrl+A, Ctrl+C the search results, then use the Paste Option "Use Text Import Wizard" in Excel, say that the data is "Fixed width" and put one single break line after the 2nd character (to remove the two leading spaces in the filename during import), and use a filter to filter out the unwanted rows.