Command Line findstr with a regular expression - regex

I need to search through all the files in a directory and sub directories to match any of the numbers in the reg exp. Basically in our code we have blocks of code based on certain project numbers. I need to find these blocks by project number. This regular expression does what I need but I cannot get it to work at the command line
([^0-9]|^)(56|14|2)([^0-9]|$)
I tested this on https://www.freeformatter.com/regex-tester.html against this string "If session.projid = 56 and then again 14 or something else"
I am trying this at the command line
findstr /s /R /C:"([^0-9]|^)(56|14|2)([^0-9]|$)" *.*
But no results and I know there should be. Thanks in advance for any help on this.

See these docs:
FINDSTR does not support alternation with the pipe character (|) multiple Regular Expressions can be separated with spaces, just the same as separating multiple words (assuming you have not specified a literal search with /C) but this might not be useful if the regex itself contains spaces.
In your case, you may use \< / \> word boundaries with each number and you may specify all your alternatives after a space:
findstr /s /r "\<56\> \<14\> \<2\>" *.*

Related

Windows Batch File Regular Expression

I have a following requirement that needs to be achieved in .bat file. Can some one please help.
There is a string, ABCD-1234 TEST SENTENCE in a variable, say str. Now I want to check if the string starts with format [A-Z]*-[0-9] * or not.
How can I achieve this? I tried various regular expression using FINDSTR, but couldn't get the desired result.
Example:
set str=ABCD-1234 TEST SENTENCE
echo %str% | findstr /r "^[A-Z]*-[0-9] *"
I'm assuming you are looking for strings that begin with 1 or more upper case letters, followed by a dash, followed by 1 or more digits, followed by a space.
If the string might contain poison characters like &, <, > etc., then you really should use delayed expansion.
FINDSTR regex is totally non-standard. For example, [A-Z] does not properly represent uppercase letters to FINDSTR, it also includes most of the lowercase letters, as well as some non-English characters. You must explicitly list all uppercase letters. The same is true for the numbers.
A space is interpreted as a search string delimiter unless the /C:"search" option is used.
setlocal enableDelayedExpansion
echo(!str!|findstr /rc:"^[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]*-[0123456789][0123456789]* "
You should have a look at What are the undocumented features and limitations of the Windows FINDSTR command?

findstr query including tab character

I'm trying to use findstr in place of grep on a barebones vanilla windows box (which is sadly a requirement). I have some relatively large files (1Gb+), and I would like to extract those lines which don't include MX, MXnn, BR, and BRnn delimited by tabs. If I were writing a 'real' regex, then
\t(MX|BR)(..)?\t
would cover it. I don't mind doing it in two stages, but I can't for the life of me seem to include the delimiter tabs.
So far I have:
findstr /V MX source.txt >> temp.txt
findstr /V BR temp.txt >> dest.txt
which due to the nature of the data does an ok-ish job, but I would really rather use something like:
findstr /R /V "\t(MX|BR)(..)?\t" source.txt >> dest.txt
I've tried double slashes, escape sequences etc. but seem to be running around in circles.
I'm loathe to resort to VBScript if I can help it.
Any ideas, given limitations of vanilla windows?
EDIT
I've looked into generating an exclusion file using the /G option, but generating might start to become problematic, once the users cotton on to the possibilities - a regex would just be a lot easier.
A possible solution from the command line or in a batch file is using:
%SystemRoot%\System32\findstr.exe /V /R /C:"\<BR[0-9]*\>" /C:"\<MX[0-9]*\>" "source.txt"
The file source.txt is searched case-sensitive for lines not containing because of /V either BR with 0 or more digits or MX with 0 or more digits being an entire word because of \< and \> using because of /R the two regular expression search terms \<BR[0-9]*\> and \<MX[0-9]*\> which are combined with a logical OR by FINDSTR.
This might be already enough to filter source.txt right. But it filters out also lines containing BR[0-9]* or MX[0-9]* surrounded by other word delimiting characters than horizontal tab characters.
It is possible to use in a batch file:
%SystemRoot%\System32\findstr.exe /V /R /C:"[ ]BR[0-9]*[ ]" /C:"[ ]MX[0-9]*[ ]" "source.txt"
ATTENTION: There must be 1 horizontal tab character in the batch file between each of the 4 pairs of square brackets. The browsers display those 4 tab characters as 1 or more spaces according to HTML specification.
Open a command prompt window and run findstr /? for more information about FINDSTR.
And perhaps read also the Stack Overflow article
What are the undocumented features and limitations of the Windows FINDSTR command?
Afaics there is no syntax to specify a horizontal tab directly.
Findstr regex seems pretty basic, they don't have \s \t \d and such like :-).
However you can use an input file to specify your search pattern. Inside this file you can use tabs literally.
The example from your original post "\t(MX|BR)(..)?\t" would be
" (MX|BR)(..)? "
without the quotes. The leading and trailing whitespace are the tabs typed and saved in the file.
Then you would use findstr with something like:
findstr /R /G:patternFileWithTabs.txt sourceFile.txt
Also you can get the job done most of the time by specifying an exclusive pattern.
If you exclude all alphanumeric, common separator, other white spaces chars, likely the only thing left is a tab.
For example I've been searching for a sequence like in default regex:
"\t\tUnknown\t\t\t\t0\t"
In my use case I could grep it with findstr like:
findstr /R "[ a-z0-9][ a-z0-9]Unknown[ a-z0-9]*0[ a-z0-9]" logfile.txt
Of course it depends on the actual data you have. In theory the pattern would match also other strings, but these other strings don't occur in my source file, so it works.
Most of the time you don't need a 100% bullet proof pattern.

regex expression or in batch

How can I write a regex expression with OR in batch.
I have a file and I want to find "aa" or "bb".
The file contains these lines:
aa
bb
cc
This is the command I have tryed:
findstr /I /R /C:"aa\|bb" temp.txt
and
findstr /I /R /C:"aa|bb" temp.txt
Can anyone help me with the OR syntax in batch for writing regular expressions.
Thanks.
Quoted from the doc:
FINDSTR does not support alternation with the pipe character (|) multiple Regular Expressions can be separated with spaces, just the same as separating multiple words (assuming you have not specified a literal search with /C) but this may not be useful if the regex itself contains spaces.
Reference: http://ss64.com/nt/findstr.html
The default behaviour of findstr is to include all lines that match at least one of the conditions. So, your findstr line should be
findstr /i /c:"aa" /c:"bb" temp.txt

Find text string or part of text with dot in grepWin

I am using grepWin on Windows 7 64. http://tools.tortoisesvn.net/grepWin.html
I have a folder with files and their duplicate copies.
The original files are named "FILENAME DOT FILETYPE" (without spaces), for example "cartonbox.shelf".
The copies of these file are named "FILENAME DOT 1 DOT FILETYPE" (without spaces), for example "cartonbox.1.shelf".
I am trying to find all files that contain the exact string:
"DOT 1 DOT FILETYPE" (without spaces), so all files that have for example ".1.shelf" in them.
How can I do that in grepWin please?
If I try "\.1\shelf" or "\.1\.shelf" for example I do not get any results.
What is my mistake please? Been reading http://www.regular-expressions.info/ but cannot come up with correct pattern.
How can I generally search for an exact part of the filename regardless of symbols?
Basically if the file I want to find has for example "garden_1.1.4-JE50.tree" in it how do I tell grepWin to find this exact string of text including underscore, dots or other characters?
Grep stands for g/re/p (global / regular expression / print)
It searches IN files, not file names. That text would need to be text-readable in the file for which you are searching.
In the directory you want to search you could do something like:
dir *.* /b/s > my_file.txt
Then you can perform your regular expressions checks with grepWin on my_file.txt
In Unix and Linux you normally pipe the commands via the command line:
ls -a | grep \.tree$
In Windows you would use
dir * /b | findstr \.tree$
I learned that gripWin is for searching IN files, I am looking to search parts of filenames of files, not in them, but simply their names. Hence I am now reading this: https://superuser.com/questions/209231/what-search-utilities-can-search-by-file-name-in-windows-7
Thanks for explaining this crucial misunderstanding to me, cpattersonv1.

Using regular expressions in findstr

I'm trying to implement a hook script in Subversion, using findstr with a regular expression. The intent is to enforce the inclusion of an entry in the log message that matches the format used by our issue tracking tool (Atlassian JIRA). Our issues each consist of 4 to 6 capital letters and 2 to 4 numerals, separated by a hyphen (e.g., "TEST-554" or CMMGT-392"). Per instructions in the Subversion documentation, I've created a batch file to check the log message for a correctly-formatted entry, using the regex
findstr ([A-Z]{3,6}\-[0-9]{2,4}) > nul
I've tested the regex in a number of testing tools and it seems to work, but when I run it as part of the hook script, it fails to return a match. As a sort of "control", I tried using the regex
findstr ...... > nul
and was able to find a match. Anyone see where I'm going wrong?
findstr requires the /R option to use regular expressions, but it doesn't support extended regular expressions, so things like counts ({3,6}) don't work. Also, zero-or-one matches (?) don't work, so doing what you want will get pretty verbose. Also, English Windows collation means that [A-Z] matches 'A', 'b', 'B', 'z', and 'Z', but not 'a'. Here's something that might work:
findstr /R "[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9][0-9]"
This incredibly verbose command may exceed the maximum command length of the shell (haven't checked), but basically does what you want by containing a separate match for each of the permutations of letter and number counts. That's another odd thing about findstr: unless you use the /C option, spaces in your match string will be used to separate it into individual match expressions.
If you have any option besides findstr such as PowerShell, Python, or even VBScript, I would suggest you use it. Good luck!
EDIT: Here's the Perl one-liner I used to generate the above command:
perl -le 'BEGIN{$\=" "}for $x (3..6){for $y (2..4){print join("","[",A..Z,"]") x $x, "-", "[0-9]" x $y}}'