regex expression or in batch - regex

How can I write a regex expression with OR in batch.
I have a file and I want to find "aa" or "bb".
The file contains these lines:
aa
bb
cc
This is the command I have tryed:
findstr /I /R /C:"aa\|bb" temp.txt
and
findstr /I /R /C:"aa|bb" temp.txt
Can anyone help me with the OR syntax in batch for writing regular expressions.
Thanks.

Quoted from the doc:
FINDSTR does not support alternation with the pipe character (|) multiple Regular Expressions can be separated with spaces, just the same as separating multiple words (assuming you have not specified a literal search with /C) but this may not be useful if the regex itself contains spaces.
Reference: http://ss64.com/nt/findstr.html

The default behaviour of findstr is to include all lines that match at least one of the conditions. So, your findstr line should be
findstr /i /c:"aa" /c:"bb" temp.txt

Related

Command Line findstr with a regular expression

I need to search through all the files in a directory and sub directories to match any of the numbers in the reg exp. Basically in our code we have blocks of code based on certain project numbers. I need to find these blocks by project number. This regular expression does what I need but I cannot get it to work at the command line
([^0-9]|^)(56|14|2)([^0-9]|$)
I tested this on https://www.freeformatter.com/regex-tester.html against this string "If session.projid = 56 and then again 14 or something else"
I am trying this at the command line
findstr /s /R /C:"([^0-9]|^)(56|14|2)([^0-9]|$)" *.*
But no results and I know there should be. Thanks in advance for any help on this.
See these docs:
FINDSTR does not support alternation with the pipe character (|) multiple Regular Expressions can be separated with spaces, just the same as separating multiple words (assuming you have not specified a literal search with /C) but this might not be useful if the regex itself contains spaces.
In your case, you may use \< / \> word boundaries with each number and you may specify all your alternatives after a space:
findstr /s /r "\<56\> \<14\> \<2\>" *.*

How can I do a negative regex match in batch?

This time, I am unable to create an if statement for my previous post
I'm trying to check if the value is in this form: ^\d\d\.\d$
Most of the time, it will. However, it is sometimes unavailable.
In other scripting languages, I can manage it, but I cannot, for some unknown reasons, figure out how to do it in batch.
So, it should be something like:
if not findstr /R "^\d\d\.\d$" %newvalue%
set newvalue=
If it's not "^\d\d\.\d$", then set newvalue blank. This is only to blank newvalue when it is not found because it will give random results.
Can somebody help me out? What's the best way to do this if statement?
Thanks in advance.
The very good working solution posted by LotPings is:
Echo:%newvalue%|findstr "^[0-9][0-9]\.[0-9]$" >Nul 2>&1 &&(echo matched pattern)||(echo didn't match pattern)
I suggest a little bit different single line solution:
echo:%newvalue%|%SystemRoot%\System32\findstr.exe /R "^[0123456789][0123456789]\.[0123456789]$" >nul && (echo matched pattern) || (echo didn't match pattern)
Or easier readable and working for really any string value assigned to environment variable newvalue:
setlocal EnableExtensions EnableDelayedExpansion
echo:!newvalue!| %SystemRoot%\System32\findstr.exe /R "^[0123456789][0123456789]\.[0123456789]$" >nul
if errorlevel 1 (
echo Value does not match the regular expression pattern.
) else (
echo Value matches the regular expression pattern.
)
endlocal
A colon is used between command echo and the string assigned to environment variable newvalue instead of a space to avoid that command ECHO outputs the current status of command echoing in case of newvalue is not defined at all.
The last solution uses delayed expansion to avoid that the command line with ECHO and FINDSTR does not work or does something completely different than it is written for if the string assigned to variable newvalue contains operators like &<>|.
It is important that there is no space left to redirection operator | which pipes output of command ECHO to the command FINDSTR as input because of this space character would be also output by command ECHO and the regular expression find would never be positive. A space right to | does not matter as last example demonstrates.
FINDSTR does not support \d as representation for any digit. It is necessary to specify the characters to match in a self-defined character class in square brackets. FINDSTR matches with [0-9] the characters 0123456789¹²³. It is necessary to use [0123456789] to match just the characters 0123456789 without ¹²³.
. means any character, except the dot is escaped with a backslash in which case it is interpreted as literal character.
A search string in "..." is interpreted by FINDSTR by default as regular expression string. But I think, it is always good to make use of /L or /R to make it 100% clear for FINDSTR and for every reader how the search string should be interpreted, as literal or as regular expression string.
FINDSTR exits with value 1 on no positive match on searched input character stream and 0 on at least one positive match. The exit code of FINDSTR can be evaluated as shown above and described in detail in single line with multiple commands using Windows batch file.
The output of FINDSTR on a positive match is of no interest and therefore redirected to device NUL to suppress it.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
echo /?
endlocal /?
findstr /?
if /?
setlocal /?
See also the Microsoft article about Using command redirection operators and DosTips forum topic ECHO. FAILS to give text or blank line - Instead use ECHO/ for the reason why using : after echo is often better than a space on output of a string read from a file or entered by a user.

findstr query including tab character

I'm trying to use findstr in place of grep on a barebones vanilla windows box (which is sadly a requirement). I have some relatively large files (1Gb+), and I would like to extract those lines which don't include MX, MXnn, BR, and BRnn delimited by tabs. If I were writing a 'real' regex, then
\t(MX|BR)(..)?\t
would cover it. I don't mind doing it in two stages, but I can't for the life of me seem to include the delimiter tabs.
So far I have:
findstr /V MX source.txt >> temp.txt
findstr /V BR temp.txt >> dest.txt
which due to the nature of the data does an ok-ish job, but I would really rather use something like:
findstr /R /V "\t(MX|BR)(..)?\t" source.txt >> dest.txt
I've tried double slashes, escape sequences etc. but seem to be running around in circles.
I'm loathe to resort to VBScript if I can help it.
Any ideas, given limitations of vanilla windows?
EDIT
I've looked into generating an exclusion file using the /G option, but generating might start to become problematic, once the users cotton on to the possibilities - a regex would just be a lot easier.
A possible solution from the command line or in a batch file is using:
%SystemRoot%\System32\findstr.exe /V /R /C:"\<BR[0-9]*\>" /C:"\<MX[0-9]*\>" "source.txt"
The file source.txt is searched case-sensitive for lines not containing because of /V either BR with 0 or more digits or MX with 0 or more digits being an entire word because of \< and \> using because of /R the two regular expression search terms \<BR[0-9]*\> and \<MX[0-9]*\> which are combined with a logical OR by FINDSTR.
This might be already enough to filter source.txt right. But it filters out also lines containing BR[0-9]* or MX[0-9]* surrounded by other word delimiting characters than horizontal tab characters.
It is possible to use in a batch file:
%SystemRoot%\System32\findstr.exe /V /R /C:"[ ]BR[0-9]*[ ]" /C:"[ ]MX[0-9]*[ ]" "source.txt"
ATTENTION: There must be 1 horizontal tab character in the batch file between each of the 4 pairs of square brackets. The browsers display those 4 tab characters as 1 or more spaces according to HTML specification.
Open a command prompt window and run findstr /? for more information about FINDSTR.
And perhaps read also the Stack Overflow article
What are the undocumented features and limitations of the Windows FINDSTR command?
Afaics there is no syntax to specify a horizontal tab directly.
Findstr regex seems pretty basic, they don't have \s \t \d and such like :-).
However you can use an input file to specify your search pattern. Inside this file you can use tabs literally.
The example from your original post "\t(MX|BR)(..)?\t" would be
" (MX|BR)(..)? "
without the quotes. The leading and trailing whitespace are the tabs typed and saved in the file.
Then you would use findstr with something like:
findstr /R /G:patternFileWithTabs.txt sourceFile.txt
Also you can get the job done most of the time by specifying an exclusive pattern.
If you exclude all alphanumeric, common separator, other white spaces chars, likely the only thing left is a tab.
For example I've been searching for a sequence like in default regex:
"\t\tUnknown\t\t\t\t0\t"
In my use case I could grep it with findstr like:
findstr /R "[ a-z0-9][ a-z0-9]Unknown[ a-z0-9]*0[ a-z0-9]" logfile.txt
Of course it depends on the actual data you have. In theory the pattern would match also other strings, but these other strings don't occur in my source file, so it works.
Most of the time you don't need a 100% bullet proof pattern.

How to use FINDSTR to get lines with simple OR double quotes

I'm trying to get lines from files in a directory tree with a single or double quote in them. As an example, I want to get these lines with a single findstr command:
You should be able to get a "Hello world" program really quick.
Enter a value for params 'name' and 'age', please.
If no value is entered for 'name' param, the "Hello world" program will throw an exception.
I can get lines with only single quotes (findstr /srn \' *.txt), only double quotes (findstr /srn \^" *.txt), or both single and double quotes (findstr /srn \'\^" *.txt), but I need lines with single or double quotes with only a single command.
Any idea about how to achieve it?
Explosion Pills had the correct idea, but not the correct syntax.
It can get tricky when trying to escape the quote for both FINDSTR and for the CMD parser. The regex you want is ['\"], but then you need to escape the " for the CMD parser.
This will work:
findstr /srn ['\^"] *.txt
and so will this
findstr /srn "['\"]^" *.txt
and so will this
findstr /srn ^"['\^"]^" *.txt
NOTE
You are at risk of failing to find files that contain the string because of a nasty FINDSTR bug. The /S option may fail to find files if 8.3 short names are enabled and a folder contains a file with an extension longer than 3 chars that starts with .txt. (name.txt2 for example). For more information, see the section labeled BUG - Short 8.3 filenames can break the /D and /S options at What are the undocumented features and limitations of the Windows FINDSTR command?.
Use a character class
findstr /srn ['\^"] *.txt

Using regular expressions in findstr

I'm trying to implement a hook script in Subversion, using findstr with a regular expression. The intent is to enforce the inclusion of an entry in the log message that matches the format used by our issue tracking tool (Atlassian JIRA). Our issues each consist of 4 to 6 capital letters and 2 to 4 numerals, separated by a hyphen (e.g., "TEST-554" or CMMGT-392"). Per instructions in the Subversion documentation, I've created a batch file to check the log message for a correctly-formatted entry, using the regex
findstr ([A-Z]{3,6}\-[0-9]{2,4}) > nul
I've tested the regex in a number of testing tools and it seems to work, but when I run it as part of the hook script, it fails to return a match. As a sort of "control", I tried using the regex
findstr ...... > nul
and was able to find a match. Anyone see where I'm going wrong?
findstr requires the /R option to use regular expressions, but it doesn't support extended regular expressions, so things like counts ({3,6}) don't work. Also, zero-or-one matches (?) don't work, so doing what you want will get pretty verbose. Also, English Windows collation means that [A-Z] matches 'A', 'b', 'B', 'z', and 'Z', but not 'a'. Here's something that might work:
findstr /R "[ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9] [ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ][ABCDEFGHIJKLMNOPQRSTUVWXYZ]-[0-9][0-9][0-9][0-9]"
This incredibly verbose command may exceed the maximum command length of the shell (haven't checked), but basically does what you want by containing a separate match for each of the permutations of letter and number counts. That's another odd thing about findstr: unless you use the /C option, spaces in your match string will be used to separate it into individual match expressions.
If you have any option besides findstr such as PowerShell, Python, or even VBScript, I would suggest you use it. Good luck!
EDIT: Here's the Perl one-liner I used to generate the above command:
perl -le 'BEGIN{$\=" "}for $x (3..6){for $y (2..4){print join("","[",A..Z,"]") x $x, "-", "[0-9]" x $y}}'