REGEX: DOS FindStr Command Has No Inverse Class Capability? - regex

The documentation says that FindStr handles inverse class syntax, such as finding any character that does not match 'X':
[^X]
But a few simple tests show this is not the case - at least not on my Windows 7 x64 setup. Findstr treats the inverse class notation '^' as if it is not there at all. So it sees the above regex as this:
[X]
I'm wondering if anyone knows a way to trick FindStr to recognize the '^' inverse class notation?
Yes I could use a different tool besides FindStr, but that is often the tool already installed on a Windows setup.

The ^ is the escape character in cmd.exe. You must escape it (analoguous to \ in other environments). Try:
[^^X]

Are you enclosing the search expression in double quotes? IIRC an unquoted ^ will simply quote the next character literally, which sounds like what you are reporting.

Here is an example, that works:
echo 123|findstr /R "[^0-9]"
finds nothing as expected.
The same without quotes does not work.

Run some tests to make the behaviour of findstr clearer. The test don't cover all possibilities, but just to make a start:
echo.123|findstr.exe /x /r "[^0]"
echo.123|findstr.exe /x /r "[^0][0-9]"
echo.123|findstr.exe /b /r "[^0][0-9]*"
echo. 123|findstr.exe /b /r "[^0][0-9]*"
echo.123a|findstr.exe /b /r "[^0][0-9]*"
echo.123a|findstr.exe /r "^[^0][0-9]*"
echo.123a|findstr.exe /r "^[0][0-9]*$"
echo.123a|findstr.exe /r "[^0]$"
echo.123a|findstr.exe /r "[^a]$"
echo.123a|findstr.exe /e /r "[^a]"
echo.123a|findstr.exe /b /r "[^a]"
More than one regular expression between the quotes is possible. Findstr will search for both expressions (using or relationship).

Related

Using JREPL with Special Characters

I am trying to replace the below
UninstallPassword="1"
with
UninstallPassword="0"
I am using JREPL.bat and so far can only replace content that does not have special characters.
Reading the documentation under /x says I must use /q but I am not sure how to format the line of code for it. I have tried:
jrepl.bat "\qUninstallPassword="1"" "\qUninstallPassword="0"" /f "%userprofile%\pol.txt" /o -
and
jrepl.bat "UninstallPassword\q=\q"1\q"" "UninstallPassword\q=\q"0\q"" /f "%userprofile%\pol.txt" /o -
but both make no change to the text.
Any help appreciated, and alternatively if Windows CMD has a builtin function to acheive the same as JREPL then that would be ideal and keep the script as a standalone.
to use \q, you have to enable it with /XSEQ. \q is then used as a placeholder for ", so replace each " with \q within the patterns. Don't replace the outer quotes surrounding the patterns:
jrepl.bat "UninstallPassword=\q1\q" "UninstallPassword=\q0\q" /XSEQ /f "%userprofile%\pol.txt" /o -

How can I do a negative regex match in batch?

This time, I am unable to create an if statement for my previous post
I'm trying to check if the value is in this form: ^\d\d\.\d$
Most of the time, it will. However, it is sometimes unavailable.
In other scripting languages, I can manage it, but I cannot, for some unknown reasons, figure out how to do it in batch.
So, it should be something like:
if not findstr /R "^\d\d\.\d$" %newvalue%
set newvalue=
If it's not "^\d\d\.\d$", then set newvalue blank. This is only to blank newvalue when it is not found because it will give random results.
Can somebody help me out? What's the best way to do this if statement?
Thanks in advance.
The very good working solution posted by LotPings is:
Echo:%newvalue%|findstr "^[0-9][0-9]\.[0-9]$" >Nul 2>&1 &&(echo matched pattern)||(echo didn't match pattern)
I suggest a little bit different single line solution:
echo:%newvalue%|%SystemRoot%\System32\findstr.exe /R "^[0123456789][0123456789]\.[0123456789]$" >nul && (echo matched pattern) || (echo didn't match pattern)
Or easier readable and working for really any string value assigned to environment variable newvalue:
setlocal EnableExtensions EnableDelayedExpansion
echo:!newvalue!| %SystemRoot%\System32\findstr.exe /R "^[0123456789][0123456789]\.[0123456789]$" >nul
if errorlevel 1 (
echo Value does not match the regular expression pattern.
) else (
echo Value matches the regular expression pattern.
)
endlocal
A colon is used between command echo and the string assigned to environment variable newvalue instead of a space to avoid that command ECHO outputs the current status of command echoing in case of newvalue is not defined at all.
The last solution uses delayed expansion to avoid that the command line with ECHO and FINDSTR does not work or does something completely different than it is written for if the string assigned to variable newvalue contains operators like &<>|.
It is important that there is no space left to redirection operator | which pipes output of command ECHO to the command FINDSTR as input because of this space character would be also output by command ECHO and the regular expression find would never be positive. A space right to | does not matter as last example demonstrates.
FINDSTR does not support \d as representation for any digit. It is necessary to specify the characters to match in a self-defined character class in square brackets. FINDSTR matches with [0-9] the characters 0123456789¹²³. It is necessary to use [0123456789] to match just the characters 0123456789 without ¹²³.
. means any character, except the dot is escaped with a backslash in which case it is interpreted as literal character.
A search string in "..." is interpreted by FINDSTR by default as regular expression string. But I think, it is always good to make use of /L or /R to make it 100% clear for FINDSTR and for every reader how the search string should be interpreted, as literal or as regular expression string.
FINDSTR exits with value 1 on no positive match on searched input character stream and 0 on at least one positive match. The exit code of FINDSTR can be evaluated as shown above and described in detail in single line with multiple commands using Windows batch file.
The output of FINDSTR on a positive match is of no interest and therefore redirected to device NUL to suppress it.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
echo /?
endlocal /?
findstr /?
if /?
setlocal /?
See also the Microsoft article about Using command redirection operators and DosTips forum topic ECHO. FAILS to give text or blank line - Instead use ECHO/ for the reason why using : after echo is often better than a space on output of a string read from a file or entered by a user.

findstr query including tab character

I'm trying to use findstr in place of grep on a barebones vanilla windows box (which is sadly a requirement). I have some relatively large files (1Gb+), and I would like to extract those lines which don't include MX, MXnn, BR, and BRnn delimited by tabs. If I were writing a 'real' regex, then
\t(MX|BR)(..)?\t
would cover it. I don't mind doing it in two stages, but I can't for the life of me seem to include the delimiter tabs.
So far I have:
findstr /V MX source.txt >> temp.txt
findstr /V BR temp.txt >> dest.txt
which due to the nature of the data does an ok-ish job, but I would really rather use something like:
findstr /R /V "\t(MX|BR)(..)?\t" source.txt >> dest.txt
I've tried double slashes, escape sequences etc. but seem to be running around in circles.
I'm loathe to resort to VBScript if I can help it.
Any ideas, given limitations of vanilla windows?
EDIT
I've looked into generating an exclusion file using the /G option, but generating might start to become problematic, once the users cotton on to the possibilities - a regex would just be a lot easier.
A possible solution from the command line or in a batch file is using:
%SystemRoot%\System32\findstr.exe /V /R /C:"\<BR[0-9]*\>" /C:"\<MX[0-9]*\>" "source.txt"
The file source.txt is searched case-sensitive for lines not containing because of /V either BR with 0 or more digits or MX with 0 or more digits being an entire word because of \< and \> using because of /R the two regular expression search terms \<BR[0-9]*\> and \<MX[0-9]*\> which are combined with a logical OR by FINDSTR.
This might be already enough to filter source.txt right. But it filters out also lines containing BR[0-9]* or MX[0-9]* surrounded by other word delimiting characters than horizontal tab characters.
It is possible to use in a batch file:
%SystemRoot%\System32\findstr.exe /V /R /C:"[ ]BR[0-9]*[ ]" /C:"[ ]MX[0-9]*[ ]" "source.txt"
ATTENTION: There must be 1 horizontal tab character in the batch file between each of the 4 pairs of square brackets. The browsers display those 4 tab characters as 1 or more spaces according to HTML specification.
Open a command prompt window and run findstr /? for more information about FINDSTR.
And perhaps read also the Stack Overflow article
What are the undocumented features and limitations of the Windows FINDSTR command?
Afaics there is no syntax to specify a horizontal tab directly.
Findstr regex seems pretty basic, they don't have \s \t \d and such like :-).
However you can use an input file to specify your search pattern. Inside this file you can use tabs literally.
The example from your original post "\t(MX|BR)(..)?\t" would be
" (MX|BR)(..)? "
without the quotes. The leading and trailing whitespace are the tabs typed and saved in the file.
Then you would use findstr with something like:
findstr /R /G:patternFileWithTabs.txt sourceFile.txt
Also you can get the job done most of the time by specifying an exclusive pattern.
If you exclude all alphanumeric, common separator, other white spaces chars, likely the only thing left is a tab.
For example I've been searching for a sequence like in default regex:
"\t\tUnknown\t\t\t\t0\t"
In my use case I could grep it with findstr like:
findstr /R "[ a-z0-9][ a-z0-9]Unknown[ a-z0-9]*0[ a-z0-9]" logfile.txt
Of course it depends on the actual data you have. In theory the pattern would match also other strings, but these other strings don't occur in my source file, so it works.
Most of the time you don't need a 100% bullet proof pattern.

regex expression or in batch

How can I write a regex expression with OR in batch.
I have a file and I want to find "aa" or "bb".
The file contains these lines:
aa
bb
cc
This is the command I have tryed:
findstr /I /R /C:"aa\|bb" temp.txt
and
findstr /I /R /C:"aa|bb" temp.txt
Can anyone help me with the OR syntax in batch for writing regular expressions.
Thanks.
Quoted from the doc:
FINDSTR does not support alternation with the pipe character (|) multiple Regular Expressions can be separated with spaces, just the same as separating multiple words (assuming you have not specified a literal search with /C) but this may not be useful if the regex itself contains spaces.
Reference: http://ss64.com/nt/findstr.html
The default behaviour of findstr is to include all lines that match at least one of the conditions. So, your findstr line should be
findstr /i /c:"aa" /c:"bb" temp.txt

How to use FINDSTR to get lines with simple OR double quotes

I'm trying to get lines from files in a directory tree with a single or double quote in them. As an example, I want to get these lines with a single findstr command:
You should be able to get a "Hello world" program really quick.
Enter a value for params 'name' and 'age', please.
If no value is entered for 'name' param, the "Hello world" program will throw an exception.
I can get lines with only single quotes (findstr /srn \' *.txt), only double quotes (findstr /srn \^" *.txt), or both single and double quotes (findstr /srn \'\^" *.txt), but I need lines with single or double quotes with only a single command.
Any idea about how to achieve it?
Explosion Pills had the correct idea, but not the correct syntax.
It can get tricky when trying to escape the quote for both FINDSTR and for the CMD parser. The regex you want is ['\"], but then you need to escape the " for the CMD parser.
This will work:
findstr /srn ['\^"] *.txt
and so will this
findstr /srn "['\"]^" *.txt
and so will this
findstr /srn ^"['\^"]^" *.txt
NOTE
You are at risk of failing to find files that contain the string because of a nasty FINDSTR bug. The /S option may fail to find files if 8.3 short names are enabled and a folder contains a file with an extension longer than 3 chars that starts with .txt. (name.txt2 for example). For more information, see the section labeled BUG - Short 8.3 filenames can break the /D and /S options at What are the undocumented features and limitations of the Windows FINDSTR command?.
Use a character class
findstr /srn ['\^"] *.txt