How should I find all the files that contains two strings? - regex

My problem is to create a batch script file for Windows and iterate through a lot of files and find every file which has a line that contains two specified strings. So if the whole file contains those strings, that's not good enough, they should be at the same line.
For example, I have 5 files which contains the following:
1st: apple:green
2nd: apple
green
3rd: green
apple
4th: apple: yellowgreen
5th: apple: green
It should return the filenames of the first, fourth and fifth file.
Here is what I have:
FINDSTR /s /i /m "apple green" *.txt | FINDSTR "\MyDirectory" >> results.txt
How should I modify this to make it work?

findstr /i /s /m /r /c:"apple.*green" /c:"green.*apple" *.txt

EDITED TO WORK WITH FINDSTR
This regex worked for me:
"apple.*green green.*apple"
Also, your write to file command with the pipe did not work for me (perhaps I'm missing something). If it doesn't work for you, perhaps this will:
FINDSTR /s /i /m "apple.*green green.*apple" *.txt >> results.txt

Related

Batch File Command - Remove Lines With Phrases From File

I am trying to cut out unnecessary lines from a list of installed programs on devices.
Currently using:
type "original.txt" | findstr /v "Click-to-Run" | findstr /v "Visual C++" | findstr /v "Windows*SDK*" > "example_new.txt"
I need it to remove lines such as "Windows Desktop SDK Tools" but KEEP lines such as ".Net Framework 4.0.0 SDK".
How can I get this to only remove the lines that contain the entire phrases specified?
Is it possible to do that, while also using wildcard in the phrases?
Thanks so much!
You can make your life easier (especially if your list is long) by using the /g switch (see findstr /? for details).
type "original.txt" | findstr /vrg:"exclude.txt" > "example_new.txt"
with exclude.txt containing your "to-ignore" list (REGEX allowed):
Click-to-Run
Visual C++
Windows.*SDK
(the /g includes /c, so spaces are no problem)

Equivalent of `grep -o` for findstr command in Windows

In Unix, the command grep -o prints out only the matched string. This is very helpful when you are searching for a pattern using regular expression and only interested in what matched exactly and not the entire line.
However, I'm not able to find something similar for the Windows command findstr. Is there any substitute for printing only matched string in windows?
For example:
grep -o "10\.[0-9]+\.[0-9]+\.[0-9]+" myfile.txt
The above command prints only the IP address in myfile.txt of the form 10.*.*.* but not the entire lines which contain such IP adresses.
PowerShell:
select-string '10\.[0-9]+\.[0-9]+\.[0-9]+' myfile.txt | foreach-object {
$_.Matches[0].Groups[0].Value
}
Just use your familiar grep and other great Linux commands by downloading this UnxUtils (ready .exe binaries). Add it to your PATH environment variable for convenience

Windows scripting: list files not matching a pattern

In Windows 7 command prompt, I´d like to list all files of a folder which name does not start with abc. I have tried:
forfiles /P C:\myFolder\ /M ^[abc]* /S /C "CMD /C echo #file"
Where is my error?
Many thanks.
Looking at forfiles /?:
/M searchmask Searches files according to a searchmask.
The default searchmask is '*' .
which strongly suggests forfiles doesn't support regular expressions, just normal Cmd/Windows wildcards.
On Windows 7 this can easily be achieved in PowerShell:
dir c:\myFolder | ?{ -not($_.Name -match '^abc') } | select Name
(That performs a case-insensitive regular expression match, which doesn't matter in the case of Windows filenames.)
NB. Assuming you want files not starting ABC, which isn't what your (attempted) regular expression says (any filename starting something that isn't a, b or c).
Where is my error?
Your error is thinking that the forfiles command would support regular expressions.
It does not. It supports file name matching with * and ?.
An alternative, in case of using a xcopy command instead of echo is using the option /exclude. For instance:
forfiles /P C:\myFolder\ /M ^[abc]* /S /C "CMD /C xcopy #path %myDestinationFolder% /exclude:abc*"
Also, if you´re using PowerShell, another option is the operator -match.

How can I use regex to chop apart xcopy statements embedded in .csproj files?

I'm working with a bunch (~2000) .csproj files, and in this development staff there's a historical precedent for embedded xcopy in the post-build events to move things around during the build process. In order to get build knowledge into once place, I'm working towards eradicating these xcopy calls in favor of declarative build actions in our automated build process.
With that in mind, I'm trying to come up with a regex I can use to chop out the path arguments supplied to xcopy. The statements come in a couple flavors:
xcopy /F /I /R /E /Y "..\..\..\Microsoft\Enterprise Library\3.1\bin"
xcopy /F /I /R /E /Y ..\Crm\*.* .\
xcopy ..\NUnit ..\..\..\output\debug /I /Y
specifically:
unpredictable placement of switches
destination path argument not always supplied
path arguments sometimes wrapped in quotes
I'm no regex wizard, but this is what I've got so far (the excessive use of parenteses are for match saving in powershell:
(.*x?copy.* '"?)([^ /'"]+)('"/.* '"?)([^ /'"]+)('"?.*)
the ([^ /'"]+) sections are the part that I intend to be the path arguments, being defined as strings containing no quotes, spaces, or forwards slashes, but I have a feeling I'll have to apply two regexes (one for quote-wrapped paths with spaces and one for no-quote paths)
Unfortunately, when I run this regex it seems to give me the same match for both the first and second path arguments. Most frustrating.
How would I change this to correct it?
In cases like this, I like to leverage PowerShell's argument parsing system. Use a simple regex to grab the whole xcopy line and then run it through a function.
$samples = 'xcopy /F /I /R /E /Y "..\..\..\Microsoft\Enterprise Library\3.1\bin"',
'xcopy /F /I /R /E /Y ..\Crm\*.* .\',
'xcopy ..\NUnit ..\..\..\output\debug /I /Y'
function argumentgrinder {
$args | Where-Object {($_ -notlike "/*") -and ($_ -ne "xcopy")}
}
$samples | foreach { Invoke-Expression "argumentgrinder $_"}
You do have to be careful of anything that looks like a PowerShell variable in the paths though ($, # and parentheses).
I don't think you need two different patterns to match the paths.
The following pattern should match each single statement in all three cases you have provided:
\A(xcopy)\s+([\/A-Z\s]*)\s*((".*?")|([^\s]*))\s*((".*?")|([^\s]*))\s*([\/A-Z\s]*)
I've used or (|) to match paths in the various combinations.
NOTE Because I've not windows at the moment, I've been testing this pattern on my linux ruby but the syntax should not be different or at least should give you an idea.

Windows Batch: How remove all blank (or empty) lines

I am trying to remove all blank lines from a text file using a Windows batch program.
I know the simplest way do achieving this is bash is via regular expressions and the sed command:
sed -i "/^$/d" test.txt
Question: Does Windows batch have an similar simple method for removing all lines from a text file? Otherwise, what is the simplest method to achieving this?
Note: I'm running this batch script to setup new Windows computers for customers to use, and so preferably no additional programs need to be installed (and then unistalled) to achieve this - ideally, I'll just be using the "standard" batch library.
For /f does not process empty lines:
for /f "usebackq tokens=* delims=" %%a in ("test.txt") do (echo(%%a)>>~.txt
move /y ~.txt "test.txt"
You could also use FINDSTR:
findstr /v "^$" C:\text_with_blank_lines.txt > C:\text_without_blank_lines.txt
/V -- Print only lines that do NOT contain a match.
^ --- Line position: beginning of line
$ --- Line position: end of line
I usually pipe command output to it:
dir | findstr /v "^$"
You also might find these answers to a similar question helpful, since some 'blank lines' may include spaces or tabs.
https://stackoverflow.com/a/45021815/5651418
https://stackoverflow.com/a/16062125/5651418