Windows Batch: How remove all blank (or empty) lines - regex

I am trying to remove all blank lines from a text file using a Windows batch program.
I know the simplest way do achieving this is bash is via regular expressions and the sed command:
sed -i "/^$/d" test.txt
Question: Does Windows batch have an similar simple method for removing all lines from a text file? Otherwise, what is the simplest method to achieving this?
Note: I'm running this batch script to setup new Windows computers for customers to use, and so preferably no additional programs need to be installed (and then unistalled) to achieve this - ideally, I'll just be using the "standard" batch library.

For /f does not process empty lines:
for /f "usebackq tokens=* delims=" %%a in ("test.txt") do (echo(%%a)>>~.txt
move /y ~.txt "test.txt"

You could also use FINDSTR:
findstr /v "^$" C:\text_with_blank_lines.txt > C:\text_without_blank_lines.txt
/V -- Print only lines that do NOT contain a match.
^ --- Line position: beginning of line
$ --- Line position: end of line
I usually pipe command output to it:
dir | findstr /v "^$"
You also might find these answers to a similar question helpful, since some 'blank lines' may include spaces or tabs.
https://stackoverflow.com/a/45021815/5651418
https://stackoverflow.com/a/16062125/5651418

Related

Batch File Command - Remove Lines With Phrases From File

I am trying to cut out unnecessary lines from a list of installed programs on devices.
Currently using:
type "original.txt" | findstr /v "Click-to-Run" | findstr /v "Visual C++" | findstr /v "Windows*SDK*" > "example_new.txt"
I need it to remove lines such as "Windows Desktop SDK Tools" but KEEP lines such as ".Net Framework 4.0.0 SDK".
How can I get this to only remove the lines that contain the entire phrases specified?
Is it possible to do that, while also using wildcard in the phrases?
Thanks so much!
You can make your life easier (especially if your list is long) by using the /g switch (see findstr /? for details).
type "original.txt" | findstr /vrg:"exclude.txt" > "example_new.txt"
with exclude.txt containing your "to-ignore" list (REGEX allowed):
Click-to-Run
Visual C++
Windows.*SDK
(the /g includes /c, so spaces are no problem)

How should I find all the files that contains two strings?

My problem is to create a batch script file for Windows and iterate through a lot of files and find every file which has a line that contains two specified strings. So if the whole file contains those strings, that's not good enough, they should be at the same line.
For example, I have 5 files which contains the following:
1st: apple:green
2nd: apple
green
3rd: green
apple
4th: apple: yellowgreen
5th: apple: green
It should return the filenames of the first, fourth and fifth file.
Here is what I have:
FINDSTR /s /i /m "apple green" *.txt | FINDSTR "\MyDirectory" >> results.txt
How should I modify this to make it work?
findstr /i /s /m /r /c:"apple.*green" /c:"green.*apple" *.txt
EDITED TO WORK WITH FINDSTR
This regex worked for me:
"apple.*green green.*apple"
Also, your write to file command with the pipe did not work for me (perhaps I'm missing something). If it doesn't work for you, perhaps this will:
FINDSTR /s /i /m "apple.*green green.*apple" *.txt >> results.txt

How to find specific text string in nested tar.gz archives?

How to find specific text string in source code files packed into nested .tar.gz archives, packed inside anothe rar archive(48MB)? (on Windows 7) I tried to use LookDisk, but it hangs and crash. Is it possible to find use system findstr utility, and what's regular expressions for this? Or with other search utility, that do not need installation(portable).
Based on a SuperUser answer this example batch file searches multiple .tar.gz archive files (specified on the command line) and outputs the filename of the .tar.gz containing specified string.
It does this without outputting any files to disk.
It is dependant on 7-Zip, you can use a portable version of this - it doesn't need to be "installed" - but be available.
Change the value of the variable SEARCHSTR (currently hell) to the string you want to search for.
I can't see any obvious or easy way of returning the filename of the file containing the text inside the archive.
#echo off
setlocal enabledelayedexpansion
set SEARCHSTR=hell
rem Ensure 7z.exe is in your path or in current directory... ie. set PATH=%PATH%;C:\Program Files\7-Zip
rem Loop through all commandline args - the tar.gz files
for %%i in (%*) do (
rem Extract without an intermediate .tar
7z x "%%i" -so | 7z x -si -ttar -so | findstr /C:"%SEARCHSTR%"
if "!ERRORLEVEL!" == "0" (
set FOUNDIN=%%i
rem Exit after we find the first occurrence.
goto found
)
)
:notfound
echo Unable to locate search string "%SEARCHSTR%" in specified files.
goto end
:found
echo Found search string "%SEARCHSTR%" in "%FOUNDIN%".
:end
Edit 1 - Using self contained / portable 7-Zip
Download the official 7-Zip Command Line Version one listed on the Official 7-Zip Download page extract and use 7za.exe it's a self-contained command line version of 7-Zip, meaning you won't need any extra files just 7za.exe.
You will need to change the two occurrences of 7z to 7za to use this version.
So the line:
7z x "%%i" -so | 7z x -si -ttar -so | findstr /C:"%SEARCHSTR%"
Changes to:
7za x "%%i" -so | 7za x -si -ttar -so | findstr /C:"%SEARCHSTR%"

How can I use regex to chop apart xcopy statements embedded in .csproj files?

I'm working with a bunch (~2000) .csproj files, and in this development staff there's a historical precedent for embedded xcopy in the post-build events to move things around during the build process. In order to get build knowledge into once place, I'm working towards eradicating these xcopy calls in favor of declarative build actions in our automated build process.
With that in mind, I'm trying to come up with a regex I can use to chop out the path arguments supplied to xcopy. The statements come in a couple flavors:
xcopy /F /I /R /E /Y "..\..\..\Microsoft\Enterprise Library\3.1\bin"
xcopy /F /I /R /E /Y ..\Crm\*.* .\
xcopy ..\NUnit ..\..\..\output\debug /I /Y
specifically:
unpredictable placement of switches
destination path argument not always supplied
path arguments sometimes wrapped in quotes
I'm no regex wizard, but this is what I've got so far (the excessive use of parenteses are for match saving in powershell:
(.*x?copy.* '"?)([^ /'"]+)('"/.* '"?)([^ /'"]+)('"?.*)
the ([^ /'"]+) sections are the part that I intend to be the path arguments, being defined as strings containing no quotes, spaces, or forwards slashes, but I have a feeling I'll have to apply two regexes (one for quote-wrapped paths with spaces and one for no-quote paths)
Unfortunately, when I run this regex it seems to give me the same match for both the first and second path arguments. Most frustrating.
How would I change this to correct it?
In cases like this, I like to leverage PowerShell's argument parsing system. Use a simple regex to grab the whole xcopy line and then run it through a function.
$samples = 'xcopy /F /I /R /E /Y "..\..\..\Microsoft\Enterprise Library\3.1\bin"',
'xcopy /F /I /R /E /Y ..\Crm\*.* .\',
'xcopy ..\NUnit ..\..\..\output\debug /I /Y'
function argumentgrinder {
$args | Where-Object {($_ -notlike "/*") -and ($_ -ne "xcopy")}
}
$samples | foreach { Invoke-Expression "argumentgrinder $_"}
You do have to be careful of anything that looks like a PowerShell variable in the paths though ($, # and parentheses).
I don't think you need two different patterns to match the paths.
The following pattern should match each single statement in all three cases you have provided:
\A(xcopy)\s+([\/A-Z\s]*)\s*((".*?")|([^\s]*))\s*((".*?")|([^\s]*))\s*([\/A-Z\s]*)
I've used or (|) to match paths in the various combinations.
NOTE Because I've not windows at the moment, I've been testing this pattern on my linux ruby but the syntax should not be different or at least should give you an idea.

Batch file - matching file extensions

Probably far too easy question, but how do I match a file extension such as .jpg while not matching jpg~ (i.e. a jpg that a program has made a local copy of?) My current line is:
for /f %%a in ('dir /b *.jpg') do echo %%~na
But if any program has a copy of one of the files open (and thus has made a .jpg~ file) this regexp will match those too. I found a reference to $ being the 'end of line', but doing this doesn't work at all:
for /f %%a in ('dir /b *.jpg$') do echo %%~na
I don't think it is possible to filter this with just a FOR command (unless you pipe the output of dir to findstr) but in this case, adding a simple if test is all that is needed:
for %%A IN (*.jpg) DO if "%%~xA"==".jpg" #echo %%~A
I think, the problem arises from the short-name representation. (Use dir /X and you can see that xxx.jpg and xxx.jpg~ both have a 8.3 file-name that ends with .jpg.)