How to find specific text string in nested tar.gz archives? - regex

How to find specific text string in source code files packed into nested .tar.gz archives, packed inside anothe rar archive(48MB)? (on Windows 7) I tried to use LookDisk, but it hangs and crash. Is it possible to find use system findstr utility, and what's regular expressions for this? Or with other search utility, that do not need installation(portable).

Based on a SuperUser answer this example batch file searches multiple .tar.gz archive files (specified on the command line) and outputs the filename of the .tar.gz containing specified string.
It does this without outputting any files to disk.
It is dependant on 7-Zip, you can use a portable version of this - it doesn't need to be "installed" - but be available.
Change the value of the variable SEARCHSTR (currently hell) to the string you want to search for.
I can't see any obvious or easy way of returning the filename of the file containing the text inside the archive.
#echo off
setlocal enabledelayedexpansion
set SEARCHSTR=hell
rem Ensure 7z.exe is in your path or in current directory... ie. set PATH=%PATH%;C:\Program Files\7-Zip
rem Loop through all commandline args - the tar.gz files
for %%i in (%*) do (
rem Extract without an intermediate .tar
7z x "%%i" -so | 7z x -si -ttar -so | findstr /C:"%SEARCHSTR%"
if "!ERRORLEVEL!" == "0" (
set FOUNDIN=%%i
rem Exit after we find the first occurrence.
goto found
)
)
:notfound
echo Unable to locate search string "%SEARCHSTR%" in specified files.
goto end
:found
echo Found search string "%SEARCHSTR%" in "%FOUNDIN%".
:end
Edit 1 - Using self contained / portable 7-Zip
Download the official 7-Zip Command Line Version one listed on the Official 7-Zip Download page extract and use 7za.exe it's a self-contained command line version of 7-Zip, meaning you won't need any extra files just 7za.exe.
You will need to change the two occurrences of 7z to 7za to use this version.
So the line:
7z x "%%i" -so | 7z x -si -ttar -so | findstr /C:"%SEARCHSTR%"
Changes to:
7za x "%%i" -so | 7za x -si -ttar -so | findstr /C:"%SEARCHSTR%"

Related

Compare Folders, Create Archive of Differences

I have folder A and folder B
Folder A contains approx 100 files all text, js, php, bash etc. They are stored in the root of the folder and sub folders and further sub folders within folder A.
Folder B is a copy of Folder A, but some of the files have been updated.
Is there any way I can compare A to B and create a tar.gz file containing only the files that have changed in Folder B
I would need to keep the folder structure intact when the tar.gz is created.
Currently I use WinMerge to check for differences, but I'm happy to look at any windows or Linux application/commands that will help with this.
Thanks
This line excludes files that are only in one or the other, but creates the tar.gz file that you want.
diff -rq folderA folderB | grep -v "^Only in" | sed "s/^.*and folderB/folderB/g" | sed "s/ differ//g" | tar czf output.tar.gz -T -
Broken down it goes:
dif -rq folderA folderB
Do a recursive diff between these folders, be quiet about it - only output the file names.
| grep -v "^Only in"
Exclude output lines that indicate one file is only in one of the folders. I'm assuming from your description this isn't an issue for you, but the two folders I was playing with were a bit dirty.
| sed "s/^.*and folderB/folderB/g"
Discard the first bit of the output up until it says " and " and then the name of the second folder. This actually takes away the second folder name as well, but then replaces it back in
| sed "s/ differ//g"
Discard the end bit of the diff output.
| tar czf output.tar.gz -T -
Tell tar to do the thing. c == create a tar file z means compress it (zip) f means the filename is coming shortly. output.tar.gz is your output file -T means "get the filenames from the file I'm about to tell you" the final - means "use stdin instead"
I suggest you build this up yourself in the individual steps so you can see how it is constructed, and what the output of each step is like.

cd into directories with a specific pattern, read file content and print their content to a text file in linux

I am trying to automate a report generation process.
I need to enter into directories having a specific pattern and then read files from them. The directories name is in pattern PYYYYMMDD001, PYYYYMMDD002 and so on. I need to enter each directory with the defined pattern and read data from each file within the directory. But I am unable to do so as I am committing a mistake while defining the pattern. Please find the command I am using
TODAY=$(date +"%m%d%Y")
cd /home/user/allFiles
for d in "P"$TODAY*
do
(cd $d && grep -o '-NEW' *_$TODAY*_GOOD* | uniq -c| sed 's/\|/ /'|awk '{print $1}' > /home/user/new/$TODAY"Report.txt" )
done
When I am trying to execute it, getting the error of P02192017* [No such file or directory]
The list of directories are - P02192017001, P02192017002, P02192017003 , P02192017004 , P02192017005, P02192017006 , P02192017007, P02192017008
Any kind of help towards this would be highly appreciated.

Use [msys] bash to remove all files whose name matches a pattern, regardless of file-name letter-case

I need a way to clean up a directory, which is populated with C/C++ built-files (.o, .a, .EXE, .OBJ, .LIB, etc.) produced by (1) some tools which always create files having UPPER-CASE names, and (2) other tools which always create lower-case file names. (I have no control over the tools.)
I need to do this from a MinGW 'msys' bash.exe shell script (or bash command prompt). I understand piping (|), but haven't come up with the right combination of exec's yet. I have successfully filtered the file names, using commands like this example:
ls | grep '.\.[eE][xX][eE]'
to list all files having any case-combination of letters in the file-extension--this example gets all the executable (e.g. ".EXE") files.
(I'll be doing similar for .o, .a, .OBJ, .LIB, .lib, .MAP, etc., which all share the same directory as the C/C++ source files. I don't want to delete the source files, only the built-files. And yes, I probably should rework the directory structure, to use a separate directory for the built-files [only], but that will take time, and I need a quick solution now.)
How can I merge the above command with "something" else (e.g., like the 'rm -f' command???), to carry this the one step further, to actually delete [only] those filtered-out files from the current directory? (I'm hopeful for a solution which does not require a temporary file to hold the filtered file names.)
Adding this answer because the accepted answer is suggesting practices which are not-recommended in actual scripts. (Please don't feel bad, I was also on that track once..)
Parsing ls output is a NO-NO! See http://mywiki.wooledge.org/ParsingLs for more detailed explanation on why.
In short, ls separates the filenames with newline; which can be present in the filename itself. (Plus, ls does not handle other special characters properly. ls prints the output in human readable form.) In unix/linux, it's perfectly valid to have a newline in the filename.
A unix filename cannot have a NULL character though. Hence below command should work.
find /path/to/some/directory -iname '*.exe' -print0 | xargs -0 rm -f
find: is a tool used to, well, find files matching the required pattern/criterion.
-iname: search using particular names, case insensitive. Note that the argument to -iname is wildcard, not regex.
-print0: Print the file names separated by NULL character.
xargs: Takes the input from stdin & runs the commands supplied (rm -f in this case) on them. The input is separaed by white-space by default.
-0 specifies that the input is separated by null character.
Or even better approach,
find /path/to/some/directory -iname '*.exe' -delete
-delete is a built-in feature of find, which deletes the files found with the pattern.
Note that if you want to do some other operation, like move them to particular directory, you'd need to use first option with xargs.
Finally, this command find /path/to/some/directory -iname '*.exe' -delete would recursively find the *.exe files/directories. You can restrict the search to current directory with -maxdepth 1 & filetype to simple file (not directory, pipe etc.) using -type f. Check the manual link I provided for more details.
this is what you mean?
rm -f `ls | grep '.\.[eE][xX][eE]'`
but usually your "ls | grep ..." output will have some other fields that you have to strip out such as date etc., so you might just want to output the file name itself.
try something like:
rm -f `ls | grep '.\.[eE][xX][eE]' | awk '{print $9}'`
where you file name is in the 9th field like:
-rwxr-xr-x 1 Administrators None 283 Jul 2 2014 search.exe
You can use following command:
ls | grep '.\.[eE][xX][eE]' | xargs rm -f
Use of "xargs" would turn standard input ( in this case output of the previous command) as arguments for "rm -f" command.

Move files based on filename with regex and delete files with lower number in a substring in the filename

I have files with the following syntax:
LWD_???_??????_???_??_??_LP?_??_?_*.PDF
Example:
LWD_ARC_G10000_102_UE_XX_LP5_08_E_Uebersicht_Bodenplatten
I want to extract substrings out of the file name and put the file into a folder with the path based on that file name like this:
C:\Lp5\ARC\G10\
First folder is the 7th part of the file name, the 2nd part is the second folder and the first 3 chars of the 3rd part is the last folder.
Then in addition to that I need an extra delete: When the files are copied to the specific folder there is a consecutively numbered part in the file name. I need the "older" files deleted so that only the "last" file is in this folder. The numbers/index is always the 30th and the 31st char.
LWD_FEU_L20000_005_IZ_00_LP8_XX_F.pdf Index 00
LWD_FEU_L20000_005_IZ_00_LP8_01_F.pdf Index 01
For now I only have a batch with static folders:
FOR /R "E:\Downloads" %%i in (LWD_ELT_A10?00_???_??_??_LP5*) do move "%%i" "C:\Lp5\ELT\A10"
FOR /R "E:\Downloads" %%i in (LWD_???_A10000_???_??_??_LP5*) do del "%%i"
...
Does anyone have an idea how to do that without VBS or sth. like that - only Windows Batch or PowerShell?
My batch strategy is as follows:
1) get list of PDF files using DIR /B
2) parse each file name into a string consisting of (pipe delimited)
file mask that matches name with ?? wildcard for positions 30,31
destination path
full file name
3) sort the strings descending, resulting in the most recent version at the top of each file name grouping.
4) process the output with FOR /F, parsing out the file mask, destination path, and full name
5) for each iteration, create the destination folder if it does not already exist, and then conditionally copy the file to the destination if there does not yet exist a file that matches the file mask.
The above strategy is non-destructive, as the original files all remain in place. You could modify step 5 to be destructive - move the newest files instead of copy, and delete the rest.
You could implement the above strategy fairly easily with pure batch. But I would use my REPL.BAT utility - a hybrid JScript/batch script that can efficiently perform sophisticated regular expression replacements. JREPL.BAT is pure script that runs natively on any Windows machine from XP onward.
The following are untested
Non-destructive version
#echo off
for /f "tokens=1,2,3 delims=|" %%A in (
'dir /b /a-d LWD_*.PDF ^| jrepl "^(LWD_(...)_(...)..._..._.._.._(LP.)_)..(_._.*\.PDF)$" "$1??$5|c:\$4\$2\$3|$&" /i /a ^| sort /r'
) do (
md "%%B" >nul 2>nul
if not exist "%%B\%%A" copy "%%C" "%%B" >nul
)
Destructive version
#echo off
for /f "tokens=1,2,3 delims=|" %%A in (
'dir /b /a-d LWD_*.PDF ^| jrepl "^(LWD_(...)_(...)..._..._.._.._(LP.)_)..(_._.*\.PDF)$" "$1??$5|c:\$4\$2\$3|$&" /i /a ^| sort /r'
) do (
md "%%B" >nul 2>nul
if not exist "%%B\%%A" (move "%%C" "%%B" >nul) else del "%%C"
)

Windows Batch: How remove all blank (or empty) lines

I am trying to remove all blank lines from a text file using a Windows batch program.
I know the simplest way do achieving this is bash is via regular expressions and the sed command:
sed -i "/^$/d" test.txt
Question: Does Windows batch have an similar simple method for removing all lines from a text file? Otherwise, what is the simplest method to achieving this?
Note: I'm running this batch script to setup new Windows computers for customers to use, and so preferably no additional programs need to be installed (and then unistalled) to achieve this - ideally, I'll just be using the "standard" batch library.
For /f does not process empty lines:
for /f "usebackq tokens=* delims=" %%a in ("test.txt") do (echo(%%a)>>~.txt
move /y ~.txt "test.txt"
You could also use FINDSTR:
findstr /v "^$" C:\text_with_blank_lines.txt > C:\text_without_blank_lines.txt
/V -- Print only lines that do NOT contain a match.
^ --- Line position: beginning of line
$ --- Line position: end of line
I usually pipe command output to it:
dir | findstr /v "^$"
You also might find these answers to a similar question helpful, since some 'blank lines' may include spaces or tabs.
https://stackoverflow.com/a/45021815/5651418
https://stackoverflow.com/a/16062125/5651418