Compare Folders, Create Archive of Differences - compare

I have folder A and folder B
Folder A contains approx 100 files all text, js, php, bash etc. They are stored in the root of the folder and sub folders and further sub folders within folder A.
Folder B is a copy of Folder A, but some of the files have been updated.
Is there any way I can compare A to B and create a tar.gz file containing only the files that have changed in Folder B
I would need to keep the folder structure intact when the tar.gz is created.
Currently I use WinMerge to check for differences, but I'm happy to look at any windows or Linux application/commands that will help with this.
Thanks

This line excludes files that are only in one or the other, but creates the tar.gz file that you want.
diff -rq folderA folderB | grep -v "^Only in" | sed "s/^.*and folderB/folderB/g" | sed "s/ differ//g" | tar czf output.tar.gz -T -
Broken down it goes:
dif -rq folderA folderB
Do a recursive diff between these folders, be quiet about it - only output the file names.
| grep -v "^Only in"
Exclude output lines that indicate one file is only in one of the folders. I'm assuming from your description this isn't an issue for you, but the two folders I was playing with were a bit dirty.
| sed "s/^.*and folderB/folderB/g"
Discard the first bit of the output up until it says " and " and then the name of the second folder. This actually takes away the second folder name as well, but then replaces it back in
| sed "s/ differ//g"
Discard the end bit of the diff output.
| tar czf output.tar.gz -T -
Tell tar to do the thing. c == create a tar file z means compress it (zip) f means the filename is coming shortly. output.tar.gz is your output file -T means "get the filenames from the file I'm about to tell you" the final - means "use stdin instead"
I suggest you build this up yourself in the individual steps so you can see how it is constructed, and what the output of each step is like.

Related

cd into directories with a specific pattern, read file content and print their content to a text file in linux

I am trying to automate a report generation process.
I need to enter into directories having a specific pattern and then read files from them. The directories name is in pattern PYYYYMMDD001, PYYYYMMDD002 and so on. I need to enter each directory with the defined pattern and read data from each file within the directory. But I am unable to do so as I am committing a mistake while defining the pattern. Please find the command I am using
TODAY=$(date +"%m%d%Y")
cd /home/user/allFiles
for d in "P"$TODAY*
do
(cd $d && grep -o '-NEW' *_$TODAY*_GOOD* | uniq -c| sed 's/\|/ /'|awk '{print $1}' > /home/user/new/$TODAY"Report.txt" )
done
When I am trying to execute it, getting the error of P02192017* [No such file or directory]
The list of directories are - P02192017001, P02192017002, P02192017003 , P02192017004 , P02192017005, P02192017006 , P02192017007, P02192017008
Any kind of help towards this would be highly appreciated.

Grep for multiple patterns in a folder containing n number of files.And if a match found for patterns create mkdir

Can we grep for multiple patterns in a folder containing n number of files. And if a match found for each and every pattern create a directory and push the files of similar pattern type into same directory likewise the others.
For example : I am having a folder name : X. X can have multiple sub folders and multiple files inside them.
I want to search for a pattern like This code is from. If a match of this string is found in multiple files in X folder create a directory named dir1 and push all the matched files into dir1.
And the same for other patterns matches also if the matches are found create directories and push the files into respective directories.
I tried of searching with grep can found all pattern matched files but parallely I can't do mkdir . In this way for n matches of patterns in X n dir it should create. Searching is fine but having issue with directories creation parallely.
one way to get the same folder structure is, unfortunately, not to use xargs cp -t dir, but instead copy one-by-one with rsync, e.g.,
grep -irl "Version" | xargs -I{} rsync -a "{}" "dir/{}"
I mean, it's not elegant, but you could use embedded for loops with an array of search strings.
EDIT: Missed the part about separate folders for different match strings. Changes are below.
#!/bin/bash
#Assuming:
#patarr is an array containing all paterns
#test/files is the location of files to be searched
#test/found is the location of matching files
for file in test/files/* #*/
#This loop runs for every file in test/files/. $file holds the filename of the current file
do
for ((i=0;i<${#patarr[#]};i++))
#This loop runs once for every index in the patarr array. $i holds the current loop number
do
if [[ $(cat $file | grep ${patarr[$i]} | wc -l) -gt 0 ]]
#if grep finds at least one match using the pattern in patarr with index "i"
then
#cp $file temp/found/ #Old code, see edit above
mkdir -p temp/found/${pararr[$i]}
#Makes a folder with the name as our search string. -p means no error if the folder already exists.
cp $file temp/found/${pararr[$i]}/
#Copies the file into said folder
fi
done
done

Copy all Files in a List to a Unique Directory

I am trying to take a text file that contains a list of files and copy them all to a directory. Within this directory, they will have unique directory names. An example of text file the structure can be seen below:
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000003/s01_2011_11_01/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000003/s01_2011_11_01/a_1.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000003/s02_2011_11_11/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000003/s02_2011_11_11/a_1.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s01_2009_02_13/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s02_2010_10_02/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s03_2010_10_02/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_1.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_2.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_3.edf
/data/isip/data/tuh_eeg/v0.6.0/edf/001/00000005/s04_2010_10_03/a_4.edf
I need a shell command or an EMACS macro to go through this list and copy them all to unique directories within the current working directory. The unique directory will depend on the file; for example, for the first two files, the directory would be
/001/00000003/s01_2011_11_01/
I have tried doing this using an EMACS macro, but I was not able to get it to work. A shell command or EMACs macro would work.
Something as simple as:
cat list | sed "s/^.*edf\/\(.*\)\/\(.*\)$/mkdir -p root_dir\/\1 \&\& cp \0 root_dir\/\1\/\2/" | sh
If on OSX - install gnu-sed and use gsed instead of sed. Run command without | sh to see what it'll do. Make sure to tweak root_dir, of course.

Use [msys] bash to remove all files whose name matches a pattern, regardless of file-name letter-case

I need a way to clean up a directory, which is populated with C/C++ built-files (.o, .a, .EXE, .OBJ, .LIB, etc.) produced by (1) some tools which always create files having UPPER-CASE names, and (2) other tools which always create lower-case file names. (I have no control over the tools.)
I need to do this from a MinGW 'msys' bash.exe shell script (or bash command prompt). I understand piping (|), but haven't come up with the right combination of exec's yet. I have successfully filtered the file names, using commands like this example:
ls | grep '.\.[eE][xX][eE]'
to list all files having any case-combination of letters in the file-extension--this example gets all the executable (e.g. ".EXE") files.
(I'll be doing similar for .o, .a, .OBJ, .LIB, .lib, .MAP, etc., which all share the same directory as the C/C++ source files. I don't want to delete the source files, only the built-files. And yes, I probably should rework the directory structure, to use a separate directory for the built-files [only], but that will take time, and I need a quick solution now.)
How can I merge the above command with "something" else (e.g., like the 'rm -f' command???), to carry this the one step further, to actually delete [only] those filtered-out files from the current directory? (I'm hopeful for a solution which does not require a temporary file to hold the filtered file names.)
Adding this answer because the accepted answer is suggesting practices which are not-recommended in actual scripts. (Please don't feel bad, I was also on that track once..)
Parsing ls output is a NO-NO! See http://mywiki.wooledge.org/ParsingLs for more detailed explanation on why.
In short, ls separates the filenames with newline; which can be present in the filename itself. (Plus, ls does not handle other special characters properly. ls prints the output in human readable form.) In unix/linux, it's perfectly valid to have a newline in the filename.
A unix filename cannot have a NULL character though. Hence below command should work.
find /path/to/some/directory -iname '*.exe' -print0 | xargs -0 rm -f
find: is a tool used to, well, find files matching the required pattern/criterion.
-iname: search using particular names, case insensitive. Note that the argument to -iname is wildcard, not regex.
-print0: Print the file names separated by NULL character.
xargs: Takes the input from stdin & runs the commands supplied (rm -f in this case) on them. The input is separaed by white-space by default.
-0 specifies that the input is separated by null character.
Or even better approach,
find /path/to/some/directory -iname '*.exe' -delete
-delete is a built-in feature of find, which deletes the files found with the pattern.
Note that if you want to do some other operation, like move them to particular directory, you'd need to use first option with xargs.
Finally, this command find /path/to/some/directory -iname '*.exe' -delete would recursively find the *.exe files/directories. You can restrict the search to current directory with -maxdepth 1 & filetype to simple file (not directory, pipe etc.) using -type f. Check the manual link I provided for more details.
this is what you mean?
rm -f `ls | grep '.\.[eE][xX][eE]'`
but usually your "ls | grep ..." output will have some other fields that you have to strip out such as date etc., so you might just want to output the file name itself.
try something like:
rm -f `ls | grep '.\.[eE][xX][eE]' | awk '{print $9}'`
where you file name is in the 9th field like:
-rwxr-xr-x 1 Administrators None 283 Jul 2 2014 search.exe
You can use following command:
ls | grep '.\.[eE][xX][eE]' | xargs rm -f
Use of "xargs" would turn standard input ( in this case output of the previous command) as arguments for "rm -f" command.

Shell script to create directories and files from a list of file names

I'm (still) not a shell-wizard, but I'm trying to find a way to create directories and files from a list of file names.
Let's take this source file (source.txt) as an example:
README.md
foo/index.html
foo/bar/README.md
foo/bar/index.html
foo/baz/README.md
I'll use this command to remove empty lines and trim useless spaces:
$ more source.txt | sed '/^$/d;s/^ *//;s/ *$//'
It will give me this list:
README.md
foo/index.html
foo/bar/README.md
foo/bar/index.html
foo/baz/README.md
Now I'm trying to loop on every line and create the related file (it it doesn't already exists), with it's parents directories.
How could I do this?
Ideally, I would put this script in an alias to quickly use it.
As always, posting a question brings me to the end of the problem...
I came to a satisfying solution, using dirname and basename in a for .. in loop:
for i in `cat source.txt | sed '/^$/d;s/^ *//;s/ *$//'`;
do mkdir -p `dirname $i`;
touch `echo $(dirname $i)$(echo "/")$(basename $i)`;
done
This one-line command will:
read the file names list
create directories
create empty files in their own directory