The idea is to calculate SHA256 hashes for all files in a directory (including all subdirectories), but exclude some files specified in another text file.
The problem is if I specify the following files (see below the code) to exclude, only one of them is excluded, not both.
Here is my code:
while read line
do
if [ $line_count -eq 0 ]
then
exclude_files=".*/$line$"
else
exclude_files="${exclude_files}\|.*/$line$"
fi
line_count=$(( $line_count + 1 ))
done < exclude-files.txt
find . -type f -print0 | xargs -0 shasum -a 256 | grep -v -P "${exclude_files}" > ~/out.txt
Contents of the file exclude-files.txt:
Icon\\r
.DS_Store
--- empty line ---
The file Icon\r is a special file for changing a folder's icon, its name contains a CR. (I'm on Mac OS X 10.7.4)
This is because in your variable \ is recognized as escape symbol for |:
exclude_files="${exclude_files}\|.*/$line$"
you need to add anothers \ to escape \ to get it work:
exclude_files="${exclude_files}\\|.*/$line$"
Also you're using -P option in grep. In this case you don't need to escape |. Therefore you use it without backslash at all.
You should to chose which way you will use: escape or -P. Both together they won't work.
grep would not be safe if filenames contain character with a special meaning, maybe this can help
cmd=(find . -type f \( )
while read line;do cmd=("${cmd[#]}" \! -name "$line" -a);done < exclude-files.txt
cmd[${#cmd[*]}-1]=\)
echo "${cmd[#]}" | cat -v
"${cmd[#]}" -print0 | xargs -0 shasum -a 256
Related
I have a bunch of directories like 001/ 002/ 003/ mixed in with others that have letters in their names. I just want to grab all the directories with numeric names and move them into another directory.
I try this:
file */ | grep ^[0-9]*/ | xargs -I{} mv {} newdir
The matching part works, but it ends up moving everything to the newdir...
I am not sure I understood correctly but here is at least something to help.
Use a combination of find and xargs to manipulate lists of files.
find -maxdepth 1 -regex './[0-9]*' -print0 | xargs -0 -I'{}' mv "{}" "newdir/{}"
Using -print0 and -0 and quoting the replacement symbol {} make your script more robust. It will handle most situations where non-printable chars are presents. This basically says it passes the lines using a \0 char delimiter instead of a \n.
mv is not powerfull enough by itself. It cannot work on patterns.
Try this approach: Rename multiple files by replacing a particular pattern in the filenames using a shell script
Either use a loop or a rename command.
With loop and array,
Your script would be something like this:
#!/bin/bash
DIR=( $(file */ | grep ^[0-9]*/ | awk -F/ '{print $1}') )
for dir in "${DIR[#]}"; do
mv $dir /path/to/DIRECTORY
done
Is there a good regex to find all of the files that do not contain a certain character? I know there are lots to find lines containing matches, but I want something that will find all files that do not contain my match.
Using ls and sed to replace all filenames with no extension (i.e. not containing a .) with NoExtension:
ls | sed -e 's/^[^.]*$/NoExtension/g'
replacing filenames that have an extension with their extension:
ls | sed -e 's/^[^.]*$/NoExtension/g' -e 's/.*\.\(.*\)/\1/'
for bash - to list all files in a directory-:
shopt -s extglob
ls !(*.*)
The extglob setting is required to enable to ! which negates the . argument to ls.
You should discard all the answers that parse the output of ls read here for why. The tool find is perfect for this.
# Show files in cwd
$ ls
file file.txt
# Find the files with an extension
$ find -type f -regex '.*/.*\..*$'
./file.txt
# Invert the match using the -not option
$ find -type f -not -regex '.*/.*\..*$'
./file
And an awk solution, for good measure.
ls | awk '$0 !~ /\..+$/{a++}END{print a}'
This might work for you (find, GNU sed & wc):
find . -type f | sed -rn '\|.*/\.?[^.]+$|w NoExtensions' && wc -l NoExtensions
This gives you a count and a list.
N.B. dot files without extensions are included.
I have a folder containing many folders with subfolder (/...) with the following structre:
_30_photos/combined
_30_photos/singles
_47_foo.bar
_47_foo.bar/combined
_47_foo.bar/singles
_50_foobar
With the command find . -type d -print | grep '_[0-9]*_' all folder with the structure ** will be shown. But I have generate a regex which captures only the */combined folders:
_[0-9]*_[a-z.]+/combined but when I insert that to the find command, nothing will be printed.
The next step would be to create for each combined folder (somewhere on my hdd) a folder and copy the content of the combined folder to the new folder. The new folder name should be the same as the parent name of the subfolder e.g. _47_foo.bar. Could that be achieved with an xargs command after the search?
You do not need grep:
find . -type d -regex ".*_[0-9]*_.*/combined"
For the rest:
find . -type d -regex "^\./.*_[0-9]*_.*/combined" | \
sed 's!\./\(.*\)/combined$!& /somewhere/\1!' | \
xargs -n2 cp -r
With basic grep you will need to escape the +:
... | grep '_[0-9]*_[a-z.]\+/combined'
Or you can use the "extended regexp" version (egrep or grep -E [thanks chepner]) in which the + does not have to be escaped.
xargs may not be the most flexible way of doing the copying you describe above, as it is tricky to use with multiple commands. You may find more flexibility with a while loop:
... | grep '_[0-9]*_[a-z.]\+/combined' | while read combined_dir; do
mkdir some_new_dir
cp -r ${combined_dir} some_new_dir/
done
Have a look at bash string manipulation if you want a way to automate the name of some_new_dir.
target_dir="your target dir"
find . -type d -regex ".*_[0-9]+_.*/combined" | \
(while read s; do
n=$(dirname "$s")
cp -pr "$s" "$target_dir/${n#./}"
done
)
NOTE:
this fails if you have linebreaks "\n" in your directory names
this uses a subshell to not clutter your env - inside a script you don't need that
changed the regex slightly: [0-9]* to [0-9]+
You can use this command:
find . -type d | grep -P "_[0-9]*_[a-z.]+/combined"
I want to use regex's with Linux's find command to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to an xargs command to do certain processing on all of the matching files. I can pipe the find output through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried using find's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and using xargs -0 to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimited find through the piped grep filters successfully; grep -Z didn't seem to help in that respect.
So I figured I'd just write a better regex for find and do away with the intermediary grep filters... perhaps sed would be an alternative?
In any case, for the following small sampling of directories...
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.
BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:
mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;
This command finds all of the .h, .c, and .cpp files...
find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"
...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using ls as a stand-in for what I actually want to do with the output):
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls
ls: ./barney/generated/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/src/bam: No such file or directory
ls: bam.cpp: No such file or directory
ls: ./barney/deploy/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/inc/bam: No such file or directory
ls: bam.h: No such file or directory
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
So I can enhance that with the -print0 and -0 args to find and xargs:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls
barney fred
...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for find and this is the best I could come up with:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.
Any ideas how I can make this work? This is the output I want:
$ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls
./barney/src/bam bam.cpp
./barney/inc/bam bam.h
./fred/src/dino.cpp
./fred/inc/dino.h
...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.
Thanks in advance!
-Mark
This works for me:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
-not -path '*/deploy/*' -print0 | xargs -0 ls -L1d
Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.
The event not found is because ! is being interpreted as a request for history expansion by bash. The fix is to use single quotes instead of double quotes.
Pop quiz: What characters are special inside of a single-quoted string in sh?
Answer: Only ' is special (it ends the string). That's the ultimate safety.
grep with -Z (sometimes known as --null) makes grep output terminated with a null character instead of newline. What you wanted was -z (sometimes known as --null-data) which causes grep to interpret a null character in its input as end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0, which adds a null character after each file name instead of a newline.
If you had done it this way:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld
Then the input and output of grep would have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cpp and started getting "mysteriously" excluded by your script.
Incidentally, here's a nicer way to generate your testcase file set.
while read -r file ; do
mkdir -p "${file%/*}"
touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA
Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.
Your command:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
fails because you are trying to use Posix extended regular expressions, which dont support lookaround/lookbehind etc. https://superuser.com/a/596499/658319
find does support pcre, so if you convert to pcre, this should work.
I wrote a little script which prints the names of files containing problematic character sequences.
#!/bin/bash
# Finds all files in the repository that contain
# undesired characters or sequences of characters
pushd .. >/dev/null
# Find Windows newlines
find . -type f | grep -v ".git/" | grep -v ".gitmodules" | grep -v "^./lib" | xargs grep -l $'\r'
# Find tabs (should be spaces)
find . -type f | grep -v ".git/" | grep -v ".gitmodules" | grep -v "^./lib" | xargs grep -l $'\t'
# Find trailing spaces
find . -type f | grep -v ".git/" | grep -v ".gitmodules" | grep -v "^./lib" | xargs grep -l " $"
popd >/dev/null
I'd line to combine this into one line, i.e. by having grep look for \r OR \t or trailing spaces. How would I construct a regex to do this? It seems that for escape characters a special sequence needs to be used ($'\X') and I'm not sure how to combine these...
I'm running OS X, and am looking for a solution that works on both BSD and GNU based systems.
find . -type f | grep -E -v ".git/|.gitmodules|^./lib" | xargs grep -E -l '$\r|$\t| $'
Not certain that '$\r|$\t| $' will work quoted that way, with a simple test on my system it seemed to work.
I'm using the -E (extended reg-exp) to grep, that allows 'OR'ing together multiple search targets.
Older Unix-en may or maynot support the -E option, so if you get an error message flagging that, replace all grep -E with egrep.
I hope this helps.