I have a huge directory that contain a lot of subdirectory. But some subdirectory's name are number or strange letter like β or some other strange things.
The directory looks like this:
/
/a,/b,/1,/0,/$,/β
/a/c,/1/a,/b/β
The depth of the directory are 3, and I want to remove all the directory those names that are not in the 26 letters (a-z). Remove ./1, ./$, ./β ... and /a/1, /b/β, /a/b/2.
I try combine find and grep and parallel (A gnu xargs)
the grep works weird, if i use grep [a-z], it will also contain the strange letter, for example , a with a circle on the top.
So , i wirte this:
find . -type d -maxdepth 2|grep -v '\/[a|b|c|d|e|f|g|h|i|j|K|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z]+/[a|b|c|d|e|f|g|h|i|j|K|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z]+'|parallel -X -r rm -r
But it just removes all the files! The grep output contain the . directory and subdirectory like /p which I do not want to delete, but I think according to the regex it should not be contain.
Why does that happen?
And how can I remove those directory?
find itself can use regex, why not use that:
find . -maxdepth 2 -type d ! -iregex '.*/[a-z]+' -exec rm -r {} \;
Notes:
-iregex : case insensitive regex
-exec : executes a command
You can use Bash's special globbing features:
$ cd -- "$(mktemp --directory)"
$ mkdir a b 1
$ touch 0 '$' β a/c 1/a b/β
$ ls -R .
.:
'$' 0 1 a b β
./1:
a
./a:
c
./b:
β
$ shopt -s extglob globstar
$ rm -r **/!([a-z])/
$ ls -R .
.:
a b
./a:
c
./b:
You can use ls -d instead of rm -r to check which files will be deleted before going through with it.
Related
I need to match files only with one specific extension under all nested directories, including the PWD, with BASH using "globbing".
I do not need to Match all files under all nested directories with shell globbing, but not in the PWD.
I need to match files using commands other than grep search all directories with filename extension
I do not need to only grep recursively, but only in files with certain extensions (plural)
set -o globstar; ls **/*.* is great for all files (not my question).
ls **/*.php does not match in the PWD.
set -o globstar; **/*.php returns duplicate files.
grep -r --include=\*.php "find me" ./ is specifically for grep, not globbing (consider this Question). It seems grep has --include=GLOB because this is not possible using globbing.
From this Answer (here), I believe there may not be a way to do this using globbing.
tl;dr
I need:
A glob expression
To match any command where simple globs can be used (ls, sed, cp, cat, chown, rm, et cetera)
Mainly in BASH, but other shells would be interesting
Both in the PWD and all subdirectories recursively
For files with a specific extension
I'm using grep & ls only as examples, but I need a glob expression that applies to other commands also.
grep -r --include=GLOB is not a glob expression for, say, cp; it is a workaround specific to grep and is not a solution.
find is not a glob, but it may be a workaround for non-grep commands if there is no such glob expression. It would need | or while do;, et cetera.
Examples
Suppose I have these files, all containing "find me":
./file1.js
./file2.php
./inc/file3.js
./inc/file4.php
./inc.php/file5.js
./inc.php/file6.php
I need to match only/all .php one time:
./file2.php
./inc/file4.php
./inc.php/file6.php
Duplicates returned: shopt -s globstar; ... **/*.php
This changes the problem; it does not solve it.
Dup: ls
Before entering shopt -s globstar as a single command...
ls **/*.php returns:
inc/file4.php
inc.php/file5.js
inc.php/file6.php
file2.php does not return.
After entering shopt -s globstar as a single command...
ls **/*.php returns:
file2.php
inc/file4.php
inc.php/file6.php
inc.php:
file5.js
file6.php
inc.php/file6.php returns twice.
Dup: grep
Before entering shopt -s globstar as a single command...
grep -R "find me" **/*.php returns:
inc/file4.php: find me
inc.php/file6.php: find me
file2.php does not return.
After entering shopt -s globstar as a single command...
grep -R "find me" **/*.php returns:
file2.php: find me
inc/file4.php: find me
inc.php/file5.js: find me
inc.php/file6.php: find me
inc.php/file6.php: find me
inc.php/file6.php returns twice.
After seeing the duplicate seen from the ls output, we know why.
Current solution: faulty misuse of && logic
grep -r "find me" *.php && grep -r "find me" */*.php
ls -l *.php && ls -l */*.php
Please no! I fail here && so I never happen
Desired solution: single command via globbing
grep -r "find me" [GLOB]
ls -l [GLOB]
Insight from grep
grep does have the --include flag, which achieves the same result but using a flag specific to grep. ls does not have an --include option. This leads me to believe that there is no such glob expression, which is why grep has this flag.
With bash, you can first do a shopt -s globstar to enable recursive matching, and then the pattern **/*.php will expand to all the files in the current directory tree that have a .php extension.
zsh and ksh93 also support this syntax. Other commands that take a glob pattern as an argument and do their own expansion of it (like your grep --include) likely won't.
With shell globing it is possible to only get directories by adding a / at the end of the glob, but there's no way to exclusively get files (zsh being an exception)
Illustration:
With the given tree:
file.php
inc.php/include.php
lib/lib.php
Supposing that the shell supports the non-standard ** glob:
**/*.php/ expands to inc.php/
**/*.php expands to file.php inc.php inc.php/include.php lib/lib.php
For getting file.php inc.php/include.php lib/lib.php, you cannot use a glob.
=> with zsh it would be **/*.php(.)
Standard work-around (any shell, any OS)
The POSIX way to recursively get the files that match a given standard glob and then apply a command to them is to use find -type f -name ... -exec ...:
ls -l <all .php files> would be:
find . -type f -name '*.php' -exec ls -l {} +
grep "finde me" <all .php files> would be:
find . -type f -name '*.php' -exec grep "finde me" {} +
cp <all .php files> ~/destination/ would be:
find . -type f -name '*.php' -type f -exec sh -c 'cp "$#" ~/destination/' _ {} +
remark: This one is a little more tricky because you need ~/destination/ to be after the file arguments, and find's syntax doesn't allow find -exec ... {} ~/destination/ +
Suggesting different strategy:
Use explicit find command to build bash command(s) on the selected files using -printf option.
Inspect the command for correctness and run.
1. preparing bash commands on selected files
find . -type f -name "*.php" -printf "cp %p ~/destination/ \n"
2. inspect the output, correct command, correct filter, test
cp ./file2.php ~/destination/
cp ./inc/file4.php ~/destination/
cp ./inc.php/file5.php ~/destination/
3. execute prepared find output
bash <<< $(find . -type f -name "*.php" -printf "cp %f ~/destination/ \n")
I need to delete directories on Linux shell which names are longer than 4 characters.
But don't count length of sub-directories.
For example:
/12345/.. <= Should be deleted
/123456/.. <= Should be deleted
/1234/12345 <= Should NOT be deleted
/1234/123456 <= Should NOT be deleted
UPDATE:
Got it:
find -maxdepth 1 -regextype posix-egrep -type d -regex '.*[^/]{4}' -exec rm -rf {} +
To delete all directories with 5 or more characters in bash, you could do :
rm -rf ?????*/
The expression is not a regular expression, but a glob pattern that uses a set of wildecard characters to specify a filename or path.
Basically, if you want to keep your directories with 4 characters or less, you want to remove everything with 5 or more, hence the 5 ? and single *. The / indicates the directory.
man bash
* :: Matches any string, including the null string. When the globstar shell option is enabled, and * is used in a pathname
expansion context, two adjacent *s used as a single pattern will match all files and zero or more directories and subdirectories. If followed by a /, two adjacent *s will match only directories and
subdirectories.
? :: Matches any single character.
$ find .
.
$ mkdir -p {1,12,123,1234,12345,123456}/{123,12345}
$ touch foobar
$ rm -rf ?????*/
$ find .
.
./123
./123/12345 <= subdirectory with 5 or more not deleted
./123/123
./foobar <= the file is still here
./1234
./1234/12345
./1234/123
./12
./12/123
./12/12345
./1
./1/12345
./1/123
For legibility, pipe to grep then to xargs:
find . -maxdepth 1 -type d | grep '.....' | xargs rm -rf
You can use -regextype for find.
$ pdirpath="/path/to/search"
$ find "$pdirpath" -type d -regextype posix-extended \
-regex "$pdirpath/[a-z0-9]{5,}" -exec rm -rf {} \;
Above will remove directories that are,
/path/to/search/fooba
/path/to/search/12345
/path/to/search/foobar
NOT
/path/to/search/foobar/12345
/path/to/search/1234
find -maxdepth 1 -type d -name '?????*' -delete
It works for deeper nested dirs too, if you don't want to restrict yourself to -maxdepth 1.
Else
rm -rf ?????*/
is of course more brief and elegant. My first idea, rmdir, of course only works with empty dirs. I didn't think about that.
Use
find -maxdepth 1 -type d -name '?????*' -ls
before issuing the delete option, since it is irreversible. And backup often, if using find ... -delete on a regular basis. :)
I have a series of files that I would like to clean up using commandline tools available on a *nix system. The existing files are named like so.
filecopy2.txt?filename=3
filecopy4.txt?filename=33
filecopy6.txt?filename=198
filecopy8.txt?filename=188
filecopy3.txt?filename=19
filecopy5.txt?filename=1
filecopy7.txt?filename=5555
I would like them to be renamed removing all characters after and including the "?".
filecopy2.txt
filecopy4.txt
filecopy6.txt
filecopy8.txt
filecopy3.txt
filecopy5.txt
filecopy7.txt
I believe the following regex will grab the bit I want to remove from the name,
\?(.*)
I just can't figure out how to accomplish this task beyond this.
A bash command:
for file in *; do
mv $file ${file%%\?filename=*}
done
find . -depth -name '*[?]*' -exec sh -c 'for i do
mv "$i" "${i%[?]*}"; done' sh {} +
With zsh:
autoload zmv
zmv '(**/)(*)\?*' '$1$2'
Change it to:
zmv -Q '(**/)(*)\?*(D)' '$1$2'
if you want to rename dot files as well.
Note that if filenames may contain more than one ? character, both will only trim from the rightmost one.
If all files are in the same directory (ignoring .dotfiles):
$ rename -n 's/\?filename=\d+$//' -- *
If you want to rename files recursively in a directory hierarchy:
$ find . -type f -exec rename -n 's/\?filename=\d+$//' {} +
Remove -n option, to do the renaming.
I this case you can use the cut command:
echo 'filecopy2.txt?filename=3' | cut -d? -f1
example:
find . -type f -name "*\?*" -exec sh -c 'mv $1 $(echo $1 | cut -d\? -f1)' mv {} \;
You can use rename if you have it:
rename 's/\?.*$//' *
I use this after downloading a bunch of files where the URL included parameters and those parameters ended up in the file name.
This is a Bash script.
for file in *; do
mv $file ${file%%\?*};
done
I want to use regex's with Linux's find command to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to an xargs command to do certain processing on all of the matching files. I can pipe the find output through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried using find's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and using xargs -0 to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimited find through the piped grep filters successfully; grep -Z didn't seem to help in that respect.
So I figured I'd just write a better regex for find and do away with the intermediary grep filters... perhaps sed would be an alternative?
In any case, for the following small sampling of directories...
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.
BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:
mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;
This command finds all of the .h, .c, and .cpp files...
find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"
...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using ls as a stand-in for what I actually want to do with the output):
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls
ls: ./barney/generated/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/src/bam: No such file or directory
ls: bam.cpp: No such file or directory
ls: ./barney/deploy/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/inc/bam: No such file or directory
ls: bam.h: No such file or directory
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
So I can enhance that with the -print0 and -0 args to find and xargs:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls
barney fred
...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for find and this is the best I could come up with:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.
Any ideas how I can make this work? This is the output I want:
$ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls
./barney/src/bam bam.cpp
./barney/inc/bam bam.h
./fred/src/dino.cpp
./fred/inc/dino.h
...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.
Thanks in advance!
-Mark
This works for me:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
-not -path '*/deploy/*' -print0 | xargs -0 ls -L1d
Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.
The event not found is because ! is being interpreted as a request for history expansion by bash. The fix is to use single quotes instead of double quotes.
Pop quiz: What characters are special inside of a single-quoted string in sh?
Answer: Only ' is special (it ends the string). That's the ultimate safety.
grep with -Z (sometimes known as --null) makes grep output terminated with a null character instead of newline. What you wanted was -z (sometimes known as --null-data) which causes grep to interpret a null character in its input as end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0, which adds a null character after each file name instead of a newline.
If you had done it this way:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld
Then the input and output of grep would have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cpp and started getting "mysteriously" excluded by your script.
Incidentally, here's a nicer way to generate your testcase file set.
while read -r file ; do
mkdir -p "${file%/*}"
touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA
Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.
Your command:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
fails because you are trying to use Posix extended regular expressions, which dont support lookaround/lookbehind etc. https://superuser.com/a/596499/658319
find does support pcre, so if you convert to pcre, this should work.
I am working on a bash script.
grep -R -l "image17" *
image17 will change to some other number when I go through my loop. When I execute the grep above, I get back the following:
slides/_rels/slide33.xml.rels
I need to put slide33 in a variable because I want to use that to rename the file named image17.jpeg to be called slide33.jpeg. I need something to check for the above format and parse out starting at slide and ending with the numbers.
Another problem is the grep statement could come up with multiple results rather than one. I need a way to check to see how many results and if one do one thing and if more than one do another.
Here is what I have so far. Now I just need to put the grep as a variable and check to see how many times it happens and if it is one then do the regular expression to get the filename.
#!/bin/sh IFS=$'\n'
where="/Users/mike/Desktop/test"
cd "${where}"
for file in $(find * -maxdepth 0 -type d)
do
cd "${where}/${file}/images"
ls -1 | grep -v ".png" | xargs -I {} rm -r "{}"
cd "${where}/${file}/ppt"
for images in $(find * -maxdepth 0 -type f)
do
if [ (grep -R -l "${images}" * | wc -l) == 1 ]
then
new_name=grep -R -l "slide[0-9]"
fi
done
done
i=0
while [ $i -lt 50 ]
do
grep -R -l "image${i}"
done
something like this might help
Or, to detect similar structured words you can do
grep -R -l "slide[0-9][0-9]"
or you can do
grep -R -l "slide[0-9]+"
to match atleast one digit and atmost any number
Check man grep for more in the "REGULAR EXPRESSION" section
this will match words starting with "slide" and ending with exactly two numbers
grep -c does count the number of matches, but does not print the matches. I think you should count the lines to detect the number of lines which grep matched and then execute the conditional statement.