Delete directories with name longer than x - regex

I need to delete directories on Linux shell which names are longer than 4 characters.
But don't count length of sub-directories.
For example:
/12345/.. <= Should be deleted
/123456/.. <= Should be deleted
/1234/12345 <= Should NOT be deleted
/1234/123456 <= Should NOT be deleted
UPDATE:
Got it:
find -maxdepth 1 -regextype posix-egrep -type d -regex '.*[^/]{4}' -exec rm -rf {} +

To delete all directories with 5 or more characters in bash, you could do :
rm -rf ?????*/
The expression is not a regular expression, but a glob pattern that uses a set of wildecard characters to specify a filename or path.
Basically, if you want to keep your directories with 4 characters or less, you want to remove everything with 5 or more, hence the 5 ? and single *. The / indicates the directory.
man bash
* :: Matches any string, including the null string. When the globstar shell option is enabled, and * is used in a pathname
expansion context, two adjacent *s used as a single pattern will match all files and zero or more directories and subdirectories. If followed by a /, two adjacent *s will match only directories and
subdirectories.
? :: Matches any single character.
$ find .
.
$ mkdir -p {1,12,123,1234,12345,123456}/{123,12345}
$ touch foobar
$ rm -rf ?????*/
$ find .
.
./123
./123/12345 <= subdirectory with 5 or more not deleted
./123/123
./foobar <= the file is still here
./1234
./1234/12345
./1234/123
./12
./12/123
./12/12345
./1
./1/12345
./1/123

For legibility, pipe to grep then to xargs:
find . -maxdepth 1 -type d | grep '.....' | xargs rm -rf

You can use -regextype for find.
$ pdirpath="/path/to/search"
$ find "$pdirpath" -type d -regextype posix-extended \
-regex "$pdirpath/[a-z0-9]{5,}" -exec rm -rf {} \;
Above will remove directories that are,
/path/to/search/fooba
/path/to/search/12345
/path/to/search/foobar
NOT
/path/to/search/foobar/12345
/path/to/search/1234

find -maxdepth 1 -type d -name '?????*' -delete
It works for deeper nested dirs too, if you don't want to restrict yourself to -maxdepth 1.
Else
rm -rf ?????*/
is of course more brief and elegant. My first idea, rmdir, of course only works with empty dirs. I didn't think about that.
Use
find -maxdepth 1 -type d -name '?????*' -ls
before issuing the delete option, since it is irreversible. And backup often, if using find ... -delete on a regular basis. :)

Related

Deleting files not containing double digit number and pattern in grep

The pattern below is supposed to delete all files that dont start with 1_ but instead it matches all files that don't contain 1.
For example, it'll not match 11_xxx.sql.bz2 and 1_xxx.sql.bz2 but will match all the others correctly.
How can I ensure the pattern only matches the exact number and not any number which contains the number?
For example, i would like the script below only to not match 1_xxx.sql.bz2
ls | grep -P "^[^1]+_([^_]+).+$" | xargs -d"\n" rm
I will need to keep items without a number at the start
I suggest using find like this to match all files in current directory excluding those that start with 1_:
find . -maxdepth 1 -type f -name '[0-9]*' -not -name '1_*' -delete
If your find doesn't support -delete then use:
find . -maxdepth 1 -type f -name '[0-9]*' -not -name '1_*' -exec rm {} +
use grep -v to invert the match, so you exclude files that match the pattern.
grep -v '^1_'

append epoch date at the beginning of a file in bash

I have a list of 20 files, 10 of them already have 1970-01-01- at the beginning of the name and 10 does not ( the remaining ones all start with a small letter ) .
So my task was to rename those files that do not have the epoch date in the beginning with the epoch date too. Using bash, the below code works, but I could not solve it using a regular expression for example using rename. I had to extract the basename and then further mv. An elegant solution would be just use one pipe instead of two.
Works
find ./ -regex './[a-z].*' | xargs -I {} basename {} | xargs -I {} mv {} 1970-01-01-{}
Hence looking for a solution with just one xargs or -exec?
You can just use a single rename command:
rename -n 's/^([a-z])/1970-01-01-$1/' *
Assuming you're operating on all the files present in current directory.
Note that -n flag (dry run) will only show intended actions by rename command but won't really rename any files.
If you want to combine with find then use:
find . -type f -maxdepth 1 -name '[a-z]*.txt' -execdir rename -n 's/^/1970-01-01-/' {} +
I always prefer readable code over short code.
r() {
base=$(basename "$1")
dir=$(dirname "$1")
if [[ "$base" =~ ^1970-01-01- ]]
then
: "ignore, already has correct prefix"
else
echo mv "$1" "$dir/1970-01-01-$base"
fi
}
export -f r
find . -type f -exec bash -c 'r {}' \;
This also just prints out what would have been done (for testing). Remove the echo before the mv to have to real thing.
Mind that the mv will overwrite existing files (if there is a ./a/b/c and an ./a/b/1970-01-01-c already). Use option -i to mv to be save from this.

How to delete certain directory in linux

I have a huge directory that contain a lot of subdirectory. But some subdirectory's name are number or strange letter like β or some other strange things.
The directory looks like this:
/
/a,/b,/1,/0,/$,/β
/a/c,/1/a,/b/β
The depth of the directory are 3, and I want to remove all the directory those names that are not in the 26 letters (a-z). Remove ./1, ./$, ./β ... and /a/1, /b/β, /a/b/2.
I try combine find and grep and parallel (A gnu xargs)
the grep works weird, if i use grep [a-z], it will also contain the strange letter, for example , a with a circle on the top.
So , i wirte this:
find . -type d -maxdepth 2|grep -v '\/[a|b|c|d|e|f|g|h|i|j|K|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z]+/[a|b|c|d|e|f|g|h|i|j|K|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z]+'|parallel -X -r rm -r
But it just removes all the files! The grep output contain the . directory and subdirectory like /p which I do not want to delete, but I think according to the regex it should not be contain.
Why does that happen?
And how can I remove those directory?
find itself can use regex, why not use that:
find . -maxdepth 2 -type d ! -iregex '.*/[a-z]+' -exec rm -r {} \;
Notes:
-iregex : case insensitive regex
-exec : executes a command
You can use Bash's special globbing features:
$ cd -- "$(mktemp --directory)"
$ mkdir a b 1
$ touch 0 '$' β a/c 1/a b/β
$ ls -R .
.:
'$' 0 1 a b β
./1:
a
./a:
c
./b:
β
$ shopt -s extglob globstar
$ rm -r **/!([a-z])/
$ ls -R .
.:
a b
./a:
c
./b:
You can use ls -d instead of rm -r to check which files will be deleted before going through with it.

Recursively go through directories and files in bash + use wc

I need to go recursively through directories. First argument must be directory in which I need to start from, second argument is regex which describes name of the file.
ex. ./myscript.sh directory "regex"
While script recursively goes through directories and files, it must use wc -l to count lines in the files which are described by regex.
How can I use find with -exec to do that? Or there is maybe some other way to do it? Please help.
Thanks
Yes, you can use find:
$ find DIR -iname "regex" -type f -exec wc -l '{}' \;
Or, if you want to count the total number of lines, in all files:
$ find DIR -iname "regex" -type f -exec wc -l '{}' \; | awk '{ SUM += $1 } END { print SUM }'
Your script would then look like:
#!/bin/bash
# $1 - name of the directory - first argument
# $2 - regex - second argument
if [ $# -lt 2 ]; then
echo Usage: ./myscript.sh DIR "REGEX"
exit
fi
find "$1" -iname "$2" -type f -exec wc -l '{}' \;
Edit: - if you need more fancy regular expressions, use -regextype posix-extended and -regex instead of -iname as noted by #sudo_O in his answer.

How can I exclude directories matching certain patterns from the output of the Linux 'find' command?

I want to use regex's with Linux's find command to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to an xargs command to do certain processing on all of the matching files. I can pipe the find output through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried using find's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and using xargs -0 to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimited find through the piped grep filters successfully; grep -Z didn't seem to help in that respect.
So I figured I'd just write a better regex for find and do away with the intermediary grep filters... perhaps sed would be an alternative?
In any case, for the following small sampling of directories...
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.
BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:
mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;
This command finds all of the .h, .c, and .cpp files...
find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"
...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using ls as a stand-in for what I actually want to do with the output):
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls
ls: ./barney/generated/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/src/bam: No such file or directory
ls: bam.cpp: No such file or directory
ls: ./barney/deploy/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/inc/bam: No such file or directory
ls: bam.h: No such file or directory
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
So I can enhance that with the -print0 and -0 args to find and xargs:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls
barney fred
...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for find and this is the best I could come up with:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.
Any ideas how I can make this work? This is the output I want:
$ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls
./barney/src/bam bam.cpp
./barney/inc/bam bam.h
./fred/src/dino.cpp
./fred/inc/dino.h
...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.
Thanks in advance!
-Mark
This works for me:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
-not -path '*/deploy/*' -print0 | xargs -0 ls -L1d
Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.
The event not found is because ! is being interpreted as a request for history expansion by bash. The fix is to use single quotes instead of double quotes.
Pop quiz: What characters are special inside of a single-quoted string in sh?
Answer: Only ' is special (it ends the string). That's the ultimate safety.
grep with -Z (sometimes known as --null) makes grep output terminated with a null character instead of newline. What you wanted was -z (sometimes known as --null-data) which causes grep to interpret a null character in its input as end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0, which adds a null character after each file name instead of a newline.
If you had done it this way:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld
Then the input and output of grep would have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cpp and started getting "mysteriously" excluded by your script.
Incidentally, here's a nicer way to generate your testcase file set.
while read -r file ; do
mkdir -p "${file%/*}"
touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA
Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.
Your command:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
fails because you are trying to use Posix extended regular expressions, which dont support lookaround/lookbehind etc. https://superuser.com/a/596499/658319
find does support pcre, so if you convert to pcre, this should work.