Regex doesnt match the file path in bash - regex

I've spent quite some time trying to figure out why this regex doesn't match files with names like:
/var/tmp/app.0.attachments
....
/var/tmp/app.11.attachments
sudo rm -rf /var/tmp/app/\.([0-9]{1}|1[0-1]{1})/\.attachments
$: -bash: syntax error near unexpected token `('
I've tried escaping [, ], | and {}
Please help.

Try
sudo rm -rf /var/tmp/app.{0..11}.attachments

Regexes do not work at the shell. Shells do globbing, which is simpler and not as powerful. With the default globbing, the best you can do is something like:
sudo rm -rf /var/tmp/app/app.[0-9]*.attachments
If you enable extended globbing, you can add pipes and grouping to the toolset.
shopt -s extglob
sudo rm -rf /var/tmp/app/app.#([0-9]|1[0-1]).attachments
Note the different syntax. It's not regex, but it's similar. From the bash(1) man page:
If the extglob shell option is enabled using the shopt builtin, several extended pattern matching operators are recognized. In the following description, a pattern-list is a list of one or more patterns separated by a |. Composite patterns may be formed using one or more of the following sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Another alternative would be to use find, which can do both globbing and regexes.
sudo find /var/tmp -regex '/var/tmp/app\.\([0-9]\|1[0-1]\)\.attachments' -delete
sudo find /var/tmp -regex '/var/tmp/app\.\([0-9]\|1[0-1]\)\.attachments' -exec rm -rf {} +
Note that it performs a match on the entire path, not just the file name. You also have to escape \(, \), and \|.

Related

How to find multiple files with different ending in LInux using regex?

Let's say that I have multiple files such as:
root.file991
root.file81
root.file77
root.file989
If I want to delete all of them, I would need to use a regex first, so I have tried:
find ./ - regex '\.\/root'
...which would find everything in root file, but how do I filter all these specific files?
You can use
find ./ -regextype posix-extended -regex '\./root\.file[0-9]+'
The regex will match paths like
\. - a dot
/root\.file - a /root.file text
[0-9]+ - ending with one or more digits.
I'm not quite sure what you mean by "files in root file" but if I understand correctly regular POSIX glob(7) pattern matching should be sufficient:
rm root.file[0-9]*
Depending on how complex the other files are, you may have to build up the regex more. $ man find has useful help as well. Try the following:
$ find ./ -regex '\.\/root.file[0-9].*'
# if that works to find what you are looking for, add the -delete
$ find ./ -regex '\.\/root.file[0-9].*' -delete

How to use mksquashfs -regex?

I want to mksquashfs a chroot, and include the /cdrom dir, but exclude everything inside it. I already know how to do this with -wildcards, but I want to see if -regex has a bug. Test case:
cd $(mktemp -d)
mkdir -p cdrom cdrom2/why
mksquashfs . /tmp/chroot.squashfs -info -noappend -regex -e '^cdrom/.*$'
The problem is that cdrom2/why was omitted! It seems to me like "/" is actually ignored there. Is this a mksquashfs bug?
This is because you don't fully understand how regexes work in Mksquashfs exclude files.
An exclude file if wildcards are used is basically treated as series of wildcarded files separated by slashes (/), i.e. wildcard1/wildcard2/wildcard3, will match wildcard1 against the top level directory, wildcard2 against the subdirectory and so on.
Specifying -regex simply replaces wildcard matching with regex matching. It is still evaluated as regexes separated by slashes (/), i.e. regex1/regex2/regex3.
In your example the regex "^cdrom" is evaluated against the files in the top level directory, and matches both "cdrom" and "cdrom2".
If you wanted the regex to only match "cdrom" you should use
mksquashfs . /tmp/chroot.squashfs -info -noappend -regex -e '^cdrom$/.*'

Expand command line exclude pattern with zsh

I'm trying to pass a complicated regex as an ignore pattern. I want to ignore all subfolders of locales/ except locales/US/en/*. I may need to fallback to using a .agignore file, but I'm trying to avoid that.
I'm using silver searcher (similar to Ack, Grep). I use zsh in my terminal.
This works really well and ignores all locale subfolders except locales/US:
ag -g "" --ignore locales/^US/ | fzf
I also want to ignore all locales/US/* except for locales/US/en
Want I want is this, but it does not work.
ag -g "" --ignore locales/^US/^en | fzf
Thoughts?
Add multiple --ignore commands. For instance:
ag -g "" --ignore locales/^US/ --ignore locales/US/^en
The following can work as well:
find locales/* -maxdepth 0 -name 'US' -prune -o -exec rm -rf '{}' ';'
Man Pages Documentation
-prune
True; if the file is a directory, do not descend into it. If -depth is given, false; no effect. Because -delete implies -depth, you cannot usefully use -prune and -delete together.
-prune lets you filter out your results (better description here)
-exec command ;
Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of ';' is encountered. The string '{}' is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find. Both of these constructions might need to be escaped (with a '\') or quoted to protect them from expansion by the shell. See the EXAMPLES section for examples of the use of the -exec option. The specified command is run once for each matched file. The command is executed in the starting directory. There are unavoidable security problems surrounding use of the -exec action; you should use the -execdir option instead.
-exec lets you execute a command on any results find returns.

ls and regular expression linux

I have two directories:
run.2016-02-25_01.
run.2016-02-25_01.47.04
Both these directories are present under a common directory called gte.
I want a directory that ends without a dot character ..
I am using the following command, however, I am not able to make it work:
ls run* | grep '.*\d+'
The commands is not able to find anything.
The negated character set in shell globbing uses ! not ^:
ls -d run*[!.]
(The ^ was at one time an archaic synonym for |.) The -d option lists directory names, not the contents of those directories.
Your attempt using:
ls run* | grep '.*\d+'
requires a PCRE-enabled grep and the PCRE regex option (-P), and you are looking for zero or more of any character followed by one or more digits, which isn't what you said you wanted. You could use:
ls -d run* | grep '[^.]$'
which doesn't require the PCRE regexes, but simply having the shell glob the right names is probably best.
If you're worried that there might not be a name starting run and ending with something other than a dot, you should consider shopt -s nullglob, as mentioned in Anubhava's answer. However, note the discussion below between hek2mgl and myself about the potentially confusing behaviour of, in particular, the ls command in conjunction with shopt -s nullglob. If you were using:
for name in run*[!.]
do
…
done
then shopt -s nullglob is perfect; the loop iterates zero times when there's no match for the glob expression. It isn't so good when the glob expression is an argument to commands such as ls that provide a default behaviour in the absence of command line arguments.
You don't need grep. Just use:
shopt -s nullglob
ls -d run*[0-9]
If your directories are not always ending with digits then use extglob:
shopt -s nullglob extglob
ls -d run*+([^.])
or to list all entries inside the run* directory ending without DOT:
printf "%s\n" run*+([^.])/*
This works...
ls|grep '.*[^.]$'
That is saying I want any amount of anything but I want the last character before the line ending to be anything except for a period.
To list the directories that don't end with a . .
ls -d run* |grep "[^.]$"
I would use find
find -regextype posix-awk -maxdepth 1 -type d -regex '-*[[:digit:]]+$'

How to Recursively Remove Files of a Certain Type

I misread the gzip documentation, and now I have to remove a ton of ".gz" files from many directories inside one another. I tried using 'find' to locate all .gz files. However, whenever there's a file with a space in the name, rm interprets that as another file. And whenever there's a dash, rm interprets that as a new flag. I decided to use 'sed' to replace the spaces with "\ " and the space-dashes with "\ -", and here's what I came up with.
find . -type f -name '*.gz' | sed -r 's/\ /\\ /g' | sed -r 's/\ -/ \\-/g'
When I run the find/sed query on a file that, for example, has a name of "Test - File - for - show.gz", I get the output
./Test\ \-\ File\ \-\ for\ \-\ show.gz
Which appears to be acceptable for rm, but when I run
rm $(find . -type f -name '*.gz'...)
I get
rm: cannot remove './Test\\': No such file or directory
rm: cannot remove '\\-\\': No such file or directory
rm: cannot remove 'File\\': No such file or directory
rm: cannot remove '\\-\\': No such file or directory
...
I haven't made extensive use of sed, so I have to assume I'm doing something wrong with the regular expressions. If you know what I'm doing wrong, or if you have a better solution, please tell me.
Adding backslashes before spaces protects the spaces against expansion in shell source code. But the output of a command in a command substitution does not undergo shell parsing, it only undergoes wildcard expansion and field splitting. Adding backslashes before spaces doesn't protect them against field splitting.
Adding backslashes before dashes is completely useless since it's rm that interprets dashes as special, and it doesn't interpret backslashes as special.
The output of find is ambiguous in general — file names can contain newlines, so you can't use a newline as a file name separator. Parsing the output of find is usually broken unless you're dealing with file names in a known, restricted character set, and it's often not the simplest method anyway.
find has a built-in way to execute external programs: the -exec action. There's no parsing going on, so this isn't subject to any problem with special characters in file names. (A path beginning with - could still be interpreted as an option, but all paths begin with . since that's the directory being traversed.)
find . -type f -name '*.gz' -exec rm {} +
Many find implementations (Linux, Cygwin, BSD) can delete files without invoking an external utility:
find . -type f -name '*.gz' -delete
See Why does my shell script choke on whitespace or other special characters? for more information on writing robust shell scripts.
There is no need to pipe to sed, etc. Instead, you can make use of the -exec flag on find, that allows you to execute a command on each one of the results of the command.
For example, for your case this would work:
find . -type f -name '*.gz' -exec rm {} \;
which is approximately the same as:
find . -type f -name '*.gz' -exec rm {} +
The last one does not open a subshell for each result, which makes it faster.
From man find:
-exec command ;
Execute command; true if 0 status is returned. All following
arguments to find are taken to be arguments to the command until an
argument consisting of ;' is encountered. The string{}' is
replaced by the current file name being processed everywhere it occurs
in the arguments to the command, not just in arguments where it is
alone, as in some versions of find. Both of these constructions
might need to be escaped (with a `\') or quoted to protect them from
expansion by the shell. See the EXAMPLES section for examples of the
use of the -exec option. The specified command is run once for
each matched file. The command is executed in the starting directory.
There are unavoidable security problems surrounding use of the -exec
action; you should use the -execdir option instead.