ls and regular expression linux - regex

I have two directories:
run.2016-02-25_01.
run.2016-02-25_01.47.04
Both these directories are present under a common directory called gte.
I want a directory that ends without a dot character ..
I am using the following command, however, I am not able to make it work:
ls run* | grep '.*\d+'
The commands is not able to find anything.

The negated character set in shell globbing uses ! not ^:
ls -d run*[!.]
(The ^ was at one time an archaic synonym for |.) The -d option lists directory names, not the contents of those directories.
Your attempt using:
ls run* | grep '.*\d+'
requires a PCRE-enabled grep and the PCRE regex option (-P), and you are looking for zero or more of any character followed by one or more digits, which isn't what you said you wanted. You could use:
ls -d run* | grep '[^.]$'
which doesn't require the PCRE regexes, but simply having the shell glob the right names is probably best.
If you're worried that there might not be a name starting run and ending with something other than a dot, you should consider shopt -s nullglob, as mentioned in Anubhava's answer. However, note the discussion below between hek2mgl and myself about the potentially confusing behaviour of, in particular, the ls command in conjunction with shopt -s nullglob. If you were using:
for name in run*[!.]
do
…
done
then shopt -s nullglob is perfect; the loop iterates zero times when there's no match for the glob expression. It isn't so good when the glob expression is an argument to commands such as ls that provide a default behaviour in the absence of command line arguments.

You don't need grep. Just use:
shopt -s nullglob
ls -d run*[0-9]
If your directories are not always ending with digits then use extglob:
shopt -s nullglob extglob
ls -d run*+([^.])
or to list all entries inside the run* directory ending without DOT:
printf "%s\n" run*+([^.])/*

This works...
ls|grep '.*[^.]$'
That is saying I want any amount of anything but I want the last character before the line ending to be anything except for a period.

To list the directories that don't end with a . .
ls -d run* |grep "[^.]$"

I would use find
find -regextype posix-awk -maxdepth 1 -type d -regex '-*[[:digit:]]+$'

Related

why the ls -R (recursing down) doesn't work with regular expression

In my case, the directory tree is following
[peter#CentOS6 a]$ tree
.
├── 2.txt
└── b
└── 1.txt
1 directory, 2 files
why the following two command does only get 2.txt?
[peter#CentOS6 a]$ ls -R *.txt
2.txt
[peter#CentOS6 a]$ ls -R | grep *.txt
2.txt
In both cases, your shell is expanding *.txt into 2.txt before the argument hits the command. So, you are in effect running
ls -R 2.txt
ls -R | grep 2.txt
You can't tell ls to look for a file pattern - that's what find is for. In the second case, you should quote your expression and use a proper regex:
ls -R | grep '\.txt'
You can use find as follows to list all matching files in current and sub directories
find . -name "*.txt"
It isn't clear if you are asking "why" meaning "explain the output" or "how should it be done". Steephen has already answered the latter, this is an answer to the former.
The reason for that is called "shell expansion". When you type *.txt in the command line, the program doesn't get it as a parameter, but rather the shell expands it and then passes the results.
*.txt expands to be "all files in the current directory with arbitrarily many symbols in the beginning, ending with '.txt' and not starting with '.'".
This means that when you type "ls -R *.txt" the command that actually executes is "ls -R 2.txt"; and when you do "ls -R | grep *.txt" it actually executes "ls -R | grep 2.txt".
This is the exact reason why Steephen has put quotation marks around the wildcard in the answer provided. It is necessary to stop this expansion. In fact you could also do so with single quotes or by placing a slash before any special character. Thus any of the following will work:
find . -name "*.txt"
or
find . -name '*.txt'
or
find . -name \*.txt
The other problem that nobody has mentioned yet is that, beyond the fact that the shell intercepts the * before grep sees it, the shell treats * differently from grep.
The shell uses file globbing, and * means "any number of characters".
grep uses regular expressions, and * means "any number of the preceding item".
What you need to do is
ls -R | grep .\*\\.txt
which will
escape the * so your shell does not intercept it
properly format the regular expression the way grep expects
properly escape the . in .txt to ensure that you have file extensions

How to call grep on pattern files?

I'm trying to grep over files which have names matching regexp. But following:
#!/bin/bash
grep -o -e "[a-zA-Z]\{1,\}" $1 -h --include=$2 -R
is working only in some cases. When I call this script like that:
./script.sh dir1/ [A-La-l]+
it doesn't work. But following:
./script.sh dir1/ \*.txt
works fine. I have also tried passing arguments within double-quotes and quotes but neither worked for me.
Do you have any ideas how to solve this problem?
grep's --include option does not accept a regex but a glob (such as *.txt), which is considerably less powerful. You will have to decide whether you want to match regexes or globs -- *.txt is not a valid regex (the equivalent regex is .*\.txt) while [A-La-l]+ is not a valid glob.
If you want to do it with regexes, you will not be able to do it with grep alone. One thing you could do is to leave the file selection to a different tool such as find:
find "$1" -type f -regex "$2" -exec grep -o -e '[a-zA-Z]\{1,\}' -h '{}' +
This will construct and run a command grep -o -e '[a-zA-Z]\{1,\}' -h list of files in $1 matching the regex $2. If you replace the + with \;, it will run the command for each file individually, which should yield the same results (very) slightly more slowly.
If you want to do it with globs, you can carry on as before; your code already does that. You should put double quotes around $1 and $2, though.

Regex doesnt match the file path in bash

I've spent quite some time trying to figure out why this regex doesn't match files with names like:
/var/tmp/app.0.attachments
....
/var/tmp/app.11.attachments
sudo rm -rf /var/tmp/app/\.([0-9]{1}|1[0-1]{1})/\.attachments
$: -bash: syntax error near unexpected token `('
I've tried escaping [, ], | and {}
Please help.
Try
sudo rm -rf /var/tmp/app.{0..11}.attachments
Regexes do not work at the shell. Shells do globbing, which is simpler and not as powerful. With the default globbing, the best you can do is something like:
sudo rm -rf /var/tmp/app/app.[0-9]*.attachments
If you enable extended globbing, you can add pipes and grouping to the toolset.
shopt -s extglob
sudo rm -rf /var/tmp/app/app.#([0-9]|1[0-1]).attachments
Note the different syntax. It's not regex, but it's similar. From the bash(1) man page:
If the extglob shell option is enabled using the shopt builtin, several extended pattern matching operators are recognized. In the following description, a pattern-list is a list of one or more patterns separated by a |. Composite patterns may be formed using one or more of the following sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Another alternative would be to use find, which can do both globbing and regexes.
sudo find /var/tmp -regex '/var/tmp/app\.\([0-9]\|1[0-1]\)\.attachments' -delete
sudo find /var/tmp -regex '/var/tmp/app\.\([0-9]\|1[0-1]\)\.attachments' -exec rm -rf {} +
Note that it performs a match on the entire path, not just the file name. You also have to escape \(, \), and \|.

grep filename[asterisk] returns unexpected result

I have a basic question about ls command.
Suppose in a directory I have 4 files named
run
run1
running
run.sh
So, if i do: ls -l|grep run* then I get no result.
But if i do ls -l|grep run.* then I get run.sh as a result.
However I expected grep to list all of the files in both the cases.
Could you make me understand what is going on behind scenes?
This is because the asterisk is special to the shell and gets expanded. To avoid this, you have to quote the regex for grep to see it unexpanded:
ls -l|grep 'run*'
And note that this is not what you want, because 'run*' as an regexp means 'ru followed by any number of n'. This will list also files named rubber and so on. To list files that match a shell glob pattern (which is different from an regexp), why not simply use
ls -l run*
ls -l run.*
and avoid the useless grep process entirely?
As long as I understand, the "*" is expanded by the shell before executing the command itself, so your grep will try to catch a string with all the file names! On the other hand, grep expects a regular expression, so the "*" is not interpreted as you expect.
The direct solution would be:
$ ls -l run*
Or, if you want to use grep, then scape the "*" and provide a regular expression:
$ ls -l|grep run.\*
$ ls -l|grep 'run.*'
Before the shell even runs grep, it searches through your command for any unquoted file globbing characters, and performs filename expansion on those arguments.
So when you enter this command:
ls -l | grep run*
the shell uses the pattern run* to search for files in the current directory, and finds run, run1, running and run.sh. It then rewrites the grep command with those arguments:
ls -l | grep run run1 running run.sh
which causes grep to search run1, running and run.sh for the string run.
As noted, the solution is to quote the argument to grep so the shell does not try to perform filename expansion on it:
ls -l | grep 'run.*'

What's the most compact version of "match everything but these strings" in the shell or regex?

Linux: I want to list all the files in a directory and within its subdirectories, except some strings. For that, I've been using a combination of find/grep/shell globbing. For instance, I want to list all files except those in the directories
./bin
./lib
./resources
I understand this can be done as shown in this question and this other. But both versions are not solving the case "everything, but this pattern" in general terms.
It seems that it is much easier to use a conditional for filtering the results, but I wonder if there is any compact and elegant way of describing this in regexp or in the shell extended globbing.
Thanks.
yourcommand | egrep -v "pattern1|pattern2|pattern3"
Use prune option of find.
find . -path './bin' -prune -o -path './lib' -prune -o -path './resources' -prune -o «rest of your find params»
With bash's extglob shopt setting enabled, you can exclude files with ! in your wildcard pattern. Some examples:
Everything except bin, lib, and resources
shopt -s extglob
ls -Rl !(bin|lib|resources)
Everything with an i in it, except bin and lib
ls -Rl !(bin|lib|!(*i*))
(Everything that has an i in it is the same as everything except the things that don't have i's in them.)