find that excludes directories that are number dash number - regex

I'm trying to write a find command that excludes directories that are numbers dash number, but allow other directories.
Sample directories
./135888897-135954433/
./135888897-135954434/
./135888897-135954435/
./BLAG-DEF-JOB1/
./TOM-DEPLOYDEV-JOB1/
./FRANK-RELEASE-JOB1/
./STEVE-RELEASE-JOB1/
Here's part of my find command. I can't seem to get it to skip the number directories.
find . -type f ! -regex '\./[0-9]+\-[0-9]+/*'
Any help would be great. Thanks!

You should use .* instead of *.
When useing regex a * means 'match the preceding token 0 or more times'
This will result in the following command:
find . -type f ! -regex '\./[0-9]+\-[0-9]+/.*'
Update: I also thought you forgot to escape the / in your command, but after doing a little bit of research it seems escaping / is not necessary when using the find command.

You can use:
find . -type f ! -regex '\./[0-9]+-[0-9]+/.*'

Related

Linux find-command with regex and write in a file working strange

So I have a folder with .wav files that look like this:
afr_(4 digits)_(random digits).wav
I want to create a list with all files that start with for example afr_0184_ and afr_1919_ and I use this linux command line
find /directory/ -name
"arf_0184_*.wav" -o -name "afr_1919_*.wav" > train.list
For some reason the list only has afr_1919_ files in it as if it overwrites the afr_0184_ that are found before them.
I also tried this
find /directory/ -name 'afr_(0184|1919)_*.wav' > train.list
but the list is empty in this case.
What am I doing wrong here?
You can use
find /directory/ -regextype posix-egrep -regex '.*/afr_(0184|1919)_[0-9]*\.wav' > train.list
To match any 4 digit variations instead of the two 'hardcoded' values, use
'.*/afr_[0-9]{4}_[0-9]*\.wav'
Using -name, you can only search with a glob wildcard pattern, you need -regex.
With -regextype posix-egrep, you can use alternation and unescaped capturing parentheses. [0-9]* match any zero or more digits (replace * with + to match one or more).
The .*/ is added because the regex should match the whole path string.

Why can't `find` actually find all of the directories matching a pattern?

I have directories matching the pattern foo[0-9]+ and foo-bar. I want to remove all directories matching the former pattern. My goal for doing this is by using find, but when I try to find directories matching the former pattern, I can't recall them:
$ mkdir foo{1..15} foo-bar
$ # yields nothing
$ find . -name "foo[0-9]+"
When I try to find everything that matches foo[^-], only some of the directories appear:
$ find . -name "foo[^-]"
./foo9
./foo7
./foo6
./foo1
./foo8
./foo4
./foo3
./foo2
./foo5
I've played with the -regex flag and all available -regextypes, but can't seem to get the magic right.
How can I list all of these directories?
This should work:
find -E . -regex '.*/foo[0-9]+'
You might want to limit the type: find -E . -type d -regex '.*/foo[0-9]+'
This works:
$ ls -F
foo-bar/ foo10/ foo12/ foo14/ foo2/ foo4/ foo6/ foo8/
foo1/ foo11/ foo13/ foo15/ foo3/ foo5/ foo7/ foo9/
$ find . -name "foo[^-]*"
./foo1
./foo2
./foo3
./foo4
./foo5
./foo6
./foo7
./foo8
./foo9
./foo10
./foo11
./foo12
./foo13
./foo14
./foo15
Alternatively, if your goal is to list all directories that don't match foo-bar then you can simply use the -not operator:
$ find . -not -name foo-bar
.
./foo1
./foo2
./foo3
./foo4
./foo5
./foo6
./foo7
./foo8
./foo9
./foo10
./foo11
./foo12
./foo13
./foo14
./foo15
By the way, you were using file globbing and not regexes when you weren't using the -regex flag.
To find the files using globbing:
find . -name "foo[1-9]" -o -name "foo1[0-5]" -o -name "foo-bar"
There we match any files with name "foo" followed by "single digit between 1 and 9", or files named foo1 followed by "single digit between 0 and 5", or files named exactly "foo-bar".
Or if you know the directory won't have any numbered files aside from the ones you created:
find . -name "foo[1-9]*" -o -name "foo-bar'"
Here we find all files named "foo" followed by one digit, followed by any number of any characters, or the file named exactly foo-bar. Globbing is not very precise like regexes, but it's often sufficient and it's pretty quick.
The * and ? in globbing is different than in regexes. In globbing, they themselves represent unknown characters in the string being matched as well as the quantity of them. In regexes, they modify the previous atom in the regex, and express the quantity of that previous atom.

How to ignore digits

I have a file location
/appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_25918.log
I know the variable for the year,month, and day, how do I ignore the rest of the string?
For example,
/appl/bcm_prod/u/scratch/markit/markitdownloader_%Y%m%d_25918.log
what do I put for the 25918 id which can change everyday.
What is the regex flavour?
There are examples below:
find /appl/bcm_prod/u/scratch/markit/ -regex ".*/markitdownloader_20160420_[0-9][0-9]*.log" -type f
find /appl/bcm_prod/u/scratch/markit/ -type f | grep "markitdownloader_20160420_[0-9][0-9]*.log"
ls /appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_*.log
ls /appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_+([[:digit:]]).log
Typically for regex you could either use * or .*. So you would put the date as necessary and then *.log or .*.log depending on your regex version. *, depending on the regex version, either means 0 or any or is a wildcard. If * isn't the wildcard then . is. On bash I would say logname_date_*.log is what you are looking for.

How can I use regular expression to search files in Unix?

I have following files from 2 different categories :
Category 1 :
MAA
MAB
MAC
MAD
MAE
MAF
MAG
MAH
MAJ
MBA
MBB
MBC
MBD
MBE
MDA
MDD
and Category 2 :
MCA
MCB
MCC
MCD
MCE
MCF
MCG
MDB
So my question is : How can I write regular expression so that I can find files from category 1 only ?
I don't want to do hard coded script, expecting some logic from brilliant people.
I am trying this :
find . -regex "*[M][A,B,D][A,B,C,D,E,F,J].txt"
It's quite simple :
ls -l | grep "MAA\|MAB\|MAC\|MAD\|MAE\|MAF\|MAG\|MAH\|MAJ\|MBA\|MBB\|MBC\|MBD MBE\|MDA\|MDD"
Ok so you don't want hardcoded. Then yes you should state the patterns which should NOT match -v
ls -l | grep -v "MC." | grep -v "pattern2" | ....
Your question is not very precise, but from your attempt, I conclude, that you are looking for files having names ending in ....MAA.txt, ...MAB.txt and so on, and being located in either your working directory or somewhere below.
You also didn't mention, which shell you are using. Here is an example using zsh - no need to write a regular expression here:
ls ./**/*M{AA,AB,AC,AD,AE,AF,AG,AH,AJ,BA,BB,BC,BD,BE,DA,DD}.txt
I am trying this : find . -regex "*[M][A,B,D][A,B,C,D,E,F,J].txt"
The errors in this are:
The wildcard for any characters in a regex is .*, unlike just * in a normal filename pattern.
You forgot G and H in the third bracket expression.
You didn't exclude the category 2 name MDB.
Besides:
The characters of a bracket expression are not to be separated by ,.
A bracket expression with a single item ([M]) can be replaced by just the item (M).
This leads to:
find . -regex ".*M[ABD].*" -not -name "MDB*"
or, without regex:
find . -name "M[ABD]*" -not -name "MDB*"

find using regex, in terminal

Using the command bellow:
find ./stylesheets/sass/ -maxdepth 1 -type f -regextype posix-egrep -regex '[^_].+\.scss'
I get this:
./stylesheets/sass/_hero.scss
./stylesheets/sass/_article-type.scss
./stylesheets/sass/app.scss
./stylesheets/sass/_footer.scss
./stylesheets/sass/_grid-items.scss
and I wanted just:
./stylesheets/sass/app.scss
so my negation [^_] of underscore, does not work beacuse it tries to match at
the very beginning of a path, I suppose, and not the filename itself.
How to solve this, keeping it flexible enough for any depth of dirs before
the actual filename?
Just do it with -name:
find ... -name '[^_]*.scss'
Alternatively, if you insist on using -regex, make sure your not-underscore is only in the last component of the pathname:
find ... -regex '.*/[^/_][^/]*\.scss$'