find using regex, in terminal - regex

Using the command bellow:
find ./stylesheets/sass/ -maxdepth 1 -type f -regextype posix-egrep -regex '[^_].+\.scss'
I get this:
./stylesheets/sass/_hero.scss
./stylesheets/sass/_article-type.scss
./stylesheets/sass/app.scss
./stylesheets/sass/_footer.scss
./stylesheets/sass/_grid-items.scss
and I wanted just:
./stylesheets/sass/app.scss
so my negation [^_] of underscore, does not work beacuse it tries to match at
the very beginning of a path, I suppose, and not the filename itself.
How to solve this, keeping it flexible enough for any depth of dirs before
the actual filename?

Just do it with -name:
find ... -name '[^_]*.scss'
Alternatively, if you insist on using -regex, make sure your not-underscore is only in the last component of the pathname:
find ... -regex '.*/[^/_][^/]*\.scss$'

Related

Linux find-command with regex and write in a file working strange

So I have a folder with .wav files that look like this:
afr_(4 digits)_(random digits).wav
I want to create a list with all files that start with for example afr_0184_ and afr_1919_ and I use this linux command line
find /directory/ -name
"arf_0184_*.wav" -o -name "afr_1919_*.wav" > train.list
For some reason the list only has afr_1919_ files in it as if it overwrites the afr_0184_ that are found before them.
I also tried this
find /directory/ -name 'afr_(0184|1919)_*.wav' > train.list
but the list is empty in this case.
What am I doing wrong here?
You can use
find /directory/ -regextype posix-egrep -regex '.*/afr_(0184|1919)_[0-9]*\.wav' > train.list
To match any 4 digit variations instead of the two 'hardcoded' values, use
'.*/afr_[0-9]{4}_[0-9]*\.wav'
Using -name, you can only search with a glob wildcard pattern, you need -regex.
With -regextype posix-egrep, you can use alternation and unescaped capturing parentheses. [0-9]* match any zero or more digits (replace * with + to match one or more).
The .*/ is added because the regex should match the whole path string.

Why can't `find` actually find all of the directories matching a pattern?

I have directories matching the pattern foo[0-9]+ and foo-bar. I want to remove all directories matching the former pattern. My goal for doing this is by using find, but when I try to find directories matching the former pattern, I can't recall them:
$ mkdir foo{1..15} foo-bar
$ # yields nothing
$ find . -name "foo[0-9]+"
When I try to find everything that matches foo[^-], only some of the directories appear:
$ find . -name "foo[^-]"
./foo9
./foo7
./foo6
./foo1
./foo8
./foo4
./foo3
./foo2
./foo5
I've played with the -regex flag and all available -regextypes, but can't seem to get the magic right.
How can I list all of these directories?
This should work:
find -E . -regex '.*/foo[0-9]+'
You might want to limit the type: find -E . -type d -regex '.*/foo[0-9]+'
This works:
$ ls -F
foo-bar/ foo10/ foo12/ foo14/ foo2/ foo4/ foo6/ foo8/
foo1/ foo11/ foo13/ foo15/ foo3/ foo5/ foo7/ foo9/
$ find . -name "foo[^-]*"
./foo1
./foo2
./foo3
./foo4
./foo5
./foo6
./foo7
./foo8
./foo9
./foo10
./foo11
./foo12
./foo13
./foo14
./foo15
Alternatively, if your goal is to list all directories that don't match foo-bar then you can simply use the -not operator:
$ find . -not -name foo-bar
.
./foo1
./foo2
./foo3
./foo4
./foo5
./foo6
./foo7
./foo8
./foo9
./foo10
./foo11
./foo12
./foo13
./foo14
./foo15
By the way, you were using file globbing and not regexes when you weren't using the -regex flag.
To find the files using globbing:
find . -name "foo[1-9]" -o -name "foo1[0-5]" -o -name "foo-bar"
There we match any files with name "foo" followed by "single digit between 1 and 9", or files named foo1 followed by "single digit between 0 and 5", or files named exactly "foo-bar".
Or if you know the directory won't have any numbered files aside from the ones you created:
find . -name "foo[1-9]*" -o -name "foo-bar'"
Here we find all files named "foo" followed by one digit, followed by any number of any characters, or the file named exactly foo-bar. Globbing is not very precise like regexes, but it's often sufficient and it's pretty quick.
The * and ? in globbing is different than in regexes. In globbing, they themselves represent unknown characters in the string being matched as well as the quantity of them. In regexes, they modify the previous atom in the regex, and express the quantity of that previous atom.

Use find to identify filename same as the parent directory name

I would like to use find in order to search for files in different subdirectories that have to match the same pattern as their parent category.
example:
ls
Random1_fa Random2_fa Random3_fa
inside these dirs there are different files that I want to search for only one of each:
cd Random1_fa
Random1.fa
Random1.fastq
Random1_match_genome.fa
Random1_unmatch_genome.fa
...
I want to "find" only the files with "filename".fa e.g:
/foo/bar/1_Random1/Random1_fa/Random1.fa
/foo/bar/2_Random2/Random2_fa/Random2.fa
/foo/bar/3_Random5/Random5_fa/Random5.fa
/foo/bar/10_Random99/Random99_fa/Random99.fa
I did:
ls | sed 's/_fa//' |find -name "*.fa"
but not what I was looking for.
I want to redirect the result of sed as a regex pattern in find.
Something "awk-like" this:
ls| sed 's/_fa//' |find -name "$1.fa"
or
ls| sed 's/_fa/.fa/' |find -name "$1"
Why read from standard input using sed to filter out files to exclude when you can do the regex condition directly with find. First you run a shell glob expansion for all directories ending with _fa and get the name of the string to find to use in the find expression. All you need to do is
for dir in ./*_fa; do
# Ignore un-expanded globs from the for-loop. The un-expanded string woul fail
# to match the condition for a directory(-d), so we exit the loop in case
# we find no files to match
[ -d "$dir" ] || continue
# The filename from the glob expansion is returned as './name.fa'. Using the
# built-in parameter expansion we remove the './' and '_fa' from the name
str="${dir##./}"
regex="${str%%_fa}"
# We then use 'find' to identify the file as 'name.fa' in the directory
find "$dir" -type f -name "${regex}.fa"
done
The below would match filenames containing only [A-Za-z0-9] and ending with .fa. Run this command at the top level containing your directories to match all the files.
To copy the file elsewhere add the following
find "$dir" -type f -name "${regex}.fa" -exec cp -t /home/destinationPath {} +

find that excludes directories that are number dash number

I'm trying to write a find command that excludes directories that are numbers dash number, but allow other directories.
Sample directories
./135888897-135954433/
./135888897-135954434/
./135888897-135954435/
./BLAG-DEF-JOB1/
./TOM-DEPLOYDEV-JOB1/
./FRANK-RELEASE-JOB1/
./STEVE-RELEASE-JOB1/
Here's part of my find command. I can't seem to get it to skip the number directories.
find . -type f ! -regex '\./[0-9]+\-[0-9]+/*'
Any help would be great. Thanks!
You should use .* instead of *.
When useing regex a * means 'match the preceding token 0 or more times'
This will result in the following command:
find . -type f ! -regex '\./[0-9]+\-[0-9]+/.*'
Update: I also thought you forgot to escape the / in your command, but after doing a little bit of research it seems escaping / is not necessary when using the find command.
You can use:
find . -type f ! -regex '\./[0-9]+-[0-9]+/.*'

How to ignore digits

I have a file location
/appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_25918.log
I know the variable for the year,month, and day, how do I ignore the rest of the string?
For example,
/appl/bcm_prod/u/scratch/markit/markitdownloader_%Y%m%d_25918.log
what do I put for the 25918 id which can change everyday.
What is the regex flavour?
There are examples below:
find /appl/bcm_prod/u/scratch/markit/ -regex ".*/markitdownloader_20160420_[0-9][0-9]*.log" -type f
find /appl/bcm_prod/u/scratch/markit/ -type f | grep "markitdownloader_20160420_[0-9][0-9]*.log"
ls /appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_*.log
ls /appl/bcm_prod/u/scratch/markit/markitdownloader_20160420_+([[:digit:]]).log
Typically for regex you could either use * or .*. So you would put the date as necessary and then *.log or .*.log depending on your regex version. *, depending on the regex version, either means 0 or any or is a wildcard. If * isn't the wildcard then . is. On bash I would say logname_date_*.log is what you are looking for.