find all files except e.g. *.xml files in shell - regex

Using bash, how to find files in a directory structure except for *.xml files?
I'm just trying to use
find . -regex ....
regexe:
'.*^((?!xml).)*$'
but without expected results...
or is there another way to achieve this, i.e. without a regexp matching?

find . ! -name "*.xml" -type f

find . -not -name '*.xml'
Should do the trick.

Sloppier than the find solutions above, and it does more work than it needs to, but you could do
find . | grep -v '\.xml$'
Also, is this a tree of source code? Maybe you have all your source code and some XML in a tree, but you want to only get the source code? If you were using ack, you could do:
ack -f --noxml

with bash:
shopt -s extglob globstar nullglob
for f in **/*!(.xml); do
[[ -d $f ]] && continue
# do stuff with $f
done

You can also do it with or-ring as follows:
find . -type f -name "*.xml" -o -type f -print

Try something like this for a regex solution:
find . -regextype posix-extended -not -regex '^.*\.xml$'

Related

Using regex OR with find to list and delete files

I have a folder with these files:
sample.jpg
sample.ods
sample.txt
sample.xlsx
Now, I need to find and remove files that end with either .ods or .xlsx.
To fish them out I initially use:
ls | grep -E "*.ods|*.xlsx"
This gives me:
sample.ods
sample.xlsx
Now, I don't want to parse ls so I use find:
find . -type f -regextype grep -regex '.*/*.ods\|*.xlsx' | wc -l
But that gives me the output of 1 while I expect to have 2 files before I extend the command to:
find . -type f -regextype grep -regex '.*/*.ods\|*.xlsx' | xargs -d"\n" rm
Which works but removes only the .ods file but not the .xlsx one.
What am I missing here?
I'm on ubuntu 18.04 and my find version is find (GNU findutils) 4.7.0-git.
You don't need to use regex here, just use -name and -or and so:
find . -type f -name "*.ods" -or -name "*.xlsx" -delete
Find files ending with either ods or xlsx and delete
If you really wanted to use regex, you could use the following:
find . -maxdepth 1 -regextype posix-extended -regex "(.*\.ods)|(.*\.xlsx)" -delete
Make sure that the expressions are in between brackets

Recursively find filenames of exactly 8 hex characters, but not all 0-9, no lookahead (Mac terminal, bash)

I'm trying to write a regex to find files recursively with Mac Terminal (bash, not zsh even though Catalina wants me to switch over for whatever reason) using the find command. I'm looking for files that are:
Exactly 8 hexadecimal digits (0-9 and A-F)
But NOT only decimal digits (0-9)
In other words, it would match A1234567, ABC12DEF, 12345ABC, and ABCDABCD, but not 12345678 or 09876543.
To find files that are exactly 8 hex digits, I've used this:
find -E . -type f -regex '.*/[A-F0-9]{8}'
The .*/ is necessary to allow the full path name to precede the filename. This is eventually going to get fed to rm, so I have to keep the path.
It SEEMS like this should work to fulfill both of my requirements:
find -E . -type f -regex '.*/(?![0-9]{8})[A-F0-9]{8}'
But that returns an error:
find: -regex: .*/(?![0-9]{8})[A-F0-9]{8}: repetition-operator operand invalid
It seems like the find command doesn't support lookaheads. How can I do this without one?
With any POSIX-compliant find
find . -type f \
-name '????????' \
! -name '*[![:xdigit:]]*' \
-name '*[![:digit:]]*'
And if you insist on using regexps for this, here you go
find -E . -type f \
-regex '.*/[[:xdigit:]]{8}' \
! -regex '.*/[[:digit:]]*'
Those who use GNU find should drop -E and insert -regextype posix-extended after paths to make this work.
It's probably easiest to just filter out the results you don't like:
find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
My find didn't understand -E and was inexplicably grumpy about -regex in general, but this still worked:
find . -type f -name '[A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9]' -a -name '*[A-F]*'
Not as elegant as oguz ismail's, but easier to read for my clogged brain, lol

Find and rename files/folders using regex

I am trying to find a right regex for the filename that starts with I0[0-9][0-9]- eg: "I097-". I am not familiar with regex but using online, I came up with this [I][0][\d][\d][-], I am sure this is not the best regex pattern for the string I have, but I tested using online regex tools and it works. Now I want to use Linux 'find' to find all the files that match this regex and re-name the resulting files by replacing the matching string with nothing.
From:
I071-PTEN-7
./I071-PTEN-7/I071-PTEN-7.txt
To:
PTEN-7
./PTEN-7/PTEN-7.txt
command used:
find . -name "I0*" -type f -o -name "I0*" -type d -exec rename -n "s/[I][0][\d][\d][-]/''/" {} \;
But it doesn't seem to do anything, not sure what is going on. Any help to find the issue or solution would be greatly appreciated. Thanks.
Use -execdir option to get only filenames entries in find also there is no need to use character class around every character in your regex.
find . -name 'I0*' -execdir rename -n 's/^I0\d\d-//' {} \;
If rename isn't working then you may try this:
find . -type f -name 'I0*' -execdir bash -c 'mv "$1" "${1/I0[0-9][0-9]-/}"' - {} \; &&
find . -name 'I0*' -execdir bash -c 'mv "$1" "${1/I0[0-9][0-9]-/}"' - {} \;

Regexp for matching filenames

I have a files:
first.error.log
second1.log
second2.log
FFFpc.log
TR.den.log
bla.error.log
and I would like to make a pattern that will match all files with error inside of filenames + few additional ones but no more:
For a sole error it would be
$FILE_PATTERN="*.error*"
But what if I want to match not only those errors but also all second and FFpc etc?
This does not work:
$FILE_PATTERN="*.error*|^second.*\log$|.*FFPC\.log$"
Thanks in advance for your help
EDIT:
$FILE_PATTERN is later used by:
find /somefolder -type f -name $FILE_PATTERN
EDIT: THIS FILE_PATTERN is in property file that is later used by bash script.
You need to use find with -regex option:
find -E /somefolder -type f -regex '\./(.*\.error.*|second.*log|.*FFPC\.log)$'
PS: Use -iregex for ignore case matching:
find -E /somefolder -type f -iregex '\./(.*\.error.*|second.*log|.*FFPC\.log)$'
$ ls | grep -i '\(.*error.*\)\|\(^second.*\log$\)\|\(.*FFPC\.log$\)'
bla.error.log
FFFpc.log
first.error.log
second1.log
second2.log
If you wanted to use with find
find /somefolder -type f | grep -i '\(.*error.*\)\|\(^second.*\log$\)\|\(.*FFPC\.log$\)'
If you're in bash I'm assuming you have to grep. Using grep -E or egrep will allow you to use alternation (ORing your searches)
$ stat * | egrep "(error|second)"
File: `first.error.log'
File: `second1.log'
File: `second2.log'
You could use ls instead of stat but sometimes ls will not give you what you predicted. But considering you're only search for filenames, ls should suffice.
$ ls | egrep "(error|second)"
first.error.log
second1.log
second2.log
You can use command substitution to store the output into a bash variable:
FILE_PATTERN=$(ls | egrep "(error|second)")
FILE_PATTERN=("*.error*" "second.*log" ".*FFPC.log")
ARGS=(-name "$FILE_PATTERN")
for F in "${FILE_PATTERN[#]:2}"; do
ARGS+=(-o -name "$F")
done
find /somefolder -type f '(' "${ARGS[#]}" ')'
You were close, theres just a few misplaced symbols.
Here's what I came up with:
.*\.error\..*|^second.*\.log$|.*FF[Pp][Cc]\.log$
here's a demo of a working modification of your regex:
http://regex101.com/r/rL3rM1/1

'find' using regex with variables

Please, help me with the following:
Let it be three files:
file-aaa.sh
file-bbb.sh
file-xxx.sh
In a bash script I have variables
$a=aaa
$b=bbb
Now, I want to execute something like:
find . -name "file-[$a|$b].sh"
and expect to get two files in output.
What am I doing wrong?
You can use this find:
find . -name "file-$a.sh" -o -name "file-$b.sh"
To combine it into one using -regex option:
On OSX:
find -E . -regex ".*file-($a|$b)\.txt"
On Linux:
find . -regextype posix-extended -regex ".*file-($a|$b)\.txt"