Finding file names without a specified character

Finding file names without a specified character - regex

Is there a good regex to find all of the files that do not contain a certain character? I know there are lots to find lines containing matches, but I want something that will find all files that do not contain my match.

Using ls and sed to replace all filenames with no extension (i.e. not containing a .) with NoExtension:
ls | sed -e 's/^[^.]*$/NoExtension/g'
replacing filenames that have an extension with their extension:
ls | sed -e 's/^[^.]*$/NoExtension/g' -e 's/.*\.\(.*\)/\1/'

for bash - to list all files in a directory-:
shopt -s extglob
ls !(*.*)
The extglob setting is required to enable to ! which negates the . argument to ls.

You should discard all the answers that parse the output of ls read here for why. The tool find is perfect for this.
# Show files in cwd
$ ls
file file.txt
# Find the files with an extension
$ find -type f -regex '.*/.*\..*$'
./file.txt
# Invert the match using the -not option
$ find -type f -not -regex '.*/.*\..*$'
./file

And an awk solution, for good measure.
ls | awk '$0 !~ /\..+$/{a++}END{print a}'

This might work for you (find, GNU sed & wc):
find . -type f | sed -rn '\|.*/\.?[^.]+$|w NoExtensions' && wc -l NoExtensions
This gives you a count and a list.
N.B. dot files without extensions are included.

Related

Is there a globbing pattern to match by file extension, both PWD and recursively?

I need to match files only with one specific extension under all nested directories, including the PWD, with BASH using "globbing".
I do not need to Match all files under all nested directories with shell globbing, but not in the PWD.
I need to match files using commands other than grep search all directories with filename extension
I do not need to only grep recursively, but only in files with certain extensions (plural)
set -o globstar; ls **/*.* is great for all files (not my question).
ls **/*.php does not match in the PWD.
set -o globstar; **/*.php returns duplicate files.
grep -r --include=\*.php "find me" ./ is specifically for grep, not globbing (consider this Question). It seems grep has --include=GLOB because this is not possible using globbing.
From this Answer (here), I believe there may not be a way to do this using globbing.
tl;dr
I need:
A glob expression
To match any command where simple globs can be used (ls, sed, cp, cat, chown, rm, et cetera)
Mainly in BASH, but other shells would be interesting
Both in the PWD and all subdirectories recursively
For files with a specific extension
I'm using grep & ls only as examples, but I need a glob expression that applies to other commands also.
grep -r --include=GLOB is not a glob expression for, say, cp; it is a workaround specific to grep and is not a solution.
find is not a glob, but it may be a workaround for non-grep commands if there is no such glob expression. It would need | or while do;, et cetera.
Examples
Suppose I have these files, all containing "find me":
./file1.js
./file2.php
./inc/file3.js
./inc/file4.php
./inc.php/file5.js
./inc.php/file6.php
I need to match only/all .php one time:
./file2.php
./inc/file4.php
./inc.php/file6.php
Duplicates returned: shopt -s globstar; ... **/*.php
This changes the problem; it does not solve it.
Dup: ls
Before entering shopt -s globstar as a single command...
ls **/*.php returns:
inc/file4.php
inc.php/file5.js
inc.php/file6.php
file2.php does not return.
After entering shopt -s globstar as a single command...
ls **/*.php returns:
file2.php
inc/file4.php
inc.php/file6.php
inc.php:
file5.js
file6.php
inc.php/file6.php returns twice.
Dup: grep
Before entering shopt -s globstar as a single command...
grep -R "find me" **/*.php returns:
inc/file4.php: find me
inc.php/file6.php: find me
file2.php does not return.
After entering shopt -s globstar as a single command...
grep -R "find me" **/*.php returns:
file2.php: find me
inc/file4.php: find me
inc.php/file5.js: find me
inc.php/file6.php: find me
inc.php/file6.php: find me
inc.php/file6.php returns twice.
After seeing the duplicate seen from the ls output, we know why.
Current solution: faulty misuse of && logic
grep -r "find me" *.php && grep -r "find me" */*.php
ls -l *.php && ls -l */*.php
Please no! I fail here && so I never happen
Desired solution: single command via globbing
grep -r "find me" [GLOB]
ls -l [GLOB]
Insight from grep
grep does have the --include flag, which achieves the same result but using a flag specific to grep. ls does not have an --include option. This leads me to believe that there is no such glob expression, which is why grep has this flag.

With bash, you can first do a shopt -s globstar to enable recursive matching, and then the pattern **/*.php will expand to all the files in the current directory tree that have a .php extension.
zsh and ksh93 also support this syntax. Other commands that take a glob pattern as an argument and do their own expansion of it (like your grep --include) likely won't.

With shell globing it is possible to only get directories by adding a / at the end of the glob, but there's no way to exclusively get files (zsh being an exception)
Illustration:
With the given tree:
file.php
inc.php/include.php
lib/lib.php
Supposing that the shell supports the non-standard ** glob:
**/*.php/ expands to inc.php/
**/*.php expands to file.php inc.php inc.php/include.php lib/lib.php
For getting file.php inc.php/include.php lib/lib.php, you cannot use a glob.
=> with zsh it would be **/*.php(.)
Standard work-around (any shell, any OS)
The POSIX way to recursively get the files that match a given standard glob and then apply a command to them is to use find -type f -name ... -exec ...:
ls -l <all .php files> would be:
find . -type f -name '*.php' -exec ls -l {} +
grep "finde me" <all .php files> would be:
find . -type f -name '*.php' -exec grep "finde me" {} +
cp <all .php files> ~/destination/ would be:
find . -type f -name '*.php' -type f -exec sh -c 'cp "$#" ~/destination/' _ {} +
remark: This one is a little more tricky because you need ~/destination/ to be after the file arguments, and find's syntax doesn't allow find -exec ... {} ~/destination/ +

Suggesting different strategy:
Use explicit find command to build bash command(s) on the selected files using -printf option.
Inspect the command for correctness and run.
1. preparing bash commands on selected files
find . -type f -name "*.php" -printf "cp %p ~/destination/ \n"
2. inspect the output, correct command, correct filter, test
cp ./file2.php ~/destination/
cp ./inc/file4.php ~/destination/
cp ./inc.php/file5.php ~/destination/
3. execute prepared find output
bash <<< $(find . -type f -name "*.php" -printf "cp %f ~/destination/ \n")

Hiding all directories that begin with a capitalized letter using `ls` in zsh

I'm trying to use ls in zsh (running on macOS) to show all files and directories except directories that begin with a capital letter.
For example, my directory contains
Archive/ data/ README.md test.txt
and I would like to run an ls command that returns only
data/ README.md test.txt
I can use ls -d [A-Z]*/ (note the terminating backslash to indicate directories) to show the directories I want to hide (e.g. only returns Archive/),
and referencing this helpful answer on using the inverted expansion in zsh with the ls *~ syntax,
I tried (what I think is) the negation of the above using ls -d *~[A-Z]*/ but this doesn't work (it hides nothing).
Moreover, using ls -d *~[A-Z]* (without the terminating backslash) returns data/ test.txt but this is not my desired result since I also want to show the file README.md which begins with a capital letter.
Note that I have enabled the extended glob option in zsh, using setopt extendedglob.
Any help on the correct regex/glob syntax for ls in zsh to obtain my desired output would be very much appreciated. Thank you! :)
Edit: There are two very useful answers that work, but any concise answers using ls in zsh (using the extended glob option) would still be awesome!

Maybe a bit verbose, but you might use ls -l and prevent the total using grep -v "^total" and then pipe the output to awk.
In awk, print the last field followed by / if the first field starts with d and the last field does not start with an uppercase char A-Z
Or print the last field if the the first field does not start with d
ls -l | grep -v "^total" | awk '{
if ($1 ~ /^d/ && $NF ~ /^[^A-Z]/){
print $NF"/"
} else if ($1 ~ /^[^d]/){
print $NF
}
}'
In a single line:
ls -l | grep -v "^total" | awk '{if($1~/^d/ && $NF~/^[^A-Z]/){print $NF"/"} else if($1~/^[^d]/){print $NF}}'

My version of ls does not directly support regular expressions, I think you are depending upon shell globbing. Perhaps try piping to grep. Something like:
ls | grep -v '^[A-Z].*'
Notice that the regular expression is in quotes. The ^ specifies the beginning of the string. The switch -v is the negation of the match.
Sorry, now I understand your requirements better. You could string the find command to do this. Try the following:
find -E . -regex "\./[^A-Z].*" -type d -exec echo Directory: {} ';' -exec find {} -type f -maxdepth 1 \;
I tested this on a Mac with the following hierarchy:
find .
.
./Test
./Test/Plain File
./aatest
./aatest/BigTest
./aatest/BigTest/Big File
./aatest/BigTest/small File
./aatest/bb File
./aatest/aaFile
And got the following output:
Directory: ./aatest
./aatest/bb File
./aatest/aaFile
Directory: ./aatest/BigTest
./aatest/BigTest/Big File
./aatest/BigTest/small File
Please note that I added the text "Directory" to the echo command to differentiate the two types of output.

Regexp for matching filenames

I have a files:
first.error.log
second1.log
second2.log
FFFpc.log
TR.den.log
bla.error.log
and I would like to make a pattern that will match all files with error inside of filenames + few additional ones but no more:
For a sole error it would be
$FILE_PATTERN="*.error*"
But what if I want to match not only those errors but also all second and FFpc etc?
This does not work:
$FILE_PATTERN="*.error*|^second.*\log$|.*FFPC\.log$"
Thanks in advance for your help
EDIT:
$FILE_PATTERN is later used by:
find /somefolder -type f -name $FILE_PATTERN
EDIT: THIS FILE_PATTERN is in property file that is later used by bash script.

You need to use find with -regex option:
find -E /somefolder -type f -regex '\./(.*\.error.*|second.*log|.*FFPC\.log)$'
PS: Use -iregex for ignore case matching:
find -E /somefolder -type f -iregex '\./(.*\.error.*|second.*log|.*FFPC\.log)$'

$ ls | grep -i '\(.*error.*\)\|\(^second.*\log$\)\|\(.*FFPC\.log$\)'
bla.error.log
FFFpc.log
first.error.log
second1.log
second2.log
If you wanted to use with find
find /somefolder -type f | grep -i '\(.*error.*\)\|\(^second.*\log$\)\|\(.*FFPC\.log$\)'

If you're in bash I'm assuming you have to grep. Using grep -E or egrep will allow you to use alternation (ORing your searches)
$ stat * | egrep "(error|second)"
File: `first.error.log'
File: `second1.log'
File: `second2.log'
You could use ls instead of stat but sometimes ls will not give you what you predicted. But considering you're only search for filenames, ls should suffice.
$ ls | egrep "(error|second)"
first.error.log
second1.log
second2.log
You can use command substitution to store the output into a bash variable:
FILE_PATTERN=$(ls | egrep "(error|second)")

FILE_PATTERN=("*.error*" "second.*log" ".*FFPC.log")
ARGS=(-name "$FILE_PATTERN")
for F in "${FILE_PATTERN[#]:2}"; do
ARGS+=(-o -name "$F")
done
find /somefolder -type f '(' "${ARGS[#]}" ')'

You were close, theres just a few misplaced symbols.
Here's what I came up with:
.*\.error\..*|^second.*\.log$|.*FF[Pp][Cc]\.log$
here's a demo of a working modification of your regex:
http://regex101.com/r/rL3rM1/1

Pass sed output to mv

I'm trying to batch rename text files according to a string they contain.
I used sed to isolate the pattern with \( and \) as I couldn't get this to work in grep.
sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt | mv *.txt $sed.txt
(the text I want to use as filename is between html title tags)`
Where I wrote $sed would be the output of sed.
hope that's clear!

A simple loop in bash can accomplish this. If each file is valid HTML, meaning you have only one <title> tag in the file, you can rename them all this way:
for file in *.txt; do
mv "$file" `sed -n 's/<title>\([^<]*\)<\/title>/\1/p;' $file| sed -e 's/[ ][ ]*/_/g'`.txt
done
So, if you have files 1.txt, 2.txt and 3.txt, each with cat, dog and my hippo in their TITLE tags, you'll end up with cat.txt, dog.txt and my_hippo.txt after the above loop.
EDIT: quoted initial $file in case there are spaces in filenames; and added a second sed to convert any spaces in the <title> tag to _'s in resulting filenames. NOTE the whitespace inside the []'s in the second sed command is a literal space and tab character.

You can enclose expression in grave accent characters (`) to make it insert its output to the place you want. Try:
mv *.txt `sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt`.txt
It is rather not flexible, but should work.
(I haven't used it in a while and cannot test it now, so I might be wrong).

Here is the command I would use:
for i in *.txt ; do
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
done
The sed substitution search for pattern in each one of your .txt files. For each file it creates string mv 'file_name' 'found_pattern'.
With the e command at the end of sed commands, this resulting string is directly executed in terminal, thus it renames your files.
Some hints:
Note the use of =s instead of /s as delimiters for sed substition: it's more readable as you already have /s in your pattern (you could use many other symbols if you don't like =). And in this way you don't have to escape the / in your pattern.
The e command for sed executes the created string.
(I'm speaking of this one below:
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
^
)
So use it with caution! I would recommand to first use the line without final e: it won't execute any mv command, but just print instead what would be executed if you were to add the e.

What I read from your question is:
you have a number of text (html) files in a directory
each file contains at least the tag <title> ... </title>
you want to extract the content (elements.text) and use it as filename
last you want to rename that file to the extracted filename
Is this correct?
So, then you need to loop through the files, e.g. with xargs or find
ls '*.txt' | xargs -i\{\} command "{}" ...
find -maxdepth 1 -type f -name '*.txt' -exec command "{}" ... \;
I always replace the xargs substitues by -i\{\} because the resulting command is compatible if I use it sometimes with find and its substitute {}.
Next the -maxdepth option will help find not to dive deeper in directory, if no subdir, you can leave it out.
command could be something very simple like echo "Testing File: {}" or a really small script if you use it with bash:
find . -name '*.txt' -exec bash -c 'CUR_FILE="{}"; echo "Working on: $CUR_FILE"; ls -l "$CUR_FILE";' \;
The big decision for your question is: how to get the text from title element.
A simple solution (suitable if opening and closing tag is on same textline) would be by grep
A more solid solution is to use a HTML Parser and navigate by DOM operation
The simple solution base on:
get the title line
remove the everything before and after title content
So do it together:
ls *.txt | xargs -i\{\} bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"'
Same with usage of find:
find . -maxdepth 1 -type f -name '*.txt' -exec bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"' \;
Hopefully it is what you expected.

BASH - find specific folder with find and filter with regex

I have a folder containing many folders with subfolder (/...) with the following structre:
_30_photos/combined
_30_photos/singles
_47_foo.bar
_47_foo.bar/combined
_47_foo.bar/singles
_50_foobar
With the command find . -type d -print | grep '_[0-9]*_' all folder with the structure ** will be shown. But I have generate a regex which captures only the */combined folders:
_[0-9]*_[a-z.]+/combined but when I insert that to the find command, nothing will be printed.
The next step would be to create for each combined folder (somewhere on my hdd) a folder and copy the content of the combined folder to the new folder. The new folder name should be the same as the parent name of the subfolder e.g. _47_foo.bar. Could that be achieved with an xargs command after the search?

You do not need grep:
find . -type d -regex ".*_[0-9]*_.*/combined"
For the rest:
find . -type d -regex "^\./.*_[0-9]*_.*/combined" | \
sed 's!\./\(.*\)/combined$!& /somewhere/\1!' | \
xargs -n2 cp -r

With basic grep you will need to escape the +:
... | grep '_[0-9]*_[a-z.]\+/combined'
Or you can use the "extended regexp" version (egrep or grep -E [thanks chepner]) in which the + does not have to be escaped.
xargs may not be the most flexible way of doing the copying you describe above, as it is tricky to use with multiple commands. You may find more flexibility with a while loop:
... | grep '_[0-9]*_[a-z.]\+/combined' | while read combined_dir; do
mkdir some_new_dir
cp -r ${combined_dir} some_new_dir/
done
Have a look at bash string manipulation if you want a way to automate the name of some_new_dir.

target_dir="your target dir"
find . -type d -regex ".*_[0-9]+_.*/combined" | \
(while read s; do
n=$(dirname "$s")
cp -pr "$s" "$target_dir/${n#./}"
done
)
NOTE:
this fails if you have linebreaks "\n" in your directory names
this uses a subshell to not clutter your env - inside a script you don't need that
changed the regex slightly: [0-9]* to [0-9]+

You can use this command:
find . -type d | grep -P "_[0-9]*_[a-z.]+/combined"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js