Filename match and export - regex

I have several files in a folder. I have to match the filenames using a regex pattern. In the regex pattern I have a word which would be a variable. I want all the files matched with the pattern to be moved to a separate directory with an alternate filename replacing the string with which I had made the match.
Eg,
I have many files with filenames having the word foo in the directory like,
gadgeagfooafsa
fsafsaffooarwf
fasfsfoofsafff
I have to list these files and copy it to another directory replacing the word foo from it. I have specified the new pattern to be "kuh", Like the above files should be copied to the new folder as
gadgeagkuhafsa
fsafsafkuharwf
fasfskuhfsafff
Finally, can I pipe different commands together to execute these in one line? :)
I had tried this command, but it didn't work, somehow the copy is failing.
ls | grep ".*foo[} ].*" | xargs cp -t work/

find + bash solution:
find . -type f -name "*foo*" -exec bash -c 'fn=${0##*/}; cp "$0" "new_dest/${fn//foo/kuh}"' {} \;
fn=${0##*/} - extracting file basename
${fn//foo/kuh} - substituting foo with kuh in filename
Replace/adjust new_dest with your current destination directory name.

I chose /tmp as the new destination, and only used two of the example files
newdest="/tmp"; fp="foo"; np="kuh"; for f in $(find . -type f -name "*$fp*"); do new=$(echo $f| sed "s/$fp/$np/g"); cp -f $f $newdest/$new ; done
which moves and renames the files
ls /tmp/*kuh*
/tmp/fsafsafkuharwf /tmp/gadgeagkuhafsa

If all the files are in same folder
with bash
for i in *foo* ;do mv "$i" /tmp/"${i/foo/kuh}";done

Related

Is there a globbing pattern to match by file extension, both PWD and recursively?

I need to match files only with one specific extension under all nested directories, including the PWD, with BASH using "globbing".
I do not need to Match all files under all nested directories with shell globbing, but not in the PWD.
I need to match files using commands other than grep search all directories with filename extension
I do not need to only grep recursively, but only in files with certain extensions (plural)
set -o globstar; ls **/*.* is great for all files (not my question).
ls **/*.php does not match in the PWD.
set -o globstar; **/*.php returns duplicate files.
grep -r --include=\*.php "find me" ./ is specifically for grep, not globbing (consider this Question). It seems grep has --include=GLOB because this is not possible using globbing.
From this Answer (here), I believe there may not be a way to do this using globbing.
tl;dr
I need:
A glob expression
To match any command where simple globs can be used (ls, sed, cp, cat, chown, rm, et cetera)
Mainly in BASH, but other shells would be interesting
Both in the PWD and all subdirectories recursively
For files with a specific extension
I'm using grep & ls only as examples, but I need a glob expression that applies to other commands also.
grep -r --include=GLOB is not a glob expression for, say, cp; it is a workaround specific to grep and is not a solution.
find is not a glob, but it may be a workaround for non-grep commands if there is no such glob expression. It would need | or while do;, et cetera.
Examples
Suppose I have these files, all containing "find me":
./file1.js
./file2.php
./inc/file3.js
./inc/file4.php
./inc.php/file5.js
./inc.php/file6.php
I need to match only/all .php one time:
./file2.php
./inc/file4.php
./inc.php/file6.php
Duplicates returned: shopt -s globstar; ... **/*.php
This changes the problem; it does not solve it.
Dup: ls
Before entering shopt -s globstar as a single command...
ls **/*.php returns:
inc/file4.php
inc.php/file5.js
inc.php/file6.php
file2.php does not return.
After entering shopt -s globstar as a single command...
ls **/*.php returns:
file2.php
inc/file4.php
inc.php/file6.php
inc.php:
file5.js
file6.php
inc.php/file6.php returns twice.
Dup: grep
Before entering shopt -s globstar as a single command...
grep -R "find me" **/*.php returns:
inc/file4.php: find me
inc.php/file6.php: find me
file2.php does not return.
After entering shopt -s globstar as a single command...
grep -R "find me" **/*.php returns:
file2.php: find me
inc/file4.php: find me
inc.php/file5.js: find me
inc.php/file6.php: find me
inc.php/file6.php: find me
inc.php/file6.php returns twice.
After seeing the duplicate seen from the ls output, we know why.
Current solution: faulty misuse of && logic
grep -r "find me" *.php && grep -r "find me" */*.php
ls -l *.php && ls -l */*.php
Please no! I fail here && so I never happen
Desired solution: single command via globbing
grep -r "find me" [GLOB]
ls -l [GLOB]
Insight from grep
grep does have the --include flag, which achieves the same result but using a flag specific to grep. ls does not have an --include option. This leads me to believe that there is no such glob expression, which is why grep has this flag.
With bash, you can first do a shopt -s globstar to enable recursive matching, and then the pattern **/*.php will expand to all the files in the current directory tree that have a .php extension.
zsh and ksh93 also support this syntax. Other commands that take a glob pattern as an argument and do their own expansion of it (like your grep --include) likely won't.
With shell globing it is possible to only get directories by adding a / at the end of the glob, but there's no way to exclusively get files (zsh being an exception)
Illustration:
With the given tree:
file.php
inc.php/include.php
lib/lib.php
Supposing that the shell supports the non-standard ** glob:
**/*.php/ expands to inc.php/
**/*.php expands to file.php inc.php inc.php/include.php lib/lib.php
For getting file.php inc.php/include.php lib/lib.php, you cannot use a glob.
=> with zsh it would be **/*.php(.)
Standard work-around (any shell, any OS)
The POSIX way to recursively get the files that match a given standard glob and then apply a command to them is to use find -type f -name ... -exec ...:
ls -l <all .php files> would be:
find . -type f -name '*.php' -exec ls -l {} +
grep "finde me" <all .php files> would be:
find . -type f -name '*.php' -exec grep "finde me" {} +
cp <all .php files> ~/destination/ would be:
find . -type f -name '*.php' -type f -exec sh -c 'cp "$#" ~/destination/' _ {} +
remark: This one is a little more tricky because you need ~/destination/ to be after the file arguments, and find's syntax doesn't allow find -exec ... {} ~/destination/ +
Suggesting different strategy:
Use explicit find command to build bash command(s) on the selected files using -printf option.
Inspect the command for correctness and run.
1. preparing bash commands on selected files
find . -type f -name "*.php" -printf "cp %p ~/destination/ \n"
2. inspect the output, correct command, correct filter, test
cp ./file2.php ~/destination/
cp ./inc/file4.php ~/destination/
cp ./inc.php/file5.php ~/destination/
3. execute prepared find output
bash <<< $(find . -type f -name "*.php" -printf "cp %f ~/destination/ \n")

append epoch date at the beginning of a file in bash

I have a list of 20 files, 10 of them already have 1970-01-01- at the beginning of the name and 10 does not ( the remaining ones all start with a small letter ) .
So my task was to rename those files that do not have the epoch date in the beginning with the epoch date too. Using bash, the below code works, but I could not solve it using a regular expression for example using rename. I had to extract the basename and then further mv. An elegant solution would be just use one pipe instead of two.
Works
find ./ -regex './[a-z].*' | xargs -I {} basename {} | xargs -I {} mv {} 1970-01-01-{}
Hence looking for a solution with just one xargs or -exec?
You can just use a single rename command:
rename -n 's/^([a-z])/1970-01-01-$1/' *
Assuming you're operating on all the files present in current directory.
Note that -n flag (dry run) will only show intended actions by rename command but won't really rename any files.
If you want to combine with find then use:
find . -type f -maxdepth 1 -name '[a-z]*.txt' -execdir rename -n 's/^/1970-01-01-/' {} +
I always prefer readable code over short code.
r() {
base=$(basename "$1")
dir=$(dirname "$1")
if [[ "$base" =~ ^1970-01-01- ]]
then
: "ignore, already has correct prefix"
else
echo mv "$1" "$dir/1970-01-01-$base"
fi
}
export -f r
find . -type f -exec bash -c 'r {}' \;
This also just prints out what would have been done (for testing). Remove the echo before the mv to have to real thing.
Mind that the mv will overwrite existing files (if there is a ./a/b/c and an ./a/b/1970-01-01-c already). Use option -i to mv to be save from this.

Regex to rename all files recursively removing everything after the character "?" commandline

I have a series of files that I would like to clean up using commandline tools available on a *nix system. The existing files are named like so.
filecopy2.txt?filename=3
filecopy4.txt?filename=33
filecopy6.txt?filename=198
filecopy8.txt?filename=188
filecopy3.txt?filename=19
filecopy5.txt?filename=1
filecopy7.txt?filename=5555
I would like them to be renamed removing all characters after and including the "?".
filecopy2.txt
filecopy4.txt
filecopy6.txt
filecopy8.txt
filecopy3.txt
filecopy5.txt
filecopy7.txt
I believe the following regex will grab the bit I want to remove from the name,
\?(.*)
I just can't figure out how to accomplish this task beyond this.
A bash command:
for file in *; do
mv $file ${file%%\?filename=*}
done
find . -depth -name '*[?]*' -exec sh -c 'for i do
mv "$i" "${i%[?]*}"; done' sh {} +
With zsh:
autoload zmv
zmv '(**/)(*)\?*' '$1$2'
Change it to:
zmv -Q '(**/)(*)\?*(D)' '$1$2'
if you want to rename dot files as well.
Note that if filenames may contain more than one ? character, both will only trim from the rightmost one.
If all files are in the same directory (ignoring .dotfiles):
$ rename -n 's/\?filename=\d+$//' -- *
If you want to rename files recursively in a directory hierarchy:
$ find . -type f -exec rename -n 's/\?filename=\d+$//' {} +
Remove -n option, to do the renaming.
I this case you can use the cut command:
echo 'filecopy2.txt?filename=3' | cut -d? -f1
example:
find . -type f -name "*\?*" -exec sh -c 'mv $1 $(echo $1 | cut -d\? -f1)' mv {} \;
You can use rename if you have it:
rename 's/\?.*$//' *
I use this after downloading a bunch of files where the URL included parameters and those parameters ended up in the file name.
This is a Bash script.
for file in *; do
mv $file ${file%%\?*};
done

BASH - find specific folder with find and filter with regex

I have a folder containing many folders with subfolder (/...) with the following structre:
_30_photos/combined
_30_photos/singles
_47_foo.bar
_47_foo.bar/combined
_47_foo.bar/singles
_50_foobar
With the command find . -type d -print | grep '_[0-9]*_' all folder with the structure ** will be shown. But I have generate a regex which captures only the */combined folders:
_[0-9]*_[a-z.]+/combined but when I insert that to the find command, nothing will be printed.
The next step would be to create for each combined folder (somewhere on my hdd) a folder and copy the content of the combined folder to the new folder. The new folder name should be the same as the parent name of the subfolder e.g. _47_foo.bar. Could that be achieved with an xargs command after the search?
You do not need grep:
find . -type d -regex ".*_[0-9]*_.*/combined"
For the rest:
find . -type d -regex "^\./.*_[0-9]*_.*/combined" | \
sed 's!\./\(.*\)/combined$!& /somewhere/\1!' | \
xargs -n2 cp -r
With basic grep you will need to escape the +:
... | grep '_[0-9]*_[a-z.]\+/combined'
Or you can use the "extended regexp" version (egrep or grep -E [thanks chepner]) in which the + does not have to be escaped.
xargs may not be the most flexible way of doing the copying you describe above, as it is tricky to use with multiple commands. You may find more flexibility with a while loop:
... | grep '_[0-9]*_[a-z.]\+/combined' | while read combined_dir; do
mkdir some_new_dir
cp -r ${combined_dir} some_new_dir/
done
Have a look at bash string manipulation if you want a way to automate the name of some_new_dir.
target_dir="your target dir"
find . -type d -regex ".*_[0-9]+_.*/combined" | \
(while read s; do
n=$(dirname "$s")
cp -pr "$s" "$target_dir/${n#./}"
done
)
NOTE:
this fails if you have linebreaks "\n" in your directory names
this uses a subshell to not clutter your env - inside a script you don't need that
changed the regex slightly: [0-9]* to [0-9]+
You can use this command:
find . -type d | grep -P "_[0-9]*_[a-z.]+/combined"

How can I exclude directories matching certain patterns from the output of the Linux 'find' command?

I want to use regex's with Linux's find command to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to an xargs command to do certain processing on all of the matching files. I can pipe the find output through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried using find's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and using xargs -0 to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimited find through the piped grep filters successfully; grep -Z didn't seem to help in that respect.
So I figured I'd just write a better regex for find and do away with the intermediary grep filters... perhaps sed would be an alternative?
In any case, for the following small sampling of directories...
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.
BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:
mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;
This command finds all of the .h, .c, and .cpp files...
find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"
...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using ls as a stand-in for what I actually want to do with the output):
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls
ls: ./barney/generated/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/src/bam: No such file or directory
ls: bam.cpp: No such file or directory
ls: ./barney/deploy/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/inc/bam: No such file or directory
ls: bam.h: No such file or directory
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
So I can enhance that with the -print0 and -0 args to find and xargs:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls
barney fred
...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for find and this is the best I could come up with:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.
Any ideas how I can make this work? This is the output I want:
$ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls
./barney/src/bam bam.cpp
./barney/inc/bam bam.h
./fred/src/dino.cpp
./fred/inc/dino.h
...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.
Thanks in advance!
-Mark
This works for me:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
-not -path '*/deploy/*' -print0 | xargs -0 ls -L1d
Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.
The event not found is because ! is being interpreted as a request for history expansion by bash. The fix is to use single quotes instead of double quotes.
Pop quiz: What characters are special inside of a single-quoted string in sh?
Answer: Only ' is special (it ends the string). That's the ultimate safety.
grep with -Z (sometimes known as --null) makes grep output terminated with a null character instead of newline. What you wanted was -z (sometimes known as --null-data) which causes grep to interpret a null character in its input as end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0, which adds a null character after each file name instead of a newline.
If you had done it this way:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld
Then the input and output of grep would have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cpp and started getting "mysteriously" excluded by your script.
Incidentally, here's a nicer way to generate your testcase file set.
while read -r file ; do
mkdir -p "${file%/*}"
touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA
Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.
Your command:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
fails because you are trying to use Posix extended regular expressions, which dont support lookaround/lookbehind etc. https://superuser.com/a/596499/658319
find does support pcre, so if you convert to pcre, this should work.