Find directories with names matching pattern and move them - regex

I have a bunch of directories like 001/ 002/ 003/ mixed in with others that have letters in their names. I just want to grab all the directories with numeric names and move them into another directory.
I try this:
file */ | grep ^[0-9]*/ | xargs -I{} mv {} newdir
The matching part works, but it ends up moving everything to the newdir...

I am not sure I understood correctly but here is at least something to help.
Use a combination of find and xargs to manipulate lists of files.
find -maxdepth 1 -regex './[0-9]*' -print0 | xargs -0 -I'{}' mv "{}" "newdir/{}"
Using -print0 and -0 and quoting the replacement symbol {} make your script more robust. It will handle most situations where non-printable chars are presents. This basically says it passes the lines using a \0 char delimiter instead of a \n.

mv is not powerfull enough by itself. It cannot work on patterns.
Try this approach: Rename multiple files by replacing a particular pattern in the filenames using a shell script
Either use a loop or a rename command.

With loop and array,
Your script would be something like this:
#!/bin/bash
DIR=( $(file */ | grep ^[0-9]*/ | awk -F/ '{print $1}') )
for dir in "${DIR[#]}"; do
mv $dir /path/to/DIRECTORY
done

Related

Pass sed output to mv

I'm trying to batch rename text files according to a string they contain.
I used sed to isolate the pattern with \( and \) as I couldn't get this to work in grep.
sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt | mv *.txt $sed.txt
(the text I want to use as filename is between html title tags)`
Where I wrote $sed would be the output of sed.
hope that's clear!
A simple loop in bash can accomplish this. If each file is valid HTML, meaning you have only one <title> tag in the file, you can rename them all this way:
for file in *.txt; do
mv "$file" `sed -n 's/<title>\([^<]*\)<\/title>/\1/p;' $file| sed -e 's/[ ][ ]*/_/g'`.txt
done
So, if you have files 1.txt, 2.txt and 3.txt, each with cat, dog and my hippo in their TITLE tags, you'll end up with cat.txt, dog.txt and my_hippo.txt after the above loop.
EDIT: quoted initial $file in case there are spaces in filenames; and added a second sed to convert any spaces in the <title> tag to _'s in resulting filenames. NOTE the whitespace inside the []'s in the second sed command is a literal space and tab character.
You can enclose expression in grave accent characters (`) to make it insert its output to the place you want. Try:
mv *.txt `sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt`.txt
It is rather not flexible, but should work.
(I haven't used it in a while and cannot test it now, so I might be wrong).
Here is the command I would use:
for i in *.txt ; do
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
done
The sed substitution search for pattern in each one of your .txt files. For each file it creates string mv 'file_name' 'found_pattern'.
With the e command at the end of sed commands, this resulting string is directly executed in terminal, thus it renames your files.
Some hints:
Note the use of =s instead of /s as delimiters for sed substition: it's more readable as you already have /s in your pattern (you could use many other symbols if you don't like =). And in this way you don't have to escape the / in your pattern.
The e command for sed executes the created string.
(I'm speaking of this one below:
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
^
)
So use it with caution! I would recommand to first use the line without final e: it won't execute any mv command, but just print instead what would be executed if you were to add the e.
What I read from your question is:
you have a number of text (html) files in a directory
each file contains at least the tag <title> ... </title>
you want to extract the content (elements.text) and use it as filename
last you want to rename that file to the extracted filename
Is this correct?
So, then you need to loop through the files, e.g. with xargs or find
ls '*.txt' | xargs -i\{\} command "{}" ...
find -maxdepth 1 -type f -name '*.txt' -exec command "{}" ... \;
I always replace the xargs substitues by -i\{\} because the resulting command is compatible if I use it sometimes with find and its substitute {}.
Next the -maxdepth option will help find not to dive deeper in directory, if no subdir, you can leave it out.
command could be something very simple like echo "Testing File: {}" or a really small script if you use it with bash:
find . -name '*.txt' -exec bash -c 'CUR_FILE="{}"; echo "Working on: $CUR_FILE"; ls -l "$CUR_FILE";' \;
The big decision for your question is: how to get the text from title element.
A simple solution (suitable if opening and closing tag is on same textline) would be by grep
A more solid solution is to use a HTML Parser and navigate by DOM operation
The simple solution base on:
get the title line
remove the everything before and after title content
So do it together:
ls *.txt | xargs -i\{\} bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"'
Same with usage of find:
find . -maxdepth 1 -type f -name '*.txt' -exec bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"' \;
Hopefully it is what you expected.

Regex grep file contents and invoke command

I have a file that has been generated containing MD5 info along with filenames. I'm wanting to remove the files from the directory they are in. I'm not sure how to go about doing this exactly.
filelist (file) contains:
MD5 (dupe) = 1fb218dfef4c39b4c8fe740f882f351a
MD5 (somefile) = a5c6df9fad5dc4299f6e34e641396d38
my command (which i would like to include with rm) looks like this:
grep -o "\((.*)\)" filelist
returns this:
(dupe)
(somefile)
*almost good, although the parentheses need to be eliminated (not sure how). I tried using grep -Po "(?<=\().*(?=\))" filelist using a lookahead/lookaround, but the command didn't work.
The next thing I would like to do is take the output filenames and delete them from the directory they are in. I'm not sure how to script it, but it would essentially do:
<returned results from grep>
rm dupe $target
rm somefile $target
If I understand correctly, you want to take lines like these
MD5 (dupe) = 1fb218dfef4c39b4c8fe740f882f351a
MD5 (somefile) = a5c6df9fad5dc4299f6e34e641396d38
extract the second column without the parentheses to get the filenames
dupe
somefile
and then delete the files?
Assuming the filenames don't have spaces, try this:
# this is where your duplicate files are.
dupe_directory='/some/path'
# Check that you found the right files:
awk '{print $2}' file-with-md5-lines.txt | tr -d '()' | xargs -I{} ls -l "$dupe_directory/{}"
# Looks ok, delete:
awk '{print $2}' file-with-md5-lines.txt | tr -d '()' | xargs -I{} rm -v "$dupe_directory/{}"
xargs -I{} means to replace the argument (dupe filename) with {} so it can be used in a more complex command.
The tool you're looking for is xargs: http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
It's pretty standard on *nix systems.
UPDATE: Given that target equals the directory where the files live...
I believe the syntax would look something like:
yourgrepcmd | xargs -I{} rm "$target{}"
The -I creates a placeholder string, and each line from your grep command gets inserted there.
UPDATE:
The step you need to remove the parens is a little use of sed's substitution command (http://unixhelp.ed.ac.uk/CGI/man-cgi?sed)
Something like this:
cat filelist | sed "s/MD5 (\([^)]*\)) .*$/\1/" | xargs -I{} rm "$target/{}"
The moral of the story here is, if you learn to utilize sed and xargs (or awk if you want something a little more advanced) you'll be a more capable linux user.

Finding soft links with grep

I'm writing a quick program that lists all of the soft/symbolic links in the working directory to a file which is given in argument 1. I'm aware that I need to use grep in order to do so, but in general I have difficulty figuring out how to write the regular expression. In this case, it is especially difficult due to the fact that a variable ($argv[1]) is involved.
The (poorly-written) line of code in question is as follows:
ls -l | xargs grep '-> $argv[1]'
My intention with this was to catch all of the lines that contained the -> and the specified file, such as
link1 -> file
link2 -> file
and so on. Is there any way that I can use grep to accomplish this?
What kind of language is $argv[1]? The (POSIX) Bourne Shell doesn't support arrays. Arguments to scripts and functions are referenced by $1, $2 and so on.
In order for grep to not treat the first hyphen in the pattern as an option, use -- to signal the end of options. Next, there is no parameter substitution in single quotes, only in double qouotes. Putting it all together, this might work:
set somename # Sets $1 to somename
ls -l | xargs grep -- "-> $1"
If your grep doesn't understand --, try
ls -l | xargs grep ".*-> $1"
Below script can find only the soft / symbolic link files and list only if the argument found on those files,
# cat sygrep.sh
#!/bin/bash
if [ $# -eq 0 ]
then
echo "No arguments supplied"
else
for a in `find . -type l` ; do grep -irl '$1' $a ; done
fi
Output:
# ./sygrep.sh
No arguments supplied
# ./sygrep.sh root
./mytest.sh

Using grep to find dynamic text

Need help with a bash script. We are modifying our database structure, the problem is we have many live sites that have pre-written queries referencing the current database structure. I need to find all of our scripts with references to MySQL tables. Here is what I started:
grep -ir 'from' /var/www/sites/inspection.certifymyshop.com/ > resultsList.txt
I am trying to grep through our scripts recursively and export ALL table names found to a text file, we can use the "->from" and the "->join" prefixes to help us:
->from('databaseName.table_name dtn') // dtn = table alias
OR
->join('databaseName.table_name dtn') // dtn = table alias
I need to find the database and table name within the single quotes (i.e. databaseName.table_name). I also need to list the filename this was found in underneath or next to the match like so:
someDatabaseName.someTableName | /var/www/sites/blah.com/index.php
| line 36
Try doing this :
grep -oPriHn -- "->(?:from|join)\('\K[^']+" . |
awk -F'[ :]' '{print $3, "|", $1, "| line " $2}'
If this fits your needs, I can explain the snippet more as well.
The one problem you have with only using grep is removing the from, join or whatever identifying prefix from the result. To fix this we can also use sed
grep -EHroi -- '->(from|join)\('\''[^'\'' ]*' /path/to/files | sed -re 's/:.*(from|join)\('\''/:/g'
You could also use sed alone in a for loop
for i in `find /path/to/files -type f -print`
do
echo $i
sed -nre 's/^.*->(from|join)\('\''([^'\'' ]*)['\'' ].*$/\2/gp' $i
done
Edit: The above for loop breaks with filenames with spaces so here's the previous sed statement using find
find ./ -type f -exec sh -c "echo {} ; sed -nre 's/^.*->(from|join)\('\''([^'\'' ]*)['\'' ].*$/\2/gp' \"{}\" ;" \;

BASH - find specific folder with find and filter with regex

I have a folder containing many folders with subfolder (/...) with the following structre:
_30_photos/combined
_30_photos/singles
_47_foo.bar
_47_foo.bar/combined
_47_foo.bar/singles
_50_foobar
With the command find . -type d -print | grep '_[0-9]*_' all folder with the structure ** will be shown. But I have generate a regex which captures only the */combined folders:
_[0-9]*_[a-z.]+/combined but when I insert that to the find command, nothing will be printed.
The next step would be to create for each combined folder (somewhere on my hdd) a folder and copy the content of the combined folder to the new folder. The new folder name should be the same as the parent name of the subfolder e.g. _47_foo.bar. Could that be achieved with an xargs command after the search?
You do not need grep:
find . -type d -regex ".*_[0-9]*_.*/combined"
For the rest:
find . -type d -regex "^\./.*_[0-9]*_.*/combined" | \
sed 's!\./\(.*\)/combined$!& /somewhere/\1!' | \
xargs -n2 cp -r
With basic grep you will need to escape the +:
... | grep '_[0-9]*_[a-z.]\+/combined'
Or you can use the "extended regexp" version (egrep or grep -E [thanks chepner]) in which the + does not have to be escaped.
xargs may not be the most flexible way of doing the copying you describe above, as it is tricky to use with multiple commands. You may find more flexibility with a while loop:
... | grep '_[0-9]*_[a-z.]\+/combined' | while read combined_dir; do
mkdir some_new_dir
cp -r ${combined_dir} some_new_dir/
done
Have a look at bash string manipulation if you want a way to automate the name of some_new_dir.
target_dir="your target dir"
find . -type d -regex ".*_[0-9]+_.*/combined" | \
(while read s; do
n=$(dirname "$s")
cp -pr "$s" "$target_dir/${n#./}"
done
)
NOTE:
this fails if you have linebreaks "\n" in your directory names
this uses a subshell to not clutter your env - inside a script you don't need that
changed the regex slightly: [0-9]* to [0-9]+
You can use this command:
find . -type d | grep -P "_[0-9]*_[a-z.]+/combined"