How can I move all folders in a directory to subdirectories based on their value? - regex

A legacy web application I've inherited control over seems to have hit the maximum number of subdirectories in a folder on my web server. Whenever an article is created in the system, it's static content is placed in a subdirectory of the document root matching the pattern /uploads/story/{STORY_ID}/. But now the system is unable to create any new directories in the /uploads/story/ folder.
I'd like to address this in 2 steps, but I'm not sure how to run the necessary linux commands to achieve this. Would you be able to help?
As a temporary fix to buy me more time to implement a better directory structure, I'd like to archive the static content of all stories with a STORY_ID of less than 1000. These should be moved from /uploads/story/ to /uploads/story_archive/.
I'll change the upload path to be /uploads/story/{THOUSANDS}/{STORY_ID}/ in the code, but will need to be able to move all folders within /uploads/story/ into this format. e.g. /uploads/story/65312/ would become /uploads/story/65/65312/. How can I do this?
Edit
Fixing (1) was as simple as running:
$ cd /path/to/uploads/
$ mkdir story_archive
$ for i in {1..999}; do mv story/$i story_archive/; done

Given that you are sure /uploads/story/* will give you a number in the * part, you can do the following (Note: backup the whole thing just in case):
# update this based on your actual directory
path_to_fix=/uploads/stories
# move the directories out of the way so they don't get mixed up
mv $path_to_fix $path_to_fix/../temp
mkdir $path_to_fix > /dev/null 2>&1
# get all directories to be moved
dirs=$(ls $path_to_fix/../temp)
# for each of them
for d in $dirs; do
# get the basename, which is store_id
id=$(basename $d)
# divide by 1000
sub=$((id / 1000))
# create the appropriate directory
mkdir $path_to_fix/$sub > /dev/null 2>&1
# move the original directory to that sub-directory
mv $path_to_fix/../temp/$d $path_to_fix/$sub
done
# cleanup
rm -Rf $path_to_fix/../temp

for the temporary fix you can use for loop and a glob
for path in /uploads/story/*; do
storyid=${path##*/}
if [[ ${storyid} -lt 1000 ]]; then
mv "${path}" /uploads/story_archive/${storyid}
fi
done
that will iterate over all directories in /uploads/story/ directory with the full path in $path var.
the bash construct ${variable##pattern} will remove the longest substring matching pattern from the left hand side of variable, in this case all the leading directories leaving just the storyid to get stored in the var.
we then check to see if storey id is less than 1000 and move it to the story archive.
The next bit.
for path in /uploads/story/*; do
storyid=${path##*/}
if [[ $storyid -ge 1000 ]]; then
thous=${storyid%%???}
[[ -d/uploads/story/$thous/ ]] || mkdir /uploads/story/$thous/
mv $path /uploads/story/$thous/
fi
done
ok here again we iterate over all the directories and pluck off the story id. this time though we make sure that the storyid is greater than or equal to 1000 and use the %%??? to remove the last three characters from storyid ( the same as the ## trick but from the right hand side of the variable. )
Then we see if the thousads dir exists and make it if it doesn't and move the dir over.
you could even do one sweep and do both tasks at once
for path in /uploads/story/*; do
storyid=${path##*/}
if [[ $storyid -lt 1000 ]]; then
mv "${path}" /uploads/story_archive/${storyid}
else
thous=${storyid%%???}
[[ -d/uploads/story/$thous/ ]] || mkdir /uploads/story/$thous/
mv $path /uploads/story/$thous/
fi
done

Related

Can git rm take a regex or can I pipe the contents of a file to git rm?

I'm trying to remove all of the folder meta files from a unity project in the git repo my team is using. Other members don't delete the meta file associated to the folder they deleted/emptied and it's propagating to everyone else. It's a minor annoyance that shouldn't need to be seen so I've added this to the .gitignore:
*.meta
!*.*.meta
and now need to remove only the folder metas. I'd rather remove the metas now than wait for them to appear and have git remove them later. I'm using git bash on Windows and have tried the following commands to find just the folder metas:
find . -name '*.meta' > test.txt #returns folders and files
find . -regex '.*\.meta' > test.txt #again folders and files
find . -regex '\.[^\.]{0,}\.meta' > test.txt #nothing
find . -regex '\.[^.]{0,}\.meta' > test.txt #nothing
find . -regex '\.{2}' > test.txt #nothing
find . -regex '(\..*){2}' > test.txt #nothing
I know regex is interpreted differently per program/language but the following will produce the results I want in Notepad++ and I'm not sure how to translate it for git or git bash:
^.*/[^.]{0,}\.meta$
by capturing the lines (file paths from root of repo) that end with a /<foldername>.meta since I realized some folders contained a '.' in their name.
Once this is figured out I need to go line by line and git rm the files.
NOTE
I can also run:
^.*/.*?\..*?\.meta$\n
and replace with nothing to delete all of the file metas from the folders and files result, and use that result to get all of the folder metas, but I'd also like to know how to avoid needing Notepad++ as an extra step.
To confine the results only to indexed files use git ls-files, the swiss-army knife of index-aware file listing. git update-index is the core-command index munger,
git ls-files -i -x '*.meta' -x '!*.*.meta' | git update-index --force-remove --stdin
which will remove the files from your index but leave them in the work tree.
It's easier to express with two conditions just like in .gitignore. Match *.meta but exclude *.*.meta:
find . -name '*.meta' ! -name '*.*.meta'
Use -exec to run the command of your choice on the matched files. {} is a placeholder for the file names and ';' signifies the end of the -exec command (weird syntax but it's useful if you append other things after the -exec ... ';').
find . -name '*.meta' ! -name '*.*.meta' -exec git rm {} ';'

Script to place files in folder by extracting date from filename

I know this has been asked many times, i am terrible with bash and i do not understand the regex format for it. Figured i'd ask for help..
I have a security camera which writes files to a folder in this format:
MDalarm_20170320_084514.mkv
so it goes -- MDalarm_yearmonthday_hourminutesecond.mkv
I want to create a cronjob that will run a script to clean this up, by doing the following:
Taking the files and placing them in a folder for year/month/day then renaming the file to the time only ie: 08_26_15.mkv, even 082615.mkv would be fine if too much of a hassle.
So in the example of MDalarm_20170320_084514.mkv
it should produce
/2017/03/20/08_45_14.mkv
or similar.
The files will be placed in the root folder as they come and the script will run once/twice a day on the folder for cleanup.
I'm decent with regex in php/js/etc.. but the bash one i completely do not understand well enough to get this done. I sincerely appreciate the help.
Cheers!
Use this to make the desired file name
$echo MDalarm_20170320_084514.mkv | sed -E "s/^MDalarm_[[:digit:]]{8}_//"
084514.mkv
and this to make the desired folder name
$echo MDalarm_20170320_084514.mkv | sed -E "s/^MDalarm_([[:digit:]]{4})([[:digit:]]{2})([[:digit:]]{2})_.*$/\/\1\/\2\/\3/"
/2017/03/20
Use them in shell commands to make folder (if needed) and copy/rename/move file.
This is what i ended up with and it works, thank you Yunnosch for the regex.
#!/bin/bash
if [[ `ls | grep -c mkv` == 0 ]]
then
echo "NO MKV FILES"
else
for f in *.mkv; do
name=`echo "$f"| sed -E "s/^MDalarm_[[:digit:]]{8}_([[:digit:]]{2})([[:digit:]]{2})([[:digit:]]{2})(.*)$/\1h-\2m-\3s\4/"`
dir=`echo "$f" | sed -E "s/^MDalarm_([[:digit:]]{4})([[:digit:]]{2})([[:digit:]]{2})_.*$/\1\/\2\/\3/"`
mkdir -p "$dir"
mv "$f" "$name"
mv "$name" "$dir"
done
fi
Once someone wrote the regex out i figured out the format, different yet similar.

Grep for multiple patterns in a folder containing n number of files.And if a match found for patterns create mkdir

Can we grep for multiple patterns in a folder containing n number of files. And if a match found for each and every pattern create a directory and push the files of similar pattern type into same directory likewise the others.
For example : I am having a folder name : X. X can have multiple sub folders and multiple files inside them.
I want to search for a pattern like This code is from. If a match of this string is found in multiple files in X folder create a directory named dir1 and push all the matched files into dir1.
And the same for other patterns matches also if the matches are found create directories and push the files into respective directories.
I tried of searching with grep can found all pattern matched files but parallely I can't do mkdir . In this way for n matches of patterns in X n dir it should create. Searching is fine but having issue with directories creation parallely.
one way to get the same folder structure is, unfortunately, not to use xargs cp -t dir, but instead copy one-by-one with rsync, e.g.,
grep -irl "Version" | xargs -I{} rsync -a "{}" "dir/{}"
I mean, it's not elegant, but you could use embedded for loops with an array of search strings.
EDIT: Missed the part about separate folders for different match strings. Changes are below.
#!/bin/bash
#Assuming:
#patarr is an array containing all paterns
#test/files is the location of files to be searched
#test/found is the location of matching files
for file in test/files/* #*/
#This loop runs for every file in test/files/. $file holds the filename of the current file
do
for ((i=0;i<${#patarr[#]};i++))
#This loop runs once for every index in the patarr array. $i holds the current loop number
do
if [[ $(cat $file | grep ${patarr[$i]} | wc -l) -gt 0 ]]
#if grep finds at least one match using the pattern in patarr with index "i"
then
#cp $file temp/found/ #Old code, see edit above
mkdir -p temp/found/${pararr[$i]}
#Makes a folder with the name as our search string. -p means no error if the folder already exists.
cp $file temp/found/${pararr[$i]}/
#Copies the file into said folder
fi
done
done

Bash: Moving Files with Regex that Matches Pattern of Script Arugment and Fixed Text

I'm trying to write a backup script that takes a directory and directory/file name as arguments to the script. The problem is curating the target directory of the backups. For safety, I'm currently moving the files into MacOS ~/.Trash/. The problem is that I want to support having spaces in the target directory's file name, but escaping the path in mv prevents shell expansion of *.
The script in question:
# Usage: backup-to-dropbox.sh "path/containing/target" "target dir"
cd "$1"
DATE=`date "+%Y%m%dT%H%M%S"`
SOURCE="$2"
mv "~/Dropbox/Backups/$SOURCE*.tgz" ~/.Trash/ ## Problem line here
tar -czf "$SOURCE $DATE.tgz" "$SOURCE/"
mv "$SOURCE $DATE.tgz" ~/Dropbox/Backups/
How can I match all the files with this known, arbitrary prefix and fixed extension?
Words can be partially quoted. Be sure not to quote anything expanded by the shell.
mv ~/"Dropbox/Backups/$SOURCE"*.tgz ~/.Trash/

How can I remove everything except listed files with the Find-command?

I have a list of hidden files in a file "list_files" that should not be removed in the current directory. How can remove everything except them with a Find-command? I tried, but it clearly does not work:
find . -iname ".*" \! -iname 'list_files'
Removing files should always be done safely.
I presume you have a directory tree with hidden files and a sub-set list of these hidden files which you want to retain. You want to delete all other hidden files.
Lets start with the list of hidden files.
find `pwd` -iname ".*" -type f > all-hidden-files.txt
Now, assume that you have some filter which will reduce the list to all files that you want to retain (creating your list_files). Here SomeFilter could be you manually editing the files list to retain the ones you do not want to delete.
SomeFilter all-hidden-files.txt > list_files
The next command will identify lines in all-hidden-files.txt that are missing from the list_files, which gives you files that can be removed.
comm -3 all-hidden-files.txt list_files > removable-files.txt
Edit: Just realized that the input files to comm should be sorted. So use this as,
comm -3 <(sort all-hidden-files.txt | uniq) <(sort list_files | uniq) \
> removable-files.txt
You can confirm this works well for you and then delete the list of files generated with something like,
for i in $(<removable-files.txt); do rm $i; done;
You can do this by invoking exec with a bash script so something like this :-
find . -iname ".*" -exec bash -c "fgrep {} /tmp/list_files >/dev/null || rm -i {}" \;
Be very careful how your construct your list of files. The list of files to exclude must be identical to the output produced by find or you will delete all files matching your pattern.
I have put the interactive option on to rm and you may wish to use this for testing. If you wish to remove directories with this technique then you will need to modify the rm options.
You may wish to construct your list of files using find from the same folder you will use to run the find to ensure the exclusions will be honoured although an absolute rather than a relative path would be a better i.e. safer option, so your find would become
find /some/folder/name -name "some pattern" -exec ....
create a temporary directory in your source directory, move everything to the temp directory, move the files you want to save back up to the original place, and then recursively remove the temp directory. Since the moves are all on one filesystem, it should be almost instantaneous with any decent filesystem, and this is pretty safe.