How to move many files in multiple different directories (on Linux) - regex

My problem is that I have too many files in single directory. I cannot "ls" the directory, cos is too large. I need to move all files in better directory structure.
I'm using the last 3 digits from ID as folders in reverse way.
For example ID 2018972 will gotta go in /2/7/9/img_2018972.jpg.
I've created the directories, but now I need help with bash script. I know the IDs, there are in range 1,300,000 - 2,000,000. But I can't handle regular expressions.
I wan't to move all files like this:
/images/folder/img_2018972.jpg -> /images/2/7/9/img_2018972.jpg
I will appreciate any help on this subject. Thanks!

EDIT: after explainations in comments the following assumptions exists:
filenames are in the form of img_<id>.jpg or img_<id>_<size>.jpg
the new dir is the reverse order of the three last digits of the id
using Bash:
for file in /images/folder/*.jpg; do
fname="${file%.*}" # remove extension and _<size>
[[ "$fname" =~ img_[0-9]+_[0-9]+$ ]] && fname="${fname%_*}"
last0="${fname: -1:1}" # last letter/digit
last1="${fname: -2:1}" # last but one letter/digit
last2="${fname: -3:1}" # last but two letter/digit
newdir="/images/$last0/$last1/$last2"
# optionally check if the new dir exists, if not create it
[[ -d "$newdir" ]] || mkdir -p "$newdir"
mv "$file" "$newdir"
done
if * can't handle it (although I think * in a for loop has no limits),
use find as suggested by #MichaƂ Kosmulski in the comments
while read -r; do
fname="${REPLY%.*}" # remove extension and _<size>
[[ "$fname" =~ img_[0-9]+_[0-9]+$ ]] && fname="${fname%_*}"
last0="${fname: -1:1}" # last letter/digit
last1="${fname: -2:1}" # last but one letter/digit
last2="${fname: -3:1}" # last but two letter/digit
newdir="/images/$last0/$last1/$last2"
# optionally check if the new dir exists, if not create it
[[ -d "$newdir" ]] || mkdir -p "$newdir"
mv "$REPLY" "$newdir"
done < <(find /images/folder/ -maxdepth 1 -type f -name "*.jpg")

find /images/folder -type f -maxdepth 1 | while read file
do
filelen=${#file}
((rootn=$filelen-5))
((midn=$filelen-6))
((topn=$filelen-7))
root=${file:$rootn:1}
mid=${file:$midn:1}
top=${file:$topn:1}
mkdir -p /images/${root}/${mid}/${top}
mv $file /images/${root}/${mid}/${top}
done

Related

Change directory to a folder whose name is in regex notation

I'm in a directory that contains many folders like these:
At_5.2000_displacement
At_-6.4000_displacement
At_2nd_-4.3000_displacement
At_2nd_2.2000_displacement
I am writing a bash script that is cd-ing to each of the folders of the type:
At_X.XXXX_displacement
i.e., I would like to
cd At_5.2000_displacement
and
cd At_-6.4000_displacement folders. This leaves out the folders of the type:
At_2nd_X.XXXX_displacement
My attempts (Edited):
I found the reg-ex that yields the name of the target folders, which is the following:
At_-\?[0-9][.].*displacement
which yields:
At_5.2000_displacement
At_-6.4000_displacement
Now, in the bash script, every time I do:
cd /path_to_the_folders/At_-\?[0-9][.].*displacement
There is no way of accessing each folder, since the error received is:
trial_cding.sh: line 11: cd: /path_to_the_folders/At_-?[0-9][.].*displacement: No such file or directory
How can I make the line cd /path_to_the_folders/At_-\?[0-9][.].*displacement to work ?
The implementation of #dawg's answer via the if statement (not using the shortcut), as far as I understood, would be something like:
for pn in *; do
if [[ ! -d "$pn" && $pn =~ At_[0-9.-]*_displacement ]]
then
echo No desired directory found
else
continue
cd $pn
pwd
cd -
fi
done
However, this returns no results. What am I getting wrong here?
Also, the I am not quite sure how to use parenthesis in cd $pn in order to avoid going back
Regarding the [0-9.-] match...
In the end we are looking for positive or negative decimal numbers.
In other words, we are looking for any number from (-)0 to (-)9, and followed by a .
This makes me think that [-0-9.] is the most intuitive instruction.
Surprisingly, it happens that the following three also do match:
[0-9.-]
[0-9-.]
[-.0-9]
Please check https://regex101.com/r/G3fXo5/1, where I show the matches. This makes me think that the matching criteria inside [ ] is quite broad.
So, I f I try [.-0-9] I get no matching results. Why is this happening? What is the rule behind the matching inside [ ] ?
Assuming X can be only a number, you can use extended globbing:
shopt -s extglob nullglob
for folder in At_?(-)[0-9].[0-9][0-9][0-9][0-9]_displacement/; do
#do stuff here
echo "$folder"
done
shopt -s nullglob is to ensure you won't loop if there are no files matching your pattern
shopt -s extglob enables extended globs
?(-) matches zero or one occurrence of -. In extended globs, the modifier comes before (pattern). See: Pattern matching
Advantage of this approach is looping only over the folders you really want. No additional checks needed.
Given:
$ ls -l
total 0
drwxr-xr-x 2 dawg wheel 68 Jan 4 10:42 At_-6.4000_displacement
drwxr-xr-x 2 dawg wheel 68 Jan 4 10:42 At_2nd_-4.3000_displacement
drwxr-xr-x 2 dawg wheel 68 Jan 4 10:42 At_2nd_2.2000_displacement
drwxr-xr-x 2 dawg wheel 68 Jan 4 10:42 At_5.2000_displacement
-rw-r--r-- 1 dawg wheel 0 Jan 4 10:51 file
You can use a regular Bash glob *, test to see each glob is a directory -d (vs some other OS object) and then test the string with a Bash regex:
for pn in *; do # You could use At_*_displacement glob to narrow if desired...
[[ -d "$pn" && $pn =~ At_[0-9.-]*_displacement ]] || continue
# ^^ a directory?
# ^^ and
# ^ ^^ match this regex
# OR ^
# continue (skip) to the next glob pattern in loop ^^
# do you Bash thing on this directory...
# you can cd "$pn" or operate on the directory directly
# ( use parenthesis for a sub shell and you don't need to cd back )
echo "$pn"
done
Prints:
At_-6.4000_displacement
At_5.2000_displacement
You can also use find with appropriate depth qualifiers and a regex:
$ find . -type d -maxdepth 1 -regex '\./At_[0-9.-]*_displacement'
./At_-6.4000_displacement
./At_5.2000_displacement
And then either use exec {} or xargs or feed that output to a Bash while loop.
For your last edit, something like this:
for pn in *; do
if [[ -d "$pn" && $pn =~ At_[0-9.-]*_displacement ]]
then
( # the ( creates a subshell so no need to cd back...
echo "Found \"$pn\"! Touching it!"
cd "$pn" # USE QUOTES!
# you are now in that sub directory
touch "dawg was here!" # create a file in the directory...
)
# exit sub shell -- back to the original directory
else
echo "\"$pn\" is not what we are looking for..."
fi
done
Be sure to use "$quotes" around expansions in Bash.
cd cannot change to more than one directory at a time. You need a for loop.
How about this:
for i in `ls | grep "^At_-\?[0-9][.].*displacement$"`; do
cd $i
# do what needs to be done
done

BASH find regex for arbitrary range of numbers in a large number of files

I am writing a BASH script that, among other things, copies files from one directory to another based on input arguments for the start and end dates. The filenames are of the format YYYYMMDDhhmmss.jpg, e.g. 20161230143922.jpg. I am using find ... -exec cp {} ... because there are tens of thousands of files in the source directory. The input arguments are the start and end date in the format YYYYMMDD.
I know that I can't do a simple range in the regex like ($startdate..$enddate), but I am unable to figure out how to programmatically generate a regex that would work. If I had fewer files I could simply do cp {$startdate..$enddate} destination, but alas I don't think that is feasible.
I would like to copy all files between $startdate and $enddate that fall between the hours of 0500 and 1700. This would include images like 20170102060635.jpg and 20170104131255.jpg, but not 20170103010022.jpg.
This is what I have so far:
#!/bin/bash
STARTDATE=$1
ENDDATE=$2
FILE_NAME="review-${STARTDATE}-${ENDDATE}.mp4"
if [[ -n "$STARTDATE" ]]; then
echo "STARTDATE: $STARTDATE"
else
echo "Invalid start date: '$STARTDATE'"
echo "Syntax: ./create_time_lapse_date_range.sh <startdate> <enddate>"
exit
fi
if [[ -n "$ENDDATE" ]]; then
echo "ENDDATE: $ENDDATE"
else
echo "Invalid end date: '$ENDDATE'"
echo "Syntax: ./create_time_lapse_date_range.sh <startdate> <enddate>"
exit
fi
cd ~/Desktop/test\ timelapse
# Copy relevant files to local directory
find ~/Desktop/originals -regex "???????????????" -exec cp {} ~/Desktop/test\ timelapse/ \;
# Rename files to be sequential serial numbers
find ~/Desktop/test\ timelapse -name "*.jpg" | awk 'BEGIN{ a=0 }{ printf "mv \"%s\" ~/Desktop/\"test\ timelapse/%06d.jpg\"\n", $0, a++ }' | bash
# Generate timelapse video
ffmpeg -framerate 25 -i %06d.jpg -c:v libx264 -r 25 ${FILE_NAME}
Regex isn't the best tool for dealing with numerical ranges, so you may need to consider a solution that incorporates some logic outside the regex itself. Something like this:
REGEX="([0-9]{8})([0-9]{4})"
for f in ~/Desktop/originals/*.jpg
do
if [[ $f =~ $regex ]]
then
datepart=${BASH_REMATCH[1]}
timepart=${BASH_REMATCH[2]}
#if the DATE part matches
if (( $STARTDATE <= $datepart )) && (( $datepart <= $ENDDATE ))
then
#if the TIME part matches
if [[ $timepart =~ "(0[5-9]|1[0-7])" ]]
then
# copy file ...
fi
fi
fi
done
Pure Regex Solution
If you really want a pure regex solution, this will help demonstrate the complexity. Here's a regex to find all the files in the 0500 to 1700 timeframe, for dates in January 2017: ^201701\d{2}(0[5-9]|1[0-7])\d{4}\.jpg$
Notice the regex pattern needed to match times from 0500 to 1700:
(0[5-9]|1[0-7])
It's not pretty, and that's with a hardcoded range. To deal with dynamic start and end dates, you would be building a similar pattern dynamically. It could be done, but why use regex for it?
Here's an example, showing what you would need to generate for a date range from 20161225 to 20170114:
^(201612(2[5-9]|3\d)|201701(0\d|1[0-4]))(0[5-9]|1[0-7])\d{4}\.jpg$

List down all sub-Directories in Bash based on some criteria

I'm writing a script which navigates all subdirs named something like 12, 98, etc., and checks that each one contains a runs subdir. The check is needed for subsequent operations in the script. How can I do that? I managed to write this:
# check that I am in a multi-grid directory, with "runs" subdirectories
for grid in ??; do
cd $grid
cd ..
done
However, ?? also matches stuff like LS, which is not correct. Any ideas on how to fix it?
Next step: in each directory named dd (digit/digit), I need to check that there is a subdirectory named runs, or exit with error. Any idea on how to do that? I thought of using find -type d -name "runs", but it looks recursively inside subdirs, which is wrong, and anyway if find doesn't find a match, I have no idea on how to catch that inside the script.
Loop over the directories, report the missing subdir:
for dir in [0-9][0-9]/ ; do
[[ -d $dir/runs ]] || { echo $dir ; exit 1 ; }
done
You can use character classes in glob patterns. The / (not \) after the pattern makes it match only directories, i.e. a file named 42 will be skipped.
The next line reads "$dir/runs is a directory, or report it". [[ ... ]] introduces a condition, see man bash for details. -d tests whether a directory exists. || is "or", you can rephrase the line as
if [[ ! -d $dir/runs ]] ; then
echo $dir
exit 1
fi
where ! stands for "not".
First find all directories with name runs using :
find . -type d -name runs
Note: to restrict to one level you can use find along with -maxdepth
From this extract the previous directory by removing the last word after /
try :
sed 's,/*[^/]\+/*$,,'

BASH: How to rename lots of file insertnig folder name in middle of filename

(I'm in a Bash environment, Cygwin on a Windows machine, with awk, sed, grep, perl, etc...)
I want to add the last folder name to the filename, just before the last underscore (_) followed by numbers or at the end if no numbers are in the filename.
Here is an example of what I have (hundreds of files needed to be reorganized) :
./aaa/A/C_17x17.p
./aaa/A/C_32x32.p
./aaa/A/C.p
./aaa/B/C_12x12.p
./aaa/B/C_4x4.p
./aaa/B/C_A_3x3.p
./aaa/B/C_X_91x91.p
./aaa/G/C_6x6.p
./aaa/G/C_7x7.p
./aaa/G/C_A_113x113.p
./aaa/G/C_A_8x8.p
./aaa/G/C_B.p
./aab/...
I would like to rename all thses files like this :
./aaa/C_A_17x17.p
./aaa/C_A_32x32.p
./aaa/C_A.p
./aaa/C_B_12x12.p
./aaa/C_B_4x4.p
./aaa/C_A_B_3x3.p
./aaa/C_X_B_91x91.p
./aaa/C_G_6x6.p
./aaa/C_G_7x7.p
./aaa/C_A_G_113x113.p
./aaa/C_A_G_8x8.p
./aaa/C_B_G.p
./aab/...
I tried many bash for loops with sed and the last one was the following :
IFS=$'\n'
for ofic in `find * -type d -name 'A'`; do
fic=`echo $ofic|sed -e 's/\/A$//'`
for ftr in `ls -b $ofic | grep -E '.png$'`; do
nfi=`echo $ftr|sed -e 's/(_\d+[x]\d+)?/_A\1/'`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
But yet with no success... This \1 does not get inserted in the $nfi...
This is the last one I tried, only working on 1 folder (which is a subfolder of a huge folder collection) and after over 60 minutes of unsuccessful trials, I'm here with you guys.
I modified your script so that it works for all your examples.
IFS=$'\n'
for ofic in ???/?; do
IFS=/ read fic fia <<<$ofic
for ftr in `ls -b $ofic | grep -E '\.p.*$'`; do
nfi=`echo $ftr|sed -e "s/_[0-9]*x[0-9]*/_$fia&/;t;s/\./_$fia./"`
echo mv \"$ofic/$ftr\" \"$fic/$nfi\"
done
done
# it's easier to change to here first
cd aaa
# process every file
for f in $(find . -type f); do
# strips everything after the first / so this is our foldername
foldername=${f/\/*/}
# creates the new filename from substrings of the
# original filename concatenated to the foldername
newfilename=".${f:1:3}${foldername}_${f:4}"
# if you are satisfied with the output, just leave out the `echo`
# from below
echo mv ${f} ${newfilename}
done
Might work for you.
See here in action. (slightly modified, as ideone.com handles STDIN/find diferently...)

Using a script to compare directories with modified file names?

I want to write a script that compares two directories.
However, the file names are modified in one of them.
So directory A contains files like HouseFile.txt, CouchFile.txt, ChairFile.txt
Directory B contains House.txt, Couch.txt, Chair.txt (which should be seen as 'equivalent' to the above)
Both may also contain new, completely different files.
Could someone point me in the right direction here? It's been a while since I've done scripting.
I have tried using diff, and I know I need to use some form of regexto compare the file names, but I am not sure where to start.
Thank you!
Added for clarification:
Of course diff, however, just compares the actual file names. I would like to know how to specify that I regard files names such as, in the example, "HouseFile.txt" and "House.txt" as equivalent in this case
If I understand correctly, this is a possible solution to compare a to b:
mkdir a b ; touch a/HouseFile.txt a/ChairFile.txt a/CouchFile.txt a/SomeFile.txt b/House.txt b/Chair.txt b/Couch.txt b/Sofa.txt
for file in a/*(.); do [[ ! -f b/${${file##*/}:fs#File#} ]] && echo $file ; done
Outputs:
a/SomeFile.txt
What is not clear to me: Is the difference pattern strictly 'File' or any arbitrary string?
EDIT: The previous was for zsh. Here is one for bash:
find a -type f -maxdepth 1 | while read file; do
check=$(echo $file | sed -r -e 's#(.*)/(.*)#\2#' -e "s#File##") ;
[[ ! -f b/${check} ]] && echo $file
done
Using parameter expansion instead of sed:
find a -type f -maxdepth 1 | while read file; do
check=${file/%File.txt/.txt} #end of file name changed
check=${check/#*\//} #delete path before the first slash
[[ ! -f b/${check} ]] && echo $file
done