Regex and shell - multiple recursive rename - regex

I have a folder with several hundreds of folders inside it. These folders contain another folder each, called images, and in this folder there is sometimes a strictly numerically named .jpg file. Sometimes there are other JPG files in the folder as well, but these need to be ignored if they aren't strictly numeric.
I would like to learn how to write a script which would, when run in a given folder, traverse every single subfolder and look for this numeric file. It would then add the "_n" suffix to a copy of each, if such a file does not already exist.
Can this be done through the unix terminal easily?
To be more specific, this is the structure I'm dealing with:
master folder
18556
images
2234.jpg
47772
images
2234.jpg
2234_n.jpg
some_pic.jpg
77377
images
88723
images
22.jpg
some_pic.jpg
After the script is run, the situation would look like this:
master folder
18556
images
2234.jpg
2234_n.jpg
47772
images
2234.jpg
2234_n.jpg
some_pic.jpg
77377
images
88723
images
22.jpg
22_n.jpg
some_pic.jpg
Update: Sorry about the typo, I accidentally put 2235 into 47772.
Update 2: Regarding the 2nd comment on the mathematical.coffee's answer, the OS I am currently on (at work) is MacOS, but my main machines are running CentOS and Ubuntu at home, so I just assumed my situation applies to all unix based systems.

You can use the -regex switch to find to match /somefolder/images/numeric.jpg:
find -type f -regex './[^/]+/images/[0-9]+\.jpg$'
Edit: refinement from #JonathanLeffler: add -type f to find so it only finds files (ie don't match a directory called '12345.jpg').
The ./[^/]+/ is for the first folder (if that first folder is always numeric too you can change it to [0-9]+).
The [0-9]+\.jpg$ means a jpg file with file name only being numeric.
You might want to change the jpg to jpe?g to allow .jpeg, but that's up to you.
Then it's a matter of copying these to xxx_n.jpg.
for f in $(find -type f -regex './[^/]+/images/[0-9]+\.jpg$')
do
# replace '.jpg' in $f (filename) with '_n.jpg'
newf=${f/\.jpg/_n\.jpg}
# see if this new file exists
if [ ! -f $newf ];
then
# if not exists, copy it.
cp "$f" "$newf"
fi
done

What should be the logic behind the renames in Folder 47772? If we assume you want to rename all the files just consisting of numbers to numbers + _n
With mmv you could write it like:
mmv "[0-9][0-9]*.jpg" "#1#2#3_n.jpg"
Note: mmv is for moving; mcp is for copying, and so is more appropriate to this question.
Question of Vader:
Well I checked the man page and the problem is that it's a bit strange.
I was thinking [0-9]* would match zero or more numbers. I turns out that this assumption was wrong.
The problem is that I could not tell I want two or more numbers at the start of the name.
So [0-9][0-9]* matches a name starting with at least two numbers (after that it takes all the rest up to the .. Now every [0-9] is one pattern and so I had to make the to pattern into:
"#1#2#3_n.jpg" With e.g 1234.jpg I have #1 = 1; #2 = 2, #3 = 34 So
#1#2#3 -> 1234; _n appends the _n and .jpg the extension
However it would rename also files with 12some_other_stuff.jpg sot 12some_other_stuff_n.jpg. It's not ideal but achieves in this context what was intended.

Related

Batch rename files with regex not working

I've got many files on a linux server which have this format
text_text_mixturelettersnumbers.filefor example Hesperocyparis_goveniana_E00196073A.bam.baior Hesperocyparis_forbesii_RBGEH19_bwa_out.txt. I would like to change the first underscore to a hyphen and leave everything else so it looks like this text-text_mixturelettersnumbers.file.
I have tried rename -n 's/(\w+)_(\w+_.)/$1-$2/' * and many different versions thereof but nothing is happening. Could someone please point out what I've got wrong?
Thanks
Markus
The util-linux rename does not have an option to display the results only. It is very basic.
If you want to list the files that contain two underscores before an extension, use
for f in *_*_*.*; do
echo "$f => ${f/_/-}";
done
To actually rename, use mv:
for f in *_*_*.*; do
mv -- "$f" "${f/_/-}";
done
The "${f/_/-}" replaces the first _ with - in variable f.

Rename multiple files with different names to same name and different numbers

I have multiple pictures of trucks with random messy names and different formats (jpeg, jpg, png etc.) and I want to rename them to "truck1.jpeg", "truck2.jpg", "truck3.png" and so on. How do I do it using the rename command?
It's probably easier to use bash and mv, since AFAIK you need something like bash to generate the number sequence. In bash
i=1
for x in *; do
echo $x '->' truck$i.${x##*.}
mv "$x" truck$i.${x##*.} && i=$((i+1))
done
The for x in * operates on all files whose names do not begin with a dot and are in the current directory. You can adjust the glob to be more exclusive, but this script will need modification if the files are in other directories. Again, probably easier to collect the files in one directory, or maybe put it in a script file and execute it in multiple directories using find ... -exec.
This uses i as a counter to generate the digits. The trick is the ${x##*.} expression which takes the file name and deletes everything up to the final dot. This allows you to preserve and reattach the file extension to the new name. You have to be careful to set i correctly or you will overwrite old truck1 files with new ones.

Finding files using regular expressions/wildcards

Within a particular directory, I have a series of files that are labelled sequentially:image0000.png, image0001.png, image0002.png, etc.. They are labelled by number, but I don't necessarily know how many preceding zeroes there are in the filename, i.e. whether it will be image0001.png or image00001.png.
Within a bash script, I wish to find a single file at a time (over a for loop), and then apply some processing to the file. This search could start at zero and end before I've reached the end, or could be of varying steps. To expand, I could want to find image0000.png, image0001.png, image0002.png and so forth, or I could start at image0010.png and find every other file, i.e. the next two would be image0012.png and image0014.png.
To try and find the first file (image0000.png), I've tried using find and ls, with the following outputs:
$ find video/figs/ -name 'image*[0]0.png'
video/figs/image00100.png
video/figs/image00000.png
$ ls video/figs/image*[0]0.png
-rw-r--r-- 1 user machine 165K Feb 19 09:06 video/figs/image00000.png
-rw-r--r-- 1 user machine 207K Feb 19 09:06 video/figs/image00100.png
Similar results occur for finding the second (i.e., find video/figs/ -name 'image*[0]0.png' finds image00101.png and image00001.png. So it's finding the file I want (image00001.png), but is also finding one that I don't (image00101.jpg). Can anyone help me understand why, and fix it?
I would use ls and grep for that:
ls | grep -oP 0*[1-9]+.png
Example:
$:/tmp/test$ ls
00001.png 00002.png 00010.png 00013.png 00201.png
$:/tmp/test$ ls | grep -oP 0*[1-9]+.png
00001.png
00002.png
00013.png
01.png
I suspect you don't want to dive into subdirectories, and find files, sorted by number, spread over subdirs.
So find isn't necessary.
ls image*{08..10}.png
image00010.png image0008.png image0009.png image0010.png image008.png image009.png
Part 2 of your question, only find every other file:
ls image*{08..10..2}.png
image00010.png image0008.png image0010.png image008.png
Maybe you know for-loops. It's like that,
for (i in 8 to 10 by 2)
or
for (int i=8; i <= 10; i+=2)
Restricting the search to find image image00010.png but not imageAB010.png wouldn't work.
The reason to exclude 101 is still unclear. Maybe it's only a sorting thing.
With directories, which aren't the PWD, there is no big difference:
ls video/figs/image*{08..10..2}.png
Note, that instead of ls, you use just the program, you want to process on the files, if the program is able to handle more than one file at a time, like ls.
Sincere thanks to everyone who contributed an answer - perhaps I explained it poorly, or I was too wedded to the code I'd already written to use any of the provided answers. However, I've found the following solutions:
1) Why did I find more answers than I expected?
find video/figs/ -name 'image*[0]0.png' uses very limited comprehension of wildcards, and thus the above was interpreted as finding a file with name image<wildcard>00.png. There is no way, using the -name option, to restrict the application of * to match only a given character (in this case, only find zero or more matches to 0.
2) How do I find the image files with an unknown number of padding zeroes?
The following is a MWE from my final code. It demonstrates how to search within a given directory SEARCH_DIR (not necessarily including subdirectories, but I haven't checked)
f1=0 # Starting number
f2=10 # End number
df=2 # number to skip between images
for ((f=$f1; f<=$f2; f=$f+$df)); do
export iFile=$(find $SEARCH_DIR -regex '.*/image0*'$f'.png')
done
The export ensures the variable is available to sub-processes, with the iFile=$() syntax allowing me to export the result of the command to the variable iFile. The bit within the parentheses is the bit I was looking for: find $SEARCH_DIR -regex '.*/image[0]*'$f'.png'
a) find $SEARCH_DIR specifies the location for the search
b) -regex specifies to use regular expressions, which are more powerful than standard bash scripting and allow me to limit wildcards as required
c) '.*/image0*'$f'.png': The regular expression search looks over the entire string, so apparently I need the initial .*/ to perform the match. The 0* now performs as I originally wanted - the * wildcard is now searching for zero or more matches of the preceding term, which here is 0 (so if I wanted to search for zero or more matches of any digit, I would use [0-9]*). The $f term is to search for the numbered file in the for loop.

rsync --exclude-from 'list' file not working

I am trying to use rsync to complete an unfinished transfer from a remote server to a local machine using
rsync -a user#domain.com:~/source/ /dest/
where /dest/ is the location of the partially completed transfer. However, due to bandwidth concerns I need to run rsync to a /tmp_dest/ on a different machine that does not have a copy of /dest/, from where I can then later move /tmp_dest/ to /dest/
The solution I have come up with thus far is to use rync's --exclude-from option, using a file containing a complete list of files from /dest/.
The command would look something like this
rsync -a --exclude-from 'list.txt' user#domain.com:~/source/ /tmp_dest/
At this point I feel as though I have scoured everywhere for a solution and tried every variant I came across.
This included relative and absolute paths for the 'list.txt'
relative:
path 1/file 1
path 2/file 2
--or--
absolute:
/absolute/source/path 1/file 1
/absolute/source/path 2/file 2
I have tried the above with combinations of including - to explicitly exclude that line (where I have seen examples of people wanting to also + other files)
- /absolute/source/path 1/file 1
- /absolute/source/path 2/file 2
I have tried putting leading **/ in front of the file paths to rectify the relative path problem
**/path 1/file 1
**/path 2/file 2
I have also tried navigating to the directory containing 'list' and executing rsync from there, to avoid the issue where rsync looks for
/path/to/the/list/something1/to.exclude
/path/to/the/list/something2/to.exclude
/path/to/the/list/something3/to.exclude
and undoubtedly finding nothing
I have also ensued that the correct line breaks are being used in the 'list' file. i.e. LF (Unix) line breaks.
I have tried to create the 'list' with the following command
find . -type f | tee list.txt
this initially created a file looking something like this
./yyyy-mm-dd folder 1/sub folder [foo]/file.a
./(yyyy) folder 2 {foo2}/file.b
./folder, 3/sub-folder 3/file.c
as you can see, there are spaces and other characters in the file paths, but from my current understanding, this shouldn't affect. But perhaps I am mistaken and will need to escape any characters with special meaning, which I may then need help with
which I then perform a replace on ./ in notepad++ or some other text editor that preserves the LF (Unix) line breaks to get the desired result.
(e.g. as above, I've tried replacing ./ with nothing, with /absolute/path/for/source/ noting the leading slash, or even double wildcards to match any parent tree structure containing the files.
The only thing I feel that I haven't tried is escaping the spaces in the file names and paths, but I have read that this shouldn't be an issue.
Perhaps I am overlooking something and any help would be appreciated.
Here is from rsync man page how to use "--exclude-from":
--exclude-from=FILE read exclude patterns from FILE
Use the following command:
rsync -a --exclude-from=list.txt user#domain.com:~/source/ /tmp_dest/
And also it is better to use full path name of list.txt file

Mass rename in shell script

I have a bunch of files which are of this format:
blabla.log.YYYY.MM.DD
Where YYYY.MM.DD is something like (2016.01.18)
I have quite a few folders with about 1000 files in each, so I wanted to have a simple script to rename them. I want to rename them to
blabla.log
So basically, I'm just stripping the date at the end. Here is what I have:
for f in [a-zA-Z]*.log.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]; do
mv -v $f ${f#[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]};
done
This script outputs this:
mv: `blabla.log.2016.01.18' and `blabla.log.2016.01.18' are the same file
For more information:
I'm on windows, but I run this script in gitbash
For some reason, my gitbash doesn't recognize the "rename" command
Some regex patterns (like [0-9]{4} don't seem to work)
I'm really at a lost. Thanks.
EDIT: I need to rename every single file that has a date at the end and that is of the from: *.log.2016.01.18. They all need to keep their original names. All that should change is the removal of the date.
You have to use % instead of #: you want to remove from the end, not the start of your string.
Also, you're missing a . in what has to be removed, you don't want to end up with blabla.log..
Quoting the variable names prevents surprises when file names contain special characters.
Together:
mv -v "$f" "${f%.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]}"