Batch rename files with regex not working - regex

I've got many files on a linux server which have this format
text_text_mixturelettersnumbers.filefor example Hesperocyparis_goveniana_E00196073A.bam.baior Hesperocyparis_forbesii_RBGEH19_bwa_out.txt. I would like to change the first underscore to a hyphen and leave everything else so it looks like this text-text_mixturelettersnumbers.file.
I have tried rename -n 's/(\w+)_(\w+_.)/$1-$2/' * and many different versions thereof but nothing is happening. Could someone please point out what I've got wrong?
Thanks
Markus

The util-linux rename does not have an option to display the results only. It is very basic.
If you want to list the files that contain two underscores before an extension, use
for f in *_*_*.*; do
echo "$f => ${f/_/-}";
done
To actually rename, use mv:
for f in *_*_*.*; do
mv -- "$f" "${f/_/-}";
done
The "${f/_/-}" replaces the first _ with - in variable f.

Related

Match the string using regex

I have few files like 123.iso, 234.isoaa, 456.isoab, sajdhsjf.isoaf.
I want to extract all the files except those that end with exactly .iso.
For example, I should have 234.isoaa, 456.isoab, sajdhsjf.isoaf.
Assuming you meant "all the files with suffix beginning with .iso except those...", this works:
ls -1 | egrep "\.iso.+"
Try this :
\b[\w](?!.iso\b).[\w]\b
As Tim Pietzker noted, you didn't say for which shell you need a solution, but in zsh you could do
setopt local_options extended_glob
echo *^*.iso(N)
If you are happy to get only files which have .isoX at the end (with any X), this should work in bash, zsh and ksh:
echo *.iso?*
Note that this second solution - different to the first one - would not list files such as abc.txt.
Of course you can do a ls -1 instead of the echo. This depends on how you what you want to do with the result.

Mass rename in shell script

I have a bunch of files which are of this format:
blabla.log.YYYY.MM.DD
Where YYYY.MM.DD is something like (2016.01.18)
I have quite a few folders with about 1000 files in each, so I wanted to have a simple script to rename them. I want to rename them to
blabla.log
So basically, I'm just stripping the date at the end. Here is what I have:
for f in [a-zA-Z]*.log.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]; do
mv -v $f ${f#[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]};
done
This script outputs this:
mv: `blabla.log.2016.01.18' and `blabla.log.2016.01.18' are the same file
For more information:
I'm on windows, but I run this script in gitbash
For some reason, my gitbash doesn't recognize the "rename" command
Some regex patterns (like [0-9]{4} don't seem to work)
I'm really at a lost. Thanks.
EDIT: I need to rename every single file that has a date at the end and that is of the from: *.log.2016.01.18. They all need to keep their original names. All that should change is the removal of the date.
You have to use % instead of #: you want to remove from the end, not the start of your string.
Also, you're missing a . in what has to be removed, you don't want to end up with blabla.log..
Quoting the variable names prevents surprises when file names contain special characters.
Together:
mv -v "$f" "${f%.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]}"

bulk file renaming in bash, to remove name with spaces, leaving trailing digits

Can a bash/shell expert help me in this? Each time I use PDF to split large pdf file (say its name is X.pdf) into separate pages, where each page is one pdf file, it creates files with this pattern
"X 1.pdf"
"X 2.pdf"
"X 3.pdf" etc...
The file name "X" above is the original file name, which can be anything. It then adds one space after the name, then the page number. Page numbers always start from 1 and up to how many pages. There is no option in adobe PDF to change this.
I need to run a shell command to simply remove/strip out all the "X " part, and just leave the digits, like this
1.pdf
2.pdf
3.pdf
....
100.pdf ...etc..
Not being good in pattern matching, not sure what regular expression I need.
I know I need something like
for i in *.pdf; do mv "$i$" ........; done
And it is the ....... part I do not know how to do.
This only needs to run on Linux/Unix system.
Use sed..
for i in *.pdf; do mv "$i" $(sed 's/.*[[:blank:]]//' <<< "$i"); done
And it would be simple through rename
rename 's/.*\s//' *.pdf
You can remove everything up to (including) the last space in the variable with this:
${i##* }
That's "star space" after the double hash, meaning "anything followed by space". ${i#* } would remove up to the first space.
So run this to check:
for i in *.pdf; do echo mv -i -- "$i" "${i##* }" ; done
and remove the echo if it looks good. The -i suggested by Gordon Davisson will prompt you before overwriting, and -- signifies end of options, which prevents things from blowing up if you ever have filenames starting with -.
If you just want to do bulk renaming of files (or directories) and don't mind using external tools, then here's mine: rnm
The command to do what you want would be:
rnm -rs '/.*\s//' *.pdf
.*\s selects the part before (and with) the last white space and replaces it with empty string.
Note:
It doesn't overwrite any existing files (throws warning if it finds an existing file with the target name).
And this operation is failsafe. You can get back the changes made by last rnm command with rnm -u.
Here's a list of documents for rnm.

Regex and shell - multiple recursive rename

I have a folder with several hundreds of folders inside it. These folders contain another folder each, called images, and in this folder there is sometimes a strictly numerically named .jpg file. Sometimes there are other JPG files in the folder as well, but these need to be ignored if they aren't strictly numeric.
I would like to learn how to write a script which would, when run in a given folder, traverse every single subfolder and look for this numeric file. It would then add the "_n" suffix to a copy of each, if such a file does not already exist.
Can this be done through the unix terminal easily?
To be more specific, this is the structure I'm dealing with:
master folder
18556
images
2234.jpg
47772
images
2234.jpg
2234_n.jpg
some_pic.jpg
77377
images
88723
images
22.jpg
some_pic.jpg
After the script is run, the situation would look like this:
master folder
18556
images
2234.jpg
2234_n.jpg
47772
images
2234.jpg
2234_n.jpg
some_pic.jpg
77377
images
88723
images
22.jpg
22_n.jpg
some_pic.jpg
Update: Sorry about the typo, I accidentally put 2235 into 47772.
Update 2: Regarding the 2nd comment on the mathematical.coffee's answer, the OS I am currently on (at work) is MacOS, but my main machines are running CentOS and Ubuntu at home, so I just assumed my situation applies to all unix based systems.
You can use the -regex switch to find to match /somefolder/images/numeric.jpg:
find -type f -regex './[^/]+/images/[0-9]+\.jpg$'
Edit: refinement from #JonathanLeffler: add -type f to find so it only finds files (ie don't match a directory called '12345.jpg').
The ./[^/]+/ is for the first folder (if that first folder is always numeric too you can change it to [0-9]+).
The [0-9]+\.jpg$ means a jpg file with file name only being numeric.
You might want to change the jpg to jpe?g to allow .jpeg, but that's up to you.
Then it's a matter of copying these to xxx_n.jpg.
for f in $(find -type f -regex './[^/]+/images/[0-9]+\.jpg$')
do
# replace '.jpg' in $f (filename) with '_n.jpg'
newf=${f/\.jpg/_n\.jpg}
# see if this new file exists
if [ ! -f $newf ];
then
# if not exists, copy it.
cp "$f" "$newf"
fi
done
What should be the logic behind the renames in Folder 47772? If we assume you want to rename all the files just consisting of numbers to numbers + _n
With mmv you could write it like:
mmv "[0-9][0-9]*.jpg" "#1#2#3_n.jpg"
Note: mmv is for moving; mcp is for copying, and so is more appropriate to this question.
Question of Vader:
Well I checked the man page and the problem is that it's a bit strange.
I was thinking [0-9]* would match zero or more numbers. I turns out that this assumption was wrong.
The problem is that I could not tell I want two or more numbers at the start of the name.
So [0-9][0-9]* matches a name starting with at least two numbers (after that it takes all the rest up to the .. Now every [0-9] is one pattern and so I had to make the to pattern into:
"#1#2#3_n.jpg" With e.g 1234.jpg I have #1 = 1; #2 = 2, #3 = 34 So
#1#2#3 -> 1234; _n appends the _n and .jpg the extension
However it would rename also files with 12some_other_stuff.jpg sot 12some_other_stuff_n.jpg. It's not ideal but achieves in this context what was intended.

Apply regular expression substitution globally to many files with a script

I want to apply a certain regular expression substitution globally to about 40 Javascript files in and under a directory. I'm a vim user, but doing this by hand can be tedious and error-prone, so I'd like to automate it with a script.
I tried sed, but handling more than one line at a time is awkward, especially if there is no limit to how many lines the pattern might match.
I also tried this script (on a single file, for testing):
ex $1 <<EOF
gs/,\(\_\s*[\]})]\)/\1/
EOF
The pattern will eliminate a trailing comma in any Perl/Ruby-style list, so that "[a, b, c,]" will come out as "[a, b, c]" in order to satisfy Internet Explorer, which alone among browsers, chokes on such lists.
The pattern works beautifully in vim but does nothing if I run it in ex, as per the above script.
Can anyone see what I might be missing?
You asked for a script, but you mentioned that you are vim user. I tend to do project-wide find and replace inside of vim, like so:
:args **/*.js | argdo %s/,\(\_\s*[\]})]\)/\1/ge | update
This is very similar to the :bufdo solution mentioned by another commenter, but it will use your args list rather than your buflist (and thus doesn't require a brand new vim session nor for you to be careful about closing buffers you don't want touched).
:args **/*.js - sets your arglist to contain all .js files in this directory and subdirectories
| - pipe is vim's command separator, letting us have multiple commands on one line
:argdo - run the following command(s) on all arguments. it will "swallow" subsequent pipes
% - a range representing the whole file
:s - substitute command, which you already know about
:s_flags, ge - global (substitute as many times per line as possible) and suppress errors (i.e. "No match")
| - this pipe is "swallowed" by the :argdo, so the following command also operates once per argument
:update - like :write but only when the buffer has been modified
This pattern will obviously work for any vim command which you want to run on multiple files, so it's a handy one to keep in mind. For example, I like to use it to remove trailing whitespace (%s/\s\+$//), set uniform line-endings (set ff=unix) or file encoding (set filencoding=utf8), and retab my files.
1) Open all the files with vim:
bash$ vim $(find . -name '*.js')
2) Apply substitute command to all files:
:bufdo %s/,\(\_\s*[\]})]\)/\1/ge
3) Save all the files and quit:
:wall
:q
I think you'll need to recheck your search pattern, it doesn't look right. I think where you have \_\s* you should have \_s* instead.
Edit: You should also use the /ge options for the :s... command (I've added these above).
You can automate the actions of both vi and ex by passing the argument +'command' from the command line, which enables them to be used as text filters.
In your situation, the following command should work fine:
find /path/to/dir -name '*.js' | xargs ex +'%s/,\(\_\s*[\]})]\)/\1/g' +'wq!'
you can use a combination of the find command and sed
find /path -type f -iname "*.js" -exec sed -i.bak 's/,[ \t]*]/]/' "{}" +;
If you are on windows, Notepad++ allows you to run simple regexes on all opened files.
Search for ,\s*\] and replace with ]
should work for the type of lists you describe.