Convert last hyphen in filename using BASH - regex

I've been tasked with a major file rename project. Some of these files that I'll be renaming contain multiple hyphens. I need to swap the last hyphen in the name to an underscore in order for the files to be renamed to our new naming convention.
Can anyone explain to me why the last hyphen not being replaced with an underscore in the test code below?
#!/bin/bash
image_name="i-need-the-last-hyphen-removed.psd"
echo -e "Normal: ${image_name}"
echo "Changed: ${image_name/%-/_}"
The output I am looking for should mimic the following:
Normal: i-need-the-last-hyphen-removed.psd
Changed: i-need-the-last-hyphen_removed.psd
The script logic was created by following documentation found here: http://tldp.org/LDP/abs/html/string-manipulation.html
I've tried escaping the hypen but that was not fruitful. I've given up, this will prove to be the most elegant solution versus using SED and/or BASH_REMATCH solutons I was working with in the past.
Any help would be great. Thank you in advance.

I'll suggest using rename tool for this kind of tasks. It's sed pattern similar.
rename 's/(.*)-/$1_/' *.psd
Since .* is greedy, that way last '-' will be catched, where (.*) is captured in group. Right part will not be changed.
With *.psd you will catch all psd files in current folder

Huge thanks to #alex-p for the following suggestion. As I originally stated I didn't want to use SED or BASH_REMATCH or any other complex REGEX. This worked flawlessly.
echo "${image_name%-}_${image_name##-}"

You can do it using sed as:
sed -r "s/(.*)-(.*)/\1_\2/"
This will have two captured group (1. before last - 2. after last -)which will be concatenated with _
Or
sed -r "s/-([^-]*$)/_\1/"
This will have one captured group which will replace - with _ and then the captured group will be concatenated at last.

${image_name/%-/_}" would only work if the - was the very termination/suffix of the $image_name (like e.g. in mystring-).
Try using sed:
$> echo i-need-the-last-hyphen-removed.psd | sed -r 's/-([^-]*$)/_\1/'
i-need-the-last-hyphen_removed.psd

Related

sed with capturing group

I have strings like below
VIN_oFDCAN8_8d836e25_In_data;
IPC_FD_1_oFDCAN8_8d836e25_In_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_data
I want to insert _Moto in between as below
VIN_oFDCAN8_8d836e25_In_Moto_data
IPC_FD_1_oFDCAN8_8d836e25_In_Moto_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_Moto_data
But when I used sed with capturing group as below
echo VIN_oFDCAN8_8d836e25_In_data | sed 's/_In_*\(_data\)/_Moto_\1/'
I get output as:
VIN_oFDCAN8_8d836e25_Moto__data
Can you please point me to right direction?
Though you could use simple substitution of IN string(considering that it is present only 1 time in your Input_file) but since your have asked specifically for capturing style in sed, you could try following then.
sed 's/\(.*_In\)\(.*\)/\1_Moto\2/g' Input_file
Also above will add string _Moto to avoid adding 2 times _ after Moto confusion, Thanks to #Bodo for mentioning same in comments.
Issue with OP's attempt: Since you are NOT keeping _In_* in memory of sed so it is taking \(_data_\) only as first thing in memory, that is the reason it is not working, I have fixed it in above, we need to keep everything till _IN in memory too and then it will fly.
$ sed 's/_[^_]*$/_Moto&/' file
VIN_oFDCAN8_8d836e25_In_Moto_data
IPC_FD_1_oFDCAN8_8d836e25_In_Moto_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_Moto_data
In your case, you can directly replace the matching string with below command
echo VIN_oFDCAN8_8d836e25_In_data | sed 's/_In_data/_In_Moto_data/'

Using sed with regex to replace text on OSX and Linux

I am trying to replace some strings inside a file with sed using Regular Expressions. To complicate the matter, this is being done inside a Makefile script that needs to work on both osx and linux.
Specifically, within file.tex I want to replace
\subimport{chapters/}{xxx}
with
\subimport{chapters/}{xxx-yyy}
(xxx and yyy are just example text.)
Note, xxx could contain any letters, numbers, and _ (underscore) but really the regex can simply match anything inside the brackets. Sometimes there is some whitespace at the beginning of the line before \subimport....
The design of the string being searched for requires a lot of escaping (when searched for with regex) and I am guessing somewhere therein lies my error.
Here's what I've tried so far:
sed -i'.bak' -e 's/\\subimport\{chapters\/\}\{xxx\}/\\subimport\{chapters\/\}\{xxx-yyy\}/g' file.tex
# the -i'.bak' is required so SED works on OSX and Linux
rm -f file.tex.bak # because of this, we have to delete the .bak files after
This results in an error of RE error: invalid repetition count(s) when I build my Makefile that contains this script.
I thought part of my problem was that the -E option for sed was not available in the osx version of sed. It turns out, when using the -E option, fewer things should be escaped (see comments on my question).
POSIX-ly:
sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#'
# is used as the parameter separator for sed's s (Substitution)
\(\\subimport{chapters/}{[[:alnum:]_]\+\) is the captured group, containing everything required upto last }, preceeded by one or more alphabetics, digits, and underscore
In the replacement, the first captured group is followed by the required string, closed by a }
Example:
$ sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#' <<<'\subimport{chapters/}{foobar9}'
\subimport{chapters/}{foobar9-yyy}
$ sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#' <<<'\subimport{chapters/}{spamegg923}'
\subimport{chapters/}{spamegg923-yyy}
Here's is the version that ended up working for me.
sed -i.bak -E 's#^([[:blank:]]*\\subimport{chapters/}{[[:alnum:]_]+)}$#\1-yyy}#' file.tex
rm -f file.tex.bak
Much thanks go to #heemayl. Their answer is the better written one, it simply required some tweaking to get a version that worked for me.

Using multicase regex with sed on jenkins

OK, so I'm making a choice parameterized Jenkins job. The choices for the parameters are DEV STAGING QA and PROD and they are stored in ${ENV}
I need to change the variable ${ENV} to match a string in a URL. I'm trying to do this with a sed command using regex. Is it possible?
I tested PROD|ING|(?<!Q)A as the regex in Expresso, and it finds the necessary portions, (A,ING,PROD) which would leave me with either DEV QA STG or `` as my variable value if I replaced them with '', then I'll add something onto the end of it.
When I try to run echo "DEVSTAGINGQAPROD" | sed "s/PROD|ING|(?<!Q)A//g" to remove those chars on CentOS it returns -bash: !Q: event not found. I want it to return DEVSTGQA
echo "DEVSTAGINGQAPROD" | sed "s/PROD|ING//g returns DEVSTAGQA as it should. The problem I seem to be having is the look behind, to only remove the A if it doesn't have a Q before it.
Any ideas how to make this work?
One problem here is that sed doesn't understand negative lookbehind. Another is your choice of quotes. History expansion is enabled by default in the shell, so ! has a special meaning and must be escaped inside double quotes.
To deal with the first problem, I'd suggest using Perl instead of sed, as it has a much more advanced regular expression engine. For the second, just use single quotes, within which the ! will be interpreted literally:
$ echo "DEVSTAGINGQAPROD" | perl -pe 's/PROD|ING|(?<!Q)A//g'
DEVSTGQA

Bash, find and replace - re-use with variable?

I'm building a script in bash that goes and finds references to other files (such as a reference in an html file to an img source (image.jpg)
The problem is that I'm using sed to replace all instances that contain (in this example) "/some/random/directory/image.jpg"
The "some/random/directory/image.jpg" is going to be differen every single time so when it comes to my sed line I need to use regex, but in order to find the line to replace I need to include image.jpg.
so for example my sed line would be something like
sed 's/\/some\/random\/directory\/image.jpg/images\/image.jpg/g'
But how do I get the end of whats in the find and put it into the replace? (In this example it would be image.jpg. Is there some way to make that a variable?
Here's my script as it stands now:
#!/bin/bash
cd /home/username/www/immrqbe/
for file in $(grep -rlI ".jpg" *)
do
sed -e "s/\".*\/.*.jpg//ig" $file > /tmp/tempfile.tmp
mv /tmp/tempfile.tmp /home/username/www/immrqbe/$file
done
This obviously isn't functional complete as I need help with it but you get the idea of how I'd like to have it complete.
What you're looking for is called a Backreference in the world of regular expressions. You want to refer back to a previously matched string.
There are a couple of ways to do this with sed, but what you want to use is the grouping mechanism: \( and \). Anything sed finds between \( and \) will be put into a group and you can refer back to that group using \n where n is the number of the group that you want to use, from left to right.
So, in your example, you want:
sed 's/".*\/\(.\+\.jpg\)"/\1/ig' file
Your filename will be in the \(.\+\.jpg\) group and you can then refer to it using \1 in the replacement section.
As a side note, notice that, as long as you don't want the shell to expand a variable in your quoted string, you can use single quotes and avoid escaping the double quotes in your pattern.
Use parentheses to capture the match and then refer to it using backslash.
sed -e 's/".*\/\(.*.jpg\)/\1/ig'

Repeating a regex pattern

First, I don't know if this is actually possible but what I want to do is repeat a regex pattern.
The pattern I'm using is:
sed 's/[^-\t]*\t[^-\t]*\t\([^-\t]*\).*/\1/' films.txt
An input of
250. 7.9 Shutter Island (2010) 110,675
Will return:
Shutter Island (2010)
I'm matching all none tabs, (250.) then tab, then all none tabs (7.9) then tab. Next I backrefrence the film title then matching all remaining chars (110,675).
It works fine, but im learning regex and this looks ugly, the regex [^-\t]*\t is repeated just after itself, is there anyway to repeat this like you can a character like a{2,2}?
I've tried ([^-\t]*\t){2,2} (and variations) but I'm guessing that is trying to match [^-\t]*\t\t?
Also if there is any way to make my above code shorter and cleaner any help would be greatly appreciated.
This works for me:
sed 's/\([^\t]*\t\)\{2\}\([^\t]*\).*/\2/' films.txt
If your sed supports -r you can get rid of most of the escaping:
sed -r 's/([^\t]*\t){2}([^\t]*).*/\2/' films.txt
Change the first 2 to select different fields (0-3).
This will also work:
sed 's/[^\t]\+/\n&/3;s/.*\n//;s/\t.*//' films.txt
Change the 3 to select different fields (1-4).
To use repeating curly brackets and grouping brackets with sed properly, you may have to escape it with backslashes like
sed 's/\([^-\t]*\t\)\{3\}.*/\1/' films.txt
Yes, this command will work properly with your example.
If you feel annoyed to, you can choose to put -r option which enables regex extended mode and forget about backslash escapes on brackets.
sed -r 's/([^-\t]*\t){3}.*/\1/' films.txt
Found that this is almost the same as Dennis Williamson's answer, but I'm leaving it because it's shorter expression to do the same.
I think you might be going about this the wrong way. If you're simply wanting to extract the name of the film, and it's release year, then you could try this regex:
(?:\t)[\w ()]+(?:\t)
As seen in place here:
http://regexr.com?2sd3a
Note that it matches a tab character at the beginning and end of the actual desired string, but doesn't include them in the matching group.
You can repeat things by putting them in parenthesis, like this:
([^-\t]*\t){2,2}
And the full pattern to match the title would be this:
([^-\t]*\t){2,2}([^-\t]+).*
You said you tried it. I'm not sure what is different, but the above worked for me on your sample data.
why are you doing things the hard way??
$ awk '{$1=$2=$NF=""}1' file
Shutter Island (2010)
If this is a tab separated file with a regular format I'd use cut instead of sed
cut -d' ' -f3 films.txt
Note there's a single tab between the quotes after the -d which can be typed at the shell prompt by typing ctrl+v first, i.e. ctrl+v ctrl+i