sed with capturing group - regex

I have strings like below
VIN_oFDCAN8_8d836e25_In_data;
IPC_FD_1_oFDCAN8_8d836e25_In_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_data
I want to insert _Moto in between as below
VIN_oFDCAN8_8d836e25_In_Moto_data
IPC_FD_1_oFDCAN8_8d836e25_In_Moto_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_Moto_data
But when I used sed with capturing group as below
echo VIN_oFDCAN8_8d836e25_In_data | sed 's/_In_*\(_data\)/_Moto_\1/'
I get output as:
VIN_oFDCAN8_8d836e25_Moto__data
Can you please point me to right direction?

Though you could use simple substitution of IN string(considering that it is present only 1 time in your Input_file) but since your have asked specifically for capturing style in sed, you could try following then.
sed 's/\(.*_In\)\(.*\)/\1_Moto\2/g' Input_file
Also above will add string _Moto to avoid adding 2 times _ after Moto confusion, Thanks to #Bodo for mentioning same in comments.
Issue with OP's attempt: Since you are NOT keeping _In_* in memory of sed so it is taking \(_data_\) only as first thing in memory, that is the reason it is not working, I have fixed it in above, we need to keep everything till _IN in memory too and then it will fly.

$ sed 's/_[^_]*$/_Moto&/' file
VIN_oFDCAN8_8d836e25_In_Moto_data
IPC_FD_1_oFDCAN8_8d836e25_In_Moto_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_Moto_data

In your case, you can directly replace the matching string with below command
echo VIN_oFDCAN8_8d836e25_In_data | sed 's/_In_data/_In_Moto_data/'

Related

sed and regular expression: unexpected replacement pattern

I am trying to use a small bash script using sed to append a string, but I do not seem to be able to make it work.
I want to append a string to another string pattern:
Strings in input file:
Xabc
Xdef
Desired output:
XabcZ
XdefZ
Here is the script:
#!/bin/bash
instring="$2"
sed -r "s/${instring}/${instring}Z/g" $1
Where $1 is the file name and $2 is the string pattern I am looking for
Then I run the script:
bash script.test.sh test.txt X
output:
XZabc
XZdef
As expected.
but if I use regular expressions:
bash script.test.sh test.txt X...
All I get is:
X...Z
X...Z
So obviously it is not reading it correctly in the replacement part of the command. Smae thing if I use X[a-z09] (but there may be "_" in my strings, I want to include those as well). I had a look at several previous similar topics, but I do not seem able to implement any of the solutions correctly (bear with a newbie...). Thank you for your kind help.
EDIT: After receiving the answers from Glenn Jackman (accepted solution) and RavinderSingh13, I would like to clarify two important points for whoever is having a similar issue:
1) Glenn Jackman solution did not work because I needed to convert the text file from DOS to Unix. I did it with dos2unix , but for some reason did not work (maybe forgot to overwrite the output to the old file?). I later did it using sed -i 's/\r$//' test.txt ; that solved the issue, and Glenn's solution now works. having a dos-formatted text file has been the source of many trouble, for me at least.
2) I probably did not make clear that I only wanted to target specific lines in the input files; my example only has target strings, but the actual file has strings that I do not want to edit. That was probably the misunderstanding occurred with RavinderSingh13's script, which actually works, but targets every single line.
Hope this can help future readers. Thank you, Stackers, you saved the day once again :)
What you have (sed -r "s/${instring}/${instring}Z/g" $1) uses the variable as a pattern on the left-hand side and as plain text on the right-hand side.
What you want to do is:
sed -r "s/${instring}/&Z/g" $1
# ....................^
where the & marker is replaced by whatever text the pattern matched. In the documentation for The s Command:
[T]he replacement can contain unescaped & characters which reference the whole matched portion of the pattern space.
EDIT: In case you need to pass a regex to script then following may help, where my previous solution was only appending a character to last of the line.
cat script.ksh
value="$2"
sed "s/$value/&Z/" "$1"
After running the script:
./script.ksh X.*
XabcZ
XdefZ
After seeing OP's comment to match everything which starts from either small letter or capital letter run script in following style then.
./script.ksh [A-Za-z]+*
Could you please try following and let me know if this helps you.
cat script.ksh
value="$2"
sed "s/$/$value/" "$1"
After running script I am getting following output on terminal too.
./script.ksh Input_file Z
XabcZ
XdefZ
You could use sed -i option in above code in case you want to save output into Input_file itself too.

Replace last occurrence of space with sed

I need to replace the last occurrence of space in an input file, using sed.
What I came up with is
sed "s/([ ])[0-9]*$/,/g"
However, it does not seem to want to remember the space which it's supposed to replace. Running the command without round brackets works fine (for what it's supposed to do - replace the space and the chain of numbers). When I add the brackets, it does nothing.
Yes, I am aware of this solution, however when trying to pass \1 to sed, it screams that "\1 not defined in the RE".
Anyone care to help? It seems to be a simple issue, I'd be glad to know the solution.
This seemed to work "the first time" (yay) ...
$ sed -e 's/ \([^ ][^ ]*\)$/,\1/' /etc/hosts

Convert last hyphen in filename using BASH

I've been tasked with a major file rename project. Some of these files that I'll be renaming contain multiple hyphens. I need to swap the last hyphen in the name to an underscore in order for the files to be renamed to our new naming convention.
Can anyone explain to me why the last hyphen not being replaced with an underscore in the test code below?
#!/bin/bash
image_name="i-need-the-last-hyphen-removed.psd"
echo -e "Normal: ${image_name}"
echo "Changed: ${image_name/%-/_}"
The output I am looking for should mimic the following:
Normal: i-need-the-last-hyphen-removed.psd
Changed: i-need-the-last-hyphen_removed.psd
The script logic was created by following documentation found here: http://tldp.org/LDP/abs/html/string-manipulation.html
I've tried escaping the hypen but that was not fruitful. I've given up, this will prove to be the most elegant solution versus using SED and/or BASH_REMATCH solutons I was working with in the past.
Any help would be great. Thank you in advance.
I'll suggest using rename tool for this kind of tasks. It's sed pattern similar.
rename 's/(.*)-/$1_/' *.psd
Since .* is greedy, that way last '-' will be catched, where (.*) is captured in group. Right part will not be changed.
With *.psd you will catch all psd files in current folder
Huge thanks to #alex-p for the following suggestion. As I originally stated I didn't want to use SED or BASH_REMATCH or any other complex REGEX. This worked flawlessly.
echo "${image_name%-}_${image_name##-}"
You can do it using sed as:
sed -r "s/(.*)-(.*)/\1_\2/"
This will have two captured group (1. before last - 2. after last -)which will be concatenated with _
Or
sed -r "s/-([^-]*$)/_\1/"
This will have one captured group which will replace - with _ and then the captured group will be concatenated at last.
${image_name/%-/_}" would only work if the - was the very termination/suffix of the $image_name (like e.g. in mystring-).
Try using sed:
$> echo i-need-the-last-hyphen-removed.psd | sed -r 's/-([^-]*$)/_\1/'
i-need-the-last-hyphen_removed.psd

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

Repeating a regex pattern

First, I don't know if this is actually possible but what I want to do is repeat a regex pattern.
The pattern I'm using is:
sed 's/[^-\t]*\t[^-\t]*\t\([^-\t]*\).*/\1/' films.txt
An input of
250. 7.9 Shutter Island (2010) 110,675
Will return:
Shutter Island (2010)
I'm matching all none tabs, (250.) then tab, then all none tabs (7.9) then tab. Next I backrefrence the film title then matching all remaining chars (110,675).
It works fine, but im learning regex and this looks ugly, the regex [^-\t]*\t is repeated just after itself, is there anyway to repeat this like you can a character like a{2,2}?
I've tried ([^-\t]*\t){2,2} (and variations) but I'm guessing that is trying to match [^-\t]*\t\t?
Also if there is any way to make my above code shorter and cleaner any help would be greatly appreciated.
This works for me:
sed 's/\([^\t]*\t\)\{2\}\([^\t]*\).*/\2/' films.txt
If your sed supports -r you can get rid of most of the escaping:
sed -r 's/([^\t]*\t){2}([^\t]*).*/\2/' films.txt
Change the first 2 to select different fields (0-3).
This will also work:
sed 's/[^\t]\+/\n&/3;s/.*\n//;s/\t.*//' films.txt
Change the 3 to select different fields (1-4).
To use repeating curly brackets and grouping brackets with sed properly, you may have to escape it with backslashes like
sed 's/\([^-\t]*\t\)\{3\}.*/\1/' films.txt
Yes, this command will work properly with your example.
If you feel annoyed to, you can choose to put -r option which enables regex extended mode and forget about backslash escapes on brackets.
sed -r 's/([^-\t]*\t){3}.*/\1/' films.txt
Found that this is almost the same as Dennis Williamson's answer, but I'm leaving it because it's shorter expression to do the same.
I think you might be going about this the wrong way. If you're simply wanting to extract the name of the film, and it's release year, then you could try this regex:
(?:\t)[\w ()]+(?:\t)
As seen in place here:
http://regexr.com?2sd3a
Note that it matches a tab character at the beginning and end of the actual desired string, but doesn't include them in the matching group.
You can repeat things by putting them in parenthesis, like this:
([^-\t]*\t){2,2}
And the full pattern to match the title would be this:
([^-\t]*\t){2,2}([^-\t]+).*
You said you tried it. I'm not sure what is different, but the above worked for me on your sample data.
why are you doing things the hard way??
$ awk '{$1=$2=$NF=""}1' file
Shutter Island (2010)
If this is a tab separated file with a regular format I'd use cut instead of sed
cut -d' ' -f3 films.txt
Note there's a single tab between the quotes after the -d which can be typed at the shell prompt by typing ctrl+v first, i.e. ctrl+v ctrl+i