SED: remove pattern from two specific lines [duplicate] - regex

This question already has answers here:
How do I match multiple addresses in sed?
(4 answers)
Closed 7 years ago.
The following command will remove PATTERN from all the lines 13,14,15...29 containing it:
sed -i 13,29s/PATTERN// file
However, I want to remove PATTERN from only the 13th and 29th line. Obviously, I can use
sed -i 13s/PATTERN//;29s/PATTERN// file
but my pattern is long enough to make this inconvenient so I would like to specify the PATTERN only once. Any ideas? I've tried to search for an answer but found nothing.
Also, is there a valid reason why sed uses a comma instead of dash to match a range of lines? I find it illogical.
Thanks in advance.

use awk:
awk 'NR == 13 || NR == 29 { sub(/PATTERN/, "") } { print }' file
of course, you have to use awk compatible re here. https://www.gnu.org/software/gawk/manual/html_node/Regexp.html#Regexp
the first part achieves your requirement, and the second part just print everything out. you can use redirect to put things in another file, and then move over to the original place.

OK, the following is probably the closest thing to what could possibly be the answer to my question:
sed -i '13,29{14,28!s/PATTERN//}' file
A little longer answer with logical-or:
sed -i '13bA;29bA;b;:A;s/PATTERN//' file
Also, using a variable:
VAR=PATTERN
sed -i "13s/$VAR//;29s/$VAR//" file
Thanks to #Jeff Bowman for redirection, and #HuStmpHrrr for the advice to use a variable.

Related

Regex to extract string between two sets of underscores [duplicate]

This question already has answers here:
How to get String between last two underscore
(5 answers)
Closed 2 years ago.
I have been trying to extract a string for a directory path that contains multiple underscores as delimiters.
I'm trying on regex101 to extract foobar but can only get _pdf-documents_
regex
_([^_]+)_
directory path
/data/documents/2020/05/07/2020-05-07-12_pdf-documents_foobar_hour.abc.defg.log
If you work only with this string you can use this _([^_p]+)_
If awk is ok, then :
echo '/data/documents/2020/05/07/2020-05-07-12_pdf-documents_foobar_hour.abc.defg.log' |
awk -F'_' '{print $3}'
Output
foobar
Or like said Wiktor Stribiżew in comments, split like I do in another language, this is the most simple, maintainable, readable and reliable solution
This worked for me.
.*_([^_]+)_.*

sed and regular expression: unexpected replacement pattern

I am trying to use a small bash script using sed to append a string, but I do not seem to be able to make it work.
I want to append a string to another string pattern:
Strings in input file:
Xabc
Xdef
Desired output:
XabcZ
XdefZ
Here is the script:
#!/bin/bash
instring="$2"
sed -r "s/${instring}/${instring}Z/g" $1
Where $1 is the file name and $2 is the string pattern I am looking for
Then I run the script:
bash script.test.sh test.txt X
output:
XZabc
XZdef
As expected.
but if I use regular expressions:
bash script.test.sh test.txt X...
All I get is:
X...Z
X...Z
So obviously it is not reading it correctly in the replacement part of the command. Smae thing if I use X[a-z09] (but there may be "_" in my strings, I want to include those as well). I had a look at several previous similar topics, but I do not seem able to implement any of the solutions correctly (bear with a newbie...). Thank you for your kind help.
EDIT: After receiving the answers from Glenn Jackman (accepted solution) and RavinderSingh13, I would like to clarify two important points for whoever is having a similar issue:
1) Glenn Jackman solution did not work because I needed to convert the text file from DOS to Unix. I did it with dos2unix , but for some reason did not work (maybe forgot to overwrite the output to the old file?). I later did it using sed -i 's/\r$//' test.txt ; that solved the issue, and Glenn's solution now works. having a dos-formatted text file has been the source of many trouble, for me at least.
2) I probably did not make clear that I only wanted to target specific lines in the input files; my example only has target strings, but the actual file has strings that I do not want to edit. That was probably the misunderstanding occurred with RavinderSingh13's script, which actually works, but targets every single line.
Hope this can help future readers. Thank you, Stackers, you saved the day once again :)
What you have (sed -r "s/${instring}/${instring}Z/g" $1) uses the variable as a pattern on the left-hand side and as plain text on the right-hand side.
What you want to do is:
sed -r "s/${instring}/&Z/g" $1
# ....................^
where the & marker is replaced by whatever text the pattern matched. In the documentation for The s Command:
[T]he replacement can contain unescaped & characters which reference the whole matched portion of the pattern space.
EDIT: In case you need to pass a regex to script then following may help, where my previous solution was only appending a character to last of the line.
cat script.ksh
value="$2"
sed "s/$value/&Z/" "$1"
After running the script:
./script.ksh X.*
XabcZ
XdefZ
After seeing OP's comment to match everything which starts from either small letter or capital letter run script in following style then.
./script.ksh [A-Za-z]+*
Could you please try following and let me know if this helps you.
cat script.ksh
value="$2"
sed "s/$/$value/" "$1"
After running script I am getting following output on terminal too.
./script.ksh Input_file Z
XabcZ
XdefZ
You could use sed -i option in above code in case you want to save output into Input_file itself too.

How to use sed and/or regex to trim a line in a file using bash? [duplicate]

This question already has answers here:
Regex to extract first 3 words from a string
(3 answers)
Closed 5 years ago.
This seems like it should be simple, but I've spent far too much time searching. How can I use sed and regex to trim off all words in a line after the fourth word?
For instance from:
19900101, This is a title
19091110, This is a really long title
I would like to have
19900101, This is a
19091110, This is a
I've tried answers like this one Regex to extract first 3 words from a string, but I'm using Mac OSX, so I get context address errors.
This is easily done using cut:
cut -d ' ' -f 1-4 file
19900101, This is a
19091110, This is a
Or using awk:
awk '{NF=4} 1' file
19900101, This is a
19091110, This is a
This might work for you (GNU sed):
sed 's/\s*\S*//5g' file
Remove the fifth or more words from the line.

shell sed - substitute an unknown string between a known string and a generic delimiter

Ok, so I know there is a similar post that I referred to already but does not fit the exact issue I am having.
For reference: replace a unknown string between two known strings with sed
I have a file with software=setting:value,setting2:value,setting3:value, etc...
My first attempt was to use the reference above with sed -i "/software/ s/setting:.*,/setting:,/" $fileName
However the wildcard references the last comma for that line, not the comma immediately following the match.
My current work around is:
sed -i "/software/ s/setting:[^,]*,/setting:,/" $fileName but this limits the ability to have a potential value that contains a comma wrapped inside quotations, etc... I know this is an unlikely scenario but I would like to have an ideal solution where the value can contain any character it would like and just to do a string replacement between the "setting:" and the comma immediately following the value of that particular setting.
Any help is appreciated. Thanks in advance!
This will work for one or none quoted part in the value (unquoted parts before and after the quoted part):
sed -i '/software/ s/setting:[^,"]*("[^"]*")?[^,"]*,/setting:,/' $fileName
echo 'software=setting:"value1,value2",setting2:value,setting3:value'|sed -E '/software/ s/setting:("[^"]*"|[^,"]*),/setting:,/'
This is with sed on OSX, gnu sed has slightly different options.

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.