sed conditional replace of a variable - regex

I have within a file a bunch of codenumbers that in general are of the form, integer.integer The first integer is necessary, the second may be empty. e.g. 123.45 or 12.345 and 12 are all valid codenumbers.
I want to use sed to change each of these lines into
job{123}subjob{45}
job{12}subjob{345}
job{12}
So far I have
sed -e 's/codenumber{\([0-9]*\)\.*\([0-9]*\)}/job{\1}subjob{\2}/g'
which results in
job{123}subjob{45}
job{12}subjob{345}
job{12}subjob{}
Is there a way for sed to realise that when the variable \2 is empty, to print a default value instead, say 0. Hence the last line of the given example would say
job{12}subjob{0}
I suppose this could be possible via two sed runs, but I am interested if it was possible with one.

You could simply extend your sed command to patch up empty subjob numbers:
sed -e 's/codenumber{\([0-9]*\)\.*\([0-9]*\)}/job{\1}subjob{\2}/g' \
-e 's/subjob{}/subjob{0}/g'

I don't think this is possible in sed. But indeed you can do two sed runs (they're really fast so it shouldn't be a problem), the second being
sed -e 's/subjob\{\}//g'

This might work for you (GNU sed):
sed 's/codenumber/job/;s/\./}subjob{/;/subjob/!s/$/subjob{0}/' file

I know, a bit too late and in addition not answering the question, but if someone lands here as I did, and does not mind to use Perl instead, so the expression is:
s/codenumber{([0-9]*)\.*([0-9]*)}/job{$1}subjob{${\($2?$2:"0")}}/g
e.g. in:
echo -e 'codenumber{123.45}\ncodenumber{12.345}\ncodenumber{12}' | perl -e 'while(<STDIN>) { print s/codenumber{([0-9]*)\.*([0-9]*)}/job{$1}subjob{${\($2?$2:"0")}}/gr;}'
So, the point is that Perl allows you to use the string interpolation ${\(EXPRESSION)} also in the regexp and calculate the replacement, based on the matched value, using a Perl expression.

Related

Extracting a match from a string with sed and a regular expression in bash

In bash, I want to get the name of the last folder in a folder path.
For instance, given ../parent/child/, I want "child" as the output.
In a language other than bash, this regex works .*\/(.*)\/$ works.
Here's one of my attempts in bash:
echo "../parent/child/" | sed "s_.*/\(.*?\)/$_\1_p"
This gives me the error:
sed: -e expression #1, char 17: unterminated `s' command
What have I failed to understand?
One problem with your script is that inside the "s_.*/\(.*?\)/$_\1_p" the $_ is interpreted by the shell as a variable name.
You could either replace the double-quotes with single-quotes or escape the $.
Once that's fixed, the .*? may or may not work with your implementation of sed. It will be more robust to write something roughly equivalent that's more widely supported, for example:
sed -e 's_.*/\([^/]*\)/$_\1_'
Note that I dropped the p flag of sed to avoid printing the result twice.
Finally, a much simpler solution will be to use the basedir command.
$ basename ../parent/child/
child
Finally, a native Bash solution is also possible using parameter expansion:
path=../parent/child/
path=${path%/}
path=${path##*/}
You can use cut too
echo '../parent/child/' | cut -d/ -f3

shell sed - substitute an unknown string between a known string and a generic delimiter

Ok, so I know there is a similar post that I referred to already but does not fit the exact issue I am having.
For reference: replace a unknown string between two known strings with sed
I have a file with software=setting:value,setting2:value,setting3:value, etc...
My first attempt was to use the reference above with sed -i "/software/ s/setting:.*,/setting:,/" $fileName
However the wildcard references the last comma for that line, not the comma immediately following the match.
My current work around is:
sed -i "/software/ s/setting:[^,]*,/setting:,/" $fileName but this limits the ability to have a potential value that contains a comma wrapped inside quotations, etc... I know this is an unlikely scenario but I would like to have an ideal solution where the value can contain any character it would like and just to do a string replacement between the "setting:" and the comma immediately following the value of that particular setting.
Any help is appreciated. Thanks in advance!
This will work for one or none quoted part in the value (unquoted parts before and after the quoted part):
sed -i '/software/ s/setting:[^,"]*("[^"]*")?[^,"]*,/setting:,/' $fileName
echo 'software=setting:"value1,value2",setting2:value,setting3:value'|sed -E '/software/ s/setting:("[^"]*"|[^,"]*),/setting:,/'
This is with sed on OSX, gnu sed has slightly different options.

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

How to print only matches with sed?

Okay, this is an easy one, but I can't figure it out.
Basically I want to extract all links ([^<>]*) from a big html file.
I tried to do this with sed, but I get all kinds of results, just not what I want. I know that my regexp is correct, because I can replace all the links in a file:
sed 's_[^<>]*_TEST_g'
If I run that on something like
<div>A google link</div>
<div>A google link</div>
I get
<div>TEST</div>
<div>TEST</div>
How can I get rid of everything else and just print the matches instead? My preferred end result would be:
A google link
A google link
PS. I know that my regexp is not the most flexible one, but it's enough for my intentions.
Match the whole line, put the interesting part in a group, replace by the content of the group. Use the -n option to suppress non-matching lines, and add the p modifier to print the result of the s command.
sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
Note that if there are multiple links on the line, this only prints the last link. You can improve on that, but it goes beyond simple sed usage. The simplest method is to use two steps: first insert a newline before any two links, then extract the links.
sed -n -e 's!</a>!&\n!p' | sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
This still doesn't handle HTML comments, <pre>, links that are spread over several lines, etc. When parsing HTML, use an HTML parser.
If you don't mind using perl like sed it can copy with very diverse input:
perl -n -e 's+(<a href=.*?</a>)+ print $1, "\n" +eg;'
Assuming that there is only one hyperlink per line the following may work...
sed -e 's_.*&lta href=_&lta href=_' -e 's_>.*_>ed &lt&lt'EOF'
-e 's_.*&lta href=_&lta href=_' -e 's_>.*_>_'
This might work for you (GNU sed):
sed '/<a href\>/!d;s//\n&/;s/[^\n]*\n//;:a;$!{/>/!{N;ba}};y/\n/ /;s//&\n/;P;D' file

Sed substitution not doing what I want and think it should do

I have am trying to use sed to get some info that is encoded within the path of a file which is passed as a parameter to my script (Bourne sh, if it matters).
From this example path, I'd like the result to be 8
PATH=/foo/bar/baz/1-1.8/sing/song
I first got the regex close by using sed as grep:
echo $PATH | sed -n -e "/^.*\/1-1\.\([0-9][0-9]*\).*/p"
This properly recognized the string, so I edited it to make a substitution out of it:
echo $PATH | sed -n -e "s/^.*\/1-1\.\([0-9][0-9]*\).*/\1/"
But this doesn't produce any output. I know I'm just not seeing something simple, but would really appreciate any ideas about what I'm doing wrong or about other ways to debug sed regular expressions.
(edit)
In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.
It is also possible to have an input string that the regular expression should not match and should product no output.
The -n option to sed supresses normal output, and since your second line doesn't have a p command, nothing is output. Get rid of the -n or stick a p back on the end
It looks like you're trying to get the 8 from the 1-1.8 (where 8 is any sequence of numerics), yes? If so, I would just use:
echo /foo/bar/baz/1-1.8/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
No doubt you could get it working with one sed "instruction" (-e) but sometimes it's easier just to break it down.
The first strips out everything from the start up to and including 1-1., the second strips from the first non-numeric after that to the end.
$ echo /foo/bar/baz/1-1.8/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
8
$ echo /foo/bar/baz/1-1.752/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
752
And, as an aside, this is actually how I debug sed regular expressions. I put simple ones in independent instructions (or independent part of a pipeline for other filtering commands) so I can see what each does.
Following your edit, this also works:
$ echo /foo/bar/baz/1-1.962/sing/song | sed -e "s/.*\/1-1\.\([0-9][0-9]*\).*/\1/"
962
As to your comment:
In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.
The two-part sed command I gave you should work with numerics anywhere in the string (as long as there's no 1-1. after the one you're interested in). That's because it actually deletes up to the specific 1-1. string and thereafter from the first non-numeric). If you have some examples that don't work as expected, toss them into the question as an update and I'll adjust the answer.
You can shorten you command by using + (one or more) instead of * (zero or more):
sed -n -e "s/^.*\/1-1\.\([0-9]\+\).*/\1/"
don't use PATH as your variable. It clashes with PATH environment variable
echo $path|sed -e's/.*1-1\.//;s/\/.*//'
You needn't divide your patterns with / (s/a/b/g), but may choose every character, so if you're dealing with paths, # is more useful than /:
echo /foo/1-1.962/sing | sed -e "s#.*/1-1\.\([0-9]\+\).*#\1#"