Sed regex to find-replace version numbers - regex

I'm new to sed, trying to write a script to find/replace text in a file. The file (test.txt) looks like this;
hello_world (1.2.0.123)
and I'm finding that this script (which I inherited):
sed -i 's/\(^\s*hello_world \)(.*)/\1hello_world (1.2.0.456)/' test.txt
is leading to;
hello_world hello_world (1.2.0.456)
when I need it to be
hello_world (1.2.0.456)
I'm not sure how to make the first part match only the parentheses, any assistance would be appreciated.
EDIT
The whitespace before the hello_world is important
The sed line is being auto-generated using variables etc. I'm looking for a way to make this regex work without changing that. The variables I have to play with are
variable1: hello_world
variable2: hello_world (1.2.0.456)
(hopefully it's obvious where these variables sat within the sed expression)
EDIT
I got this sorted in the end, answer below if anyone else is interested.

Got it
sed -i 's/\(^\s*\)phoenix_utils (.*)/\1phoenix_utils (1.0.0.28583)/' test.txt

sed -i -e 's/^\([[:blank:]]*hello_world \).*/\1(1.0.0.28583)/' YourFile
\1 is the content of first ( ) so \1Helloworld write it twice in your sample
be carefull with escape content depending of -e or not (behavior change and non GNU sed often need to escape (for grouping pattern)

Related

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.
That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson
If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.
A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

Using sed with regex to replace text on OSX and Linux

I am trying to replace some strings inside a file with sed using Regular Expressions. To complicate the matter, this is being done inside a Makefile script that needs to work on both osx and linux.
Specifically, within file.tex I want to replace
\subimport{chapters/}{xxx}
with
\subimport{chapters/}{xxx-yyy}
(xxx and yyy are just example text.)
Note, xxx could contain any letters, numbers, and _ (underscore) but really the regex can simply match anything inside the brackets. Sometimes there is some whitespace at the beginning of the line before \subimport....
The design of the string being searched for requires a lot of escaping (when searched for with regex) and I am guessing somewhere therein lies my error.
Here's what I've tried so far:
sed -i'.bak' -e 's/\\subimport\{chapters\/\}\{xxx\}/\\subimport\{chapters\/\}\{xxx-yyy\}/g' file.tex
# the -i'.bak' is required so SED works on OSX and Linux
rm -f file.tex.bak # because of this, we have to delete the .bak files after
This results in an error of RE error: invalid repetition count(s) when I build my Makefile that contains this script.
I thought part of my problem was that the -E option for sed was not available in the osx version of sed. It turns out, when using the -E option, fewer things should be escaped (see comments on my question).
POSIX-ly:
sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#'
# is used as the parameter separator for sed's s (Substitution)
\(\\subimport{chapters/}{[[:alnum:]_]\+\) is the captured group, containing everything required upto last }, preceeded by one or more alphabetics, digits, and underscore
In the replacement, the first captured group is followed by the required string, closed by a }
Example:
$ sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#' <<<'\subimport{chapters/}{foobar9}'
\subimport{chapters/}{foobar9-yyy}
$ sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#' <<<'\subimport{chapters/}{spamegg923}'
\subimport{chapters/}{spamegg923-yyy}
Here's is the version that ended up working for me.
sed -i.bak -E 's#^([[:blank:]]*\\subimport{chapters/}{[[:alnum:]_]+)}$#\1-yyy}#' file.tex
rm -f file.tex.bak
Much thanks go to #heemayl. Their answer is the better written one, it simply required some tweaking to get a version that worked for me.

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

Print RegEx matches using SED in bash

I have an XML file, the file is made up of one line.
What I am trying to do is extract the "finalNumber" attribute value from the file via Putty. Rather than having to download a copy and search using notepad++.
I've built up a regular expression that I've tested on an On-line Tool, and tried using it within a sed command to duplicate grep functionality. The command runs but doesn't return anything.
RegEx:
(?<=finalNumber=")(.*?)(?=")
sed Command (returns nothing, expected 28, see file extract):
sed -n '/(?<=finalNumber=")(.*?)(?=")/p' file.xml
File Extract:
...argo:finalizedDate="2012-02-09T00:00:00.000Z" argo:finalNumber="28" argo:revenueMonth=""...
I feel like I am close (i could be wrong), am I on the right lines or is there better way to achieve the output?
Nothing wrong with good old grep here.
grep -E -o 'finalNumber="[0-9]+"' file.xml | grep -E -o '[0-9]+'
Use -E for extended regular expressions, and -o to print only the matching part.
Though you already select an answer, here is a way you can do in pure sed:
sed -n 's/^.*finalNumber="\([[:digit:]]\+\)".*$/\1/p' <test
Output:
28
This replaces the entire line by the match number and print (because p will print the entire line so you have to replace the entire line)
This might work for you (GNU sed):
sed -r 's/.*finalNumber="([^"]*)".*/\1/' file
sed does not support look-ahead assertions. Perl does, though:
perl -ne 'print $1 if /(?<=finalNumber=")(.*?)(?=")/'
As I understand, there is no need to use look-aheads here.
Try this one
sed -n '/finalNumber="[[:digit:]]\+"/p'

How to print only matches with sed?

Okay, this is an easy one, but I can't figure it out.
Basically I want to extract all links ([^<>]*) from a big html file.
I tried to do this with sed, but I get all kinds of results, just not what I want. I know that my regexp is correct, because I can replace all the links in a file:
sed 's_[^<>]*_TEST_g'
If I run that on something like
<div>A google link</div>
<div>A google link</div>
I get
<div>TEST</div>
<div>TEST</div>
How can I get rid of everything else and just print the matches instead? My preferred end result would be:
A google link
A google link
PS. I know that my regexp is not the most flexible one, but it's enough for my intentions.
Match the whole line, put the interesting part in a group, replace by the content of the group. Use the -n option to suppress non-matching lines, and add the p modifier to print the result of the s command.
sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
Note that if there are multiple links on the line, this only prints the last link. You can improve on that, but it goes beyond simple sed usage. The simplest method is to use two steps: first insert a newline before any two links, then extract the links.
sed -n -e 's!</a>!&\n!p' | sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
This still doesn't handle HTML comments, <pre>, links that are spread over several lines, etc. When parsing HTML, use an HTML parser.
If you don't mind using perl like sed it can copy with very diverse input:
perl -n -e 's+(<a href=.*?</a>)+ print $1, "\n" +eg;'
Assuming that there is only one hyperlink per line the following may work...
sed -e 's_.*&lta href=_&lta href=_' -e 's_>.*_>ed &lt&lt'EOF'
-e 's_.*&lta href=_&lta href=_' -e 's_>.*_>_'
This might work for you (GNU sed):
sed '/<a href\>/!d;s//\n&/;s/[^\n]*\n//;:a;$!{/>/!{N;ba}};y/\n/ /;s//&\n/;P;D' file