Alternatives to sed extended regular expressions

Alternatives to sed extended regular expressions - regex

I need to run a regular expression to match part of a string. On OS X I would do:
echo "$string" | sed -E 's/blah(.*)blah/\1/g'
However, this use of sed isn't compatible with other platforms, many of which would invoke the same command using sed -r.
So what I'm looking for is either a good way to detect which option to use, or a widely available (and compatible) alternative to sed that I can try to do the same thing (retrieve part of a string using a pattern).

There are alternatives like awk, perl, tr or even pure bash. It depends upon what you want to do.
However for your case you don't really need special regex flag -E of sed. You can do:
sed 's/blah\(.*\)blah/\1/g'
To make it compatible with sed on other platforms.

This is indeed incredibly annoying. I do something like:
SED_EXTENDED_REGEXP_FLAG=-r
case $(uname)
in
*BSD) SED_EXTENDED_REGEXP_FLAG=-E ;;
Darwin) SED_EXTENDED_REGEXP_FLAG=-E ;;
esac
echo "$string" | sed $SED_EXTENDED_REGEXP_FLAG 's/blah(.*)blah/\1/g'
That's off the top of my head, so apologies if the shell script syntax is a bit off.
This assumes that any platform which is not a BSD or OS X has GNU sed (or another sed where -r is the flag for extended regular expressions, if there is such a thing).

By far the best solution using sed is to use the portable (POSIX) basic regular expression equivalent, which will work on all platforms:
echo "$string" | sed -e 's/blah\(.*\)blah/\1/g'
This -e indicates the sed-script follows; it could be omitted.
Failing that, Perl was in part a sed substitute (there's still a program s2p that converts sed scripts into Perl scripts).
perl -e 'foreach (#ARGV) { s/blah(.*)blah/$1/; print "$_\n"; }' "$string"

Related

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.

That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson

If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.

A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

Sed doesn't replace my text properly

My following regex in Sed doesn't extract the file I want without the #30 substring.
Could you please help pointing out what I am missing here?
[machine]# echo "//dir1/dir2/dir3/component/file.rb#70" | sed 's/\(.*rb\)#\d+$/\1/g'
Output: //dir1/dir2/dir3/component/file.rb#70
What I want is simply: //dir1/dir2/dir3/component/file.rb without #70 substring.
Thanks in advance
PL

The flavor of regular expression understood by sed by default doesn't include either \d for digits or + for "1 or more".
This will work:
sed 's/\(.*\.rb\)#[0-9][0-9]*$/\1/g'
Or you could turn on "extended" regular expression syntax with -E, which makes the + work (though still not \d), and swaps the meaning of backslashed vs non-backslashed parentheses:
sed -E 's/(.*\.rb)#[0-9]+$/\1/g'
Both of the above commands will work even on non-GNU sed, as you get by default on BSD and Mac OS X systems. In normal mode (without the -E), GNU sed also understands \+ to mean the same as bare + in extended mode, but BSD sed does not.
If all you're trying to do is get rid of the #digits, though, you can do it more simply. Sed regexes aren't anchored to the start of the line, so you don't have to include the filename - just replace the part you don't want with nothing at all:
sed 's/#[0-9][0-9]*$//'
or
sed -E 's/#[0-9]+$//'
If your real problem does require the fancy version, though, you could also use Perl, which has the advantage that there's relatively few (almost no) changes in regex syntax across versions. It also understands that \d syntax you tried to use:
perl -pe 's/(.*\.rb)#\d+$/\1/g'

With GNU sed, your command works if you use -E and change \d to [0-9] or [[:digit:]]:
echo "//dir1/dir2/dir3/component/file.rb#70" | sed -E 's/(.*rb)#[0-9]+$/\1/g'
//dir1/dir2/dir3/component/file.rb
Depending on the context, you may be able to use a simpler command, such as
sed 's/#[0-9]\+//g'

You got the answer but have you considered simply:
$ echo "//dir1/dir2/dir3/component/file.rb#70" | cut -d'#' -f1
//dir1/dir2/dir3/component/file.rb

Regex not working with sed

i have this text in file
"0000000441244"
"0000000127769"
I want to replace all zeros with 'L'
I am trying this and nothing gets chnaged
sed -e 's/0+/L/g' regex.txt
sed -e 's/(0+)/L/g' regex.txt
I want to know where i am wrong

Posix compliant version should use 00* instead of 0+:
sed -e 's/00*/L/g' regex.txt
As a side note, you only need the g flag if you want to convert "000000012700009" or even "000000012709" into "L127L9". Otherwise, the * in 's/00*/L/' will include all zeros at the beginning anyway.

In Linux(GNU version's sed), both sed -e 's/0\+/L/g' regex.txt and sed -r 's/0+/L/g' regex.txt will do,
but if you are using Mac(BSD version's sed), neither of them works, instead you have to use this:
sed -E 's/0+/L/g' regex.txt.
Actually the last one works in Linux too, so it's more portable.
For this particular problem, #perreal's suggestion is also portable. But when you do need + or other metacharacter in regex, you'd better know how to work around it.

Try this
sed -e 's/0\+/L/g' regex.txt

If you are using any flavor of Unix except GNU, you can either install GNU sed yourself or just switch to awk or ruby or perl.
For example:
ruby -e 'ARGF.each{|l|puts l.gsub(/0+/, "L")}' regex.txt
Using awk:
awk '{gsub("0+", "L"); print $0}' regex.txt
Extended regular expressions are available on Mac OS/X via -E rather than -e.
From the "BSD General Commands Manual":
-E Interpret regular expressions as extended (modern) regular
expressions rather than basic regular expressions (BRE's).
The re_format(7) manual page fully describes both formats.

This might work for you (GNU sed):
sed 'y/0/L/' file

Sed expression doesn't allow optional grouped string

I'm trying to use the following regex in a sed script but it doesn't work:
sed -n '/\(www\.\)\?teste/p'
The regex above doesn't seem to work. sed doesn't seem to apply the ? to the grouped www\..
It works if you use the -E parameter that switches sed to use the Extended Regex, so the syntax becomes:
sed -En '/(www\.)?teste/p'
This works fine but I want to run this script on a machine that doesn't support the -E operator. I'm pretty sure that this is possible and I'm doing something very stupid.

Standard sed only understands POSIX Basic Regular Expressions (BRE), not Extended Regular Expressions (ERE), and the ? is a metacharacter in EREs, but not in BREs.
Your version of sed might support EREs if you turn them on. With GNU sed, the relevant options are -r and --regexp-extended, described as "use extended regular expressions in the script".
However, if your sed does not support it - quite plausible - then you are stuck. Either import a version of sed that does support them, or redesign your processing. Maybe you should use awk instead.
2014-02-21
I don't know why I didn't mention that even though sed does not support the shorthand ? or \? notation, it does support counted ranges with \{n,m\}, so you can simulate ? with \{0,1\}:
sed -n '/\(www\.\)\{0,1\}teste/p' << EOF
http://www.tested.com/
http://tested.com/
http://www.teased.com/
EOF
which produces:
http://www.tested.com/
http://tested.com/
Tested on Mac OS X 10.9.1 Mavericks with the standard BSD sed and with GNU sed 4.2.2.

What GNU/Linux command-line tool would I use for performing a search and replace on a file?

What GNU/Linux command-line tool would I use for performing a search and replace on a file?
Can the search text, and replacement, be specified in a regex format?

sed 's/a.*b/xyz/g;' old_file > new_file
GNU sed (which you probably have) is even more versatile:
sed -r --in-place 's/a(.*)b/x\1y/g;' your_file
Here is a brief explanation of those options:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
-r, --regexp-extended
use extended regular expressions in the script.
The FreeBSD, NetBSD and OpenBSD versions also supports these options.
If you want to learn more about sed, Cori has suggested this tutorial.

Perl was invented for this:
perl -pi -e 's/foo/bar/g;' *.txt
Any normal s/// pattern in those single quotes. You can keep a backup with something like this:
perl -pi.bak -e 's/foo/bar/g;' *.txt
Or pipeline:
cat file.txt | perl -ne 's/foo/bar/g;' | less
But that's really more sed's job.

Consider Ruby as an alternative to Perl. It stole most of Perl's one-liner commandline args (-i, -p, -l, -e, -n) and auto-sets $_ for you like Perl does and has plenty of regex goodness. Additionally Ruby's syntax may be more comfortable and easier to read or write than Perl's or sed's. (Or not, depending on your tastes.)
ruby -pi.bak -e '$_.gsub!(/foo|bar/){|x| x.upcase}' *.txt
vs.
perl -pi.bak -e 's/(foo|bar)/\U\1/g' *.txt
In many cases when dealing with one-liners, performance isn't enough of an issue to care whether you use lightweight sed or heavyweight Perl or heaveier-weight Ruby. Use whatever is easiest to write.

sed, the stream editor, and yes, it uses regex.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Alternatives to sed extended regular expressions - regex

There are alternatives like awk, perl, tr or even pure bash. It depends upon what you want to do. However for your case you don't really need special regex flag -E of sed. You can do: sed 's/blah\(.*\)blah/\1/g' To make it compatible with sed on other platforms.

Related

sed regexp, number reformatting: how to escape for bash

Sed doesn't replace my text properly

Regex not working with sed

Sed expression doesn't allow optional grouped string

What GNU/Linux command-line tool would I use for performing a search and replace on a file?

Categories

Resources