How to get sed to take extended regular expressions? - regex

I want to do string replacement using regular expressions in sed. Now, I'm aware that the behavior of sed is funky on a Mac. I've often seen workarounds using egrep when I want to just examine a certain pattern in a line. But, in this case I want to do string replacement.
I want to replace cp an and cp <tab or newline> an with gggg. I tried the following, which would work under extended regular expressions:
sed -i'_backup' 's/cp\s+an/gggg/g'
But of course this does nothing. I tried egrepping, and of course it picks out the lines with cp <one or more space characters> an.
How do I get sed to do replacement using extended regular expressions? Or what is a better way to do replacement using regular expressions?
i'm on mac osx.

On OSX following command will work for extended regex support:
sed -i.backup -E 's/cp[[:blank:]]+an/gggg/g'
POSIX Character Class Reference

Since you mentioned you want <newline> to be handled, you'll need to coax sed a bit. Your exact requirements aren't too clear to me but the following example illustrates that sed can easily handle certain cases in which a newline is in the "target" regex:
$ echo $'cp\nancp an' | sed -E '/cp/{N; s/cp(\n|[[:blank:]])an/gggg/g;}'
gggggggg
(Note to non-Mac readers: If your grep does not support -E, try -r instead.)

Related

How can I translate a regex within vim to work with sed?

I have a string that exists within a text file that I am trying to modify with regex.
"configuration_file_for_wks_33-40"
and I want to modify it so that it looks like this
"configuration_file_for_wks_33-40_6ks"
Within vim I can accomplish this with the following regex command
%s/33-\(\d\d\)/33-\1_6ks/
But if I try to pass that regex command to sed such as
sed 's/33-\(\d\d\)/33-\1_6ks/' input_file.json
The string is not changed, even if I include the -e parameter.
I have also tried to do this using ex as
echo '%s/33-\(\d\d\)/33-\1_6ks/' | ex input_file.json
If I use
sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
then I get
configuration_file_for_wks_33-_6ks40
For that, I've tried various different escaping patterns without any luck.
Can someone help me understand why this changes are not working?
vim has a different syntax for regular expressions (which is even configurable). Unfortunately, sed doesn't understand \d (see https://unix.stackexchange.com/a/414230/304256). With -E, you can match digits with [0-9] or [[:digit:]]:
$ sed -E 's/33-[0-9][0-9]/&_6ks/'
configuration_file_for_wks_33-40_6ks
Note that you can use & in the replacement for adding the entire matched string.
So why is this:
$ sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
configuration_file_for_wks_33-_6ks40
Here, (\d\d)* is simply matched 0 times, so you replace wks_33- by wks_33-_6ks (\1 is a zero-length string) and 40 remains where it was before.
Translation from one language to another is best done with some reference material on hand:
sed BRE syntax
sed ERE syntax
sed classes
sed RE extensions
The superficial reading of which shows that sed doesn't support \d.
Possible alternatives to \d\d:
[[:digit:]]\{2\}
[0-9]\{2\}
How can I translate a regex within vim to work with sed?
Since you write "a regex", I think you refer to any regex.
Translating a Vim regex to a Sed regex is not always possible, because a Vim regex can have lookarounds, whereas a Sed regex has no such things.

Sed doesn't work in command line however regular expression in online test regex101 works

I have a string like
July 20th 2017, 11:03:37.620 fc384c3d-9a75-459d-ba92-99069db0e7bf
I need to remove everything from the beginning of the line till the UUID substring (it's a tab, \t just before the UUID).
My regex looks like that:
^\s*July(.*)\t
When I test it in regex101 it all works beatufully: https://regex101.com/r/eZ1gT7/1077
However, when I plonk that into a sed command it doesn't do any substitution:
less pensionQuery.txt | sed -e 's/^\s*July(.*)\t//'
where pensionQuery.txt is a file full of the lines similar to the above. So the command above simply spits out unmodified file contnent.
Is my sed command wrong?
Any ideas?
The regex is right, you are not trying sed with --regexp-extended
'-E'
'--regexp-extended'
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that egrep accepts; they
can be clearer because they usually have fewer backslashes.
Historically this was a GNU extension, but the -E extension has
since been added to the POSIX standard
echo -e $'July 20th 2017, 11:03:37.620\tfc384c3d-9a75-459d-ba92-99069db0e7bf' |
sed -E 's/^\s*July(.*)\t//'
fc384c3d-9a75-459d-ba92-99069db0e7bf
Also a simple read-up on Basic (BRE) and extended (ERE) regular expression
Basic and extended regular expressions are two variations on the syntax of the specified pattern. Basic Regular Expression (BRE) is the default in sed (and similarly in grep). Extended Regular Expression syntax (ERE) is activated by using the -r or -E options (and similarly, grep -E).

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.
That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson
If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.
A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

Regular expression in sed command

I want to use sed command to change one(width to be exact) parameter in a file. In this case it is motion.conf in motion program for motion detection.
I have written a command for line replacment but it is not working, as I want it to work. This is the command:
sudo sed 's/width [0-9]/change/g' /etc/motion/motion.conf
The selection of the line works only when I have [0-9] written. If I place [0-9]+ the ommand does not work anymore.
Can you please tell me how to make it work.
Thanks
All you need to do is escape the +:
$ cat a
xxx width 0 hallo
xxx width 12 hallo
$ sed 's/width [0-9]\+/change/g' a
xxx change hallo
xxx change hallo
You can find the regex syntax for GNU sed here.
Note that sed uses by default the POSIX BRE (Basic Regular Expression) syntax. In the POSIX specification + doesn't exist, but many linux implementations have added it. To obtain a more portable pattern, you can write:
sed 's/width [0-9][0-9]*/change/g'
or
sed 's/width [0-9]\{1,\}/change/g'
But curly brackets quantifiers are not supported in some older versions.
You can also switch the regular expression syntax to ERE (Extended Regular Expression) that supports the + quantifier and doesn't need to be escaped:
sed -r 's/width [0-9]+/change/g'
(the -r option may be different ( for example: -E) depending of your sed version.)
You can look around these syntaxes here.

Sed expression doesn't allow optional grouped string

I'm trying to use the following regex in a sed script but it doesn't work:
sed -n '/\(www\.\)\?teste/p'
The regex above doesn't seem to work. sed doesn't seem to apply the ? to the grouped www\..
It works if you use the -E parameter that switches sed to use the Extended Regex, so the syntax becomes:
sed -En '/(www\.)?teste/p'
This works fine but I want to run this script on a machine that doesn't support the -E operator. I'm pretty sure that this is possible and I'm doing something very stupid.
Standard sed only understands POSIX Basic Regular Expressions (BRE), not Extended Regular Expressions (ERE), and the ? is a metacharacter in EREs, but not in BREs.
Your version of sed might support EREs if you turn them on. With GNU sed, the relevant options are -r and --regexp-extended, described as "use extended regular expressions in the script".
However, if your sed does not support it - quite plausible - then you are stuck. Either import a version of sed that does support them, or redesign your processing. Maybe you should use awk instead.
2014-02-21
I don't know why I didn't mention that even though sed does not support the shorthand ? or \? notation, it does support counted ranges with \{n,m\}, so you can simulate ? with \{0,1\}:
sed -n '/\(www\.\)\{0,1\}teste/p' << EOF
http://www.tested.com/
http://tested.com/
http://www.teased.com/
EOF
which produces:
http://www.tested.com/
http://tested.com/
Tested on Mac OS X 10.9.1 Mavericks with the standard BSD sed and with GNU sed 4.2.2.