sed is failing when regex pattern is a group

sed is failing when regex pattern is a group - regex

My sed command works when I try to replace a single pattern.
sed -i '' 's/^\(feature\)/replacement/g' change.md
It doesn’t do anything when the regex is a group.
sed -i '' 's/^\(feature\|bug\|security\)/replacement/g' change.md
Does anyone know what I'm doing wrong? I tried a number of variations, but none seem to work.
My file:
## Features
feature:foo
feature: baz
## Bugs
bug: bar
bug: KK

On macos sed refers to re_format(7) man page to describe regular expressions. Without -E, sed uses "basic regular expressions". The re_format man page calls these "obsolete REs".
This sentence seems to answer the question:
Obsolete (“basic”) regular expressions differ in several respects. ‘|’ is an ordinary character and there is no equivalent for its functionality.
(emphasis mine)
It appears sed -E 's/^(feature|bug|security)/replacement/g' is your easiest option.
Or perl -i -pe 's/^(feature|bug|security)/replacement/g' file

Related

Sed doesn't work in command line however regular expression in online test regex101 works

I have a string like
July 20th 2017, 11:03:37.620 fc384c3d-9a75-459d-ba92-99069db0e7bf
I need to remove everything from the beginning of the line till the UUID substring (it's a tab, \t just before the UUID).
My regex looks like that:
^\s*July(.*)\t
When I test it in regex101 it all works beatufully: https://regex101.com/r/eZ1gT7/1077
However, when I plonk that into a sed command it doesn't do any substitution:
less pensionQuery.txt | sed -e 's/^\s*July(.*)\t//'
where pensionQuery.txt is a file full of the lines similar to the above. So the command above simply spits out unmodified file contnent.
Is my sed command wrong?
Any ideas?

The regex is right, you are not trying sed with --regexp-extended
'-E'
'--regexp-extended'
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that egrep accepts; they
can be clearer because they usually have fewer backslashes.
Historically this was a GNU extension, but the -E extension has
since been added to the POSIX standard
echo -e $'July 20th 2017, 11:03:37.620\tfc384c3d-9a75-459d-ba92-99069db0e7bf' |
sed -E 's/^\s*July(.*)\t//'
fc384c3d-9a75-459d-ba92-99069db0e7bf
Also a simple read-up on Basic (BRE) and extended (ERE) regular expression
Basic and extended regular expressions are two variations on the syntax of the specified pattern. Basic Regular Expression (BRE) is the default in sed (and similarly in grep). Extended Regular Expression syntax (ERE) is activated by using the -r or -E options (and similarly, grep -E).

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.

That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson

If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.

A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

How to get sed to take extended regular expressions?

I want to do string replacement using regular expressions in sed. Now, I'm aware that the behavior of sed is funky on a Mac. I've often seen workarounds using egrep when I want to just examine a certain pattern in a line. But, in this case I want to do string replacement.
I want to replace cp an and cp <tab or newline> an with gggg. I tried the following, which would work under extended regular expressions:
sed -i'_backup' 's/cp\s+an/gggg/g'
But of course this does nothing. I tried egrepping, and of course it picks out the lines with cp <one or more space characters> an.
How do I get sed to do replacement using extended regular expressions? Or what is a better way to do replacement using regular expressions?
i'm on mac osx.

On OSX following command will work for extended regex support:
sed -i.backup -E 's/cp[[:blank:]]+an/gggg/g'
POSIX Character Class Reference

Since you mentioned you want <newline> to be handled, you'll need to coax sed a bit. Your exact requirements aren't too clear to me but the following example illustrates that sed can easily handle certain cases in which a newline is in the "target" regex:
$ echo $'cp\nancp an' | sed -E '/cp/{N; s/cp(\n|[[:blank:]])an/gggg/g;}'
gggggggg
(Note to non-Mac readers: If your grep does not support -E, try -r instead.)

difference between 'i' and 'I' in sed

I thought i and I both mean ignorecase in sed, e.g.
$ echo "abcABC"|sed -e 's/a/j/gi'
jbcjBC
$ echo "abcABC"|sed -e 's/a/j/gI'
jbcjBC
However, looks like it's only for substitution:
$ echo "abcABC"|sed -e '/a/id' # <--
d
abcABC
$ echo "abcABC"|sed -e '/a/Id'
$
It's really confusing.
Where can I find full reference of the meaning of regular expression for sed?

i and I are indeed flags to the s command; they are not generally applicable to all uses of regular expressions in sed. The GNU man page is oddly silent on which flags s accepts (or even the fact that s accepts flags), so you'll have to look in the info page (run info sed).
Other uses of regular expressions are governed by the function in which they are used.
In your other examples, i and I are the actual sed functions applied to lines that match the regular expression a; i means to insert text. As far as I can tell, I is an unrecognized function and so ignored, leaving d as the function, deleting the line. (My interpretation of I may be wrong.)

The sed man page in FreeBSD in the section describing options to the s (substitute) command, says only:
i or I Match the regular expression in a case-insensitive
way
Thus, the following are identical:
s/a/j/gi
s/a/j/gI
But that's only using i as a modifier to the s command. In your second example, you're using i as a command. The man page in this case states:
[1addr]i\
text Write text to the standard output.
and at least in FreeBSD's sed, there is no I (capital-I) command. So your sed script /a/id would (1) match lines containing an a, and if found (2) print the text "d". Which is what you saw.
And since I is not a command, I would have expected an error, but my results match yours -- /a/Id appears to eliminate output.
Note that commands, commands, and completeness of documentation may differ depending on the variant of sed you are using.

What's wrong with my lookahead regex in GNU sed?

This is what I'm doing (simplified example):
gsed -i -E 's/^(?!foo)(.*)$/bar\1/' file.txt
I'm trying to put bar in front of every line that doesn't start with foo. This is the error:
gsed: -e expression #1, char 22: Invalid preceding regular expression
What's wrong?

sed -i '/^foo/! s/^/bar/' file.txt
-i change the file in place
/^foo/! only perform the next action on lines not ! starting with foo ^foo
s/^/bar/ change the start of the line to bar

As far as I know sed has not neither look-ahead nor look-behind. Switch to a more powerful language with similar syntax, like perl.

You use perl compatible regular expression (PCRE) syntax which is not supported by GNU sed.
You should rewrite your regex according to SED Regular-Expressions or use perl instead.
Note that SED doesn't have lookahead and therefore doesn't support the regex feature you were trying to use. It can be done in SED using other features, as others have mentioned.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sed is failing when regex pattern is a group - regex

Related

Sed doesn't work in command line however regular expression in online test regex101 works

sed regexp, number reformatting: how to escape for bash

How to get sed to take extended regular expressions?

difference between 'i' and 'I' in sed

What's wrong with my lookahead regex in GNU sed?

Categories

Resources