-E flag for sed - regex

I seem to remember a -E flag for sed to enable extended regex, then looking in the man page today I see it's absent, however
echo development.properties | sed -E 's/(development.|staging.|qa.|production.)//'
properties
echo development.properties | sed 's/(development.|staging.|qa.|production.)//'
development.properties
echo development.properties | sed -r 's/(development.|staging.|qa.|production.)//'
properties
So it looks to me like -E is doing something, furthermore I'd venture to say it's now an alias for -r, at least that's what I thought -E was for (extended regex). Did it change at some point? It looks like it may still be supported for backwards compatibility, or no?
Also, it sounds like extended regex in sed has nothing to do with ranges like the pipe character inside of parens, so why doesn't the substitution work without either of those flags (-E, -r)?

-E is an alias for -r in your sed version i.e. extended regex support
Without extended regex support, square brackets are matched as literal parentheses not for regex grouping.

Related

sed - print translated HEX using capture group

I would like to print directly with sed a HEX value translation by isolating the HEX values in capture groups. This works:
echo bbb3Accc | sed -n 's/3A/\x3A/p'
bbb:ccc
...but this doesn't work:
echo bbb3Accc | sed 's/\(3A\)/\x\1/'
bbbx3Accc
...or an actual capture group REGEX matching based on URL encoded strings:
echo bbb%3Accc | sed 's/%\([A-Za-z0-9]\)/\x\1/'
bbbx3Accc
Apparently sed no longer interprets and translates the HEX value if it is constructed from a REGEX capture group, together with the \x escape.
But I am wondering if there's a workaround that I am not aware of, to make this work only with sed. Note that I am aware that I can do a bash command substitution and wrap the sed syntax in a echo -e but I would like to avoid that.
Your question isn't clear but maybe this is what you're trying to do using GNU awk for multi-char RS, RT, and strtonum():
$ echo 'bbb%3Accc%21ddd' |
gawk -v RS='%[[:xdigit:]]{2}' 'sub(/%/,"0x",RT){RT=sprintf("%c",strtonum(RT))} {ORS=RT} 1'
bbb:ccc!ddd
As mentioned in the comments, \xAB is interpreted by sed's parser, rather than as an expression, so \x won't work in the way you were trying.
sed is pretty primitive and your example is beyond what it is intended for, so you'd be better off using something more general purpose. For example, in Perl:
$ echo bbb3Accc | perl -ple 's/([0-9A-F]{2})/chr(hex($1))/ge'
bbb:ccc

sed - exchange words with delimiter

I'm trying swap words around with sed, not replace because that's what I keep finding on Google search.
I don't know if it's the regex that I'm getting wrong. I did a search for everything before a char and everything after a char, so that's how I got the regex.
echo xxx,aaa | sed -r 's/[^,]*/[^,]*$/'
or
echo xxx/aaa | sed -r 's/[^\/]*/[^\/]*$/'
I am getting this output:
[^,]*$,aaa
or this:
[^,/]*$/aaa
What am I doing wrong?
For the first sample, you should use:
echo xxx,aaa | sed 's/\([^,]*\),\([^,]*\)/\2,\1/'
For the second sample, simply use a character other than slash as the delimiter:
echo xxx/aaa | sed 's%\([^/]*\)/\([^/]*\)%\2/\1%'
You can also use \{1,\} to formally require one or more:
echo xxx,aaa | sed 's/\([^,]\{1,\}\),\([^,]\{1,\}\)/\2,\1/'
echo xxx/aaa | sed 's%\([^/]\{1,\}\)/\([^/]\{1,\}\)%\2/\1%'
This uses the most portable sed notation; it should work anywhere. With modern versions that support extended regular expressions (-r with GNU sed, -E with Mac OS X or BSD sed), you can lose some of the backslashes and use + in place of * which is more precisely what you're after (and parallels \{1,\} much more succinctly):
echo xxx,aaa | sed -E 's/([^,]+),([^,]+)/\2,\1/'
echo xxx/aaa | sed -E 's%([^/]+)/([^/]+)%\2/\1%'
With sed it would be:
sed 's#\([[:alpha:]]\+\)/\([[:alpha:]]\+\)#\2,\1#' <<< 'xxx/aaa'
which is simpler to read if you use extended posix regexes with -r:
sed -r 's#([[:alpha:]]+)/([[:alpha:]]+)#\2/\1#' <<< 'xxx/aaa'
I'm using two sub patterns ([[:alpha:]]+) which can contain one or more letters and are separated by a /. In the replacement part I reassemble them in reverse order \2/\1. Please also note that I'm using # instead of / as the delimiter for the s command since / is already the field delimiter in the input data. This saves us to escape the / in the regex.
Btw, you can also use awk for that, which is pretty easy to read:
awk -F'/' '{print $2,$1}' OFS='/' <<< 'xxx/aaa'

sed regular expression failed on solaris

Under Solaris 5.10, Why this regexp doesn't match a line like tag="12447"
sed "s/tag=\"[0-9]+\"/emptytag/" test.xml
(I noticed that -r is not implemented in the sed version)
In strict posix mode, the + sign cannot be used to represent "one or more" of something. You can use a range of {1,} instead (escaped of course):
echo 'tag="12447"' | sed --posix "s/tag=\"[0-9]\{1,\}\"/emptytag/"
emptytag
Note that you don't actually need the --posix, I was just using it to disable all GNU extensions in my version of sed:
echo 'tag="12447"' | sed "s/tag=\"[0-9]\{1,\}\"/emptytag/"
emptytag

Bash (grep) regex performing unexpectedly

I have a text file, which contains a date in the form of dd/mm/yyyy (e.g 20/12/2012).
I am trying to use grep to parse the date and show it in the terminal, and it is successful,
until I meet a certain case:
These are my test cases:
grep -E "\d*" returns 20/12/2012
grep -E "\d*/" returns 20/12/2012
grep -E "\d*/\d*" returns 20/12/2012
grep -E "\d*/\d*/" returns nothing
grep -E "\d+" also returns nothing
Could someone explain to me why I get this unexpected behavior?
EDIT: I get the same behavior if I substitute the " (weak quotes) for ' (strong quotes).
The syntax you used (\d) is not recognised by Bash's Extended regex.
Use grep -P instead which uses Perl regex (PCRE). For example:
grep -P "\d+/\d+/\d+" input.txt
grep -P "\d{2}/\d{2}/\d{4}" input.txt # more restrictive
Or, to stick with extended regex, use [0-9] in place of \d:
grep -E "[0-9]+/[0-9]+/[0-9]" input.txt
grep -E "[0-9]{2}/[0-9]{2}/[0-9]{4}" input.txt # more restrictive
You could also use -P instead of -E which allows grep to use the PCRE syntax
grep -P "\d+/\d+" file
does work too.
grep and egrep/grep -E don't recognize \d. The reason your first three patterns work is because of the asterisk that makes \d optional. It is actually not found.
Use [0-9] or [[:digit:]].
To help troubleshoot cases like this, the -o flag can be helpful as it shows only the matched portion of the line. With your original expressions:
grep -Eo "\d*" returns nothing - a clue that \d isn't doing what you thought it was.
grep -Eo "\d*/" returns / (twice) - confirmation that \d isn't matching while the slashes are.
As noted by others, the -P flag solves the issue by recognizing "\d", but to clarify Explosion Pills' answer, you could also use -E as follows:
grep -Eo "[[:digit:]]*/[[:digit:]]*/" returns 20/12/
EDIT: Per a comment by #shawn-chin (thanks!), --color can be used similarly to highlight the portions of the line that are matched while still showing the entire line:
grep -E --color "[[:digit:]]*/[[:digit:]]*/" returns 20/12/2012 (can't do color here, but the bold "20/12/" portion would be in color)

Java regex and sed aren't the same...?

Get these strings:
00543515703528
00582124628575
0034911320020
0034911320020
005217721320739
0902345623
067913187056
00543515703528
Apply this exp in java: ^(06700|067|00)([0-9]*).
My intention is to remove leading "06700, 067 and 00" from the beggining of the string.
It is all cool in java, group 2 always have the number I intend to, but in sed it isnt the same:
$ cat strings|sed -e 's/^\(06700|067|00\)\([0-9]*\)/\2/g'
00543515703528
00582124628575
0034911320020
0034911320020
005217721320739
0902345623
067913187056
00543515703528
What the heck am I missing?
Cheers,
f.
When using extended regular expressions, you also need to omit the \ before ( and ). This works for me:
sed -r 's/^(06700|067|00)([0-9]*)/\2/g' strings
note also that there's no need for a separate call to cat
I believe your problem is this:
sed defaults to BRE: The default
behaviour of sed is to support Basic
Regular Expressions (BRE). To use all
the features described on this page
set the -r (Linux) or -E (BSD) flag to
use Extended Regular Expressions
Source
Without this flag, the | character is interpreted literally. Try this example:
echo "06700|067|0055555" | sed -e 's/^\(06700|067|00\)\([0-9]*\)/\2/g'