Under Solaris 5.10, Why this regexp doesn't match a line like tag="12447"
sed "s/tag=\"[0-9]+\"/emptytag/" test.xml
(I noticed that -r is not implemented in the sed version)
In strict posix mode, the + sign cannot be used to represent "one or more" of something. You can use a range of {1,} instead (escaped of course):
echo 'tag="12447"' | sed --posix "s/tag=\"[0-9]\{1,\}\"/emptytag/"
emptytag
Note that you don't actually need the --posix, I was just using it to disable all GNU extensions in my version of sed:
echo 'tag="12447"' | sed "s/tag=\"[0-9]\{1,\}\"/emptytag/"
emptytag
Related
To my understanding, * matches zero or more and + matches one or more.
So when I did this on macOS:
echo "1" | sed 's/[0-9]*//g'
The number was deleted.
But if I do this:
echo "1" | sed 's/[0-9]+//g'
The number will still be there.
But shouldn't [0-9]+ matches "1" as well?
This is (probably) about whether the sed command is running in POSIX (strict) mode or whether GNU extensions are enabled.
In POSIX mode, a + in a sed regex is not a meta-character.
In GNU extension mode, a + means "one or more repetitions". GNU extensions are enabled using the -E option.
For more information about sed regexes:
https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
+ in sed is considered part of the extended-regular expressions, and so, by default, + is not recognized as a special character. Use the -E flag to enable extended regular expressions like so:
echo "1" | sed -E 's/[0-9]+//g'
I have an XML file that I am finding and replacing emails and usernames in.
It's all good but to avoid some duplicate user emails etc.. I am wanting to skip XML elements of specific types.
I can do this if I want to skip ONE specific time i.e.
/ApplicationUser/!s/"user.name"/"user.name#abc.com"/g
But not if I try multiple on the one sed command
/(OtherElement|ApplicationUser)/!s/"user.name"/"user.name#abc.com"/g
OR
/\(OtherElement\|ApplicationUser\)/!s/"user.name"/"user.name#abc.com"/g
OR
/\(OtherElement|ApplicationUser\)/!s/"user.name"/"user.name#abc.com"/g
I am loading in the commands from a file if that is relevant. I'm assuming it has something to do with my pattern at the start trying to match 1 or more words but not sure.
So, the regular expression syntax depends on the version of sed you're using.
First off, according to the POSIX specification, basic regular expressions (BRE) do not support alternation. However, tools do not necessarily follow the specification and, in particular, different versions of sed have different behavior.
The examples below are all processing this file:
$ cat sed-re-test.txt
OtherElement "user.name"
OnlyReplaceMe "user.name"
ApplicationUser "user.name"
GNU sed
The GNU sed BRE variant supports alternation but the | metacharacter (along with ( and )) must be escaped with a \. If you use -E flag to enable Extended Regular Expressions (ERE), then the metacharacters must not be escaped.
$ sed --version
sed (GNU sed) 4.4
<...SNIP...>
GNU sed BRE variant (with escaped metacharacters): WORKS
$ cat sed-re-test.txt | sed '/\(OtherElement\|ApplicationUser\)/!s/"user.name"/"user.name#abc.com"/g'
OtherElement "user.name"
OnlyReplaceMe "user.name#abc.com"
ApplicationUser "user.name"
GNU sed ERE (with unescaped metacharacters): WORKS
$ cat sed-re-test.txt | sed -E '/(OtherElement|ApplicationUser)/!s/"user.name"/"user.name#abc.com"/g'
OtherElement "user.name"
OnlyReplaceMe "user.name#abc.com"
ApplicationUser "user.name"
BSD/MacOS sed
BSD sed does not support alternation in BRE mode. You must use -E to enable alternation support.
No --version flag, so identifying the OS will have to do:
$ uname -s
OpenBSD
BSD sed BRE (with escaped and unescaped metacharacters): DOES NOT WORK
$ cat sed-re-test.txt | sed '/\(OtherElement\|ApplicationUser\)/! s/"user.name"/"user.name#abc.com"/'
OtherElement "user.name#abc.com"
OnlyReplaceMe "user.name#abc.com"
ApplicationUser "user.name#abc.com"
$ cat sed-re-test.txt | sed '/(OtherElement|ApplicationUser)/! s/"user.name"/"user.name#abc.com"/'
OtherElement "user.name#abc.com"
OnlyReplaceMe "user.name#abc.com"
ApplicationUser "user.name#abc.com"
BSD sed ERE (with unescaped metacharacters): WORKS
$ cat sed-re-test.txt | sed -E '/(OtherElement|ApplicationUser)/! s/"user.name"/"user.name#abc.com"/'
OtherElement "user.name"
OnlyReplaceMe "user.name#abc.com"
ApplicationUser "user.name"
This might work for you (GNU sed):
sed '/OtherElement\|ApplicationUser/b;s/"user.name"/"user.name#abc.com"/g file
On encountering a line which you do not want to process, break out, fetch the next and repeat.
Just use awk and avoid the convoluted, backwards logic (if X do NOT do Y but do Y for everything else vs the simple if NOT X do Y) and the version-specific constructs that you get with sed.
awk '!/OtherElement|ApplicationUser/{ gsub(/"user.name"/,"\"user.name#abc.com\"") } 1' file
That is clear, simple, extensible and will work with any awk in any shell on any UNIX box.
I'm trying swap words around with sed, not replace because that's what I keep finding on Google search.
I don't know if it's the regex that I'm getting wrong. I did a search for everything before a char and everything after a char, so that's how I got the regex.
echo xxx,aaa | sed -r 's/[^,]*/[^,]*$/'
or
echo xxx/aaa | sed -r 's/[^\/]*/[^\/]*$/'
I am getting this output:
[^,]*$,aaa
or this:
[^,/]*$/aaa
What am I doing wrong?
For the first sample, you should use:
echo xxx,aaa | sed 's/\([^,]*\),\([^,]*\)/\2,\1/'
For the second sample, simply use a character other than slash as the delimiter:
echo xxx/aaa | sed 's%\([^/]*\)/\([^/]*\)%\2/\1%'
You can also use \{1,\} to formally require one or more:
echo xxx,aaa | sed 's/\([^,]\{1,\}\),\([^,]\{1,\}\)/\2,\1/'
echo xxx/aaa | sed 's%\([^/]\{1,\}\)/\([^/]\{1,\}\)%\2/\1%'
This uses the most portable sed notation; it should work anywhere. With modern versions that support extended regular expressions (-r with GNU sed, -E with Mac OS X or BSD sed), you can lose some of the backslashes and use + in place of * which is more precisely what you're after (and parallels \{1,\} much more succinctly):
echo xxx,aaa | sed -E 's/([^,]+),([^,]+)/\2,\1/'
echo xxx/aaa | sed -E 's%([^/]+)/([^/]+)%\2/\1%'
With sed it would be:
sed 's#\([[:alpha:]]\+\)/\([[:alpha:]]\+\)#\2,\1#' <<< 'xxx/aaa'
which is simpler to read if you use extended posix regexes with -r:
sed -r 's#([[:alpha:]]+)/([[:alpha:]]+)#\2/\1#' <<< 'xxx/aaa'
I'm using two sub patterns ([[:alpha:]]+) which can contain one or more letters and are separated by a /. In the replacement part I reassemble them in reverse order \2/\1. Please also note that I'm using # instead of / as the delimiter for the s command since / is already the field delimiter in the input data. This saves us to escape the / in the regex.
Btw, you can also use awk for that, which is pretty easy to read:
awk -F'/' '{print $2,$1}' OFS='/' <<< 'xxx/aaa'
I would like to replace all terms that start with a hashtag with a new term
I'm using sed but there seems to be a syntax error
sed 's/#[a-zA-Z0-9]+/replacement/g' terms
How can I correct my syntax?
sed supports a "basic regular expression" (BRE) which does not offer the + as a special operator.
A correct replacement for + would be
sed 's/#[[:alnum:]]\{1,\}/replacement/g'
or
sed 's/#[[:alnum:]][[:alnum:]]*/replacement/g'
GNU sed and recent BSD sed offer "extended regular expression" (ERE) matching:
sed -E 's/#[[:alnum:]]+/replacement/g'
(although with GNU sed you should probably use -r since -E is currently undocumented)
and they also offer \+ as an extension to BRE,
sed 's/#[[:alnum:]]\+/replacement/g'
If you require portability you should stick with the BRE of regular sed.
#user784637 I used [[:alnum:]] instead of [a-zA-Z0-9]. This would also match letters with diacriticals for example.
$ printf "%s\n" ë è é | grep '[a-zA-Z0-9]'
$
vs.
$ printf "%s\n" ë è é | grep '[[:alnum:]]'
ë
è
é
$
You could use either that suits your needs..
On my version of sed, + doesn't do anything useful. You should use * instead.
This is a really simple RegEx that isn't working, and I can't figure out why. According to this, it should work.
I'm on a Mac (OS X 10.8.2).
script.sh
#!/bin/bash
ZIP="software-1.3-licensetypeone.zip"
VERSION=$(sed 's/software-//g;s/-(licensetypeone|licensetypetwo).zip//g' <<< $ZIP)
echo $VERSION
terminal
$ sh script.sh
1.3-licensetypeone.zip
Looking at the regex documentation for OS X 10.7.4 (but should apply to OP's 10.8.2), it is mentioned in the last paragraph that
Obsolete (basic) regular expressions differ in several respects. | is an ordinary character and there is no equivalent for its functionality...
... The parentheses for nested subexpressions are \(' and )'...
sed, without any options, uses basic regular expression (BRE).
To use | in OS X or BSD's sed, you need to enable extended regular expression (ERE) via -E option, i.e.
sed -E 's/software-//g;s/-(licensetypeone|licensetypetwo).zip//g'
p/s: \| in BRE is a GNU extension.
Alternative ways to extract version number
chop-chop (parameter expansion)
VERSION=${ZIP#software-}
VERSION=${VERSION%-license*.zip}
sed
VERSION=$(sed 's/software-\(.*\)-license.*/\1/' <<< "$ZIP")
You don't necessarily have to match strings word-by-word with shell patterns or regex.
sed works with simple regular expressions. You have to backslash parentheses and a vertical bar to make it work.
sed 's/software-//g;s/-\(licensetypeone\|licensetypetwo\)\.zip//g'
Note that I backslashed the dot, too. Otherwise, it would have matched any character.
You can do this in the shell, don't need sed, parameter expansion suffices:
shopt -s extglob
ZIP="software-1.3-licensetypeone.zip"
tmp=${ZIP#software-}
VERSION=${tmp%-licensetype#(one|two).zip}
With a recent version of bash (may not ship with OSX) you can use regular expressions
if [[ $ZIP =~ software-([0-9.]+)-licensetype(one|two).zip ]]; then
VERSION=${BASH_REMATCH[1]}
fi
or, if you just want the 2nd word in a hyphen-separated string
VERSION=$(IFS=-; set -- $ZIP; echo $2)
$ man sed | grep "regexp-extended" -A2
-r, --regexp-extended
use extended regular expressions in the script.