sed: What's wrong with this replace command? - regex

I'm trying to replace the package line in a java file using this sed command:
sed -i "s/package org\.objectweb\.asm[.\w]*;/package $package;/" FILE
but it doesn't work as expected. What's wrong with this replace command?

I assume you want to match what \w would match, plus a dot. \w doesn't work like that inside a [] character set. Try [.a-zA-Z0-9_] instead:
sed -i "s/package org\.objectweb\.asm[.a-zA-Z0-9_]*;/package $package;/" FILE

Actually, sed does not support a \w Perl-like shorthand character class that matches digits, letters or underscores.
In the POSIX pattern, instead of \w, you would use [_[:alnum:]] where [:alnum:] is a POSIX character class matching any alphanumeric chars. And [._[:alnum:]] seems to be what you meant to use here:
sed -i "s/package org\.objectweb\.asm[._[:alnum:]]*;/package $package;/" FILE
^^^^^^^^^^^^^^
However, judging by your string structure, you may just match any chars but ; with a negated bracket expression [^;]:
sed -i "s/package org\.objectweb\.asm[^;]*;/package $package;/" FILE
^^^^^

Related

Regex matches but sed fails replace

I am having a tricky regex issue
I have the string like below
some_Name _ _Bday Date Comm.txt
And here is my regex to match the spaces and underscore
\_?\s\_?
Now when i try to replace the string using sed and the above regex
echo "some_Name _ _Bday Date Comm.txt" | sed 's/\_?\s\_?/\_/g'
The output i want is
some_Name_Bday_Date_Comm.txt
Any ideas on how do i go about this ?
You are using a POSIX BRE regex engine with the \_?\s\_? pattern that matches a _?, a whitespace (if your sed supports \s shorthand) an a _? substring, i.e. the ? are treated as literal question mark symbols.
You may use
sed -E 's/[[:space:]_]+/_/g'
sed 's/[[:space:]_]\{1,\}/_/g'
See online sed demo
The [[:space:]_]+ POSIX ERE pattern (enabled with -E option) will match one or more whitespace or underscore characters.
The POSIX ERE + quantifier can be written as \{1,\} in POSIX BRE. Also, if you use a GNU sed, you may use \+ in the second sed command.
This might work for you (GNU sed):
sed -E 's/\s(\s*_)*/_/g' file
This will replace a space followed by zero or more of the following: zero or more spaces followed by an underscore.

How does backslash affect curly braces in regex?

So I started to learn regex using grep and sed in linux, and I don't understand why I have to save curly braces? So saving means escaping characters to match them literally, but when I type in grep 'test{2}' it will only match test{2} and when I type 'test\{2\}' it will match testtest. It's okay, but why backslash has another usage with other modifiers? For example in the case of . (dot), when I type test. it will match any text with test followed by any characters. In this case we need backslash to interpret it as a character. So when I use it like that: test\. it will only match test.
So summarized: why in the case of { backslash saves the curly braces to be interpreted as a character, and in the case of other modifiers, like . backslash saves the character to be interpreted as a special one...
I know it sounds hilarious but I don't understand it...
When grep is used with no -E you need to escape ("save") braces that are quantifiers because the regex flavor used is POSIX BRE:
grep 'test\{2\}' file # => Finds lines having testt, not testtest
and
grep '\(test\)\{2\}' file # => Finds lines having testtest
The identical POSIX ERE variants are
grep -E 'test{2}' file
grep -E '(test){2}' file
Another example is to match curly braces:
grep '{2}' file # => matches lines having {2} in them
grep -E '\{2}' file => same, note the } is not special
See more about BRE and ERE POSIX regex standard.
The differences between BRE and ERE POSIX syntax are just historical, there seems no specific idea behind that.

Using sed to match regex

I don't know much about sed, nor regex. I want to replace every line that contains only tabs by the string '0'. There are also lines in my file that contain only '\n'.
Basically I want to use the regular expression ^\h+$ and replace the matches with 0.
I tried:
sed -i 's/^\h+$/0/' file.txt
But it doesn't work
You can use:
sed -i.bak -E 's/^[[:blank:]]+$/0/' file
POSIX character class [[:blank:]] matches a space or tab which is same as \h in PCRE.
-i.bak is to keep original file in file.bak, in case you want to restore.
In sed the tabulator is called \t. One-or-more need a backslash \+:
sed -i -e 's/^\t\+$/0/' file.txt

Match string plus any non-whitespace character and insert whitespace

I'm trying to match and replace a string in a lot of files.
String to search for:
</ANON>[any non-whitespace char], e.g. "</ANON>." or "</ANON>)"
I want to stick a whitespace in between the tag and the non-whitespace char.
I have tried to do it with sed using something like:
sed -i -e 's/<\/ANON>/S/<\/ANON> /S/g'
but alas, that doesn't work.
Any help much appreciated.
Try the following:
sed -i -e 's|\(</ANON>\)\([^[:space:]]\)|\1 \2|g' file
It's not Perl and you can't use \S for non-whitespace characters. Also you should capture groups and use them in replacement part. Also you can't use /S because 1) it's wrong 2) slash used by sed for separating parts with pattern, replacement and flags.
P.S. Or you can use Perl if you like:
perl -p -i -e 's|(</ANON>)(\S)|$1 $2|g' file

Pattern matching digits does not work in egrep?

Why can't I match the string
"1234567-1234567890"
with the given regular expression
\d{7}-\d{10}
with egrep from the shell like this:
egrep \d{7}-\d{10} file
?
egrep doesn't recognize \d shorthand for digit character class, so you need to use e.g. [0-9].
Moreover, while it's not absolutely necessary in this case, it's good habit to quote the regex to prevent misinterpretation by the shell. Thus, something like this should work:
egrep '[0-9]{7}-[0-9]{10}' file
See also
egrep mini tutorial
References
regular-expressions.info/Flavor comparison
Flavor note for GNU grep, ed, sed, egrep, awk, emacs
Lists the differences between grep vs egrep vs other regex flavors
For completeness:
Egrep does in fact have support for character classes. The classes are:
[:alnum:]
[:alpha:]
[:cntrl:]
[:digit:]
[:graph:]
[:lower:]
[:print:]
[:punct:]
[:space:]
[:upper:]
[:xdigit:]
Example (note the double brackets):
egrep '[[:digit:]]{7}-[[:digit:]]{10}' file
you can use \d if you pass grep the "perl regex" option, ex:
grep -P "\d{9}"
Use [0-9] instead of \d. egrep doesn't know \d.
try this one:
egrep '(\d{7}-\d{10})' file