I'm trying to replace the package line in a java file using this sed command:
sed -i "s/package org\.objectweb\.asm[.\w]*;/package $package;/" FILE
but it doesn't work as expected. What's wrong with this replace command?
I assume you want to match what \w would match, plus a dot. \w doesn't work like that inside a [] character set. Try [.a-zA-Z0-9_] instead:
sed -i "s/package org\.objectweb\.asm[.a-zA-Z0-9_]*;/package $package;/" FILE
Actually, sed does not support a \w Perl-like shorthand character class that matches digits, letters or underscores.
In the POSIX pattern, instead of \w, you would use [_[:alnum:]] where [:alnum:] is a POSIX character class matching any alphanumeric chars. And [._[:alnum:]] seems to be what you meant to use here:
sed -i "s/package org\.objectweb\.asm[._[:alnum:]]*;/package $package;/" FILE
^^^^^^^^^^^^^^
However, judging by your string structure, you may just match any chars but ; with a negated bracket expression [^;]:
sed -i "s/package org\.objectweb\.asm[^;]*;/package $package;/" FILE
^^^^^
Related
I am having a tricky regex issue
I have the string like below
some_Name _ _Bday Date Comm.txt
And here is my regex to match the spaces and underscore
\_?\s\_?
Now when i try to replace the string using sed and the above regex
echo "some_Name _ _Bday Date Comm.txt" | sed 's/\_?\s\_?/\_/g'
The output i want is
some_Name_Bday_Date_Comm.txt
Any ideas on how do i go about this ?
You are using a POSIX BRE regex engine with the \_?\s\_? pattern that matches a _?, a whitespace (if your sed supports \s shorthand) an a _? substring, i.e. the ? are treated as literal question mark symbols.
You may use
sed -E 's/[[:space:]_]+/_/g'
sed 's/[[:space:]_]\{1,\}/_/g'
See online sed demo
The [[:space:]_]+ POSIX ERE pattern (enabled with -E option) will match one or more whitespace or underscore characters.
The POSIX ERE + quantifier can be written as \{1,\} in POSIX BRE. Also, if you use a GNU sed, you may use \+ in the second sed command.
This might work for you (GNU sed):
sed -E 's/\s(\s*_)*/_/g' file
This will replace a space followed by zero or more of the following: zero or more spaces followed by an underscore.
So I started to learn regex using grep and sed in linux, and I don't understand why I have to save curly braces? So saving means escaping characters to match them literally, but when I type in grep 'test{2}' it will only match test{2} and when I type 'test\{2\}' it will match testtest. It's okay, but why backslash has another usage with other modifiers? For example in the case of . (dot), when I type test. it will match any text with test followed by any characters. In this case we need backslash to interpret it as a character. So when I use it like that: test\. it will only match test.
So summarized: why in the case of { backslash saves the curly braces to be interpreted as a character, and in the case of other modifiers, like . backslash saves the character to be interpreted as a special one...
I know it sounds hilarious but I don't understand it...
When grep is used with no -E you need to escape ("save") braces that are quantifiers because the regex flavor used is POSIX BRE:
grep 'test\{2\}' file # => Finds lines having testt, not testtest
and
grep '\(test\)\{2\}' file # => Finds lines having testtest
The identical POSIX ERE variants are
grep -E 'test{2}' file
grep -E '(test){2}' file
Another example is to match curly braces:
grep '{2}' file # => matches lines having {2} in them
grep -E '\{2}' file => same, note the } is not special
See more about BRE and ERE POSIX regex standard.
The differences between BRE and ERE POSIX syntax are just historical, there seems no specific idea behind that.
I don't know much about sed, nor regex. I want to replace every line that contains only tabs by the string '0'. There are also lines in my file that contain only '\n'.
Basically I want to use the regular expression ^\h+$ and replace the matches with 0.
I tried:
sed -i 's/^\h+$/0/' file.txt
But it doesn't work
You can use:
sed -i.bak -E 's/^[[:blank:]]+$/0/' file
POSIX character class [[:blank:]] matches a space or tab which is same as \h in PCRE.
-i.bak is to keep original file in file.bak, in case you want to restore.
In sed the tabulator is called \t. One-or-more need a backslash \+:
sed -i -e 's/^\t\+$/0/' file.txt
I'm trying to match and replace a string in a lot of files.
String to search for:
</ANON>[any non-whitespace char], e.g. "</ANON>." or "</ANON>)"
I want to stick a whitespace in between the tag and the non-whitespace char.
I have tried to do it with sed using something like:
sed -i -e 's/<\/ANON>/S/<\/ANON> /S/g'
but alas, that doesn't work.
Any help much appreciated.
Try the following:
sed -i -e 's|\(</ANON>\)\([^[:space:]]\)|\1 \2|g' file
It's not Perl and you can't use \S for non-whitespace characters. Also you should capture groups and use them in replacement part. Also you can't use /S because 1) it's wrong 2) slash used by sed for separating parts with pattern, replacement and flags.
P.S. Or you can use Perl if you like:
perl -p -i -e 's|(</ANON>)(\S)|$1 $2|g' file
Why can't I match the string
"1234567-1234567890"
with the given regular expression
\d{7}-\d{10}
with egrep from the shell like this:
egrep \d{7}-\d{10} file
?
egrep doesn't recognize \d shorthand for digit character class, so you need to use e.g. [0-9].
Moreover, while it's not absolutely necessary in this case, it's good habit to quote the regex to prevent misinterpretation by the shell. Thus, something like this should work:
egrep '[0-9]{7}-[0-9]{10}' file
See also
egrep mini tutorial
References
regular-expressions.info/Flavor comparison
Flavor note for GNU grep, ed, sed, egrep, awk, emacs
Lists the differences between grep vs egrep vs other regex flavors
For completeness:
Egrep does in fact have support for character classes. The classes are:
[:alnum:]
[:alpha:]
[:cntrl:]
[:digit:]
[:graph:]
[:lower:]
[:print:]
[:punct:]
[:space:]
[:upper:]
[:xdigit:]
Example (note the double brackets):
egrep '[[:digit:]]{7}-[[:digit:]]{10}' file
you can use \d if you pass grep the "perl regex" option, ex:
grep -P "\d{9}"
Use [0-9] instead of \d. egrep doesn't know \d.
try this one:
egrep '(\d{7}-\d{10})' file