Sed works only without grouping - regex

I have a file of the following format:
foo: ...
bar: ...
baz: ...
I want to delete all lines that start with bar, baz or goo. I could do a couple of seds in the following format:
sed '/^bar:.*$/d'
But I'd like to instead put all the possibilities in one. I thought this would work, but its not deleting the lines:
sed '/^(?:bar|baz|goo):.*$/d'
I also noticed that not even this works:
sed '/^(bar):.*$/d'
Which was surprising because I thought the capture group wouldn't actually change any behavior in the pattern matching.

you need to use extended regular expressions
sed -r '/^(bar):.*$/d'
And sed doesn't support perl regular expressions, so you should be using
sed -r '/^(bar|baz|foo):.*$/d'
without non capture group ?:

You can use:
sed '/^\(bar\|baz\|foo\):/d' file
Or with extended regex:
sed -r '/^(bar|baz|foo):/d' file

Related

Replace a string between regular expressions

I have a csv file with the following contents:
INTERB-MNT,2008-09-10T21:05:38Z,2008-09-10T21:05:38Z,MARIA
How can I use sed to replace the characters 'T' and 'Z', such that the contents of the file are changed to the following?:
INTERB-MNT,2008-09-10,21:05:38,UTC,2008-09-10,21:05:38,UTC,MARIA
I tried the following, but obviously I'm missing something because that does not produce the desired results:
sed -e 's/[0-9]{4}-[0-9]{2}-[0-9]{2}.T.[0-9]{2}:[0-9]{2}:[0-9]{2}Z/[0-9]{4}-[0-9]{2}-[0-9]{2},[0-9]{2}:[0-9]{2}:[0-9]{2}UTC/g'
To keep your text after substitution, you have to capture input with parens, and then use \1 through \9 to refer to the captured matching in the substitution part. To be able to use \1 through \9 backreferences, you have to use -E or -r options.
The command will look like this:
sed -r 's/(.+)T(.+)Z/\1,\2,UTC/g'
But this can't be used: the T will match the last part of the string because (.+) is greedy. So your idea to match 2008-09-10 and 21:05:38 pattern is good. You ended up with this:
sed -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2})T([0-9]{2}:[0-9]{2}:[0-9]{2})Z/\1,\2,UTC/g'
This works. You could also use this simpler command:
sed -r 's/(....-..-..)T(..:..:..)Z/\1,\2,UTC/g'
It is easier to read and write, and a false positive is very unlikely. It depends on your needs.

Bash: sed regex pattern won't match strings

I have tested this particular regex in RegExr.com:
/(\*)*((\s)?(\w)*)/g
to match the following:
* Global Links contained...etc
* Change User, contact list...etc
(everything from ... on is just extra words in the sentence, not a literal ...etc)
I tried to use this regex in a sed command as part of a bash script like so:
sed "/(\*)*((\s)?(\w)*)/d" test.txt > stripped.txt
But these two lines still remain in stripped.txt. Is there something I'm not accounting for in the regex or in the file? before these two lines is the start of a block comment (/**) and the block comment end is after them(*/), both of these are on new lines. Am i missing something obscure with new lines or is the sed command/regex wrong?
You aren't accounting for the dialect of regex in use by sed by default. That's not a valid BRE (basic regular expression).
You need to tell sed to use ERE's (extended regular expressions).
For GNU sed that is the -r flag and for BSD sed that is the -E flag (though -r is often available as a compat flag).
sed -r "/(\*)*((\s)?(\w)*)/d" test.txt > stripped.txt

Selective find/replace with sed

I need to do some find and replace in C++ source code: replace all occurrences of _uvw with xyz except when _uvw is part of abc_uvw or def_uvw. For example:
abc_uvw ghi_uvw;
jkl_uvw def_uvw;
should become:
abc_uvw ghixyz;
jklxyz def_uvw;
So far I came up with the following:
find . -type f -print0 | xargs -0 sed -i '/abc_uvw/\!s/_uvw/xyz/g'
This will replace all _uvw with xyz only in the lines that don't contain abc_uvw, which (1) doesn't handle such a case: abc_uvw ghi_uvw; and (2) doesn't take into account the second exception, that is def_uvw.
So how would one do that sort of selective find and replace with sed?
This might work for you (GNU sed):
sed -r 's/(abc|def)_uvw/\1\n_uvw/g;s/([^\n])_uvw/\1xyz/g;s/\n//g' file
Insert a newline infront of the strings you do not want to change. Change those strings which do not have a newline infront of them. Delete any newlines.
N.B. Newline is chosen as it cannot exist in an unadulterated sed buffer.
How about this?
$ cat file
abc_uvw ghi_uvw;
jkl_uvw def_uvw;
$ sed 's/abc_uvw/foo/g;s/def_uvw/bar/g;s/_uvw/xyz/g;s/foo/abc_uvw/g;s/bar/def_uvw/g' file
abc_uvw ghixyz;
jklxyz def_uvw;
You should use negative lookbehind. For example, in Perl:
perl -pe 's/(?<!(abc|def))_uvw/xyz/g' file.c
This performs a global substitution of any instances of _uvw that are not immediately preceded by abc or def.
Output:
abc_uvw ghixyz;
jklxyz def_uvw;
Sed is a useful tool and certainly has its place but Perl is a lot more powerful in terms of regular expressions. Using Perl, you get to specify exactly what you mean, rather than solving the problem in a more roundabout way.
This will work:
sed -e 's/abc_uvw/AAA_AAA/g; # shadow abc_uvw
s/def_uvw/DDD_DDD/g; # shadow def_uvw
s/_uvw/xyz/g; # substitute
s/AAA_AAA/abc_uvw/g; # recover abc_uvw
s/DDD_DDD/def_uvw/g # recover def_uvw
' input.cpp > output.cpp
cat output.cpp
sed 's/µ/µm/g;s/abc_uvw/µa/g;s/def_uvw/µd/g
s/_uvw/xyz/g
s/µd/def_uvw/g;s/µa/abc_uvw/g;s/µm/µ/g' YourFile
This is like the other in concept but "escaping" first the temporary pattern to filter on abc and def. I use µ but other char is possible, just avoid special sed char like /, \, &, ...

Replace certain strings from text with SED and REGEX

I have the following strings in a text file (big one, more like these and different):
79A18D7F-1517-5981-8446-3A0452727B06
7842A72D-1517-5281-84E4-EAEF09B743F7
6040BEE7-1517-5982-84C1-419B224E647E
615F2747-1517-5981-84AF-787C34967FB2
7468A3E3-1517-5931-84B3-3FC3F701C269
I can find them using grep and regex:
'[0-9A-F]{8}-[0-9]{4}-[0-9]{4}-[0-9A-F]{4}-[0-9A-F]{12}'
what's the sed regex syntax to delete them because:
sed "s/[0-9A-F]{8}-[0-9]{4}-[0-9]{4}-[0-9A-F]{4}-[0-9A-F]{12}//g"
doesn't seem to work.
Thanks!
Use sed -r. You are relying on extended regular expression syntax features without escaping them, but with sed -r you don't have to. If you want to actually delete the lines instead of just clearing them, you can use:
sed -r "/regex/d"
In addition, for regular sed (BRE) you would need to escape the curly braces:
sed 's/[0-9A-F]\{8\}-[0-9]\{4\}-[0-9]\{4\}-[0-9A-F]\{4\}-[0-9A-F]\{12\}//g' file

sed remove digits at end of the line

I need to find out how to delete up to 10 digits that are at the end of the line in my text file using sed.
For example if I have this:
ajsdlfkjasldf1234567890
asdlkjfalskdjf123456
adsf;lkjasldfkjas123
it should become:
ajsdlfkjasldf
asdlkjfalskdjf
adsf;lkjasldfkjas
can anyone help?
I have this, but its not working:
sed 's/[0-9]{10}$//g'
Have you tried this:
sed 's/[0-9]+$//'
Your command would only match and delete exactly 10 digits at the end of line and only, if you enabled extended regular expressions (-E or -r, depending on your version of sed).
You should try
sed -r 's/[0-9]{1,10}$//'
The following should work:
sed 's/[0-9]\{1,10\}$//' file
Regex syntax in sed requires backslashes before the brackets to use them for repetition, unless you use an extended regex option.
A quick look here suggests you should try this:
$ sed 's/[0-9]\{0,10\}$//g'
{ } should be escaped, unless you switch to extended regex syntax:
$ sed -r 's/[0-9]{0,10}$//g'