I am trying to write a RegEx for replacing a character in a string, given that a condition is met. In particular, if the string ends in y, I would like to replace all instances of a to o and delete the final y. To illustrate what I am trying to do with examples:
Katy --> Kot
cat --> cat
Kakaty --> KoKot
avidly --> ovidl
I was using the RegEx s/\(\w*\)a\(\w*\)y$/\1o\2/g but it does not work. I was wondering how would one be able to capture the "conditional" nature of this task with a RegEx.
Your help is always most appreciated.
With GNU sed:
If a line ends with y (/y$/), replace every a with o and replace trailing y with nothing (s/y$//).
sed '/y$/{y/a/o/;s/y$//}' file
Output:
Kot
cat
Kokot
ovidl
You may use awk:
Input:
cat file
Katy
cat
KaKaty
avidly
Command:
awk '/y$/{gsub(/a/, "o"); sub(/.$/, "")} 1' file
Kot
cat
KoKot
ovidl
You could use some sed spaghetti code, but please don't
sed '
s/y$// ; # try to replace trailing y
ta ; # if successful, goto a
bb ; # otherwise, goto b
:a
y/a/o/ ; # replace a with o
:b
'
Suppose I have 'abbc' string and I want to replace:
ab -> bc
bc -> ab
If I try two replaces the result is not what I want:
echo 'abbc' | sed 's/ab/bc/g;s/bc/ab/g'
abab
So what sed command can I use to replace like below?
echo abbc | sed SED_COMMAND
bcab
EDIT:
Actually the text could have more than 2 patterns and I don't know how many replaces I will need. Since there was a answer saying that sed is a stream editor and its replaces are greedily I think that I will need to use some script language for that.
Maybe something like this:
sed 's/ab/~~/g; s/bc/ab/g; s/~~/bc/g'
Replace ~ with a character that you know won't be in the string.
I always use multiple statements with "-e"
$ sed -e 's:AND:\n&:g' -e 's:GROUP BY:\n&:g' -e 's:UNION:\n&:g' -e 's:FROM:\n&:g' file > readable.sql
This will append a '\n' before all AND's, GROUP BY's, UNION's and FROM's, whereas '&' means the matched string and '\n&' means you want to replace the matched string with an '\n' before the 'matched'
sed is a stream editor. It searches and replaces greedily. The only way to do what you asked for is using an intermediate substitution pattern and changing it back in the end.
echo 'abcd' | sed -e 's/ab/xy/;s/cd/ab/;s/xy/cd/'
Here is a variation on ooga's answer that works for multiple search and replace pairs without having to check how values might be reused:
sed -i '
s/\bAB\b/________BC________/g
s/\bBC\b/________CD________/g
s/________//g
' path_to_your_files/*.txt
Here is an example:
before:
some text AB some more text "BC" and more text.
after:
some text BC some more text "CD" and more text.
Note that \b denotes word boundaries, which is what prevents the ________ from interfering with the search (I'm using GNU sed 4.2.2 on Ubuntu). If you are not using a word boundary search, then this technique may not work.
Also note that this gives the same results as removing the s/________//g and appending && sed -i 's/________//g' path_to_your_files/*.txt to the end of the command, but doesn't require specifying the path twice.
A general variation on this would be to use \x0 or _\x0_ in place of ________ if you know that no nulls appear in your files, as jthill suggested.
Here is an excerpt from the SED manual:
-e script
--expression=script
Add the commands in script to the set of commands to be run while processing the input.
Prepend each substitution with -e option and collect them together. The example that works for me follows:
sed < ../.env-turret.dist \
-e "s/{{ name }}/turret$TURRETS_COUNT_INIT/g" \
-e "s/{{ account }}/$CFW_ACCOUNT_ID/g" > ./.env.dist
This example also shows how to use environment variables in your substitutions.
This might work for you (GNU sed):
sed -r '1{x;s/^/:abbc:bcab/;x};G;s/^/\n/;:a;/\n\n/{P;d};s/\n(ab|bc)(.*\n.*:(\1)([^:]*))/\4\n\2/;ta;s/\n(.)/\1\n/;ta' file
This uses a lookup table which is prepared and held in the hold space (HS) and then appended to each line. An unique marker (in this case \n) is prepended to the start of the line and used as a method to bump-along the search throughout the length of the line. Once the marker reaches the end of the line the process is finished and is printed out the lookup table and markers being discarded.
N.B. The lookup table is prepped at the very start and a second unique marker (in this case :) chosen so as not to clash with the substitution strings.
With some comments:
sed -r '
# initialize hold with :abbc:bcab
1 {
x
s/^/:abbc:bcab/
x
}
G # append hold to patt (after a \n)
s/^/\n/ # prepend a \n
:a
/\n\n/ {
P # print patt up to first \n
d # delete patt & start next cycle
}
s/\n(ab|bc)(.*\n.*:(\1)([^:]*))/\4\n\2/
ta # goto a if sub occurred
s/\n(.)/\1\n/ # move one char past the first \n
ta # goto a if sub occurred
'
The table works like this:
** ** replacement
:abbc:bcab
** ** pattern
Tcl has a builtin for this
$ tclsh
% string map {ab bc bc ab} abbc
bcab
This works by walking the string a character at a time doing string comparisons starting at the current position.
In perl:
perl -E '
sub string_map {
my ($str, %map) = #_;
my $i = 0;
while ($i < length $str) {
KEYS:
for my $key (keys %map) {
if (substr($str, $i, length $key) eq $key) {
substr($str, $i, length $key) = $map{$key};
$i += length($map{$key}) - 1;
last KEYS;
}
}
$i++;
}
return $str;
}
say string_map("abbc", "ab"=>"bc", "bc"=>"ab");
'
bcab
May be a simpler approach for single pattern occurrence you can try as below:
echo 'abbc' | sed 's/ab/bc/;s/bc/ab/2'
My output:
~# echo 'abbc' | sed 's/ab/bc/;s/bc/ab/2'
bcab
For multiple occurrences of pattern:
sed 's/\(ab\)\(bc\)/\2\1/g'
Example
~# cat try.txt
abbc abbc abbc
bcab abbc bcab
abbc abbc bcab
~# sed 's/\(ab\)\(bc\)/\2\1/g' try.txt
bcab bcab bcab
bcab bcab bcab
bcab bcab bcab
Hope this helps !!
echo "C:\Users\San.Tan\My Folder\project1" | sed -e 's/C:\\/mnt\/c\//;s/\\/\//g'
replaces
C:\Users\San.Tan\My Folder\project1
to
mnt/c/Users/San.Tan/My Folder/project1
in case someone needs to replace windows paths to Windows Subsystem for Linux(WSL) paths
If replacing the string by Variable, the solution doesn't work.
The sed command need to be in double quotes instead on single quote.
#sed -e "s/#replacevarServiceName#/$varServiceName/g" -e "s/#replacevarImageTag#/$varImageTag/g" deployment.yaml
Here is an awk based on oogas sed
echo 'abbc' | awk '{gsub(/ab/,"xy");gsub(/bc/,"ab");gsub(/xy/,"bc")}1'
bcab
I believe this should solve your problem. I may be missing a few edge cases, please comment if you notice one.
You need a way to exclude previous substitutions from future patterns, which really means making outputs distinguishable, as well as excluding these outputs from your searches, and finally making outputs indistinguishable again. This is very similar to the quoting/escaping process, so I'll draw from it.
s/\\/\\\\/g escapes all existing backslashes
s/ab/\\b\\c/g substitutes raw ab for escaped bc
s/bc/\\a\\b/g substitutes raw bc for escaped ab
s/\\\(.\)/\1/g substitutes all escaped X for raw X
I have not accounted for backslashes in ab or bc, but intuitively, I would escape the search and replace terms the same way - \ now matches \\, and substituted \\ will appear as \.
Until now I have been using backslashes as the escape character, but it's not necessarily the best choice. Almost any character should work, but be careful with the characters that need escaping in your environment, sed, etc. depending on how you intend to use the results.
Every answer posted thus far seems to agree with the statement by kuriouscoder made in his above post:
The only way to do what you asked for is using an intermediate
substitution pattern and changing it back in the end
If you are going to do this, however, and your usage might involve more than some trivial string (maybe you are filtering data, etc.), the best character to use with sed is a newline. This is because since sed is 100% line-based, a newline is the one-and-only character you are guaranteed to never receive when a new line is fetched (forget about GNU multi-line extensions for this discussion).
To start with, here is a very simple approach to solving your problem using newlines as an intermediate delimiter:
echo "abbc" | sed -E $'s/ab|bc/\\\n&/g; s/\\nab/bc/g; s/\\nbc/ab/g'
With simplicity comes some trade-offs... if you had more than a couple variables, like in your original post, you have to type them all twice. Performance might be able to be improved a little bit, too.
It gets pretty nasty to do much beyond this using sed. Even with some of the more advanced features like branching control and the hold buffer (which is really weak IMO), your options are pretty limited.
Just for fun, I came up with this one alternative, but I don't think I would have any particular reason to recommend it over the one from earlier in this post... You have to essentially make your own "convention" for delimiters if you really want to do anything fancy in sed. This is way-overkill for your original post, but it might spark some ideas for people who come across this post and have more complicated situations.
My convention below was: use multiple newlines to "protect" or "unprotect" the part of the line you're working on. One newline denotes a word boundary. Two newlines denote alternatives for a candidate replacement. I don't replace right away, but rather list the candidate replacement on the next line. Three newlines means that a value is "locked-in", like your original post way trying to do with ab and bc. After that point, further replacements will be undone, because they are protected by the newlines. A little complicated if I don't say so myself... ! sed isn't really meant for much more than the basics.
# Newlines
NL=$'\\\n'
NOT_NL=$'[\x01-\x09\x0B-\x7F]'
# Delimiters
PRE="${NL}${NL}&${NL}"
POST="${NL}${NL}"
# Un-doer (if a request was made to modify a locked-in value)
tidy="s/(\\n\\n\\n${NOT_NL}*)\\n\\n(${NOT_NL}*)\\n(${NOT_NL}*)\\n\\n/\\1\\2/g; "
# Locker-inner (three newlines means "do not touch")
tidy+="s/(\\n\\n)${NOT_NL}*\\n(${NOT_NL}*\\n\\n)/\\1${NL}\\2/g;"
# Finalizer (remove newlines)
final="s/\\n//g"
# Input/Commands
input="abbc"
cmd1="s/(ab)/${PRE}bc${POST}/g"
cmd2="s/(bc)/${PRE}ab${POST}/g"
# Execute
echo ${input} | sed -E "${cmd1}; ${tidy}; ${cmd2}; ${tidy}; ${final}"
Can someone explain to me why my sed command isn't working? I'm sure I'm doing something stupid. Here's small text file that demonstrates my issue:
#!/usr/bin/env python
class A:
def candy(self):
print "cane"
Put that in a file and call it test.py
My goal is to add #profile before the def line with the same indentation as the function declaration. I try with this:
$ sed -i '/\( *\)def /i \
\1#profile' test.py
Note that the capture group should be the set of spaces before the def and I'm referencing the group with \1.
Here's my result:
#!/usr/bin/env python
class A:
1#profile
def candy(self):
print "cane"
Why is that 1 being placed in there literally instead of being replaced by my capture group (four spaces)?
Thanks!
I don't know this to be true but I'm going to assume that sed doesn't maintain captures from address selectors and into manually inserted text and in fact may not be evaluating references inside "literal" text at all.
Try sed -e 's/\( *\)def /\1#profile\n&/' test.py instead.
What about that :
sed -i -e 's/^\(.*\)\(def.*\)/\1#profile\n\2/' test.py
Just use awk:
$ awk '{orig=$0} sub(/def.*/,"#profile"); {print orig}' file
#!/usr/bin/env python
class A:
#profile
def candy(self):
print "cane"
simple, portable, easily extendable, debuggable, etc., etc....
How do i remove my matching pattern from the file?
Everytime the pattern [my_id= occurs, it shall be removed without replacement.
For example, the field [my_id=AB_123456789.1] should be AB_123456789.1.
I already tried, with no result
sed '/\[my\_id\=/d'
awk '$(NF-1) /^[protein\_id\=/d'
Also it is possible to remove the first n characters from the last but 1 field ($(NF-1)) as an alternative?
Thanks for any help
You can use:
sed 's/\[my_id=\([^]]*\)\]/\1/g' file
\[my_id=\([^]]*\)\] looks for this and replaces with the text inside (\1).
\[my_id=\([^]]*\)\] means [my_id= plus a string not containing ], that is caught with the \(...\) syntax to be printed back with \1.
Test
$ cat a
hello [my_id=AB_123456789.1] bye
adf aa [my_id=AB_123456789.1] bbb
$ sed 's/\[my_id=\([^]]*\)\]/\1/g' a
hello AB_123456789.1 bye
adf aa AB_123456789.1 bbb
You can try something like this in awk
$ cat <<test | awk 'gsub(/\[my_id=|\]/,"")'
hello [my_id=AB_123456789.1] bye
adf aa [my_id=AB_123456789.1] bbb
test
hello AB_123456789.1 bye
adf aa AB_123456789.1 bbb
How can I do a sed regex swap on all text that preceed a line matching a regex.
e.g. How can I do a swap like this
s/foo/bar/g
for all text that precedes the first point this regex matches:
m/baz/
I don't want to use positive/negative look ahead/behind in my sed regex, because those are really expensive operations on big files.
If you mean that you want to do the substitution on every line preceding the given match, this is your answer:
The substitution takes an optional address range; you can use both numbers and patterns. In this case, start from line 1, go until your pattern:
sed '1,/baz/s/foo/bar/g'
In awk:
awk '
/baz/ { done = 1 }
{
if (!done) {
gsub(/foo/, "bar")
}
print
}'
(It's really short enough to leave out the line breaks, but they make it readable)
This variation on Jefromi's answer should do the trick of not touching the line that "baz" appears on as mentioned in Jonathan's comment.
sed '1,/baz/{/baz/!s/foo/bar/g}'
$ cat file
123 abc 01
456 foo 02 bar
789 ghi
baz
blah1
blah2
foo bar
$ awk -vRS="baz" 'NR==1{gsub("foo","bar")}1' ORS="baz" file
123 abc 01
456 bar 02 bar
789 ghi
baz
blah1
blah2
foo bar
baz
use "baz" record separator , then the 1st record will be the record you want to change "foo" to "bar".
with sed, variation of Denni's solution to take care of "baz" at first line
sed '0,/baz/{/baz/!s/foo/bar/g}' file
This might work for you:
awk '/baz/{p=1};!p{gsub(/foo/,"bar")};1' file
or this:
sed '/baz/,$!s/foo/bar/g' file