How to remove matching pattern? - regex

How do i remove my matching pattern from the file?
Everytime the pattern [my_id= occurs, it shall be removed without replacement.
For example, the field [my_id=AB_123456789.1] should be AB_123456789.1.
I already tried, with no result
sed '/\[my\_id\=/d'
awk '$(NF-1) /^[protein\_id\=/d'
Also it is possible to remove the first n characters from the last but 1 field ($(NF-1)) as an alternative?
Thanks for any help

You can use:
sed 's/\[my_id=\([^]]*\)\]/\1/g' file
\[my_id=\([^]]*\)\] looks for this and replaces with the text inside (\1).
\[my_id=\([^]]*\)\] means [my_id= plus a string not containing ], that is caught with the \(...\) syntax to be printed back with \1.
Test
$ cat a
hello [my_id=AB_123456789.1] bye
adf aa [my_id=AB_123456789.1] bbb
$ sed 's/\[my_id=\([^]]*\)\]/\1/g' a
hello AB_123456789.1 bye
adf aa AB_123456789.1 bbb

You can try something like this in awk
$ cat <<test | awk 'gsub(/\[my_id=|\]/,"")'
hello [my_id=AB_123456789.1] bye
adf aa [my_id=AB_123456789.1] bbb
test
hello AB_123456789.1 bye
adf aa AB_123456789.1 bbb

Related

Regular expression with conditional replacement

I am trying to write a RegEx for replacing a character in a string, given that a condition is met. In particular, if the string ends in y, I would like to replace all instances of a to o and delete the final y. To illustrate what I am trying to do with examples:
Katy --> Kot
cat --> cat
Kakaty --> KoKot
avidly --> ovidl
I was using the RegEx s/\(\w*\)a\(\w*\)y$/\1o\2/g but it does not work. I was wondering how would one be able to capture the "conditional" nature of this task with a RegEx.
Your help is always most appreciated.
With GNU sed:
If a line ends with y (/y$/), replace every a with o and replace trailing y with nothing (s/y$//).
sed '/y$/{y/a/o/;s/y$//}' file
Output:
Kot
cat
Kokot
ovidl
You may use awk:
Input:
cat file
Katy
cat
KaKaty
avidly
Command:
awk '/y$/{gsub(/a/, "o"); sub(/.$/, "")} 1' file
Kot
cat
KoKot
ovidl
You could use some sed spaghetti code, but please don't
sed '
s/y$// ; # try to replace trailing y
ta ; # if successful, goto a
bb ; # otherwise, goto b
:a
y/a/o/ ; # replace a with o
:b
'

how to use preg_replace for the following pattern

Input: "abbbcdaa" Output: "abcd"
With the follow regex the out put is abcda
preg_replace('/(.)\\1*/', '$1', "abbbcdaa");
how to get abcd using pre_replace
This should do it:
$string="abbbcdaa";
echo preg_replace('/(.)(?=.*?\1)/','',$string);
The above outputs:
bcda
Alternatively, you can use:
echo count_chars($string,3);
That would also return unique characters abcd.
Good luck!

Unix egrep command how to create a pattern to match the following?

I want to ask about back reference in egrep.
I have a file, it contains:
aa aa someothertext
and there are something like 77 77
How do I use back reference to match the pattern 'aa aa' and '77 77'?
I tried:
egrep '(aa )\1' file.txt
and it will match 'aa aa'. Then. I tried to replace 'aa' with ' ([a-zA-Z0-9])\1', which yields:
egrep '(([a-zA-Z0-9])\1 )\1' file.txt
It won't work.
I'd appreciate if you can help!
Remember that capturing groups are indexed by their opening parenthesis: you were calling the first group before it was defined.
In ((a)b), \1 is referring to (a)b and \2 to a.
To fix this, you can use the correct index:
(([a-zA-Z0-9])\2 )\1

Getting list of commands using regex

I have list of commands where some are having parameters which I need to skip before executing them.
show abc(h2) xyz
show abc(h2) xyz opq(h1)
show abc(h2) xyz <32>
show abc(a,l) xyz [<32>] opq
show abc
Ultimately, the list has different combinations of ( ), <>, [] with plain text commands.
I want to separate out all other commands from plain commands like "show abc".
Processing needed on commands :-
(h1), (h2), (a,l) are to be discarded
<32> - is to be replaced with any ip address
[<32>] - is to be replaced with any integer digit
I tried following but resultant file was empty :-
cat show-cmd.txt | grep "<|(|[" > hard-cmd.txt
How can I get the result file which has no plain commands using regex?
Desired output file :-
show abc xyz
show abc xyz opq
show abc xyz 1.1.1.1
show abc xyz 2 opq
Try using grep followed by sed
grep '[(<\[]' file | sed -e 's/\[<32>\]/2/g' -e 's/<32>/1.1.1.1/g' -e 's/([^)]*)//g'
Output:
show abc xyz
show abc xyz opq
show abc xyz 1.1.1.1
show abc xyz 2 opq
Please note that order of s///g command matters in your case.
Also try avoiding redundant use of cat
cat show-cmd.txt | grep "[\[\(\<]" > hard-cmd.txt
This should work. The opening and closing square brackets [] mean that only one of the options need to be present. Then the further brackets that you want to search for are provided and escaped by a .
Hope this helps.
Pulkit

sed circular replacements

In my file input.txt, I want to replace A->B, B->C, and C->A
i.e. I want to run
s/A/B/g;
s/B/C/g;
s/C/A/g;
However, I don't want the patterns to operate on the new text
i.e. if I run the above sed script, A becomes unchanged A->B->C->A, and B goes to A (B->C->A)
Is there a way to do the replacements I want?
Thanks
There might be a better way, but if there are suitable X and Y not in your input.txt you could try:
s/A/X/g;
s/B/Y/g;
s/C/A/g;
s/X/B/g;
s/Y/C/g;
I'd use tr : tr 'ABC' 'BCA'
Use the y operator instead of the s operator:
sed y/ABC/BCA/
We can complement the good bgg answer this way. Just append a newline to the strings to be used as X or Y. If we have this file:
$ cat teste.in
ABC DEF GHI
GHI ABC DEE
DEF ABB ABCd
We execute this sed command:
$ sed 's/ABC/ABC\n/g
s/DEF/DEF\n/g
s/GHI/ABC/g
s/ABC\n/DEF/g
s/DEF\n/GHI/g' teste.in
DEF GHI ABC
ABC DEF DEE
GHI ABB DEFd
Since it is very unlikely that a newline will appear in a line, it works. (Note that I used GNU sed. Some seds cannot accept the \n notation. In this case, just use a backslash followed by an actual newline:
$ sed 's/ABC/ABC\
/g
s/DEF/DEF\
/g
s/GHI/ABC/g
s/ABC\
/DEF/g
s/DEF\
/GHI/g' teste.in
DEF GHI ABC
ABC DEF DEE
This should work.)