sed replace exact string that include brackets - regex

i'm trying to replace an exact string that includes bracket on it. let's say:
a[aa] to bbb, just for giving an example.
I had used the following regex:
sed 's|\<a\[aa]\>|bbb|g' testfile
but it doesn't seem to work. this could be something really basic but I have not been able to make it work so I would appreciate any help on this.

You need to remove the trailing word boundary that requires a letter, digit or _ to immediately follow the ] char.
sed 's|\<a\[aa]|bbb|g' file
See the online sed demo:
s="say: a[aa] to bbb, not ba[aa]"
sed 's|\<a\[aa]|bbb|g' <<< "$s"
# => say: bbb to bbb, not ba[aa]
You may also require a non-word char with a capturing group and replace with a backreference:
sed -E 's~([^_[:alnum:]]|^)a\[aa]([^_[:alnum:]]|$)~\1bbb\2~g' file
Here, ([^_[:alnum:]]|^) captures any non-word char or start of string into Group 1 and ([^_[:alnum:]]|$) matches and caprures into Group 2 any char other than _, digit or letter, and the \1 and \2 placeholders restore these values in the result. This, however, does not allow consecutive matches, so you may still use \< before a to play it safe: sed -E 's~\<a\[aa]([^_[:alnum:]]|$)~bbb\1~g'. file`.
See this online demo.
To enforce whitespace boundaries you may use
sed -E 's~([[:space:]]|^)a\[aa]([[:space:]]|$)~\1bbb\2~g' file
Or, in your case, just a trailing whitespace boundary seems to be enough:
sed -E 's~\<a\[aa]([[:space:]]|$)~bbb\1~g' file

Related

Extract capture group only from string

I have the following rule:
https://regex101.com/r/noX9lj/4
I want to make this work in a script so I'm using grep like this:
echo "\$this->table('test')" | grep -Po "qr/\$this->table\(\'(test)\'\);/"
The output should be "test"
It's not working, not sure why..
You may use
echo "\$this->table('test');" | grep -oP "\\\$this->table\\('\\K[^']+(?='\\);)"
Or, if you feed a file path to grep:
grep -oP "\\\$this->table\\('\\K[^']+(?='\\);)" file
See the online grep demo
To match $, you need to escape it with a literal backslash, and inside a double quoted string, you need to escape $ itself with one backslash char in order to stop variable expansion, and then you need to add two more backslashes to regex-escape the literal $ char, hence is the "\\\$" in the pattern.
To match any text between two single quotes, you may use [^']+ - 1 or more chars other than '.
See the regex demo
Pattern details
\$this->table\(' - $this->table(' string
\K - match reset operator that discards the text matched so far from the overall match buffer
[^']+ - one or more chars other than '
(?='\);) - a positive lookahead that requires '); string to be present immediately to the right of the current position.
There were multiple issues:
had to use "cat" instead of echo for some reason
used this rule instead:
grep -oP "this->table\('\K\w+(?='\);)"

Sed removing after ip

I have a simple sed question.
I have data like this:
boo:moo:127.0.0.1--¹óÖÝÊ¡µçÐÅ
foo:joo:127.0.0.1 ÁÉÄþÊ¡ÉòÑôÊвʺçÍø°É
How do I make it like this:
boo:moo:127.0.0.1
foo:joo:127.0.0.1
My sed code
sed -e 's/\.[^\.]*$//' test.txt
Thanks!
For the given sample, you could capture everything from start of line till last digit in the line
$ sed 's/\(.*[0-9]\).*/\1/' ip.txt
boo:moo:127.0.0.1
foo:joo:127.0.0.1
$ grep -o '.*[0-9]' ip.txt
boo:moo:127.0.0.1
foo:joo:127.0.0.1
Or, you could delete all non-digit characters at end of line
$ sed 's/[^0-9]*$//' ip.txt
boo:moo:127.0.0.1
foo:joo:127.0.0.1
You may find an IP like substring and remove all after it:
sed -E 's/([0-9]{1,3}(\.[0-9]{1,3}){3}).*/\1/' # POSIX ERE version
sed 's/\([0-9]\{1,3\}\(\.[0-9]\{1,3\}\)\{3\}\).*/\1/' # BRE POSIX version
The ([0-9]{1,3}(\.[0-9]{1,3}){3}) pattern is a simplified IP address regex pattern that matches and captures 1 to 3 digits and then 3 occurrences of a dot and again 1 to 3 digits, and then .* matches and consumes the rest of the line. The \1 placeholder in the replacement pattern inserts the captured value back into the result.
Note that in the BRE POSIX pattern, you have to escape ( and ) to make them a capturing group construct and you need to escape {...} to make it a range/interval/limiting quantifier (it has lots of names in the regex literature).
See an online demo.

add dot before first integer in all lines

I have lines like
Input:
abcd1234
bdfghks4506
agfdch6985
I would like to add "." before the first integer in line, how do I do it?
Output:
abcd.1234
bdfghks.4506
agfdch.6985
This might work for you (GNU sed):
sed -i 's/[[:digit:]]/.&/' file
If there is a digit in a line, put a . before it.
N.B. To put a . before every digit in a file use:
sed -i 's/[[:digit:]]/.&/g' file
$ cat > input.txt
abcd1234
bdfghks4506
agfdch6985
$ sed -e 's/^\([^0-9]*\)\([0-9]\)\(.*\)$/\1.\2\3/' input.txt
abcd.1234
bdfghks.4506
agfdch.6985
Use sed string replacement with regular expression capture groups.
Match the beginning of the line.
Start a capture group that matches any number of non-numeric characters.
Start a second capture group that matches a single digit.
Start a third capture group that matches the remainder of the line.
Match the end of the line.
Replace the entire line with the contents of the first capture group, ".", the second capture group and, finally, the third capture group.
The function sub of awk may help,
$ awk 'sub(/[0-9]/,".&",$0)1' file
abcd.1234
bdfghks.4506
agfdch.6985
Brief explanation,
sub: replace only the first matching substring in each line
&: is replaced with the text that was actually matched (i.e. [0-9])
Appended 1: to print the result.
The most strict command for your case would be
sed -E -i 's/([a-z])([1-9])/\1\.\2/' file.txt
-E Use extended regex
-i '' Replace in file (instead of writing to output)
This will match any example you provided
Not familiar with awk / sed specifically, but a regex replace using this regex should be all you need:
Search: (\d+.*?)$ (match everything from the first found number to the end of the line)
Replace by: .$1 (captured group #1 prefixed by a literal .)
The notation of the capture group in the replace command may differ depending on the implementation. I used $1 here, but some implementations may use \1.

Inserting underscores in strings using sed

I'm trying to use sed to insert _ before every uppercase letter of a string of non-whitespace characters, unless it's at its beginning. (I want to convert strings that are in camelcase and occasionally contain several adjacent uppercase letters or even punctuation signs.)
Desired behavior:
Input:
AaAaAa AAA AAA
Output:
Aa_Aa_Aa A_A_A A_A_A
I tried to use the following command:
sed -e "s/\(\S\)\([[:upper:]]\)/\1_\2/g"
But it fails on the last two strings in the above input, yielding this:
Aa_Aa_Aa A_AA A_AA
And I don't really understand why.
I'm using GNU sed 4.2.2.
I am assuming your example is mistyped because Aa Aa Aa given to the substitution you gave does nothing. And it's also not a camel case identifier. It should be AaAaAa, correct?
If so, then you can get sed to do what you need by causing it to loop until no more substitutions occur:
echo "AaAaAa AAA AAA" | sed -e ':x;s/\([^[:space:]_]\)\([[:upper:]]\)/\1_\2/g;tx'
produces
Aa_Aa_Aa A_A_A A_A_A
This might work for you (GNU sed):
sed -r 'y/_/\n/;s/[[:upper:]]/_&/g;s/\b_//g;y/\n/_/' file
Convert all _'s to unique alternative. Insert _'s infront of uppercase characters. Remove any leading _'s. Reconvert original _'s.
If you don't have any leading _'s in the first place, then this is suffice:
sed -r 's/[[:upper:]]/_&/g;s/\b_//g' file
The problem is that with a single s///g, regex matches can't overlap (and results of an earlier substitution aren't considered for further matches).
With AAA, the first match is
AAA
^^
| \
\1 \2
After replacement, we have A_AA, with the "current position" between the two rightmost A's:
A _ A A
^
next match attempt starts here
Then we try to match again, but we've run out of characters. \S matches the last A, but that's it: There's no uppercase character after that.
To make this work, we'd have to somehow match the middle A as both \2 of the first substitution and \1 of the second substitution, and I don't know how to do that with sed.
(It would be easy with perl because then you could use look-behind/look-ahead, which don't include the surrounding text in the match: perl -pe 's/(?<=\S)(?=[[:upper:]])/_/g')

sed replace between two strings wildcard

I am trying to flag everything inside a color tag and replace it with something else, such as:
I have a [color=blue]dog[/color] and a [color=blue]cat[/color] in my house.
to
I have a [color=blue][b]foobar[/b][/color] and a [color=blue][b]foobar[/b][/color] in my house.
Here is what I've tried:
sample='I have a [color=blue]dog[/color] and a [color=blue]cat[/color] in my house.'
replace='foobar'
sample=$(echo $sample| sed "s/\[color=blue\].*\[\/color\]/\[color=blue\]\[b\]$replace\[\/b\]\[\/color\]/g")
Which gets me:
I have a [color=blue][b]foobar[/b][/color] in my house.
Any idea on how to make sed nongreedy in this case?
Just replace your .* with [^[]* (any character other than left bracket). That is:
"s/\[color=blue\][^[]*\[\/color\]/\[color=blue\]\[b\]$replace\[\/b\]\[\/color\]/g"
sed is always greedy. You can work around it by selecting the regex carefully. The example below is identical to yours except that .* has been replaced with [^[]* (which means everything except [):
$ echo $sample| sed "s/\[color=blue\][^[]*\[\/color\]/\[color=blue\]\[b\]$replace\[\/b\]\[\/color\]/g"
I have a [color=blue][b]foobar[/b][/color] and a [color=blue][b]foobar[/b][/color] in my house.
For truly non-greedy regular expressions, try perl or python.
sed 's#\(\[color=[[:alpha:]]*\]\)[[:alnum:]]*\(\[/color\)#\1[b]foobar[/b]\2#g'
example
echo 'I have a [color=blue]dog[/color] and a [color=blue]cat[/color] in my house.'|sed 's#\(\[color=[[:alpha:]]*\]\)[[:alnum:]]*\(\[/color\)#\1[b]\2#g'
output
I have a [color=blue][b]foobar[/b][/color] and a [color=blue][b]foobar[/b][/color] in my house.
sed -r 's/(\[color=[a-z]*\])[a-z]*(\[\/color\])/\1[b]foobar[\/b]\2/g' File
or
sed 's/\(\[color=[a-z]*\]\)[a-z]*\(\[\/color\]\)/\1[b]foobar[\/b]\2/g' File
Explanation:
Here, we look for the patterns 1. [color=any small letter sequence] followed by 2. any small letter sequence followed by 3. [/color] and group patterns 1 and 3 using ( and ). Then we do the substitutions. We keep the 1st and 2nd groups (using \1 and \2), but replace the contents between the first and second group with [b]foobar[/b].
sed will always be greedy. You can use perl if you strictly want non-greedy variant:
$ echo $test
I have a [color=blue]dog[/color] and a [color=blue]cat[/color] in my house.
$ perl -pne 's/(\[color=[a-zA-Z]*\])(.*?)(\[\/color\])/$1\[b\]foobar\[\/b\]$3/g' <<< "$test"
I have a [color=blue][b]foobar[/b][/color] and a [color=blue][b]foobar[/b][/color] in my house.
I guess, you can interpret most of the regex here, except for the tiny syntax change:
(.*?) in place of (.*) dictates that the match is supposed to be non-greedy.
If you skip ? after .*, here is the output you must be getting currently:
$ perl -pne 's/(\[color=[a-zA-Z]*\])(.*)(\[\/color\])/$1\[b\]foobar\[\/b\]$3/g' <<< "$test"
I have a [color=blue][b]foobar[/b][/color] in my house.
As other have stated you need to use non greedy by reading non matching characters.
Using a carat inside brackets [^ABC] effectively means not whatever follows.
So using this with the asterix * will match only up to the next one of that character.
For example
[^[]*
Will match everything up to the next [ bracket
Also everyone is backslash escaping the replacement which is not needed as it cannot print regex.
Anyway here is a command that should work.
sed 's/\(\[color[^]]*\]\)[^[]*\(\[\/color\]\)/\1[b]foobar\[b]\2/g'