Regex sed issue - regex

My sed expression looks as belos:
sed -i "s/-D CONSOLELOG /-D CONSOLELOG -fPIC /g" makefile.init
makefile.init
CFLAGS = -std=c99 -rdynamic -g -Wall -Wno-write-strings -D CONSOLELOG
Output after 1st Run( As expected)
CFLAGS = -std=c99 -rdynamic -g -Wall -Wno-write-strings -D CONSOLELOG -fPIC
2nd Run (Notice the extra fPIC at the end)
CFLAGS = -std=c99 -rdynamic -g -Wall -Wno-write-strings -D CONSOLELOG -fPIC -fPIC
I need to modify my sed expression to get output as in (1) irrespective of the number of times it is executed

This might work for you (GNU sed):
sed -ri 's/-D CONSOLELOG (-fPIC )?$/&-fPIC /' file
This would insert at most 2 -fPIC options following a -D CONSOLELOG option.

Sample changed for illustration purposes
$ cat ip.txt
42 foo baz
ijk baz xyz
$ sed -i 's/baz $/&123/' ip.txt
$ cat ip.txt
42 foo baz 123
ijk baz xyz
$ # further runs won't change input
$ sed -i 's/baz $/&123/' ip.txt
$ cat ip.txt
42 foo baz 123
ijk baz xyz
$ is a meta character to ensure matching only at end of line
so, matches elsewhere in the line won't be changed and hence applying the command again won't result in duplication
& in replacement section is backreference to entire matched string in search section
since there can only be one match at end of line, g modifier is not needed
To replace anywhere in the line(assuming only single match per line)
$ cat ip.txt
42 foo baz
ijk baz xyz
$ sed -i '/baz 123/! s/baz /&123/' ip.txt
$ cat ip.txt
42 foo baz 123
ijk baz 123xyz
$ # further runs won't change input
$ sed -i '/baz 123/! s/baz /&123/' ip.txt
$ cat ip.txt
42 foo baz 123
ijk baz 123xyz
sed commands can be qualified with addressing
here, /baz 123/! means lines not matching baz 123
Further reading: Difference between single and double quotes in Bash

Related

How do I replace the second occurrence of a whitespace in each line with 'sed' or 'awk'?

I have a file hashes which has many lines that look like this:
wget https://ipfs.io/ipfs/QmbKi6XiMmf4YfvKXhqVPymD1HDwJ3WqukjyLuEvnrZrCz The_Supremes_-_My_World_Is_Empty_Without_You_(lyrics).mkv
All the lines in hashes will follow the pattern:
wget https://ipfs.io/ipfs/hashthatis46characterlong nameOfAfileWithoutSpaces
as they are written by my script with the following lines of code:
find ~/pCloudDrive/VisualArts/Films/Fiction_Movies -maxdepth 1 -type f -size +200M -exec ipfs add --nocopy {} \;>>~/CS/ipfs/hashes && \
sed -i 's;added ;wget https://ipfs.io/ipfs/;g' ~/CS/ipfs/hashes
All hashes are going to be 46-character long and they typically start with 'Qm' but this may not necessarily be
the case in the future.
I want to replace the second space of each line of this file with ' -O ' so that it looks like:
wget https://ipfs.io/ipfs/hashthatis46characterlong -O nameOfAfileWithoutSpaces
I tried sed 's/[0-9A-z]{46,46}\s/& -O /g' hashes but to no avail - I get the following output:
sed: -e expression #1, char 27: Invalid range end
How do I do this? Would awk present a better solution for this problem than sed?
Using GNU awk and gensub() to change the second occurrence on each record:
$ awk '{print gensub(/ /," -O ","2")}' file
For example:
$ echo 1 2 3 4 5 | awk '{print gensub(/ /," -O ","2")}'
1 2 -O 3 4 5
As simple as this
sed 's/ / -O /2' input
where the trailing 2 in the sed command means "the second occurrence".
As you have nameOfAfileWithoutSpaces it is possible to get desired result another way using GNU sed, namely:
s/\([^[:space:]]*\)$/-O \1/
this does capture non-whitespace characters which are followed by end of line ($) then does replace by -O followed by these characters. I tested in using sed.js.org and for input
wget https://ipfs.io/ipfs/hashthatis46characterlong nameOfAfileWithoutSpaces
wget https://ipfs.io/ipfs/hashthatis46characterlong anotherName
output is
wget https://ipfs.io/ipfs/hashthatis46characterlong -O nameOfAfileWithoutSpaces
wget https://ipfs.io/ipfs/hashthatis46characterlong -O anotherName
Another awk:
$ awk '{$3="-O" OFS $3}1' file

Escaping plus in sed regular expression

There is a file with following text:
CXX_FLAGS = -fPIC -Wall -Wextra -Wno-missing-braces -ffloat-store -pthread -std=gnu++17
To replace the string "-std=gnu++17" with "-std=c++17 -std=gnu++17", I tried:
sed -i -e 's/\-std\=gnu\+\+17/\-std=c\+\+17 \-std=gnu\+\+17/g' filename
That however does not work, until I remove the \ escape from frst + sign in search expression. So these seem to be OK:
sed -i -e 's/\-std\=gnu++17/\-std=c\+\+17 \-std=gnu\+\+17/g' filename
sed -i -e 's/\-std\=gnu+\+17/\-std=c\+\+17 \-std=gnu\+\+17/g' filename
sed -i -e 's/\-std\=gnu..17/\-std=c\+\+17 \-std=gnu\+\+17/g' filename
I understand the + must be escaped when not in character class, but I thought one can prefix any character with backslash in regex. Why does escaping the + sign here cause the search-replace to fail?
The OS is Ubuntu 20.04 LTS.
You have not used -r nor -E option, so you tell sed to parse the regex pattern as a POSIX BRE expression. In GNU sed, in a POSIX BRE expression, \+ is a quantifier matching 1 or more occurrences of the quantified pattern. Run sed -e 's/\-std\=gnu\+\+17/\-std=c\+\+17 \-std=gnu\+\+17/g' <<< '-std=gnuuuu17' and the result will be -std=c++17 -std=gnu++17. To match +, you just need to use +.
Note you overescaped a lot of chars and your command is unnecessarily long because you repeated the pattern in both the LHS and RHS.
You may use the following POSIX BRE sed command with GNU sed:
sed -i 's/-std=gnu++17/-std=c++17 &/' filename
See the sed online demo:
s='CXX_FLAGS = -fPIC -Wall -Wextra -Wno-missing-braces -ffloat-store -pthread -std=gnu++17'
sed 's/-std=gnu++17/-std=c++17 &/' <<< "$s"
# => CXX_FLAGS = -fPIC -Wall -Wextra -Wno-missing-braces -ffloat-store -pthread -std=c++17 -std=gnu++17
Details
-std=gnu++17 - the string pattern matches -std=gnu++17 string exactly
-std=c++17 & - the replacement pattern is -std=c++17, space and & stands for the whole match, -std=gnu++17.

Multiline regex search with ag

I'd like to "AND" search text in spesific multiline range in a file by regex with ag(the_silver_searcher). But the regex pattern not work.
Following regex pattern works well.
ag --multiline -G "^.*\.(md|txt)$" -C 1 -S "foo(\n|.)*baz" ./dev_note.md
(output)
40-
41:foo
42:bar
43:baz
44-
But following regex pattern will output nothing.(no matched)
ag --multiline -G "^.*\.(md|txt)$" -C 1 -S "(?=(.|\n)*(foo))(?=(.|\n)*(baz))" ./dev_note.md
Also I tried: ag --multiline -G "^.*\.(md|txt)$" -C 1 -S "(?=(.|\n)*(foo))(.|\n)*(?=(.|\n)*(baz))" ./dev_note.md

grep/egrep the star operator not matching all occurrences

Let's take the string AaAa. I want to match the as:
$ echo AaAa | grep -o a
a
a
So it is printing every match and not just the first one. When I add a star after the a I get the following
$ echo AaAa | grep -o 'a*'
$
Why did grep not output every match this time? I know it matched because if we remove the -o option it prints the whole line:
$ echo AaAa | grep 'a*'
AaAa
To see how many matches it should have matched I used sed:
$ echo AaAa | sed 's/a*/x/g'
xAxAx
The strings that were substituted for x should have been what grep -o printed. So the matches are as follows:
The null string in the beginning for matching a zero times
The first a
The second a
Why didn't it print the following?
$ echo AaAa | grep -o 'a*'
a
a
$
EDIT
The above was done with GNU grep 2.5.1
The following was done with GNU grep 2.6.3
$ echo AaAa | grep -o 'a*'
a
a
$
Notice that it still didn't print the first null string on its own line. It seems the bug was partially fixed in this newer release. Shouldn't there be a null string matched as well, like the sed example above?
Let's start with this:
$ echo AaAa | grep -o 'a*'
$
You mentioned this was run on version 2.5.1. This appears to be a bug in grep that seems to have been fixed in 2.5.3.
Here's a quote from GNU grep development:
2.5.3
=====
Fix the combinations:
* -i -o
* --colour -i
* -o -b
* -o and zero-width matches
Go through the bug list im my mailbox and fix fixable.
Fix bugs reported with 2.5.2.
-o and zero-width matches is the bug we seem to be dealing with here. Zero width assertions don't consume characters in the string to the match, but they are still assertions so they do have to match. In this case, our zero width assertion is matching the character a zero times.
On to the next part:
$ echo AaAa | grep -o 'a*'
a
a
$
I think the reason you don't get a blank line here is just that the -o flag just doesn't print anything for zero width assertions.
You can eliminate the duplicates using awk:
$ echo AaAa | grep -o a|awk '!x[$0]++'
a

Grep match a word but not another word

I want to match a line with foo not followed by bar, e.g.
foo 123 <-- match
foo bar <-- not match
Using the following regex does not work:
echo "foo 123" | grep -E 'foo.*(?!bar).*'
Any idea?
On systems that don't have grep -P like OSX you can use this awk command:
awk -F 'foo' 'NF>1{s=$0; $1=""; if (!index($0,"bar")) print s}' file
Script Demo
You could try the below grep command which uses -P(perl-regexp) parameter,
grep -P 'foo(?:(?!bar).)*$' file
Example:
$ cat file
foo 123
foo bar
$ grep -P 'foo(?:(?!bar).)*$' file
foo 123
Or
use only negative lookahead to check whether the string bar is after to foo without matching any following character.
$ grep -P 'foo(?!.*bar)' file
foo 123
You can use -v to invert the match:
grep -v 'foo.*bar' file