Modifying a pattern-matched line as well as next line in a file - regex

I'm trying to write a script that, among other things, automatically enable multilib. Meaning in my /etc/pacman.conf file, I have to turn this
#[multilib]
#Include = /etc/pacman.d/mirrorlist
into this
[multilib]
Include = /etc/pacman.d/mirrorlist
without accidentally removing # from lines like these
#[community-testing]
#Include = /etc/pacman.d/mirrorlist
I already accomplished this by using this code
linenum=$(rg -n '\[multilib\]' /etc/pacman.conf | cut -f1 -d:)
sed -i "$((linenum))s/#//" /etc/pacman.conf
sed -i "$((linenum+1))s/#//" /etc/pacman.conf
but I'm wondering, whether this can be solved in a single line of code without any math expressions.

With GNU sed. Find row starting with #[multilib], append next line (N) to pattern space and then remove all # from pattern space (s/#//g).
sed -i '/^#\[multilib\]/{N;s/#//g}' /etc/pacman.conf
If the two lines contain further #, then these are also removed.

Could you please try following, written with shown samples only. Considering that multilib and it's very next line only you want to deal with.
awk '
/multilib/ || found{
found=$0~/multilib/?1:""
sub(/^#+/,"")
print
}
' Input_file
Explanation:
First checking if a line contains multilib or variable found is SET then following instructions inside it's block.
Inside block checking if line has multilib then set it to 1 or nullify it. So that only next line after multilib gets processed only.
Using sub function of awk to substitute starting hash one or more occurences with NULL here.
Then printing current line.

This will work using any awk in any shell on every UNIX box:
$ awk '$0 == "#[multilib]"{c=2} c&&c--{sub(/^#/,"")} 1' file
[multilib]
Include = /etc/pacman.d/mirrorlist
and if you had to uncomment 500 lines instead of 2 lines then you'd just change c=2 to c=500 (as opposed to typing N 500 times as with the currently accepted solution). Note that you also don't have to escape any characters in the string you're matching on. So in addition to being robust and portable this is a much more generally useful idiom to remember than the other solutions you have so far. See printing-with-sed-or-awk-a-line-following-a-matching-pattern/17914105#17914105 for more.

A perl one-liner:
perl -0777 -api.back -e 's/#(\[multilib]\R)#/$1/' /etc/pacman.conf
modify in place with a backup of original in /etc/pacman.conf.back

If there is only one [multilib] entry, with ed and the shell's printf
printf '/^#\[multilib\]$/;+1s/^#//\n,p\nQ\n' | ed -s /etc/pacman.conf
Change Q to w to edit pacman.conf
Match #[multilib]
; include the next address
+1 the next line (plus one line below)
s/^#// remove the leading #
,p prints everything to stdout
Q exit/quit ed without error message.
-s means do not print any message.

Ed can do this.
cat >> edjoin.txt << EOF
/multilib/;+j
s/#//
s/#/\
/
wq
EOF
ed -s pacman.conf < edjoin.txt
rm -v ./edjoin.txt
This will only work on the first match. If you have more matches, repeat as necessary.

This might work for you (GNU sed):
sed '/^#\[multilib\]/,+1 s/^#//' file
Focus on a range of lines (in this case, two) where the first line begins #[multilib] and remove the first character in those lines if it is a #.
N.B. The [ and ] must be escaped in the regexp otherwise they will match a single character that is m,u,l,t,i or b. The range can be extended by changing the integer +1 to +n if you were to want to uncomment n lines plus the matching line.
To remove all comments in a [multilib] section, perhaps:
sed '/^#\?\[[^]]*\]$/h;G;/^#\[multilib\]/M s/^#//;P;d' file

Related

Replace several lines by one using sed

I have an input like this:
This_is(A)
Goto(B,condition_1)
Goto(C,condition_2)
This_is(B)
Goto(A,condition_3)
This_is(C)
Goto(B,condition_1)
I want it to become like this
(A,B,condition_1)
(A,C,condition_2)
(B,A,condition_3)
(C,B,condition_1)
Anyone knows how to do this with sed?
Assuming you don't really need to do this with sed, this will work using any awk in any shell on every UNIX box:
$ awk -F'[()]' '/^[^[:space:]]/{s=$2; next} {sub(/[^[:space:]]*\(/,"("s",")} 1' file
(A,B,condition_1)
(A,C,condition_2)
(B,A,condition_3)
(C,B,condition_1)
This is a possible sed solution, where I have hardcoded a few bits, like This_is and Goto because the OP did not clarify if those strings change along the file in the actual file:
sed '/^This_is/{:a;N;s/\(^This_is(\(.\)).*\)\(\n *\)Goto(\([^)]*)\)$/\1\3(\2,\4/;$!ta;s/[^\n]*\n//}' input_file
(Unfortunately, with all these parenthesis, using the -E does not shorten the command much.)
The code is slightly more readable if split on more lines:
sed '/^This_is/{
:a
N
s/\(^This_is(\(.\)).*\)\(\n *\)Goto(\([^)]*)\)$/\1\3(\2,\4/
$!ta
s/[^\n]*\n//
}' os
Here you can see that the code takes action only on the lines starting with This_is; when the program hits those lines, it does the following.
It uses the N command to append the next line to the pattern space (interspersing \ns),
and it attempts a substitution with s/…/…/, which essentially tries to pick the x in This_is(x) and to put it just after the last Goto( on the multiline,
and it keeps doing this as long as the latter action is successful (ta branches to :a if s was successful) and the last line has not been read ($! matches all line but the last);
Indeed, this is a do-while loop, where :a marks the entry point, where the control jumps back if the while-condition is true, and ta is the command that evaluates the logical condition.
When the above while loop terminates, the shorter s/…/…/ command removes the leading line from the multiline pattern space, which is the This_is line.
This might work for you (GNU sed):
sed -E '/^\S.*\(.*\)/{h;d};G;s/\S+\((.*\))\n.*(\(.*)\).*/\2,\1/;P;d' file
If a line starts with a non-white space character and contains parens, copy it to the hold space (HS) and then delete it.
Otherwise, append the HS, remove non-white characters upto the opening paren, insert the value between parens from the stored value, add a comma and print the first line and then delete the whole of the pattern space.
N.B. Lines that do not meet the substitution criteria will be unchanged.
An alternative solution using GNU parallel and sed:
parallel --pipe --recstart T -kqN1 sed -E '1{h;d};G;s/\S+\((.*)\n.*(\(.*)\).*/\2,\1/;P;d' <file

How can I merge multiple blocks/lines with sed or regex?

Is it possible to merge multiple blocks/lines into a "single" line?
So basically if the next line starts with the same "#Msg" tag then append it to the previous line. (Hard to explain, but my example speaks for itself) (The blocks are separated by a new/blank line)
My input file looks like this:
#Msg,00000
#Msg,00001
#Msg,00002
#Msg,00003
#Msg,00004
#Msg,00005
#Msg,00006
#Msg,00007
#Msg,00008
#Msg,00009
#Msg,00010
#Msg,00011
Output should be like this:
#Msg,00000
#Msg,00001 #Msg,00002
#Msg,00003 #Msg,00004
#Msg,00005
#Msg,00006 #Msg,00007 #Msg,00008
#Msg,00009
#Msg,00010 #Msg,00011
Any advice is very welcome.
This would be pretty easy to do in Perl:
perl -00 -ple 'tr/\n/ /'
-e CODE specifies the program.
-p wraps a read/write line loop around it (by default it reads from STDIN, but you can also specify one or more filenames on the command line).
-00 specifies that the input "lines" are actually paragraphs.
-l has two effects: Incoming line terminators are automatically stripped from lines, and outgoing lines get line terminators added to them (and because we used -00 (paragraph mode), our line terminator is actually \n\n).
To recap:
We read the input one paragraph at a time. For each paragraph, we remove any trailing newlines. We then translate every newline to a space. Finally we output the transformed paragraph, followed by \n\n.
No point in trying to produce a shorter code than is possible with Perl!
Collect lines from the input file in list group until a blank line appears. Then output the contents of group, empty it and start again. When end-of-file is encountered output whatever is in group, if it is non-empty.
group = []
with open('vollschauer.txt') as vollschauer:
for line in vollschauer:
line = line.rstrip()
if line:
group.append(line)
else:
if group:
print (' '.join(group))
print()
group = []
if group:
print (' '.join(group))
group = []
$ awk -v RS= -v ORS='\n\n' '{$1=$1}1' file
#Msg,00000
#Msg,00001 #Msg,00002
#Msg,00003 #Msg,00004
#Msg,00005
#Msg,00006 #Msg,00007 #Msg,00008
#Msg,00009
#Msg,00010 #Msg,00011
If you insist on using sed, this should do the trick:
sed -r ':a; N; /^(#[^,]+,).*\n\1/! { P; D }; s/\n/ /; ba' file
It takes different tags into account. Such tags won't be grouped together (that's what I understood is the desired behavior):
$ cat file
#Msg,00000
#Msg,00001
#Hello,00002
#Hello,00003
#What,00004
#What,00005
$ sed -r ':a; N; /^(#[^,]+,).*\n\1/! { P; D }; s/\n/ /; ba' file
#Msg,00000 #Msg,00001
#Hello,00002
#Hello,00003
#What,00004 #What,00005
Note that this solution uses GNU sed.
This might work for you (GNU sed):
sed ':a;N;/^$/M!s/\n/ /;ta' file
Gather up lines, replacing each newline by a space until an empty line.
N.B. The use of the M flag on the repexp /^$/ which matches an empty line on a pattern space containing multiple lines.

process a delimited text file with sed

I have a ";" delimited file:
aa;;;;aa
rgg;;;;fdg
aff;sfg;;;fasg
sfaf;sdfas;;;
ASFGF;;;;fasg
QFA;DSGS;;DSFAG;fagf
I'd like to process it replacing the missing value with a \N .
The result should be:
aa;\N;\N;\N;aa
rgg;\N;\N;\N;fdg
aff;sfg;\N;\N;fasg
sfaf;sdfas;\N;\N;\N
ASFGF;\N;\N;\N;fasg
QFA;DSGS;\N;DSFAG;fagf
I'm trying to do it with a sed script:
sed "s/;\(;\)/;\\N\1/g" file1.txt >file2.txt
But what I get is
aa;\N;;\N;aa
rgg;\N;;\N;fdg
aff;sfg;\N;;fasg
sfaf;sdfas;\N;;
ASFGF;\N;;\N;fasg
QFA;DSGS;\N;DSFAG;fagf
You don't need to enclose the second semicolon in parentheses just to use it as \1 in the replacement string. You can use ; in the replacement string:
sed 's/;;/;\\N;/g'
As you noticed, when it finds a pair of semicolons it replaces it with the desired string then skips over it, not reading the second semicolon again and this makes it insert \N after every two semicolons.
A solution is to use positive lookaheads; the regex is /;(?=;)/ but sed doesn't support them.
But it's possible to solve the problem using sed in a simple manner: duplicate the search command; the first command replaces the odd appearances of ;; with ;\N, the second one takes care of the even appearances. The final result is the one you need.
The command is as simple as:
sed 's/;;/;\\N;/g;s/;;/;\\N;/g'
It duplicates the previous command and uses the ; between g and s to separe them. Alternatively you can use the -e command line option once for each search expression:
sed -e 's/;;/;\\N;/g' -e 's/;;/;\\N;/g'
Update:
The OP asks in a comment "What if my file have 100 columns?"
Let's try and see if it works:
$ echo "0;1;;2;;;3;;;;4;;;;;5;;;;;;6;;;;;;;" | sed 's/;;/;\\N;/g;s/;;/;\\N;/g'
0;1;\N;2;\N;\N;3;\N;\N;\N;4;\N;\N;\N;\N;5;\N;\N;\N;\N;\N;6;\N;\N;\N;\N;\N;\N;
Look, ma! It works!
:-)
Update #2
I ignored the fact that the question doesn't ask to replace ;; with something else but to replace the empty/missing values in a file that uses ; to separate the columns. Accordingly, my expression doesn't fix the missing value when it occurs at the beginning or at the end of the line.
As the OP kindly added in a comment, the complete sed command is:
sed 's/;;/;\\N;/g;s/;;/;\\N;/g;s/^;/\\N;/g;s/;$/;\\N/g'
or (for readability):
sed -e 's/;;/;\\N;/g;' -e 's/;;/;\\N;/g;' -e 's/^;/\\N;/g' -e 's/;$/;\\N/g'
The two additional steps replace ';' when they found it at beginning or at the end of line.
You can use this sed command with 2 s (substitute) commands:
sed 's/;;/;\\N;/g; s/;;/;\\N;/g;' file
aa;\N;\N;\N;aa
rgg;\N;\N;\N;fdg
aff;sfg;\N;\N;fasg
sfaf;sdfas;\N;\N;
ASFGF;\N;\N;\N;fasg
QFA;DSGS;\N;DSFAG;fagf
Or using lookarounds regex in a perl command:
perl -pe 's/(?<=;)(?=;)/\\N/g' file
aa;\N;\N;\N;aa
rgg;\N;\N;\N;fdg
aff;sfg;\N;\N;fasg
sfaf;sdfas;\N;\N;
ASFGF;\N;\N;\N;fasg
QFA;DSGS;\N;DSFAG;fagf
The main problem is that you can't use several times the same characters for a single replacement:
s/;;/..../g: The second ; can't be reused for the next match in a string like ;;;
If you want to do it with sed without to use a Perl-like regex mode, you can use a loop with the conditional command t:
sed ':a;s/;;/;\\N;/g;ta;' file
:a defines a label "a", ta go to this label only if something has been replaced.
For the ; at the end of the line (and to deal with eventual trailing whitespaces):
sed ':a;s/;;/;\\N;/g;ta; s/;[ \t\r]*$/;\\N/1' file
this awk one-liner will give you what you want:
awk -F';' -v OFS=';' '{for(i=1;i<=NF;i++)if($i=="")$i="\\N"}7' file
if you really want the line: sfaf;sdfas;\N;\N;\N , this line works for you:
awk -F';' -v OFS=';' '{for(i=1;i<=NF;i++)if($i=="")$i="\\N";sub(/;$/,";\\N")}7' file
sed 's/;/;\\N/g;s/;\\N\([^;]\)/;\1/g;s/;[[:blank:]]*$/;\\N/' YourFile
non recursive, onliner, posix compliant
Concept:
change all ;
put back unmatched one
add the special case of last ; with eventually space before the end of line
This might work for you (GNU sed):
sed -r ':;s/^(;)|(;);|(;)$/\2\3\\N\1\2/g;t' file
There are 4 senarios in which an empty field may occur: at the start of a record, between 2 field delimiters, an empty field following an empty field and at the end of a record. Alternation can be employed to cater for senarios 1,2 and 4 and senario 3 can be catered for by a second pass using a loop (:;...;t). Multiple senarios can be replaced in both passes using the g flag.

Print commands in history consisting in just one word

I want to print lines that contains single word only.
For example:
this is a line
another line
one
more
line
last one
I want to get the ones with single word only
one
more
line
EDIT: Guys, thank you for answers. Almost all of the answers work for my test file. However I wanted to list single lines in bash history. When I try your answers like
history | your posted commands
all of them below fails. Some only prints some numbers (might line numbers?)
You want to get all those commands in history that contain just one word. Considering that history prints the number of the command as a first column, you need to match those lines consisting in two words.
For this, you can say:
history | awk 'NF==2'
If you just want to print the command itself, say:
history | awk 'NF==2 {print $2}'
To rehash your problem, any line containing a space or nothing should be removed.
grep -Ev '^$| ' file
Your problem statement is unspecific on whether lines containing only punctuation might also occur. Maybe try
grep -Ex '[A-Za-z]+' file
to only match lines containing only one or more alphabetics. (The -x option implicitly anchors the pattern -- it requires the entire line to match.)
In Bash, the output from history is decorated with line numbers; maybe try
history | grep -E '^ *[0-9]+ [A-Za-z]+$'
to match lines where the line number is followed by a single alphanumeric token. Notice that there will be two spaces between the line number and the command.
In all cases above, the -E selects extended regular expression matching, aka egrep (basic RE aka traditional grep does not support e.g. the + operator, though it's available as \+).
Try this:
grep -E '^\s*\S+\s*$' file
With the above input, it will output:
one
more
line
If your test strings are in a file called in.txt, you can try the following:
grep -E "^\w+$" in.txt
What it means is:
^ starting the line with
\w any word character [a-zA-Z0-9]
+ there should be at least 1 of those characters or more
$ line end
And output would be
one
more
line
Assuming your file as texts.txt and if grep is not the only criteria; then
awk '{ if ( NF == 1 ) print }' texts.txt
If your single worded lines don't have a space at the end you can also search for lines without an empty space :
grep -v " "
I think that what you're looking for could be best described as a newline followed by a word with a negative lookahead for a space,
/\n\w+\b(?! )/g
example

how to grep exact string match across 2 files

I've UTF-8 plain text lists of usernames, 1 per line, in list1.txt and list2.txt. Note, in case pertinent, that usernames may contain regex characters e.g. ! ^ . ( and such as well as spaces.
I want to get and save to matches.txt a list of all unique values occurring in both lists. I've little command line expertise but this almost gets me there:
grep -Ff list1.txt list2.txt > matches.txt
...but that is treating "jdoe" and "jdoe III" as a match, returning "jdoe III" as the matched value. This is incorrect for the task. I need the per-line pattern match to be the whole line, i.e. from ^ to $. I've tried adding the -x flag but that gets no matches at all (edit: see comment to accepted answer - I got the flag order wrong).
I'm on OS X 10.9.5 and I don't have to use grep - another command line (tool) solving the problem will do.
All you need to do is add the -x flag to your grep query:
grep -Fxf list1.txt list2.txt > matches.txt
The -x flag will restrict matches to full line matches (each PATTERN becomes ^PATTERN$). I'm not sure why your attempt at -x failed. Maybe you put it after the -f, which must be immediately followed by the first file?
This awk will be handy than grep here:
awk 'FNR==NR{a[$0]; next} $0 in a' list1.txt list2.txt > matches.txt
$0 is the line, FNR is the current line number of the current file, NR is the overall line number (they are only the same when you are on the first file). a[$0] is a associative array (hash) whose key is the line. next will ensure that further clauses (the $0 in a) will not run if the current clause (the fact that this is the first file) did. $0 in a will be true when the current line has a value in the array a, thus only lines present in both will be displayed. The order will be their order of occurence in the second file.
A very simple and straightforward way to do it that doesn't require one to do all sorts of crazy things with grep is as follows
cat list1.txt list2.txt|grep match > matches.txt
Not only that, but it's also easier to remember, (especially if you regularly use cat).
grep -Fwf file1 file2 would match word to word !!