Could anyone help to get desired output like mentioned below?
❯ cat example.txt
hostname a-b-c
services Apache
hostname d-e-f
services Python
hostname g-h-i
vmhostname u-v-w
vmhostname x-y-z
I would like to get output like below
a-b-c Apache
d-e-f Python
g-h-i No services
u-v-w No services
x-y-z No services
I have used awk but getting different output.
❯ cat example.txt | awk 'NR%2{printf $0" ";next;}1'
hostname a-b-c services Apache
hostname d-e-f services Python
hostname g-h-i vmhostname u-v-w
vmhostname x-y-z %
Could you please try following, written and tested with shown samples in GNU awk.
awk '
/hostname/{
if(val){
print val,"NO service"
}
val=$2
next
}
/services/{
print val,$2
val=""
}
END{
if(val){
print val,"NO service"
}
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/hostname/{ ##checking condition if line has hostname here.
if(val){ ##checking condition if val is NOT NULL here.
print val,"NO service" ##printing val and string NO service here.
}
val=$2 ##setting val to 2nd field here.
next ##next will skip all statements from here.
}
/services/{ ##Checking condition if line has services in it then do following.
print val,$2 ##Printing val and 2nd field here.
val="" ##Nullifying val here.
}
END{ ##starting END block of this program from here.
if(val){ ##checking condition if val is NOT NULL here.
print val,"NO service" ##printing val and string NO service here.
}
}' Input_file ##Mentioning Input_file name here.
Output will be as follows.
a-b-c Apache
d-e-f Python
g-h-i NO service
u-v-w NO service
x-y-z NO service
Related
How can I determine a pattern exists over multiple lines with grep? below is a multiline pattern I need to check is present in the file
Status: True
Type: Master
I tried the below command but it checks multiple strings on a single line but fails for strings pattern match on multiple lines
if cat file.txt | grep -P '^(?=.*Status:.*True)(?=.*Type:.*Master)'; then echo "Present"; else echo "NOT FOUND"; fi
file.txt
Interface: vlan
Status: True
Type: Master
ID: 104
Using gnu-grep you can do this:
grep -zoP '(?m)^\s*Status:\s+True\s+Type:\s+Master\s*' file
Status: True
Type: Master
Explanation:
P: Enabled PCRE regex mode
-z: Reads multiline input
-o: Prints only matched data
(?m) Enables MULTILINE mode so that we can use ^ before each line
^: Start a line
With your shown samples, please try following awk program. Written and tested in GNU awk.
awk -v RS='(^|\n)Status:[[:space:]]+True\nType:[[:space:]]+Master' '
RT{
sub(/^\n/,"",RT)
print RT
}
' Input_file
Explanation: Simple explanation would be setting RS(Record separator of awk) as regex (^|\n)Status:[[:space:]]+True\nType:[[:space:]]+Master(explained below) and in main program checking if RT is NOT NULL then remove new line(starting one) in RT with NULL and print value of RT to get expected output shown by OP.
I did it as follows:
grep -A 1 "^.*Status:.*True" test.txt | grep -B 1 "^Type:.*Master"
The -A x means "also show the x lines After the found one.
The -B y means "also show the y lines Before the found one.
So: show the "Status" line together with the next one (the "Type" one), and then show the "Type" line together with the previous one (the "Status" one).
You could also keep track of the previous line setting in every line at the end prev = $0 and use a pattern to match the previous and the current line.
awk '
prev ~ /^[[:space:]]*Status:[[:space:]]*True$/ && $0 ~ /^[[:space:]]*Type:[[:space:]]*Master$/{
printf "%s\n%s", prev, $0
}
{prev = $0}
' file.txt
Output
Status: True
Type: Master
I'm creating a shell script, which reads the following list.log
1.15.2.119
1.15.86.33
1.15.251.60
1.20.178.145/31
1.37.33.24
1.54.202.216
1.58.10.126/28
1.80.225.84
1.116.240.174/30
I would like to add a /32 IP at the end of all IPs except the ones that already exist /32 something.
Example:
1.14.191.227/32
1.15.2.119/32
1.15.86.33/32
1.15.251.60/32
1.20.178.145/31
1.37.33.24/32
1.54.202.216/32
1.58.10.126/28
1.80.225.84/32
1.116.240.174/30
My return is doubling the /32
cat list.log | sed 's/$/\/32/'
1.14.191.227/32
1.15.2.119/32
1.15.86.33/32
1.15.251.60/32
1.20.178.145/31/32
1.37.33.24/32
1.54.202.216/32
1.58.10.126/28/32
1.80.225.84/32
1.116.240.174/30/32
This could be easily done in awk, please try following awk program. Written and tested with shown samples.
awk '!/\/32$/{$0=$0"/32"} 1' Input_file
Explanation: Simple explanation would be, checking condition if line doesn't ending with /32 then add /32 to current line and mentioning 1 will print edited/non-edited current line.
Using sed
$ sed 's|\.[0-9]\+$|&/32|' list.log
1.15.2.119/32
1.15.86.33/32
1.15.251.60/32
1.20.178.145/31
1.37.33.24/32
1.54.202.216/32
1.58.10.126/28
1.80.225.84/32
1.116.240.174/30
You can add /32 to the end of lines that do not contain /
sed '\,/,!s,$,/32,' list.log > newlist.log
Details:
\,/,! - find lines not containing /
s,$,/32, - and replace end of string position with /32 there.
See the online demo:
#!/bin/bash
s='1.15.2.119
1.15.86.33
1.15.251.60
1.20.178.145/31
1.37.33.24
1.54.202.216
1.58.10.126/28
1.80.225.84
1.116.240.174/30'
sed '\,/,!s,$,/32,' <<< "$s"
Output:
1.15.2.119/32
1.15.86.33/32
1.15.251.60/32
1.20.178.145/31
1.37.33.24/32
1.54.202.216/32
1.58.10.126/28
1.80.225.84/32
1.116.240.174/30
I hope this is an easy fix
I originally wrote a clean and easy script that utilized gawk, I used this first and foremost because when I was solving the original issue was what I found. I now need to adapt it to only use awk.
sample file.fasta:
>gene1
>gene235
ATGCTTAGATTTACAATTCAGAAATTCCTGGTCTATTAACCCTCCTTCACTTTTCACTTTTCCCTAACCCTTCAAAATTTTATATCCAATCTTCTCACCCTCTACAATAATACATTTATTATCCTCTTACTTCAAAATTTTT
>gene335
ATGCTCCTTCTTAATCTAAACCTTCAAAATTTTCCCCCTCACATTTATCCATTATCACCTTCATTTCGGAATCCTTAACTAAATACAATCATCAACCATCTTTTAACATAACTTCTTCAAAATTTTACCAACTTACTATTGCTTCAAAATTTTTCAT
>gene406
ATGTACCACACACCCCCATCTTCCATTTTCCCTTTATTCTCCTCACCTCTACAATCCCCTTAATTCCTCTTCAAAATTTTTGGAGCCCTTAACTTTCAATAACTTCAAAATTTTTCACCATACCAATAATATCCCTCTTCAAAATTTTCCACACTCACCAAC
gawk '/[ACTG]{21,}GG/{print a; print}{a=$0}' file.fasta >"species_precrispr".fasta
what I know works is awk is the following:
awk '/[ACTG]GG/{print a; print}{a=$0}' file.fasta >"species_precrispr".fasta
the culprit therefore is the interval expression of {21,}
What I want it to do is search is for it to match each line that contains at least 21 nucleotides left of my "GG" match.
Can anyone help?
Edit:
Thanks for all the help:
There are various solutions that worked. To reply to some of the comments a more basic example of the initial output and the desired effect achieved...
Prior to awk command:
cat file1.fasta
>gene1
ATGCCTTAACTTTCAATAACTGG
>gene2
ATGGGTGCCTTAACTTTCAATAACTG
>gene3
ATGTCAAAATTTTTCATTTCAAT
>gene4
ATCCTTTTTTTTGGGTCAAAATTAAA
>gene5
ATGCCTTAACTTTCAATAACTTTTTAAAATTTTTGG
Following codes all produced the same desired output:
original code
gawk '/[ACTG]{21,}GG/{print a; print}{a=$0}' file1.fasta
slight modification that adds interval function to original awk version >3.x.x
awk --re-interval'/[ACTG]{21,}GG/{print a; print}{a=$0}' file1.fasta
Allows for modification of val and correct output , untested but should work with lower versions of awk
awk -v usr_count="21" '/gene/{id=$0;next} match($0,/.*GG/){val=substr($0,RSTART,RLENGTH-2);if(gsub(/[ACTG]/,"&",val)>= usr_count){print id ORS $0};id=""}' file1.fasta
awk --re-interval '/^>/ && seq { if (match(seq,"[ACTG]{21,}GG")) print ">" name ORS seq ORS} /^>/{name=$0; seq=""; next} {seq = seq $0 } END { if (match(seq,"[ACTG]{21,}GG")) print ">" name ORS seq ORS }' file1.fasta
Desired output: only grab genes names and sequences of sequences that have 21 nucleotides prior to matching GG
>gene1
ATGCCTTAACTTTCAATAACTGG
>gene5
ATGCCTTAACTTTCAATAACTTTTTAAAATTTTTGG
Lastly just to show the discarded lines
>gene2
ATG-GG-TGCCTTAACTTTCAATAACTG # only 3 nt prior to any GG combo
>gene3
ATGTCAAAATTTTTCATTTCAAT # No GG match found
>gene4
ATCCTTTTTTTTGGGTCAAAATTAAA # only 14 nt prior to any GG combo
Hope this helps others!
EDIT: As per OP comment need to print gene ids too then try following.
awk '
/gene/{
id=$0
next
}
match($0,/.*GG/){
val=substr($0,RSTART,RLENGTH-2)
if(gsub(/[ACTG]/,"&",val)>=21){
print id ORS $0
}
id=""
}
' Input_file
OR one-liner form of above solution as per OP's request:
awk '/gene/{id=$0;next} match($0,/.*GG/){val=substr($0,RSTART,RLENGTH-2);if(gsub(/[ACTG]/,"&",val)>=21){print id ORS $0};id=""}' Input_file
Could you please try following, written and tested with shown samples only.
awk '
match($0,/.*GG/){
val=substr($0,RSTART,RLENGTH-2)
if(gsub(/[ACTG]/,"&",val)>=21){
print
}
}
' Input_file
OR more generic approach where created a variable in which user could mention value which user is looking to match should be present before GG.
awk -v usr_count="21" '
match($0,/.*GG/){
val=substr($0,RSTART,RLENGTH-2)
if(gsub(/[ACTG]/,"&",val)>=usr_count){
print
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/.*GG/){ ##Using Match function to match everything till GG in current line.
val=substr($0,RSTART,RLENGTH-2) ##Storing sub-string of current line from RSTART till RLENGTH-2 into variable val here.
if(gsub(/[ACTG]/,"&",val)>=21){ ##Checking condition if global substitution of ACTG(with same value) is greater or equal to 21 then do following.
print ##Printing current line then.
}
}
' Input_file ##Mentioning Input_file name here.
GNU awk accepts interval expressions in regular expressions from version 3.0 onwards. However, only from version 4.0, interval expression became defaultly enabled. If you have awk 3.x.x, you have to use the flag --re-interval to enable them.
awk --re-interval '/a{3,6}/{print}' file
There is an issue that often people overlook with FASTA files and using awk. When you have multi-line sequences, it is possible that your match is covering multiple lines. To this end you need to combine your sequences first.
The easiest way to process FASTA files with awk, is to build up a variable called name and a variable called seq. Every time you read a full sequence, you can process it. Remark that, for the best way of processing, the sequence, should be stored as a continues string, and not contain any newlines or white-spaces due. A generic awk for processing fasta, looks like this:
awk '/^>/ && seq { **process_sequence_here** }
/^>/{name=$0; seq=""; next}
{seq = seq $0 }
END { **process_sequence_here** }' file.fasta
In the presented case, your sequence processing looks like:
awk '/^>/ && seq { if (match(seq,"[ACTG]{21,}GG")) print ">" name ORS seq ORS}
/^>/{name=$0; seq=""; next}
{seq = seq $0 }
END { if (match(seq,"[ACTG]{21,}GG")) print ">" name ORS seq ORS }' file.fasta
Sounds like what you want is:
awk 'match($0,/[ACTG]+GG/) && RLENGTH>22{print a; print} {a=$0}' file
but this is probably all you need given the sample input you provided:
awk 'match($0,/.*GG/) && RLENGTH>22{print a; print} {a=$0}' file
They'll both work in any awk.
Using your updated sample input:
$ awk 'match($0,/.*GG/) && RLENGTH>22{print a; print} {a=$0}' file
>gene1
ATGCCTTAACTTTCAATAACTGG
>gene5
ATGCCTTAACTTTCAATAACTTTTTAAAATTTTTGG
I need to print from the word 'update' to ';'.
file.txt:
-- Host (first) kkk (queen1)
-- prince princess#/king 1/1
update schema.table_name t set "A=123","B=234" where "C=222" and "D=333"
and "F=2342";
-- Host (first) ddd (queen2)
-- prince princess#/king 2/2
update schema2.table_name2 t set "A=123","B=234" where "C=222" and "D=333"
and "F=2342";
With the below awk, I can specify the block to parse, but I'm not sure how I can print the statement from update....until semicolon().
file.awk:
BEGIN {
}
/-- Host/,/;/ {
if (/-- Host/) printf "%s#%s#",$3,$5;
if (/update /) printf ??????????????;
}
END {
}
This is how I execute it:
awk -f file.awk -F'[ ()]+' file.txt
Can you let me know of any idea?
I guess your problem is, the update... line was broken into multiple lines. This one-liner may help you. However you perhaps have to adjust it a little bit to fit your whole script.
awk 'p||/^update/{p=1;printf "%s",$0}/;$/&&p{p=0;print ""}' file
with your file as input, it outputs:
update schema.table_name t set "A=123","B=234" where "C=222" and "D=333"and "F=2342";
update schema2.table_name2 t set "A=123","B=234" where "C=222" and "D=333"and "F=2342";
awk idiomatic way of doing this is
awk '/update/,/;/' file.txt
Consider the format of a bind dns zone file:
zone "mydomain.com" {
type slave;
file "db.mydomain";
masters {
192.168.5.15;
};
};
...
repeated several more times for other zones in the conf file.
I need to discover in a script some details about the zone.conf file.
I know the domain I am looking for so I can regex for something like '^zone "mydomain.com"'
But I need to discover the file line that occurs first after the zone name I am looking at.
I also want to discover the ip address in the masters list.
Our configuration only has one master ip so I don't have to worry about multiple ip's.
Ideas appreciated.
sed can be used to isolate the right section of the dns file, then print the next line after a pattern matched:
# sed -n '/"mydomain.com"/,/^};$/{/^zone "mydomain.com"/{n;p}}' dnsfile
type slave;
# sed -n '/"mydomain.com"/,/^};$/{/masters/{n;p}}' dnsfile
192.168.5.15;
One approach here would be to use sed to first output the zone block you are interested in, and then grab just the lines you want. This might look something like the following:
sed -n '/^zone "mydomain.com"/,/^};/p' zone.conf | sed -n -e '2p' -e '/[0-9]/p'
2p will print only the second line (first line after the zone name), and /[0-9]/p will print only lines that contain digits (ip address).
To get the next line with trimmed IP:
awk -F';' '/^ *masters/ { getline; sub(/^ */, "", $1); print $0 }' file
OUTPUT
192.168.5.15
To get zone line:
awk -F';' '/^zone "mydomain.com"/ { getline; sub(/^ */, "", $1); print $0}' file
OUTPUT
192.168.5.15