The file I'm searching (fruit.text) looks something like the below snippet, the data appears in random order that I cannot control.
....fruit=apple,...qty=3,...condition=bad,....
...qty=4,...condition=great,...fruit=orange,...
...condition=ok,...qty=2,...fruit=banana,...
My Grep command is: grep -Eo 'fruit.[^,]*'\|'qty.[^,]*'\|'condition.[^,]*' fruit.txt
This results in output like:
fruit=apple
qty=3
condition=bad
qty=4
condition=great
fruit=orange
condition=ok
qty=2
fruit=banana
Which is correct, however, I'm looking for the output to be ordered as I specified in the grep cmd. ie, exactly like the below:
fruit=apple
qty=3
condition=bad
fruit=orange
qty=4
condition=great
fruit=banana
qty=2
condition=ok
A solution with gawk:
first i added some extra ',' to the input:
....,fruit=apple,...,qty=3,...,condition=bad,....
...,qty=4,...,condition=great,...,fruit=orange,...
...,condition=ok,...,qty=2,...,fruit=banana,...
Then i wrote this awk script (fruit.awk):
{ fruit ="";
qty="";
condition="";
for (i = 1;i <= NF; i++){
delete a;
split($i,a,"=");
if (a[1]=="fruit" ) { fruit=a[2]; }
if (a[1]=="qty") { qty=a[2] }
if (a[1]=="condition") { condition=a[2] }
}
}
{ print "fruit=" fruit;
print "qty=" qty;
print "condition=" condition;
}
output of: gawk -F , -f fruit.awk fruit.txt:
fruit=apple
qty=3
condition=bad
fruit=orange
qty=4
condition=great
fruit=banana
qty=2
condition=ok
Using sed in some steps:
sed -E 's/^/,/;
s/(.*),(condition[^,]*)/\2\r,\1/;
s/(.*),(qty=[^,]*)/\2,\1/;
s/(.*),(fruit=[^,]*)/\2,\1/;
s/\r.*//;
s/,/\n/g' input.txt
I start with inserting a , for input where the interesting data starts in the first field.
After condition I add a \r, so I can remove the garbage after finding the fruit.
Related
I have file
Mds:tfy Fg:567895435
Mds:hgf Fg:567896553
Mds:tfy Fg:561245746
But i want Fg:56789 Mds:tfy Mds:hgf
I tried :
awk ‘{$2,$1’} /file | grep -o ‘^(fg:56789)*{f:56789}
but it didn’t work and not a good idea
Thanks.
Could you please try following.
awk '
match($0,/[0-9][0-9][0-9][0-9][0-9]/){
val=substr($0,RSTART,RLENGTH)
a[val]++
b[val]=$0
}
END{
for(i in a){
if(a[i]>1){
print b[i]
}
}
}
' Input_file
I am trying to write an AWK script to parse a file of the form
> field1 - field2 field3 ...
lineoftext
anotherlineoftext
anotherlineoftext
and I am checking using regex if the first line is correct (begins with a > and then has something after it) and then print all the other lines. This is the script I wrote but it only verifies that the file is in a correct format and then doesn't print anything.
#!/bin/bash
# FASTA parser
awk ' BEGIN { x = 0; }
{ if ($1 !~ />.*/ && x == 0)
{ print "Not a FASTA file"; exit; }
else { x = 1; next; }
print $0 }
END { print " - DONE - "; }'
Basically you can use the following awk command:
awk 'NR==1 && /^>./ {p=1} p' file
On the first row NR==1 it checks whether the line starts with a > followed by "something" (/^>./). If that condition is true the variable p will be set to one. The p at the end checks whether p evaluates true and prints the line in that case.
If you want to print the error message, you need to revert the logic a bit:
awk 'NR==1 && !/^>./ {print "Not a FASTA file"; exit 1} 1' file
In this case the program prints the error messages and exits the program if the first line does not start with a >. Otherwise all lines gets printed because 1 always evaluates to true.
For this OP literally
awk 'NR==1{p=$0~/^>/}p' YourFile
# shorter version with info of #EdMorton
awk 'NR==1{p=/^>/}p' YourFile
for line after > (including)
awk '!p{p=$0~/^>/}p' YourFile
# shorter version with info of #EdMorton
awk '!p{p=/^>/}p' YourFile
Since all you care about is the first line, you can just check that, then exit.
awk 'NR > 1 { exit (0) }
! /^>/ { print "Not a FASTA file" >"/dev/stderr"; exit (1) }' file
As noted in comments, the >"/dev/stderr" is a nonportable hack which may not work for you. Regard it as a placeholder for something slightly more sophisticated if you want a tool which behaves as one would expect from a standard Unix tool (run silently if no problems; report problems to standard error).
I have a following content in the file:
NAME=ALARMCARDSLOT137 TYPE=2 CLASS=116 SYSPORT=2629 STATE=U ALARM=M APPL=" " CRMPLINK=CHASSIS131 DYNDATA="GL:1,15 ADMN:1 OPER:2 USAG:2 STBY:0 AVAL:0 PROC:0 UKNN:0 INH:0 ALM:20063;1406718801,"
I just want to filter out NAME , SYSPORT and ALM field using sed
Try the below sed command to filter out NAME,SYSPORT,ALM fields ,
$ sed 's/.*\(NAME=[^ ]*\).*\(SYSPORT=[^ ]*\).*\(ALM:[^;]*\).*/\1 \2 \3/g' file
NAME=ALARMCARDSLOT137 SYSPORT=2629 ALM:20063
why not using grep?
grep -oE 'NAME=\S*|SYSPORT=\S*|ALM:[^;]*'
test with your text:
kent$ echo 'NAME=ALARMCARDSLOT137 TYPE=2 CLASS=116 SYSPORT=2629 STATE=U ALARM=M APPL=" " CRMPLINK=CHASSIS131 DYNDATA="GL:1,15 ADMN:1 OPER:2 USAG:2 STBY:0 AVAL:0 PROC:0 UKNN:0 INH:0 ALM:20063;1406718801,"'|grep -oE 'NAME=\S*|SYSPORT=\S*|ALM:[^;]*'
NAME=ALARMCARDSLOT137
SYSPORT=2629
ALM:20063
Here is another awk
awk -F" |;" -v RS=" " '/NAME|SYSPORT|ALM/ {print $1}'
NAME=ALARMCARDSLOT137
SYSPORT=2629
ALM:20063
Whenever there are name=value pairs in input files, I find it best to first create an array mapping the names to the values and then operating on the array using the names of the fields you care about. For example:
$ cat tst.awk
function bldN2Varrs( i, fldarr, fldnr, subarr, subnr, tmp ) {
for (i=2;i<=NF;i+=2) { gsub(/ /,RS,$i) }
split($0,fldarr,/[[:blank:]]+/)
for (fldnr in fldarr) {
split(fldarr[fldnr],tmp,/=/)
gsub(RS," ",tmp[2])
gsub(/^"|"$/,"",tmp[2])
name2value[tmp[1]] = tmp[2]
split(tmp[2],subarr,/ /)
for (subnr in subarr) {
split(subarr[subnr],tmp,/:/)
subName2value[tmp[1]] = tmp[2]
}
}
}
function prt( fld, subfld ) {
if (subfld) print fld "/" subfld "=" subName2value[subfld]
else print fld "=" name2value[fld]
}
BEGIN { FS=OFS="\"" }
{
bldN2Varrs()
prt("NAME")
prt("SYSPORT")
prt("DYNDATA","ALM")
}
.
$ awk -f tst.awk file
NAME=ALARMCARDSLOT137
SYSPORT=2629
DYNDATA/ALM=20063;1406718801,
and if 20063;1406718801, isn't the desired value for the ALM field and you just want some subsection of that, simply tweak the array construction function to suit whatever your criteria is.
Let's suppose I have this sample
foo/bar/123-465.txt
foo/bar/456-781.txt
foo/bar/102-445.txt
foo/bar/123-721.txt
I want to remove every line where the regex /[0-9]*- result also appears on another line. In other terms : I want to remove every line where the file prefix is present more than once in my file.
Therefore only keeping :
foo/bar/456-781.txt
foo/bar/102-445.txt
I bet sed can do this, but how ?
Ok I misunderstood your problem, here's how to do it:
grep -vf <(grep -o '/[0-9]*-' file | sort | uniq -d) file
In action:
cat file
foo/bar/123-465.txt
foo/bar/456-781.txt
foo/bar/102-445.txt
foo/bar/123-721.txt
grep -vf <(grep -o '/[0-9]*-' file | sort | uniq -d) file
foo/bar/456-781.txt
foo/bar/102-445.txt
awk '
match($0, "[0-9]*-") {
id=substr($0, RSTART, RLENGTH)
if (store[id])
dup[id] = 1
store[id] = $0
}
END {
for(id in store) {
if(! dup[id]) {
print store[id]
}
}
}
'
You can use the following awk script:
example.awk:
{
# Get value of interest (before the -)
prefix=substr($3,0,match($3,/\-/)-1)
# Increment counter for this value (starting at 0)
counter[prefix]++
# Buffer the current line
buffer[prefix]=$0
}
# At the end print every line which's value of interest appeared just once
END {
for(index in counter)
if(counter[index]==1)
print buffer[index]
}
Execute it like this:
awk -F\ -f example.awk input.file
I would like to understand awk a little better: I often search for regular expressions and many times I am interested only in the Nth occurrence. I always did this task using pipes say:
awk '/regex/' file | awk 'NR%N==0'
How can I do the same task with awk (or perl) without piping?
Are there some instances in which using pipes is the most computationally efficient solution?
Every third:
awk '/line/ && !(++c%3)' infile
For example:
zsh-4.3.12[t]% cat infile
1line
2line
3line
4line
5line
6line
7line
8line
9line
10line
zsh-4.3.12[t]% awk '/line/ && !(++c%3)' infile
3line
6line
9line
zsh-4.3.12[t]% awk '/line/ && !(++c%2)' infile
2line
4line
6line
8line
10line
Just count the occurences and print every other Nth:
BEGIN { n=0 }
/myregex/ { n++; if(n==3) { n=0; print } }
You can use multiple conditions, e.g.:
awk -v N=10 '/regex/ { count++ } count == N { N=0; print $0 }'
awk '/regex/ { c=(c+1)%N; if(c==0) print}' N=3
try this:
awk '/yourRegex/{i++} i==N{print; exit;}' yourFile
this will print only the Nth match
Oh, if you need every Nth
how about:
awk '/yourRegex/{i++} (!(i%N) && i){print; i=0}' yourFile