How to have a good output in bash when printing? - regex

I have this command here, and I have a problem achieving a good format.
In this lines,
DATE*2014*09*23
VAL*0001*ABC
N3*Sample
VAL*0002*XYZ
My desired output here is like this:
["ABC", "XYC"]
I tried this code:
perl -nle 'print $& if /VAL\*[0-9]*\*\K.*/' file | awk '{ printf "\"%s\",", $0 }'
resulting only:
"ABC","XYZ",
Another thing is that when printing only one value.
If it happens that a file is like this:
DATE*2014*09*23
VAL*0001*ABC
N3*Sample
my desired output would only be like this (ignoring the output of having []):
"ABC"

You can do all with awk:
#!/usr/bin/awk -f
BEGIN {FS="*"; i=0; ORS=""}
$1=="VAL" {a[i++]=$3}
END {
if (i>1) {
print "[\"" a[0]
for (j = 1; j < i; j++)
print "\",\"" a[j]
print "\"]"
}
if (i==1)
print "\"" a[0] "\""
}

Related

How do I detect embbeded field names and reorder fields using awk?

I have the following data:
"b":1.14105,"a":1.14106,"x":48,"t":1594771200000
"a":1.141,"b":1.14099,"x":48,"t":1594771206000
...
I am trying to display data in a given order and only for three fields. As the fields order is not guaranteed, I need to read the "tag" for each comma separated column for each line.
I have tried to solve this task using awk:
awk -F',' '
{
for(i=1; i<=$NF; i++) {
if(index($i,"\"a\":")!=0) a=$i;
if(index($i,"\"b\":")!=0) b=$i;
if(index($i,"\"t\":")!=0) t=$i;
}
printf("%s,%s,%s\n",a,b,t);
}
'
But I get:
,,
,,
...
In the above data sample, I would expect:
"a":1.14106,"b":1.14105,"t":1594771200000
"a":1.141,"b":1.14099,"t":1594771206000
...
Note: I am using the awk shipped with FreeBSD
$ cat tst.awk
BEGIN {
FS = "[,:]"
OFS = ","
}
{
for (i=1; i<NF; i+=2) {
f[$i] = $(i+1)
}
print p("a"), p("b"), p("t")
}
function p(tag, t) {
t = "\"" tag "\""
return t ":" f[t]
}
.
$ awk -f tst.awk file
"a":1.14106,"b":1.14105,"t":1594771200000
"a":1.141,"b":1.14099,"t":1594771206000
With awk and an array:
awk -F '[:,]' '{for(i=1; i<=NF; i=i+2){a[$i]=$(i+1)}; print "\"a\":" a["\"a\""] ",\"b\":" a["\"b\""] ",\"t\":" a["\"t\""]}' file
or
awk -F '[":,]' '{for(i=2; i<=NF; i=i+4){a[$i]=$(i+2)}; print "\"a\":" a["a"] ",\"b\":" a["b"] ",\"t\":" a["t"]}' file
Output:
"a":1.14106,"b":1.14105,"t":1594771200000
"a":1.141,"b":1.14099,"t":1594771206000
similar awk where you can specify the fields and order.
$ awk -F[:,] -v fields='"a","b","t"' 'BEGIN{n=split(fields,f)}
{for(i=1;i<NF;i+=2) map[$i]=$(i+1);
for(i=1;i<=n;i++) printf "%s", f[i]":"map[f[i]] (i==n?ORS:",")}' file
"a":1.14106,"b":1.14105,"t":1594771200000
"a":1.141,"b":1.14099,"t":1594771206000

How to remove identical columns in a csv file using Bash

There are already a lot of questions like this but neither of them did help me. I want to keep this simple:
I have a file (more than 90 columns) like:
Class,Gene,col3,Class,Gene,col6,Class
A,FF,23,A,FF,16,A
B,GG,45,B,GG,808,B
C,BB,43,C,BB,76,C
I want to keep unique columns so the desired output should be:
Class,Gene,col3,col6
A,FF,23,16
B,GG,45,808
C,BB,43,76
I used awk '!a[$0]++' but it did not remove the repeated columns of the file.
As a side note: I have repetitive columns because I used paste command to join different files column-wise.
You may use this awk to print unique columns based on their names in first header row:
awk 'BEGIN {
FS=OFS="," # set input/output field separators as comma
}
NR == 1 { # for first header row
for (i=1; i<=NF; i++) # loop through all columns
if (!ucol[$i]++) # if col name is not in a unique array
hdr[i] # then store column no. in an array hdr
}
{
for (i=1; i<=NF; i++) # loop through all columns
if (i in hdr) # if col no. is found in array hdr then print
printf "%s",(i==1?"":OFS) $i # then print col with OFS
print "" # print line break
}' file
Class,Gene,col3,col6
A,FF,23,16
B,GG,45,808
C,BB,43,76
For your specific case where you're just trying to remove 2 cols added by paste per original file all you need is:
$ awk '
BEGIN { FS=OFS="," }
{ r=$1 OFS $2; for (i=3; i<=NF; i+=3) r=r OFS $i; print r }
' file
Class,Gene,col3,col6
A,FF,23,16
B,GG,45,808
C,BB,43,76
but in other situations where it's not as simple: create an array (f[] below) that maps output field numbers (determined based on uniqueness of first line field/column names) to the input field numbers then loop through just the output field numbers (note: you don't have to loop through all of the input fields, just the ones that you're going to output) printing the value of the corresponding input field number:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==1 {
for (i=1; i<=NF; i++) {
if ( !seen[$i]++ ) {
f[++nf] = i
}
}
}
{
for (i=1; i<=nf; i++) {
printf "%s%s", $(f[i]), (i<nf ? OFS : ORS)
}
}
.
$ awk -f tst.awk file
Class,Gene,col3,col6
A,FF,23,16
B,GG,45,808
C,BB,43,76
Here's a version with more meaningful variable names and a couple of intermediate variables to clarify what's going on:
BEGIN { FS=OFS="," }
NR==1 {
numInFlds = NF
for (inFldNr=1; inFldNr<=numInFlds; inFldNr++) {
fldName = $inFldNr
if ( !seen[fldName]++ ) {
out2in[++numOutFlds] = inFldNr
}
}
}
{
for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
inFldNr = out2in[outFldNr]
fldValue = $inFldNr
printf "%s%s", fldValue, (outFldNr<numOutFlds ? OFS : ORS)
}
}
Print the first two columns and then iterate in strides of 3 to skip the Class and Gene columns in the rest of the row.
awk -F, '{printf("%s,%s", $1, $2); for (i=3; i<=NF; i+=3) printf(",%s", $i); printf("\n")}'

How to use sed to extract numbers from a comma separated string?

I managed to extract the following response and comma separate it. It's comma seperated string and I'm only interested in comma separated values of the account_id's. How do you pattern match using sed?
Input: ACCOUNT_ID,711111111119,ENVIRONMENT,dev,ACCOUNT_ID,111111111115,dev
Expected Output: 711111111119, 111111111115
My $input variable stores the input
I tried the below but it joins all the numbers and I would like to preserve the comma ','
echo $input | sed -e "s/[^0-9]//g"
I think you're better served with awk:
awk -v FS=, '{for(i=1;i<=NF;i++)if($i~/[0-9]/){printf sep $i;sep=","}}'
If you really want sed, you can go for
sed -e "s/[^0-9]/,/g" -e "s/,,*/,/g" -e "s/^,\|,$//g"
$ awk '
BEGIN {
FS = OFS = ","
}
{
c = 0
for (i = 1; i <= NF; i++) {
if ($i == "ACCOUNT_ID") {
printf "%s%s", (c++ ? OFS : ""), $(i + 1)
}
}
print ""
}' file
711111111119,111111111115

awk regex doesn't work when match ip address

I wanna extract ip address in a file,
each line of the file is like:
T 218.241.107.98 167.232.255.245 7 2719 1378473670 N 0 0 0 G 0 I 218.241.107.97,0.146,1 218.241.98.45,0.239,1 192.168.1.253,0.182,1 159.226.253.77,0.210,1 159.226.253.54,0.676,1 159.226.254.254,39.287,1 203.192.137.173,39.335,1 203.192.134.69,50.128,1 61.14.157.141,42.917,1 202.147.61.193,188.165,1 38.104.84.41,201.100,1 154.54.30.193,194.939,1 154.54.41.221,194.915,1 154.54.5.65,237.396,1 154.54.2.81,251.547,1 154.54.24.153,260.946,1 154.54.26.126,256.046,1 154.54.10.14,245.145,1 193.251.240.113,241.663,1 q q q 57.69.31.22,283.784,1;57.69.31.22,284.763,1
But my awk script doesn't work
#!/usr/bin/awk -f
BEGIN {
FS = "[, \t;]"
}
{
for(i = 4; i <= NF; i++)
{
if ($1 == "#")
continue
if ($i ~ /(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}/)
printf $i"\t"
if (i == NF)
printf "\n"
}
}
Can anyone figure out what's wrong?
Any help will be really appreaciated, thanks in advance.
PS: there is no output but a new line character
Try this awk
awk -F"[, \t;]+" '!/^#/ {for (i=1;i<NF;i++) if ($i ~ /(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}/) printf "%s\t",$i;print ""}' file
218.241.107.98 167.232.255.245 218.241.107.97 218.241.98.45 192.168.1.253 159.226.253.77 159.226.253.54 159.226.254.254 203.192.137.173 203.192.134.69 61.14.157.141 202.147.61.193 38.104.84.41 154.54.30.193 154.54.41.221 154.54.5.65 154.54.2.81 154.54.24.153 154.54.26.126 154.54.10.14 193.251.240.113 57.69.31.22 57.69.31.22
This !/^#/ makes it only prints line not starting with #

awk remove unwanted records and consolidate multiline fields to one line record in specific order

I have an output file that I am trying to process into a formatted csv for our audit team.
I thought I had this mastered until I stumbled across bad data within the output. As such, I want to be able to handle this using awk.
MY OUTPUT FILE EXAMPLE
Enter password ==>
o=hoster
ou=people,o=hoster
ou=components,o=hoster
ou=websphere,ou=components,o=hoster
cn=joe-bloggs,ou=appserver,ou=components,o=hoster
cn=joe
sn=bloggs
cn=S01234565
uid=bloggsj
cn=john-blain,ou=appserver,ou=components,o=hoster
cn=john
uid=blainj
sn=blain
cn=andy-peters,ou=appserver,ou=components,o=hoster
cn=andy
sn=peters
uid=petersa
cn=E09876543
THE OUTPUT I WANT AFTER PROCESSING
joe,bloggs,s01234565;uid=bloggsj,cn=joe-bloggs,ou=appserver,ou=components,o=hoster
john,blain;uid=blainj;cn=john-blain,ou=appserver,ou=components,o=hoster
andy,peters,E09876543;uid=E09876543;cn=andy-peters,ou=appserver,ou=components,o=hoster
As you can see:
we always have a cn= variable that contains o=hoster
uid can have any value
we may have multiple cn= variables without o=hoster
I have acheived the following:
cat output | awk '!/^o.*/ && !/^Enter.*/{print}' | awk '{getline a; getline b; getline c; getline d; print $0,a,b,c,d}' | awk -v srch1="cn=" -v repl1="" -v srch2="sn=" -v repl2="" '{ sub(srch1,repl1,$2); sub(srch2,repl2,$3); print $4";"$2" "$3";"$1 }'
Any pointers or guidance is greatly appreciated using awk. Or should I give up and just use the age old long winded method a large looping script to process the file?
You may try following awk code
$ cat file
Enter password ==>
o=hoster
ou=people,o=hoster
ou=components,o=hoster
ou=websphere,ou=components,o=hoster
cn=joe-bloggs,ou=appserver,ou=components,o=hoster
cn=joe
sn=bloggs
cn=S01234565
uid=bloggsj
cn=john-blain,ou=appserver,ou=components,o=hoster
cn=john
uid=blainj
sn=blain
cn=andy-peters,ou=appserver,ou=components,o=hoster
cn=andy
sn=peters
uid=petersa
cn=E09876543
Awk Code :
awk '
function out(){
print s,u,last
i=0; s=""
}
/^cn/,!NF{
++i
last = i == 1 ? $0 : last
s = i>1 && !/uid/ && NF ? s ? s "," $NF : $NF : s
u = /uid/ ? $0 : u
}
i && !NF{
out()
}
END{
out()
}
' FS="=" OFS=";" file
Resulting
joe,bloggs,S01234565;uid=bloggsj;cn=joe-bloggs,ou=appserver,ou=components,o=hoster
john,blain;uid=blainj;cn=john-blain,ou=appserver,ou=components,o=hoster
andy,peters,E09876543;uid=petersa;cn=andy-peters,ou=appserver,ou=components,o=hoster
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
This awk script works for your sample and produces the sample output:
BEGIN { delete cn[0]; OFS = ";" }
function print_info() {
if (length(cn)) {
names = cn[1] "," sn
for (i=2; i <= length(cn); ++i) names = names "," cn[i]
print names, uid, dn
delete cn
}
}
/^cn=/ {
if ($0 ~ /o=hoster/) dn = $0
else {
cn[length(cn)+1] = substr($0, index($0, "=") + 1)
uid = $0; sub("cn", "uid", uid)
}
}
/^sn=/ { sn = substr($0, index($0, "=") + 1) }
/^uid=/ { uid = $0 }
/^$/ { print_info() }
END { print_info() }
This should help you get started.
awk '$1 ~ /^cn/ {
for (i = 2; i <= NF; i++) {
if ($i ~ /^uid/) {
u = $i
continue
}
sub(/^[^=]*=/, x, $i)
r = length(r) ? r OFS $i : $i
}
print r, u, $1
r = u = x
}' OFS=, RS= infile
I assume that there is an error in your sample output: in the 3d record the uid should be petersa and not E09876543.
You might want look at some of the "already been there and done that" solutions to accomplish the task.
Apache Directory Studio for example, will do the LDAP query and save the file in CSV or XLS format.
-jim