awk/Perl, select different fields and combine them [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have stream output in the below format which I like to filter and print out specific fields:
customer $2 $3
Address $2 $3 $4 $5 $6
for example print field#2 from line 1 and print field#6 from line 2 and then print them together separated by space.
Can someone share how this can be done in perl, awk or sed..etc?

In awk you can hold data in variables (and use the line number in patterns). For example, in your sample
print field#2 from line 1 and print field#6 from line 2 and then print them together separated by space.
The command would be
awk 'NR==1 {x=$2} NR==2 {print x " " $6}'

awk 'NR==1{s=$2;next} {print s ORS $6 ORS s, $6}' file

This might work for you (GNU sed):
sed -rn '/^customer/{N;s/^((\S+)\s*){2}.*\n((\S+)\s*){6}.*/\2 \4/p}' file

Related

AWK/SED Split a list if the row starts with OR ends with [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
trying to split my list by redirecting all the lines that match a condition (starts with OR ends with) into a file otherwise in another one.
Trying to use this AWK but seems not working:
awk '{print >out}; /^abc|abc.com$/{out="file2"}' out=file1 MyLargeList.lst
Any help would be appreciated...
The problem is that you assign out on the first match and then never change it again. In short, you seem to assume that out=file1 on the command line will be re-evaluated on each iteration of the script, but this is not true.
Also, you print before you reassign, so the first match goes to the wrong file.
awk '{ if (/^abc|abc.com$/) out="file2"
else out="file1"
print >out }' MyLargeList.lst
As already suggested (without any explanation) in a comment, this can be elegantly but somewhat obscurely rearticulated to use a ternary boolean operator.
awk '{ print > (/^abc|abc.com$/ ? "file2" : "file1") }' MyLargeList.lst
In brief, x ? y : z returns y if x is true, otherwise z.
This might work for you (GNU sed):
sed -ne '/^abc\|abc\.com$/w file2' -e '//!w file1' file
Turn off implicit printing -n.
If a line begins abc or ends abc.com, write to file2, otherwise write to file1.
Or if you only want to separate the first condition, use:
sed '/^abc\|abc\.com$/w file2' file > file1
Of course, grep would probably be quicker:
grep '^abc\|abc\.com$' file > file2
and
grep -v '^abc\|abc\.com$' file > file1
A simple solution for this task, in one awk pass, is:
awk '/^abc|abc\.com$/{print > "file1"; next} {print > "file2"}' file
In general, if you want to print lines to many files, based on many patterns matching, this scales to:
awk '/pattern1/{out=f1;next} /pattern2/{out=f2;next} ... {print > out}' file
where you probably need a default output (for no matching lines) like this:
awk '... /pattern3/{out=f3;next} {out=f} {print > out}' file
and in case of many outputs, to avoid the too many open files error, you may need a close statement at the beginning:
awk '{close(out)} /pattern1/{out=f1} ... {print > out}' file
Testing
Here is an example file:
> cat file
abc
test
abc.com
test
test
abc
end
And the result:
> cat file1
abc
abc.com
abc
> cat file2
test
test
test
end

Separate multiple subdomains into all possible subdomain combinations using bash and awk [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to separate multiple subdomains into all possible subdomain combinations using bash.
For example if subdomains.txt has:
www.ir.example.com
www.it.api4.qa.example.com
www.api.example2.com
The expected output has to be:
example.com
ir.example.com
www.ir.example.com
qa.example.com
api4.qa.example.com
it.api4.qa.example.com
example2.com
api.example2.com
www.api.example2.com
I think that the best idea is to use the . to separate the subdomains without breaking the original domain but i'm not sure how to achieve this, any help it would be great.
Using awk:
awk 'BEGIN{FS=OFS="."} # Set the input and output field separator to a dot
{
for(i=1;i<NF;i++) { # Number of domains to print
for(j=i;j<NF;j++) # For each domain element
d=d $j OFS; # d is the domain
a[d $NF] # store it in the array a
d="" # Reset the domain
}
}
END{
for(i in a) # Loop through each element of the array a
print i # and print it
}' file
Note the use of the array a is for having unique domain name (and not twice example.com).
Note also the domain are not sorted, you may pipe the command through sort if needed.
Perl comes with any linux distro as far as I know (and some UNIXes). So I throw here an alternative with perl:
perl -e 'while(<>){while(s/^([^.]+\.)(.+)/$2/){$x{$1.$2}=1}}print "$_\n" foreach(keys %x)' subdomains.txt
The code, 'unfolded':
while(<>){ # read file line by line. Store line at $_
# Match first subdomain to group $1 and the rest to group $2
# replace by $2, so we will remove the first subdomain part
while(s/^([^.]+\.)(.+)/$2/){
# Store it on a hash (that will avoid printing duplicates)
$x{$1.$2}=1
}
}
# print the keys of the hash
print "$_\n" foreach(keys %x)
You can try this awk
awk -F'.' '{b=$NF;for(i=NF-1;i>0;i--){b=$i FS b;print b}}' infile
Here is a solution using GNU sed:
sed -nr 's/\./#/g;:a;/#/!{p;bb};s/#([^#]+)$/.\1/;h;s/.*#//p;g;ta;:b' subdomains.txt

Regular Expression to extract multiple values from a delimited string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to extract both i-name & ipaddress from the below string (where ; is delimiter)
INPUT:
i-03ghijklmn345;abc;xyz;pqr;null;abc;null;null;null;disabled;/dev/sda1;abc;abc: User initiated shutdown;abc;abc;vpc-abc;**192.186.40.255**;abc /dev/sda1 vol-abc 2017-15-14T12:04:17.000z
I was able to retrieve the ipaddress only from this using ([0-9]{1,3}[\.]){3}[0-9]{1,3} but I need both strings in one line
OUTPUT:
i-03ghijklmn345;192.186.40.255
No need for AWK. Use grep:
# Partial Bash script
I_NAME=$(cat your_file | grep -Po 'i-\w+')
IP_ADDR=$(cat your_file | grep -Po '\d{1,3}(?:\.\d{1,3}){3}')
The RegEx is between the single quotes in the commands above.
If you want a awk solution and for a bit of diversity you can use the following commands:
iName=$(awk 'BEGIN{RS=";"}/^i-\w+/{print $1; exit}' inputFile)
ipAddress=$(awk 'BEGIN{RS=";"}/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print $1; exit}' inputFile)
echo $iName
echo $ipAddress
output:
i-03ghijklmn345
192.186.40.255
explanations:
BEGIN{RS=";"} you defined ; as record separator
/^i-\w+/{print $1; exit} when you reach the i-name it will be printed and the process will stop at that point and will not continue analyzing the input string
/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print $1; exit} works the same way to extract the IP address.
finally you assign the result to the 2 variables and display their content or do whatever you want with them.
change the inputFile with what fit your needs.
If you want to put it in one variable use the following awk command:
$ awk 'BEGIN{RS=";"}/^i-\w+/{printf $1;}/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print ";"$1;exit}' inputFile;
i-03ghijklmn345;192.186.40.255
TESTED:
Considering your pattern, the first field is some sort of an id and so it is inappropriate for an id to contain an asterisk(*). Also the ip address is always enclosed between asterisks(*). In that case below awk would also help.
$ cat 48437686
i03ghijklmn345;abc;xyz;pqr;null;abc;null;null;null;disabled;/dev/sda1;abc;abc: User initiated shutdown;abc;abc;vpc-abc;**192.186.40.255**;abc /dev/sda1 vol-abc 2017-15-14T12:04:17.000z
$ awk -v RS=";" 'BEGIN{oldORS=ORS}NR==1 || /^\*\*.*\*\*$/{gsub(/*/,"");ORS=NR==1?";":oldORS;print}' 48437686
i03ghijklmn345;192.186.40.255
With awk. Set input and output field separator to ; and print columns 1 and 17:
awk 'BEGIN{FS=OFS=";"} {print $1,$17}' file
Output:
i-03ghijklmn345;192.186.40.255

cURL and Bash, capture variable and value from string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Using bash, grep, split, awk, or sed, I would like to capture
ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ
from
Set-Cookie: ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ; secure; path=/
'ASPSESSIONID' remains always the same + 8 random characters (SUSTQBQS).
Also this variable may not always be located in the second columns or right after 'Set-Cookie: '
Can anyone please help ?
With awk:
awk -F"[ ;]" '{print $2}' FileName
Set the field seperator as space and ;. Then print the 2nd field.
The basic regex structure is the same in various programs.
It may be explained in words as: The text between the colon/space(: ) and the semicolon (;). Which, in regex parlance is:
: ([^;]*);
And could be assigned to a var:
RE=': ([^;]*);'
Then, we could use it in
bash
while read l; do
[[ $l =~ $RE ]] && echo "${BASH_REMATCH[1]}";
done <file
gawk
gawk -v RE="$RE" '$0 ~ RE { print gensub(".*"RE".*","\\1",1); }' file
sed
sed -rn 's/^.*'"$RE"'.*$/\1/p' file # using -r avoids the several `\`
Try this sed command
sed 's/[^:]\+..\([^;]\+\).*/\1/' FileName
Explanation:
[^:]\+ -- Remove the charecters until :
.. -- Remove two characters
\([^;]\+\) -- Capture the group until ; found
.* -- Remove the all character after capture the group
\1 -- Finally print the captured group
Output :
ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ

reorder great number of columns in a text file [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I need a method (awk/perl/sed/shell) to modify the contents of a file as below:
Before:
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|A1|A2|B1|B2|C1|C2|2013|2013|9999|Y
After:
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|9999|Y|A1|B1|C1|NULL|NULL|NULL|2013|2013
I need to move the last 2 columns after the 9th column, remove the columns 11,13, 15 and also insert the NULL|NULL|NULL in between the 14th and 15th columns (C1|2013). Any tips appreciated. cut command cannot change the order of insertion, so will need to go another way. The input file has 10 million such rows and i'm looking for the best way to do this.
Ugly question calls for ugly solution:
awk -F"|" '{
for(i=1;i<=9;i++) { printf "%s|" ,$i }
printf "%s|%s|",$(NF-1),$NF
for(i=10;i<16;i+=2) { printf "%s|" ,$i }
printf "%s|%s|%s|","NULL","NULL","NULL"
for(i=16;i<(NF-2);i++) { printf "%s|" ,$i }
print $(NF-2)
}' inputFile
Code for GNU sed:
sed -r 's/((\w+\|){9})(\w+\|)\w+\|(\w+\|)\w+\|(\w+\|)\w+(\|\w+\|)(\w+)\|(\w+\|\w+)/\1\8|\3\4\5NULL|NULL|NULL\6\7/' file
$cat file
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|A1|A2|B1|B2|C1|C2|2013|2013|9999|Y
$sed -r 's/((\w+\|){9})(\w+\|)\w+\|(\w+\|)\w+\|(\w+\|)\w+(\|\w+\|)(\w+)\|(\w+\|\w+)/\1\8|\3\4\5NULL|NULL|NULL\6\7/' file
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|9999|Y|A1|B1|C1|NULL|NULL|NULL|2013|2013
You can use awk for this:
awk 'BEGIN{FS=OFS="|"}{print $1,$2,...,"9999|Y",..."NULL|NULL|NULL",...'
$1 is the first field, $2 the second, etc.
Don't want count your columns, but you can get the idea from the next perl script:
perl -F'/\|/' -lanE 'say join("|", $F[2], "NULL", "NULL", $F[0], $F[3], $F[1])'
for the input
123456|ABCDEF|123|011
produces
123|NULL|NULL|123456|011|ABCDEF
tha autosplit mode splits each line on the | character and you can reorder the fields anyhow you need. the join joins fields together with the |.
For fun - pure bash - and slow :)
while IFS='|' read -r a b c d
do
echo "$a|NULL|$d|$c|NULL|$b"
done << EOF
123456|ABCDEF|123|011
EOF
prints
123456|NULL|011|123|NULL|ABCDEF