Regular Expression to extract multiple values from a delimited string [closed]

Regular Expression to extract multiple values from a delimited string [closed] - regex

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to extract both i-name & ipaddress from the below string (where ; is delimiter)
INPUT:
i-03ghijklmn345;abc;xyz;pqr;null;abc;null;null;null;disabled;/dev/sda1;abc;abc: User initiated shutdown;abc;abc;vpc-abc;**192.186.40.255**;abc /dev/sda1 vol-abc 2017-15-14T12:04:17.000z
I was able to retrieve the ipaddress only from this using ([0-9]{1,3}[\.]){3}[0-9]{1,3} but I need both strings in one line
OUTPUT:
i-03ghijklmn345;192.186.40.255

No need for AWK. Use grep:
# Partial Bash script
I_NAME=$(cat your_file | grep -Po 'i-\w+')
IP_ADDR=$(cat your_file | grep -Po '\d{1,3}(?:\.\d{1,3}){3}')
The RegEx is between the single quotes in the commands above.

If you want a awk solution and for a bit of diversity you can use the following commands:
iName=$(awk 'BEGIN{RS=";"}/^i-\w+/{print $1; exit}' inputFile)
ipAddress=$(awk 'BEGIN{RS=";"}/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print $1; exit}' inputFile)
echo $iName
echo $ipAddress
output:
i-03ghijklmn345
192.186.40.255
explanations:
BEGIN{RS=";"} you defined ; as record separator
/^i-\w+/{print $1; exit} when you reach the i-name it will be printed and the process will stop at that point and will not continue analyzing the input string
/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print $1; exit} works the same way to extract the IP address.
finally you assign the result to the 2 variables and display their content or do whatever you want with them.
change the inputFile with what fit your needs.
If you want to put it in one variable use the following awk command:
$ awk 'BEGIN{RS=";"}/^i-\w+/{printf $1;}/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print ";"$1;exit}' inputFile;
i-03ghijklmn345;192.186.40.255
TESTED:

Considering your pattern, the first field is some sort of an id and so it is inappropriate for an id to contain an asterisk(*). Also the ip address is always enclosed between asterisks(*). In that case below awk would also help.
$ cat 48437686
i03ghijklmn345;abc;xyz;pqr;null;abc;null;null;null;disabled;/dev/sda1;abc;abc: User initiated shutdown;abc;abc;vpc-abc;**192.186.40.255**;abc /dev/sda1 vol-abc 2017-15-14T12:04:17.000z
$ awk -v RS=";" 'BEGIN{oldORS=ORS}NR==1 || /^\*\*.*\*\*$/{gsub(/*/,"");ORS=NR==1?";":oldORS;print}' 48437686
i03ghijklmn345;192.186.40.255

With awk. Set input and output field separator to ; and print columns 1 and 17:
awk 'BEGIN{FS=OFS=";"} {print $1,$17}' file
Output:
i-03ghijklmn345;192.186.40.255

Related

AWK/SED Split a list if the row starts with OR ends with [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
trying to split my list by redirecting all the lines that match a condition (starts with OR ends with) into a file otherwise in another one.
Trying to use this AWK but seems not working:
awk '{print >out}; /^abc|abc.com$/{out="file2"}' out=file1 MyLargeList.lst
Any help would be appreciated...

The problem is that you assign out on the first match and then never change it again. In short, you seem to assume that out=file1 on the command line will be re-evaluated on each iteration of the script, but this is not true.
Also, you print before you reassign, so the first match goes to the wrong file.
awk '{ if (/^abc|abc.com$/) out="file2"
else out="file1"
print >out }' MyLargeList.lst
As already suggested (without any explanation) in a comment, this can be elegantly but somewhat obscurely rearticulated to use a ternary boolean operator.
awk '{ print > (/^abc|abc.com$/ ? "file2" : "file1") }' MyLargeList.lst
In brief, x ? y : z returns y if x is true, otherwise z.

This might work for you (GNU sed):
sed -ne '/^abc\|abc\.com$/w file2' -e '//!w file1' file
Turn off implicit printing -n.
If a line begins abc or ends abc.com, write to file2, otherwise write to file1.
Or if you only want to separate the first condition, use:
sed '/^abc\|abc\.com$/w file2' file > file1
Of course, grep would probably be quicker:
grep '^abc\|abc\.com$' file > file2
and
grep -v '^abc\|abc\.com$' file > file1

A simple solution for this task, in one awk pass, is:
awk '/^abc|abc\.com$/{print > "file1"; next} {print > "file2"}' file
In general, if you want to print lines to many files, based on many patterns matching, this scales to:
awk '/pattern1/{out=f1;next} /pattern2/{out=f2;next} ... {print > out}' file
where you probably need a default output (for no matching lines) like this:
awk '... /pattern3/{out=f3;next} {out=f} {print > out}' file
and in case of many outputs, to avoid the too many open files error, you may need a close statement at the beginning:
awk '{close(out)} /pattern1/{out=f1} ... {print > out}' file
Testing
Here is an example file:
> cat file
abc
test
abc.com
test
test
abc
end
And the result:
> cat file1
abc
abc.com
abc
> cat file2
test
test
test
end

More elegant way to extract substring in shell [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I wrote regex to get chartname(auth-token-service)). But this seems very crude, can someone write a more precise way.
chartname=`echo my-auth-token-service=xxx.azurecr.io/auth-token-service:latest | cut -d= -f1 | sed -e "s/^.*-//"`

Gets text between '=' and '/'
sed "s/.*=\(.*\)\/.*/\1/" = xxx.azurecr.io
Gets text between '/' and ':'
sed "s/.*\/\(.*\):.*/\1/" = auth-token-service
Gets text after ':'
sed "s/.*:\(.*\)/\1/" = latest

Not familiar with the format of token, but if I understood correctly you just want the part after the slash and before the colon.
echo my-auth-token-service=xxx.azurecr.io/auth-token-service:latest | sed -e 's/^.\+\/\([^\/]\+\):[^:]\+$/\1/'

Since you asked for a regex solution:
string=my-auth-token-service=xxx.azurecr.io/auth-token-service:latest
[[ $string =~ /([^:]*) ]] && chartname=${BASH_REMATCH[1]}
This assumes that the chartname is always between the / and the :. Note that chartname would be unassigned with this, if the reges does not match.

The Unix shell has parameter expansion built in. You can't nest these, so it takes multiple steps, but you avoid the overhead of starting multiple external processes.
var='my-auth-token-service=xxx.azurecr.io/auth-token-service:latest'
chartname=${var%%=*}
chartname=${chartname#*-}
The suffix operator ${var%pattern} returns the value of $var with any suffix matching pattern removed; the ${var#pattern} operator does the same for a prefix match. Doubling the operator changes it to trim the longest possible pattern match instead of the shortest. (These are shell glob patterns, not regular expressions, though.)
If you require a one-liner, you can refactor the cut into the sed script.
chartname=$(sed 's/[^-]*\([^=]*\)=.*/\1/' <<< 'my-auth-token-service=xxx.azurecr.io/auth-token-service:latest')
Notice the modernized syntax $(cmd ...) over the obsolescent `cmd ...` and the Bash "here string" with <<< (not POSIX-compatible though).

With awk only tested on the GNU variant.
var=my-auth-token-service=xxx.azurecr.io/auth-token-service:latest
echo "$var" | awk -F'[=:/]' -vOFS='\n' '{print $1, $2, $3, $NF}'
Output
my-auth-token-service
xxx.azurecr.io
auth-token-service
latest

Regex group match using shell [duplicate]

This question already has answers here:
How do I use grep to extract a specific field value from lines
(2 answers)
Closed 3 years ago.
I am trying to match a pattern and set that as a variable.
I have a file with many "value=key". I want to find the value for key "fizz".
In the file I have this string
fizz="something_cool"
I try to parse it as:
cat file | grep fizz="(.*)"
I was thinking it would give me the group output, and then I would be able to use $1 to select it.
I also play with escaping characters and sed and awk. But I could not manage to get it working.

You need to enable extended regex for using unescaped ( and ) and quote pattern properly to make it:
grep -E 'fizz="(.*)"' file
However awk might be better choice here since it will do both search and filter in same command.
You may just use:
awk -F= '$1 == "fizz" {gsub(/"/, "", $2); print $2}' file
something_cool

cURL and Bash, capture variable and value from string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Using bash, grep, split, awk, or sed, I would like to capture
ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ
from
Set-Cookie: ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ; secure; path=/
'ASPSESSIONID' remains always the same + 8 random characters (SUSTQBQS).
Also this variable may not always be located in the second columns or right after 'Set-Cookie: '
Can anyone please help ?

With awk:
awk -F"[ ;]" '{print $2}' FileName
Set the field seperator as space and ;. Then print the 2nd field.

The basic regex structure is the same in various programs.
It may be explained in words as: The text between the colon/space(: ) and the semicolon (;). Which, in regex parlance is:
: ([^;]*);
And could be assigned to a var:
RE=': ([^;]*);'
Then, we could use it in
bash
while read l; do
[[ $l =~ $RE ]] && echo "${BASH_REMATCH[1]}";
done <file
gawk
gawk -v RE="$RE" '$0 ~ RE { print gensub(".*"RE".*","\\1",1); }' file
sed
sed -rn 's/^.*'"$RE"'.*$/\1/p' file # using -r avoids the several `\`

Try this sed command
sed 's/[^:]\+..\([^;]\+\).*/\1/' FileName
Explanation:
[^:]\+ -- Remove the charecters until :
.. -- Remove two characters
\([^;]\+\) -- Capture the group until ; found
.* -- Remove the all character after capture the group
\1 -- Finally print the captured group
Output :
ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ

reorder great number of columns in a text file [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I need a method (awk/perl/sed/shell) to modify the contents of a file as below:
Before:
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|A1|A2|B1|B2|C1|C2|2013|2013|9999|Y
After:
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|9999|Y|A1|B1|C1|NULL|NULL|NULL|2013|2013
I need to move the last 2 columns after the 9th column, remove the columns 11,13, 15 and also insert the NULL|NULL|NULL in between the 14th and 15th columns (C1|2013). Any tips appreciated. cut command cannot change the order of insertion, so will need to go another way. The input file has 10 million such rows and i'm looking for the best way to do this.

Ugly question calls for ugly solution:
awk -F"|" '{
for(i=1;i<=9;i++) { printf "%s|" ,$i }
printf "%s|%s|",$(NF-1),$NF
for(i=10;i<16;i+=2) { printf "%s|" ,$i }
printf "%s|%s|%s|","NULL","NULL","NULL"
for(i=16;i<(NF-2);i++) { printf "%s|" ,$i }
print $(NF-2)
}' inputFile

Code for GNU sed:
sed -r 's/((\w+\|){9})(\w+\|)\w+\|(\w+\|)\w+\|(\w+\|)\w+(\|\w+\|)(\w+)\|(\w+\|\w+)/\1\8|\3\4\5NULL|NULL|NULL\6\7/' file
$cat file
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|A1|A2|B1|B2|C1|C2|2013|2013|9999|Y
$sed -r 's/((\w+\|){9})(\w+\|)\w+\|(\w+\|)\w+\|(\w+\|)\w+(\|\w+\|)(\w+)\|(\w+\|\w+)/\1\8|\3\4\5NULL|NULL|NULL\6\7/' file
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|9999|Y|A1|B1|C1|NULL|NULL|NULL|2013|2013

You can use awk for this:
awk 'BEGIN{FS=OFS="|"}{print $1,$2,...,"9999|Y",..."NULL|NULL|NULL",...'
$1 is the first field, $2 the second, etc.

Don't want count your columns, but you can get the idea from the next perl script:
perl -F'/\|/' -lanE 'say join("|", $F[2], "NULL", "NULL", $F[0], $F[3], $F[1])'
for the input
123456|ABCDEF|123|011
produces
123|NULL|NULL|123456|011|ABCDEF
tha autosplit mode splits each line on the | character and you can reorder the fields anyhow you need. the join joins fields together with the |.
For fun - pure bash - and slow :)
while IFS='|' read -r a b c d
do
echo "$a|NULL|$d|$c|NULL|$b"
done << EOF
123456|ABCDEF|123|011
EOF
prints
123456|NULL|011|123|NULL|ABCDEF

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression to extract multiple values from a delimited string [closed] - regex

No need for AWK. Use grep: # Partial Bash script I_NAME=$(cat your_file | grep -Po 'i-\w+') IP_ADDR=$(cat your_file | grep -Po '\d{1,3}(?:\.\d{1,3}){3}') The RegEx is between the single quotes in the commands above.

With awk. Set input and output field separator to ; and print columns 1 and 17: awk 'BEGIN{FS=OFS=";"} {print $1,$17}' file Output: i-03ghijklmn345;192.186.40.255

Related

AWK/SED Split a list if the row starts with OR ends with [closed]

More elegant way to extract substring in shell [closed]

Regex group match using shell [duplicate]

cURL and Bash, capture variable and value from string [closed]

reorder great number of columns in a text file [closed]

Categories

Resources