regular expression that use th previous pattern with awk - regex

This is content of my log file:
INFO consume_end_processor: user:bbbb callee_num:+23455539764806 sid:I374uribbbbb151101030212130 duration:0 result:ok provider:sipouthh.ym.ms
INFO consume_processor: user:bbbb callee_num:+23455539764806 sid:<<"A28udestaniephillips52x151031185754827">> duration:0 result:ok provider:sipouthh.ym.ms
and I need to extract the content from :
sid:<<"A28udestaniephillips52x151031185754827">>
sid:A28udestaniephillips52x151031185754827
like A28udestaniephillips52x151031185754827
My answer is awk '/(?<=sid)^[A-Z]+\/{print $8 }', however this is wrong and I am not sure how to fix it.
How can I write my regular expression in awk in order to extract just this part of information.
Thank you for any help.

$ awk '{ sub(/^sid:(<<")?/,"",$5); sub(/">>$/, "", $5); print $5}' log.txt
I374uribbbbb151101030212130
A28udestaniephillips52x151031185754827
Here we are simply using sub to remove (by replacing with an empty string) the parts of the 5th field that we don't want.
The first sub removes the leading sid:, that may optionally be followed by <<".
The second sub removes a trailing ">>. Note that if there is no trailing ">>, then the sub does nothing and is harmless.

$ awk '{gsub(/sid:(<<")?|">>/,"",$5); print $5}' file
I374uribbbbb151101030212130
A28udestaniephillips52x151031185754827

Related

Concatenate urls based on result of two columns

I would like to first take out of the string in the first column parenthesis which I can do with:
awk -F"[()]" '{print $2}'
Then, concatenate it with the second column to create a URL with the following format:
"https://ftp.drupal.org/files/projects/"[firstcolumn stripped out of parenthesis]-[secondcolumn].tar.gz
With input like:
Admin Toolbar (admin_toolbar) 8.x-2.5
Entity Embed (entity_embed) 8.x-1.2
Views Reference Field (viewsreference) 8.x-2.0-beta2
Webform (webform) 8.x-5.28
Data from the first line would create this URL:
https://ftp.drupal.org/files/projects/admin_toolbar-8.x-2.5.tar.gz
Something like
sed 's!^[^(]*(\([^)]*\))[[:space:]]*\(.*\)!https://ftp.drupal.org/files/projects/\1-\2.tar.gz!' input.txt
If a file a has your input, you can try this:
$ awk -F'[()]' '
{
split($3,parts," *")
printf "https://ftp.drupal.org/files/projects/%s-%s.tar.gz\n", $2, parts[2]
}' a
https://ftp.drupal.org/files/projects/admin_toolbar-8.x-2.5.tar.gz
https://ftp.drupal.org/files/projects/entity_embed-8.x-1.2.tar.gz
https://ftp.drupal.org/files/projects/viewsreference-8.x-2.0-beta2.tar.gz
https://ftp.drupal.org/files/projects/webform-8.x-5.28.tar.gz
The trick is to split the third field ($3). Based on your field separator ( -F'[()]'), the third field contains everything after the right paren. So, split can be used to get rid of all the spaces. I probably should have searched for an awk "trim" equivalent.
In the example data, the second last column seems to contain the part with the parenthesis that you are interested in, and the value of the last column.
If that is always the case, you can remove the parenthesis from the second last column, and concat the hyphen and the last column.
awk '{
gsub(/[()]/, "", $(NF-1))
printf "https://ftp.drupal.org/files/projects/%s-%s.tar.gz%s", $(NF-1), $NF, ORS
}' file
Output
https://ftp.drupal.org/files/projects/admin_toolbar-8.x-2.5.tar.gz
https://ftp.drupal.org/files/projects/entity_embed-8.x-1.2.tar.gz
https://ftp.drupal.org/files/projects/viewsreference-8.x-2.0-beta2.tar.gz
https://ftp.drupal.org/files/projects/webform-8.x-5.28.tar.gz
Another option with a regex and gnu awk, using match and 2 capture groups to capture what is between the parenthesis and the next field.
awk 'match($0, /^[^()]*\(([^()]+)\)\s+(\S+)/, ary) {
printf "https://ftp.drupal.org/files/projects/%s-%s.tar.gz%s", ary[1], ary[2], ORS
}' file
This might work for you (GNU sed):
sed 's#.*(#https://ftp.drupal.org/files/projects/#;s/)\s*/-/;s/\s*$/.tar.gz/' file
Pattern match, replacing the unwanted parts by the required strings.
N.B. The use of the # as a delimiter for the substitution command to avoid inserting back slashes into the literal replacement.
The above solution could be ameliorated into:
sed -E 's#.*\((.*)\)\s*(\S*).*#https://ftp.drupal.org/files/projects/\1-\2.tar.gz#' file

awk Regular Expression (REGEX) get phone number from file

The following is what I have written that would allow me to display only the phone numbers
in the file. I have posted the sample data below as well.
As I understand (read from left to right):
Using awk command delimited by "," if the first char is an Int and then an int preceded by [-,:] and then an int preceded by [-,:]. Show the 3rd column.
I used "www.regexpal.com" to validate my expression. I want to learn more and an explanation would be great not just the answer.
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
awk -F "," '/^(\d)+([-,:*]\d+)+([-,:*]\d+)*$/ {print $3}' bashuser.csv
bashuser.csv
Jordon,New York,630-150,7234
Jaremy,New York,630-250-7768
Jordon,New York,630*150*7745
Jaremy,New York,630-150-7432
Jordon,New York,630-230,7790
Expected Output:
6301507234
6302507768
....
You could just remove all non int
awk '{gsub(/[^[:digit:]]/, "")}1' file.csv
gsub remove all match
[^[:digit:]] the ^ everything but what is next to it, which is an int [[:digit:]], if you remove the ^ the reverse will happen.
"" means remove or delete in awk inside the gsub statement.
1 means print all, a shortcut for print
In sed
sed 's/[^[:digit:]]*//g' file.csv
Since your desired output always appears to start on field #3, you can simplify your regrex considerably using the following:
awk -F '[*,-]' '{print $3$4$5}'
Proof of concept
$ awk -F '[*,-]' '{print $3$4$5}' < ./bashuser.csv
6301507234
6302507768
6301507745
6301507432
6302307790
Explanation
-F '[*,-]': Use a character class to set the field separators to * OR , OR -.
print $3$4$5: Concatenate the 3rd through 5th fields.
awk is not very suitable because the comma occurs not only as a separator of records, better results will give sed:
sed 's/[^,]\+,[^,]\+,//;s/[^0-9]//g;' bashuser.csv
first part s/[^,]\+,[^,]\+,// removes first two records
second part //;s/[^0-9]//g removes all remaining non-numeric characters

How to use 'sed' to add dynamic prefix to each number in integer list?

How can I use sed to add a dynamic prefix to each number in an integer list?
For example:
I have a string "A-1,2,3,4,5", I want to transform it to string "A-1,A-2,A-3,A-4,A-5" - which means I want to add prefix of first integer i.e. "A-" to each number of the list.
If I have string like "B-1,20,300" then I want to transform it to string "B-1,B-20,B-300".
I am not able to use RegEx Capturing Groups because for global match they do not retain their value in subsequent matches.
When it comes to looping constructs in sed, I like to use newlines as markers for the places I have yet to process. This makes matching much simpler, and I know they're not in the input because my input is a text line.
For example:
$ echo A-1,2,3,4,5 | sed 's/,/\n/g;:a s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/; ta'
A-1,A-2,A-3,A-4,A-5
This works as follows:
s/,/\n/g # replace all commas with newlines (insert markers)
:a # label for looping
s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/ # replace the next marker with a comma followed
# by the prefix
ta # loop unless there's nothing more to do.
The approach is similar to #potong's, but I find the regex much more readable -- \([^0-9]*\) captures the prefix, \([^\n]*\) captures everything up to the next marker (i.e. everything that's already been processed), and then it's just a matter of reassembling it in the substitution.
Don't use sed, just use the other standard UNIX text manipulation tool, awk:
$ echo 'A-1,2,3,4,5' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
A-1,A-2,A-3,A-4,A-5
$ echo 'B-1,20,300' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
B-1,B-20,B-300
This might work for you (GNU sed):
sed -E ':a;s/^((([^-]+-)[^,]+,)+)([0-9])/\1\3\4/;ta' file
Uses pattern matching and a loop to replace a number following a comma by the first column prefix and that number.
Assuming this is for shell scripting, you can do so with 2 seds:
set string = "A1,2,3,4,5"
set prefix = `echo $string | sed 's/^\([A-Z]\).*/\1/'`
echo $string | sed 's/,\([0-9]\)/,'$prefix'-\1/g'
Output is
A1,A-2,A-3,A-4,A-5
With
set string = "B-1,20,300"
Output is
B-1,B-20,B-300
Could you please try following(if ok with awk).
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i !~ /^A/&&$i !~ /\"A/){
$i="A-"$i
}
}
}
1' Input_file
if your data in 'd' file, tried on gnu sed:
sed -E 'h;s/^(\w-).+/\1/;x;G;:s s/,([0-9]+)(.*\n(.+))/,\3\1\2/;ts; s/\n.+//' d

Gawk regexp to select sequence

sorry for the nth simple question on regexp but I'm not able to get what I need without a what seems to me a too complicated solution. I'm parsing a file containing sequence of only 3 letters A,E,D as in
AADDEEDDA
EEEEEEEE
AEEEDEEA
AEEEDDAAA
and I'd like to identify only those that start with E and ends in D with only one change in the sequence as for example in
EDDDDDDDD
EEEDDDDDD
EEEEEEEED
I'm fighting with the proper regexp to do that. Here my last attempt
echo "1,AAEDDEED,1\n2,EEEEDDDD,2\n3,EDEDEDED" | gawk -F, '{if($2 ~ /^E[(ED){1,1}]*D$/ && $2 !~ /^E[(ED){2,}]*D$/) print $0}'
which does not work. Any help?
Thanks in advance.
If i understand correctly your request a simple
awk '/^E+D+$/' file.input
will do the trick.
UPDATE: if the line format contains pre/post numbers (with post optional) as showed later in the example, this can be a possible pure regex adaptation (alternative to the use of field switch-F,):
awk '/^[0-9]+,E+D+(,[0-9]+)?$/' input.test
First of all, you need the regular expression:
^E+[^ED]*D+$
This matches one or more Es at the beginning, zero or more characters that are neither E nor D in the middle, and one or more Ds at the end.
Then your AWK program will look like
$2 ~ /^E+[^ED]*D+$/
$2 refers to the 2nd field of the current record, ~ is the regex matching operator, and /s delimit a regular expression. Together, these components form what is known in AWK jargon as a "pattern", which amounts to a boolean filter for input records. Note that there is no "action" (a series of statements in {s) specified here. That's because when no action is specified, AWK assumes that the action should be { print $0 }, which prints the entire line.
If I understand you correct you want to match patterns that starts with at least one E and then continues with at least one D until the end.
echo "1,AAEDDEED,1\n2,EEEEDDDD,2\n3,EDEDEDED" | gawk -F, '{if($2 ~ /^E+D+$) print $0}'

AWK to match strings beginning with a number

I want to print all the lines of a file where the first element of each line begins with a number using awk. Below are the details on the data contained in the file and command used:
filename contents:
12.44.4444goad ABCDEF/END
LMNOP/START joker
98.0 kites
command used:
awk '{ $1 ~ /^\d[a-zA-Z0-9]*/ }' filename
After running the above command, no results are displayed on the prompt.
Please let me know if there is any correction that needs to be made to the above command.
To print the lines starting with a digit, you can try the following:
awk '/^[[:digit:]]+/' file
as pointed out by #HenkLangeveld your syntax is incorrect. Also the regex \d is not available in awk.
If you only need to match at least one digit at the start of the line, all you need is ^ to match the start of a line and [0-9] to match a digit.
You can use curly brackets with an if statement:
awk '{if($1 ~ /^[0-9]/) print $0}' filename
But that would just be longhand for this:
awk '$1 ~ /^[0-9]/' filename
From your attempted solution, it looks like you want:
awk 'NF>1 && $1 ~ /^[0-9.]*$/' filename
You need to explicitly match the . if you want to include the decimal point, and you need the $ anchor to make the * meaningful. This will miss lines in which the first column looks like 5e39 or -2.3. You can try to catch those cases with:
awk 'NF>1 && $1 ~ /^-?[0-9.]*(e[0-9*])?$/' filename
but at this point I would tell you to use perl and stop trying to be more robust with awk.
Perhaps (this will print blank lines...not sure which behavior you want):
perl -lane 'use POSIX qw(strtod); my ($num, $end) = strtod($F[0]);
print unless $end;' filename
This uses strtod to parse the number and tells you the number of characters at the end of the string that are not part of it.
Drop the braces and the \d, like this:
awk ' $1 ~ /^[0-9]/ ' filename
Awk programs come in chunks. A chunk is a pattern block pair, where the block
defaults to { print }. (An empty pattern defaults to true.)
The /\d/ is a perl-ism and might work in some versions awk - not in those that I tried*. You need either the traditional /^[0-9]/ or the POSIX /^[[:digit:]]/ notation.
*
gnu and ast