Print matched pattern with AWK - regex

For example i have this data:
/home/test/dat1.txt
/home/test/dat2.txt
/home/test/test1/dat3.txt
/home/test/test2/dat4.txt
/home/test/test3/test4/dat5.txt
I need to print only the name and extension, that output should be:
dat1.txt
dat2.txt
dat3.txt
dat4.txt
dat5.txt
I need to use the awk command... anyone can help?
I use this regular expression: '/\/*\.txt/{print ???}

If you are going to use awk, you do not need a regex for this purpose.
You can just tell awk to print the last field, using a field separator of /.
awk -F'/' '{print $NF}' Input.txt
As hd1's comment already noted, NF is the number of fields on the current input record (in this case line). Since awk starts indexing fields at $1, $NF gives you the last field.

You could use this short awk
awk -F/ '$0=$NF' Input.txt
If you need empty line use
awk -F/ '{$0=$NF}1' Input.txt

Related

sed regex cut string after match

I tested a regex on http://regexr.com/ and it works like expected.
How can I run this by using sed?
/^.*?OU=([^,]*)/g
The test string looks like:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test,OU=Tese Sites,DC=Test,DC=local;test.local
And the output is:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
So it should cut the string before the second OU= starts.
Thanks
sed is not the best tool for this case when you have to deal with text that contains "columns" and can be split. Here are two possibilities, one with sed and the other with awk:
s="mario.test;Mario Test;Mario;Test;123;+001122334455,CN=Mario Test,OU=AT-Linz,OU=Tese Sites,DC=Test,DC=local;test.local"
echo $s | sed 's/OU=/й/' | sed 's/\([^й]*\)й\([^,]*\).*/\1OU=\2/'
echo $s | awk -F",OU=" '{print $1 ",OU=" $2}'
See the online demo
The awk solution splits with ,OU= substring and then joins the first and second column with the separator (since it is hardcoded, it is easy to put it back).
sed uses 2 passes: 1) add a non-used char (must be a control char, here, a Cyrillic letter is used for better "visibility") to mark the border of our match, 2) match all we do not need and match and capture what we need to keep with the help of capturing groups and backreferences.
Your question isn't clear but from reading your comments, are either of these what you're looking for?
$ awk -F, '{print $1 FS $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
$ awk -F'CN=[^,]+,OU=|,' '{print $1 $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;AT-Test

Extract substring using regex shell

I have a string that contains multiple ocurrences in the way:
element 1 tag1{field1:"text",field2:"text"...},tag2{field1:"text",field2:"text"...},..
element 2 tag1{field1:"text",field2:"text"...},tag2{field1:"text",field2:"text"...},..
I want to extract using shell all the fields1, of the tag1 of all the elements
my try:
sed -n "s/.*\"tag1\":{\"fiel1\":\"\(.*\),\"fiel2\".*/\1/gp"
I am obtaining just the final one not all of them.
EDIT: The problem is that the whole text is in one single string and the regex just get me one cocurrence.
Thanks
You can try this,
sed 's/\(.*tag1{field1:"\)\([^"]*\)\(".*\)/\2/g' yourfile
perl -pe 's/tag1\{field1:\"([^\"]*)".*/$1/g' your_file
Or
awk -F":|," '{print $2}'
sed -n 's/.*[[:space:]]\{1,\}tag1{field1:"\([^"]*\)".*/\1/gp' YourFile
based on text sample
element 1 tag1{field1:"text",field2:"text"...},tag2{field1:"text",field2:"text"...},..
element 2 tag1{field1:"text",field2:"text"...},tag2{field1:"text",field2:"text"...},..
Using awk
awk -F\" '{print $2}'
or to make sure its only extracted for lines with that field1
awk -F\" '/field1/ {print $2}'

Regex, get what's after the second occurence of a string

I have a string of the following format:
TEXT####TEXT####SPECIALTEXT
I need to get the SPECIALTEXT, basically what is after the second occurrence of the ####. I can't get it done. Thanks
The regex (?:.*?####){2}(.*) contains what you're looking for in its first group.
If you are using shell and can use awk for it:
From a file:
awk 'BEGIN{FS="####"} {print $3}' input_file
From a variable:
awk 'BEGIN{FS="####"} {print $3}' <<< "$input_variable"

how to get sub-expression value of regExp in awk?

I was analyzing logs contains information like the following:
y1e","email":"","money":"100","coi
I want to fetch the value of money, i used 'awk' like :
grep pay action.log | awk '/"money":"([0-9]+)"/' ,
then how can i get the sub-expression value in ([0-9]+) ?
If you have GNU AWK (gawk):
awk '/pay/ {match($0, /"money":"([0-9]+)"/, a); print substr($0, a[1, "start"], a[1, "length"])}' action.log
If not:
awk '/pay/ {match($0, /"money":"([0-9]+)"/); split(substr($0, RSTART, RLENGTH), a, /[":]/); print a[5]}' action.log
The result of either is 100. And there's no need for grep.
Offered as an alternative, assuming the data format stays the same once the lines are grep'ed, this will extract the money field, not using a regular expression:
awk -v FS=\" '{print $9}' data.txt
assuming data.txt contains
y1e","email":"","money":"100","coin.log
yielding:
100
I.e., your field separator is set to " and you print out field 9
You need to reference group 1 of the regex
I'm not fluent in awk but here are some other relevant questions
awk extract multiple groups from each line
GNU awk: accessing captured groups in replacement text
Hope this helps
If you have money coming in at different places then may be it would not be a good idea to hard code the positional parameter.
You can try something like this -
$ awk -v FS=[,:\"] '{ for (i=1;i<=NF;i++) if($i~/money/) print $(i+3)}' inputfile
grep pay action.log | awk -F "\n" 'm=gensub(/.*money":"([0-9]+)".*/, "\\1", "g", $1) {print m}'

awk, a field doesn't match but it should match

I have a file structured as record list, where field separator is \t.
I want to extract only records where the second field is a number from 1 to 9, but my awk script doesn't work.
The awk script is
cat file |awk -v FS="\t" '$2 ~ /[0-9]{1}/ {print $0;}'
or this
cat file |awk -v FS="\t" '$2 ~ /.{1}/ {print $0;}' #because the second fields of my file have all second fields as number
Why these sscript don't work? Isn't regex a good regex?
Update
Even with the interval {1}, you are still going to match a field like 23 because the 2 matches a single number. What you really want to use are anchors and forget about intervals:
awk '$2 ~ /^[0-9]$/{print}' FS="\t" file
The problem is the use of intervals {1}. awk less than version 4 doesn't support intervals. gawk on the other hand will if you add the following flag: --re-interval
Try this:
awk --re-interval '$2 ~ /[0-9]{1}/{print}' FS="\t" file
Some other things to note:
Built in vars such as FS can be assigned at the end without the need for -v
You can use just print rather than print $0 as that is its default behavior
Useless use of cat. awk can take a file as an argument, use that instead
If you want to ensure the 2nd field is a single-digit number, you don't really need a regex:
awk '1 <= $2 && $2 <= 9 {print}'