Extract substring using regex shell - regex

I have a string that contains multiple ocurrences in the way:
element 1 tag1{field1:"text",field2:"text"...},tag2{field1:"text",field2:"text"...},..
element 2 tag1{field1:"text",field2:"text"...},tag2{field1:"text",field2:"text"...},..
I want to extract using shell all the fields1, of the tag1 of all the elements
my try:
sed -n "s/.*\"tag1\":{\"fiel1\":\"\(.*\),\"fiel2\".*/\1/gp"
I am obtaining just the final one not all of them.
EDIT: The problem is that the whole text is in one single string and the regex just get me one cocurrence.
Thanks

You can try this,
sed 's/\(.*tag1{field1:"\)\([^"]*\)\(".*\)/\2/g' yourfile

perl -pe 's/tag1\{field1:\"([^\"]*)".*/$1/g' your_file
Or
awk -F":|," '{print $2}'

sed -n 's/.*[[:space:]]\{1,\}tag1{field1:"\([^"]*\)".*/\1/gp' YourFile
based on text sample
element 1 tag1{field1:"text",field2:"text"...},tag2{field1:"text",field2:"text"...},..
element 2 tag1{field1:"text",field2:"text"...},tag2{field1:"text",field2:"text"...},..

Using awk
awk -F\" '{print $2}'
or to make sure its only extracted for lines with that field1
awk -F\" '/field1/ {print $2}'

Related

How to use sed or awk to extract substring

I have a file that contains the following:
[class:ABC_DEF_GHI]
[class:ABC_DEF_GHI:app:ABC_DEF_GHI]
My goal is to extract ABC_DEF_GHI
Here is the script I'm trying to write so far.
eval sed -n 's/.*app://p' file.txt >> $file
You can get this value by using multiple delimiters in awk:
awk -F':|]' '{print $2}' $file
with sed
$ sed -E 's/.*:(.+)]/\1/' file
ABC_DEF_GHI
ABC_DEF_GHI
extract content between a colon and right square bracket, due to greedy match it will be the last colon.

sed regex cut string after match

I tested a regex on http://regexr.com/ and it works like expected.
How can I run this by using sed?
/^.*?OU=([^,]*)/g
The test string looks like:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test,OU=Tese Sites,DC=Test,DC=local;test.local
And the output is:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
So it should cut the string before the second OU= starts.
Thanks
sed is not the best tool for this case when you have to deal with text that contains "columns" and can be split. Here are two possibilities, one with sed and the other with awk:
s="mario.test;Mario Test;Mario;Test;123;+001122334455,CN=Mario Test,OU=AT-Linz,OU=Tese Sites,DC=Test,DC=local;test.local"
echo $s | sed 's/OU=/й/' | sed 's/\([^й]*\)й\([^,]*\).*/\1OU=\2/'
echo $s | awk -F",OU=" '{print $1 ",OU=" $2}'
See the online demo
The awk solution splits with ,OU= substring and then joins the first and second column with the separator (since it is hardcoded, it is easy to put it back).
sed uses 2 passes: 1) add a non-used char (must be a control char, here, a Cyrillic letter is used for better "visibility") to mark the border of our match, 2) match all we do not need and match and capture what we need to keep with the help of capturing groups and backreferences.
Your question isn't clear but from reading your comments, are either of these what you're looking for?
$ awk -F, '{print $1 FS $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
$ awk -F'CN=[^,]+,OU=|,' '{print $1 $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;AT-Test

Print matched pattern with AWK

For example i have this data:
/home/test/dat1.txt
/home/test/dat2.txt
/home/test/test1/dat3.txt
/home/test/test2/dat4.txt
/home/test/test3/test4/dat5.txt
I need to print only the name and extension, that output should be:
dat1.txt
dat2.txt
dat3.txt
dat4.txt
dat5.txt
I need to use the awk command... anyone can help?
I use this regular expression: '/\/*\.txt/{print ???}
If you are going to use awk, you do not need a regex for this purpose.
You can just tell awk to print the last field, using a field separator of /.
awk -F'/' '{print $NF}' Input.txt
As hd1's comment already noted, NF is the number of fields on the current input record (in this case line). Since awk starts indexing fields at $1, $NF gives you the last field.
You could use this short awk
awk -F/ '$0=$NF' Input.txt
If you need empty line use
awk -F/ '{$0=$NF}1' Input.txt

sed or awk to capture part of url

I am not very experienced with regular expressions and sed/awk scripting.
I have urls that are similar to the following torrent url:
http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
I would like to have sed or awk script extract the text after the title i.e
from the example above just get:
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
A simple approach with awk: use the = as the field separator:
awk -F"=" '{print $2}'
Thus:
echo "http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent" | awk -F"=" '{print $2}'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
Just remove everything before the title=: sed 's/.*title=//'
$ echo "http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent" | sed 's/.*title=//'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
Let's say:
s='http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent'
Pure BASH solution:
echo "${s/*title=}"
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
OR using grep -P:
echo "$s"|grep -oP 'title=\K.*'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
By using sed (no need to mention title in the regexp in your example) :
sed 's/.*=//'
An another solution exists with cut, another standard unix tool :
cut -d= -f2

Regex, get what's after the second occurence of a string

I have a string of the following format:
TEXT####TEXT####SPECIALTEXT
I need to get the SPECIALTEXT, basically what is after the second occurrence of the ####. I can't get it done. Thanks
The regex (?:.*?####){2}(.*) contains what you're looking for in its first group.
If you are using shell and can use awk for it:
From a file:
awk 'BEGIN{FS="####"} {print $3}' input_file
From a variable:
awk 'BEGIN{FS="####"} {print $3}' <<< "$input_variable"