How can I extract the content between two brackets?

How can I extract the content between two brackets? - regex

My input:
1:FAILED + *1 0 (8328832,AR,UNDECLARED)
This is what I expect:
8328832,AR,UNDECLARED
I am trying to find a general regular expression that allows to take any content between two brackets out.
My attempt is
grep -o '\[(.*?)\]' test.txt > output.txt
but it doesn't match anything.

Still using grep and regex
grep -oP '\(\K[^\)]+' file
\K means that use look around regex advanced feature. More precisely, it's a positive look-behind assertion, you can do it like this too :
grep -oP '(?<=\()[^\)]+' file
if you lack the -P option, you can do this with perl :
perl -lne '/\(\K[^\)]+/ and print $&' file
Another simpler approach using awk
awk -F'[()]' '{print $2}' file

Related

How to grep file to find lines like <version>1.1.9-beta</version>?

Looking for suggestion to cat file | grep REGEX to get the lines with <version>anything</version>.

grep -F '<version>1.1.9-beta</version>' file
-F will match your pattern as literal text
you don't need that useless cat
if you really mean anything: try grep '<version>.*</version>' file or grep -P '<version>.*?</version>' file , however searching xml with regex is bad idea.

Use the -E option to match a regular expression:
grep -E "<version>.*</version>" file
Refer to these rules for the regular expression: https://www.gnu.org/savannah-checkouts/gnu/grep/manual/grep.html#Regular-Expressions
For example, to match the typical version format (3.14, or 13.14, or 0.1458) you can type:
grep -E "<version>[0-9]?\.[0-9]?</version>" file

You can do:
grep '<version>[^<]*</version>' file.xml
[^<]* will match zero or more characters upto next <.

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'

It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.

For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

Getting defined substring with help of sed or egrep

Everyone!!
I want to get specific substring from stdout of command.
stdout:
{"response":
{"id":"110200dev1","success":"true","token":"09ad7cc7da1db13334281b84f2a8fa54"},"success":"true"}
I need to get a hex string after token without quotation marks, the length of hex string is 32 letters.I suppose it can be done by sed or egrep. I don't want to use awk here. Because the stdout is being changed very often.

This is an alternate gnu-awk solution when grep -P isn't available:
awk -F: '{gsub(/"/, "")} NF==2&&$1=="token"{print $2}' RS='[{},]' <<< "$string"
09ad7cc7da1db13334281b84f2a8fa54

grep's nature is extracting things:
grep -Po '"token":"\K[^"]+'
-P option interprets the pattern as a Perl regular expression.
-o option shows only the matching part that matches the pattern.
\K throws away everything that it has matched up to that point.
Or an option using sed...
sed 's/.*"token":"\([^"]*\)".*/\1/'

With sed:
your-command | sed 's/.*"token":"\([^"]*\)".*/\1/'

YourStreamOrFile | sed -n 's/.*"token":"\([a-f0-9]\{32\}\)".*/\1/p'
doesn not return a full string if not corresponding

i have a file and i need to extract a particular string followed after the regex 'LN:' from the second line

please refer the file contents below.
#HD VN:1.0 SO:unsorted
#SQ SN:Chr1 LN:30427680
#PG ID:bowtie2 PN:bowtie2 VN:2.1.0
how can i extract just the number 30427680 using awk or any other unix command.

Using sed
sed -n 's/.*LN://p' < input.txt
This will erase everything up until LN:, and print what's left, and only if a substitution did take place.
Using awk
awk -v FS=: '/LN:/ { print $3; }' < input.txt
This will match lines that contain LN:, use : as field separator, and print the 3rd column.
Using grep
grep -o '[0-9]\{3,\}' < input.txt
This will match sequences of 3 or more digits, and print only the matched pattern thanks to the -o.
Depending on other cases not included in your question, you might have to make the patterns more strict.

Using grep:
grep -oP 'LN:\K.*' filename

Just use grep:
grep -o 30427680 file
-o, --only-matching
Prints only the matching part of the lines.

Using perl :
perl -ne 'print $& if /LN:\K.*/' filename
or
perl -ne 'print $1 if /LN:(.*)/' filename

Another awk
awk -F"LN:" 'NF>1 {print $2}' file

Issue in regex pattern matching using gawk, grep

I have a file "Input_file" with content like this
%name=ABC
%value=123
sample text in file
sample text in file
%name=XYZ
%value=789
sample text in file
I need to extract the lines of this file matching this pattern.
str="%name=*\n%value=*"
I was working this way
gawk -v st=$str '/"$st"/ {print}' $Input_file
I'm getting the error
gawk: ^ backslash not last character on line
Even with grep as in
grep -e "$str" $Input_file
it says there is no such matching pattern. Where am I going wrong.

Try this:
grep -A1 "^%name=" $Input_file | grep -B1 "^%value=" | grep -v "^--"

you cannot directly use your "pattern (str)" in awk. because awk default doesn't work in multi-line mode. However you could do this with awk:
awk '/^%name=/{n=$0;next}/^%value=/&&n{print n"\n"$0}{n=""}' file
with your example, the above one-liner outputs:
%name=ABC
%value=123
%name=XYZ
%value=789

You can use a different syntax in your $str variable, the '*' is useless in because you are searching a pattern not a literal value, for gawk I can't help sorry
try this:
str="\%name=|\%value="
egrep $str $input_file
So you can match the two criteria of you search

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I extract the content between two brackets? - regex

My input: 1:FAILED + 1 0 (8328832,AR,UNDECLARED) This is what I expect: 8328832,AR,UNDECLARED I am trying to find a general regular expression that allows to take any content between two brackets out. My attempt is grep -o '\[(.?)\]' test.txt > output.txt but it doesn't match anything.

Related

How to grep file to find lines like <version>1.1.9-beta</version>?

Extract few matching strings from matching lines in file using sed

Getting defined substring with help of sed or egrep

i have a file and i need to extract a particular string followed after the regex 'LN:' from the second line

Issue in regex pattern matching using gawk, grep

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I extract the content between two brackets? - regex

My input: 1:FAILED + *1 0 (8328832,AR,UNDECLARED) This is what I expect: 8328832,AR,UNDECLARED I am trying to find a general regular expression that allows to take any content between two brackets out. My attempt is grep -o '\[(.*?)\]' test.txt > output.txt but it doesn't match anything.

Related

How to grep file to find lines like <version>1.1.9-beta</version>?

Extract few matching strings from matching lines in file using sed

Getting defined substring with help of sed or egrep

i have a file and i need to extract a particular string followed after the regex 'LN:' from the second line

Issue in regex pattern matching using gawk, grep

Categories

Resources

My input: 1:FAILED + 1 0 (8328832,AR,UNDECLARED) This is what I expect: 8328832,AR,UNDECLARED I am trying to find a general regular expression that allows to take any content between two brackets out. My attempt is grep -o '\[(.?)\]' test.txt > output.txt but it doesn't match anything.