Grep and sed returning only first match

Grep and sed returning only first match - regex

I am trying to extract the title and description of a rss Feed , I have written following script to return all the title in the Feed , But its returning only the first Title from the xml:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*)</title>" |sed -e 's,.*<title>\(.*\)</title>.*,\1,g' | less
How can I also find the description ?

You can use grep -P:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null |\
grep -oP "<title>\K[\s\S]*?(?=</title>)"

First put each title and description on its own line. Here is an example:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' |
sed -n 's,.*<title>\(.*\)</title>.*,\1,gp'
For the description:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' | \
sed 's,<title>\([^<]*\)</title>,T:\1,' | \
sed 's,<description>\([^<]*\)</description>,D:\1,' | \
sed -n 's/[DT]://p'

You should use non-greedy match (.*?) instead of greedy matching (.*) to get all the titles:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*?)</title>" |sed -e 's,.*<title>\(.*?\)</title>.*,\1,g' | less

Related

Need to get substring from string in bash

I'm trying to get Atom version in bash. Thid regex is working, but I need a substring from string, which giving grep. How can I get version from this string?
<span class="version">1.34.0</span>
curl https://atom.io/ | grep 'class="version"' | grep '[0-9]\+.[0-9]\+.[0-9]\+'

with awk
$ curl ... | awk -F'[<>]' '/class="version"/{print $3; exit}'

You can achieve this by using the cut command and adding your respective delimiters; in your case this would be the > and < tags encapsulating the version.
Input:
curl -s https://atom.io/ \
| grep 'class="version"' \
| grep '[0-9]\+.[0-9]\+.[0-9]\+' \
| cut -d '>' -f2 \
| cut -d '<' -f1
Output:
1.34.0
*added the curl -s flag to make output silent, personal choice

Linux delete egrepped lines

I pass file to my egrep expression (tcpdump log), then I want to delete all matched lines
Code example:
cat file | tr -d '\000' |egrep -i 'user: | usr: ' --color=auto --line-buffered -B20
How can I delete all matched lines now?

Use -v flag
-v, --invert-match
Selected lines are those not matching any of the specified patterns.
cat file | tr -d '\000' |egrep -iv 'user: | usr: ' --color=auto --line-buffered -B20 > newfile

You can do all that using sed:
sed -iE '/use?r: /d; s/\x0//g' file

Grep in bash with regex

I am getting the following output from a bash script:
INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist
and I would like to get only the path(MajorDomo/MajorDomo-Info.plist) using grep. In other words, everything after the equals sign. Any ideas of how to do this?

This job suites more to awk:
s='INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist'
awk -F' *= *' '{print $2}' <<< "$s"
MajorDomo/MajorDomo-Info.plist
If you really want grep then use grep -P:
grep -oP ' = \K.+' <<< "$s"
MajorDomo/MajorDomo-Info.plist

Not exactly what you were asking, but
echo "INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist" | sed 's/.*= \(.*\)$/\1/'
will do what you want.

You could use cut as well:
your_script | cut -d = -f 2-
(where your_script does something equivalent to echo INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist)
If you need to trim the space at the beginning:
your_script | cut -d = -f 2- | cut -d ' ' -f 2-
If you have multiple spaces at the beginning and you want to trim them all, you'll have to fall back to sed: your_script | cut -d = -f 2- | sed 's/^ *//' (or, simpler, your_script | sed 's/^[^=]*= *//')

Assuming your script outputs a single line, there is a shell only solution:
line="$(your_script)"
echo "${line#*= }"

Bash
IFS=' =' read -r _ x <<<"INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist"
printf "%s\n" "$x"
MajorDomo/MajorDomo-Info.plist

Regular Expression for "D210" for Linux?

The tile says it all. Right now I'm using:
grep "^D[\d][\d][\d]" file.txt
to no avail.

\d is not recognized unless -P or --perl-regexp option is specified. (assuming GNU grep).
$ echo D210 | grep '^D\d\d\d'
$ echo D210 | grep -P '^D\d\d\d'
D210
$ echo D210 | grep -P '^D\d{3}'
D210
If your grep does not accept -P, use [0-9] or [[:digit:]]:
$ echo D210 | grep '^D[0-9][0-9][0-9]'
D210
$ echo D210 | grep '^D[[:digit:]][[:digit:]][[:digit:]]'
D210

Can not extract the capture group with either sed or grep

I want to extract the value pair from a key-value pair syntax but I can not.
Example I tried:
echo employee_id=1234 | sed 's/employee_id=\([0-9]+\)/\1/g'
But this gives employee_id=1234 and not 1234 which is actually the capture group.
What am I doing wrong here? I also tried:
echo employee_id=1234| egrep -o employee_id=([0-9]+)
but no success.

1. Use grep -Eo: (as egrep is deprecated)
echo 'employee_id=1234' | grep -Eo '[0-9]+'
1234
2. using grep -oP (PCRE):
echo 'employee_id=1234' | grep -oP 'employee_id=\K([0-9]+)'
1234
3. Using sed:
echo 'employee_id=1234' | sed 's/^.*employee_id=\([0-9][0-9]*\).*$/\1/'
1234

To expand on anubhava's answer number 2, the general pattern to have grep return only the capture group is:
$ regex="$precedes_regex\K($capture_regex)(?=$follows_regex)"
$ echo $some_string | grep -oP "$regex"
so
# matches and returns b
$ echo "abc" | grep -oP "a\K(b)(?=c)"
b
# no match
$ echo "abc" | grep -oP "z\K(b)(?=c)"
# no match
$ echo "abc" | grep -oP "a\K(b)(?=d)"

Using awk
echo 'employee_id=1234' | awk -F= '{print $2}'
1234

use sed -E for extended regex
echo employee_id=1234 | sed -E 's/employee_id=([0-9]+)/\1/g'

You are specifically asking for sed, but in case you may use something else - any POSIX-compliant shell can do parameter expansion which doesn't require a fork/subshell:
foo='employee_id=1234'
var=${foo%%=*}
value=${foo#*=}
 
$ echo "var=${var} value=${value}"
var=employee_id value=1234

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Grep and sed returning only first match - regex

You can use grep -P: curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null |\ grep -oP "<title>\K[\s\S]*?(?=</title>)"

You should use non-greedy match (.?) instead of greedy matching (.) to get all the titles: curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.?)</title>" |sed -e 's,.<title>\(.?\)</title>.,\1,g' | less

Related

Need to get substring from string in bash

Linux delete egrepped lines

Grep in bash with regex

Regular Expression for "D210" for Linux?

Can not extract the capture group with either sed or grep

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Grep and sed returning only first match - regex

You can use grep -P: curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null |\ grep -oP "<title>\K[\s\S]*?(?=</title>)"

You should use non-greedy match (.*?) instead of greedy matching (.*) to get all the titles: curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*?)</title>" |sed -e 's,.*<title>\(.*?\)</title>.*,\1,g' | less

Related

Need to get substring from string in bash

Linux delete egrepped lines

Grep in bash with regex

Regular Expression for "D210" for Linux?

Can not extract the capture group with either sed or grep

Categories

Resources

You should use non-greedy match (.?) instead of greedy matching (.) to get all the titles: curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.?)</title>" |sed -e 's,.<title>\(.?\)</title>.,\1,g' | less