Need to get substring from string in bash - regex

I'm trying to get Atom version in bash. Thid regex is working, but I need a substring from string, which giving grep. How can I get version from this string?
<span class="version">1.34.0</span>
curl https://atom.io/ | grep 'class="version"' | grep '[0-9]\+.[0-9]\+.[0-9]\+'

with awk
$ curl ... | awk -F'[<>]' '/class="version"/{print $3; exit}'

You can achieve this by using the cut command and adding your respective delimiters; in your case this would be the > and < tags encapsulating the version.
Input:
curl -s https://atom.io/ \
| grep 'class="version"' \
| grep '[0-9]\+.[0-9]\+.[0-9]\+' \
| cut -d '>' -f2 \
| cut -d '<' -f1
Output:
1.34.0
*added the curl -s flag to make output silent, personal choice

Related

Grep next word after pattern match

I'm trying to get grep/sed out the following output: "name":"test_backup_1" from the below response
{"backups":[{"name":"test_backup_1","status":"CORRUPTED","creationTime":"2019-11-08T15:03:49.460","id":"test_backup_1"}]}
I have been trying variations of the following grep -Eo 'name:"\w+\"' but no joy.
I'm not sure if it would be easier to achieve this using grep or sed?
The way I am running this is curling a response from the server and saving it to a local variable, then echo out the variable and pipe grep/sed
example of what I am running
echo ${view_backup} | grep -Eo '"name":"\w+\"'
Referencing #sundeep answer
grep -Eo '"name":"[^"]+"'
resulted in the expected output
Make sure to transform the file to one line before grep
and pipe from your curl
echo `curl --silent https://someurl | tr -d '\n' | grep -oP "(?<=name\":\")[^\"]+"`
will return
test_backup_1
If you want more variables you can chain the -oP grep like in this example where I get some data on a danish license plate (bt419329)
curl --silent https://www.tjekbil.dk/api/v2/nummerplade/bt41932 | grep -oP -m 1 "(?<=\"RegNr\":\")[^\"]+|(?<=\"MaerkeTypeNavn\":\")[^\"]+|(?<=\"MaksimumHastighed\":)[^,]+"| tr '\n' ' '
returns
BT41932 SKODA 218

grep within nested brackets

How do I grep strings in between nested brackets using bash? Is it possible without the use of loops? For example, if I have a string like:
[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]
I wish to grep only the two target strings inside the [[]]:
TargetString1
TargetString2
I tried the following command which cannot get TargetString2
grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1
With GNU's grep P option:
grep -oP "(?<=\[\[)[\w\s]+"
The regex will match a sequence of word characters (\w+) when followed by two brackets ([[). This works for your sample string, but will not work for more complicated constructs like:
[[[[TargetString1]]TargetString2:SomethingIDontWantAfterColon[[TargetString3]]]]
where only TargetString1 and TargetString3 are matched.
To extract from nested [[]] brackets, you can use sed
#!/bin/bash
str="[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]"
echo $str | grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1
echo $str | sed 's/.*\[\([^]]*\)\].*/\1/g' #which works only if string exsit between []
Output:
TargetString1
TargetString2
You can use grep regex grep -Eo '\[\[\w+' | sed 's/\[\[//g' for doing this
[root#localhost ~]# echo "[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]" | grep -Eo '\[\[\w+' | sed 's/\[\[//g'
TargetString1
TargetString2
[root#localhost ~]#

Grep and sed returning only first match

I am trying to extract the title and description of a rss Feed , I have written following script to return all the title in the Feed , But its returning only the first Title from the xml:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*)</title>" |sed -e 's,.*<title>\(.*\)</title>.*,\1,g' | less
How can I also find the description ?
You can use grep -P:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null |\
grep -oP "<title>\K[\s\S]*?(?=</title>)"
First put each title and description on its own line. Here is an example:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' |
sed -n 's,.*<title>\(.*\)</title>.*,\1,gp'
For the description:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' | \
sed 's,<title>\([^<]*\)</title>,T:\1,' | \
sed 's,<description>\([^<]*\)</description>,D:\1,' | \
sed -n 's/[DT]://p'
You should use non-greedy match (.*?) instead of greedy matching (.*) to get all the titles:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*?)</title>" |sed -e 's,.*<title>\(.*?\)</title>.*,\1,g' | less

Grep in bash with regex

I am getting the following output from a bash script:
INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist
and I would like to get only the path(MajorDomo/MajorDomo-Info.plist) using grep. In other words, everything after the equals sign. Any ideas of how to do this?
This job suites more to awk:
s='INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist'
awk -F' *= *' '{print $2}' <<< "$s"
MajorDomo/MajorDomo-Info.plist
If you really want grep then use grep -P:
grep -oP ' = \K.+' <<< "$s"
MajorDomo/MajorDomo-Info.plist
Not exactly what you were asking, but
echo "INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist" | sed 's/.*= \(.*\)$/\1/'
will do what you want.
You could use cut as well:
your_script | cut -d = -f 2-
(where your_script does something equivalent to echo INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist)
If you need to trim the space at the beginning:
your_script | cut -d = -f 2- | cut -d ' ' -f 2-
If you have multiple spaces at the beginning and you want to trim them all, you'll have to fall back to sed: your_script | cut -d = -f 2- | sed 's/^ *//' (or, simpler, your_script | sed 's/^[^=]*= *//')
Assuming your script outputs a single line, there is a shell only solution:
line="$(your_script)"
echo "${line#*= }"
Bash
IFS=' =' read -r _ x <<<"INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist"
printf "%s\n" "$x"
MajorDomo/MajorDomo-Info.plist

how to grep part of the content from a string in bash

For example when filtering html file,
if every line is in this kind of pattern:
<i>some text</i>
how can I get the content of href, and how can I get the text between <i> and </i>?
cat file | cut -f2 -d\"
FYI: Just about every other HTML/regexp post on Stackoverflow explains why getting values from HTML using anything other than HTML parsing is a bad idea. You may want to read some of those. This one for example.
If href is always the second token separated by space in a,ine then u can try
grep "href" file | cut -d' ' -f2 | cut -d'=' -f2
Here's how to do it using xmlstarlet (optionally with tidy):
# extract content of href and <i>...</i>
echo '<i>some text</i>' |
xmlstarlet sel -T -t -m "//a" -v #href -n -v i -n
# using tidy & xmlstarlet
echo '<i>some text</i>' |
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:a" -v #href -n -v . -n