Need to get substring from string in bash

Need to get substring from string in bash - regex

I'm trying to get Atom version in bash. Thid regex is working, but I need a substring from string, which giving grep. How can I get version from this string?
<span class="version">1.34.0</span>
curl https://atom.io/ | grep 'class="version"' | grep '[0-9]\+.[0-9]\+.[0-9]\+'

with awk
$ curl ... | awk -F'[<>]' '/class="version"/{print $3; exit}'

You can achieve this by using the cut command and adding your respective delimiters; in your case this would be the > and < tags encapsulating the version.
Input:
curl -s https://atom.io/ \
| grep 'class="version"' \
| grep '[0-9]\+.[0-9]\+.[0-9]\+' \
| cut -d '>' -f2 \
| cut -d '<' -f1
Output:
1.34.0
*added the curl -s flag to make output silent, personal choice

Related

Grep next word after pattern match

I'm trying to get grep/sed out the following output: "name":"test_backup_1" from the below response
{"backups":[{"name":"test_backup_1","status":"CORRUPTED","creationTime":"2019-11-08T15:03:49.460","id":"test_backup_1"}]}
I have been trying variations of the following grep -Eo 'name:"\w+\"' but no joy.
I'm not sure if it would be easier to achieve this using grep or sed?
The way I am running this is curling a response from the server and saving it to a local variable, then echo out the variable and pipe grep/sed
example of what I am running
echo ${view_backup} | grep -Eo '"name":"\w+\"'

Referencing #sundeep answer
grep -Eo '"name":"[^"]+"'
resulted in the expected output

Make sure to transform the file to one line before grep
and pipe from your curl
echo `curl --silent https://someurl | tr -d '\n' | grep -oP "(?<=name\":\")[^\"]+"`
will return
test_backup_1
If you want more variables you can chain the -oP grep like in this example where I get some data on a danish license plate (bt419329)
curl --silent https://www.tjekbil.dk/api/v2/nummerplade/bt41932 | grep -oP -m 1 "(?<=\"RegNr\":\")[^\"]+|(?<=\"MaerkeTypeNavn\":\")[^\"]+|(?<=\"MaksimumHastighed\":)[^,]+"| tr '\n' ' '
returns
BT41932 SKODA 218

grep within nested brackets

How do I grep strings in between nested brackets using bash? Is it possible without the use of loops? For example, if I have a string like:
[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]
I wish to grep only the two target strings inside the [[]]:
TargetString1
TargetString2
I tried the following command which cannot get TargetString2
grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1

With GNU's grep P option:
grep -oP "(?<=\[\[)[\w\s]+"
The regex will match a sequence of word characters (\w+) when followed by two brackets ([[). This works for your sample string, but will not work for more complicated constructs like:
[[[[TargetString1]]TargetString2:SomethingIDontWantAfterColon[[TargetString3]]]]
where only TargetString1 and TargetString3 are matched.

To extract from nested [[]] brackets, you can use sed
#!/bin/bash
str="[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]"
echo $str | grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1
echo $str | sed 's/.*\[\([^]]*\)\].*/\1/g' #which works only if string exsit between []
Output:
TargetString1
TargetString2

You can use grep regex grep -Eo '\[\[\w+' | sed 's/\[\[//g' for doing this
[root#localhost ~]# echo "[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]" | grep -Eo '\[\[\w+' | sed 's/\[\[//g'
TargetString1
TargetString2
[root#localhost ~]#

Grep and sed returning only first match

I am trying to extract the title and description of a rss Feed , I have written following script to return all the title in the Feed , But its returning only the first Title from the xml:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*)</title>" |sed -e 's,.*<title>\(.*\)</title>.*,\1,g' | less
How can I also find the description ?

You can use grep -P:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null |\
grep -oP "<title>\K[\s\S]*?(?=</title>)"

First put each title and description on its own line. Here is an example:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' |
sed -n 's,.*<title>\(.*\)</title>.*,\1,gp'
For the description:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' | \
sed 's,<title>\([^<]*\)</title>,T:\1,' | \
sed 's,<description>\([^<]*\)</description>,D:\1,' | \
sed -n 's/[DT]://p'

You should use non-greedy match (.*?) instead of greedy matching (.*) to get all the titles:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*?)</title>" |sed -e 's,.*<title>\(.*?\)</title>.*,\1,g' | less

Grep in bash with regex

I am getting the following output from a bash script:
INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist
and I would like to get only the path(MajorDomo/MajorDomo-Info.plist) using grep. In other words, everything after the equals sign. Any ideas of how to do this?

This job suites more to awk:
s='INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist'
awk -F' *= *' '{print $2}' <<< "$s"
MajorDomo/MajorDomo-Info.plist
If you really want grep then use grep -P:
grep -oP ' = \K.+' <<< "$s"
MajorDomo/MajorDomo-Info.plist

Not exactly what you were asking, but
echo "INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist" | sed 's/.*= \(.*\)$/\1/'
will do what you want.

You could use cut as well:
your_script | cut -d = -f 2-
(where your_script does something equivalent to echo INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist)
If you need to trim the space at the beginning:
your_script | cut -d = -f 2- | cut -d ' ' -f 2-
If you have multiple spaces at the beginning and you want to trim them all, you'll have to fall back to sed: your_script | cut -d = -f 2- | sed 's/^ *//' (or, simpler, your_script | sed 's/^[^=]*= *//')

Assuming your script outputs a single line, there is a shell only solution:
line="$(your_script)"
echo "${line#*= }"

Bash
IFS=' =' read -r _ x <<<"INFOPLIST_FILE = MajorDomo/MajorDomo-Info.plist"
printf "%s\n" "$x"
MajorDomo/MajorDomo-Info.plist

how to grep part of the content from a string in bash

For example when filtering html file,
if every line is in this kind of pattern:
<i>some text</i>
how can I get the content of href, and how can I get the text between <i> and </i>?

cat file | cut -f2 -d\"
FYI: Just about every other HTML/regexp post on Stackoverflow explains why getting values from HTML using anything other than HTML parsing is a bad idea. You may want to read some of those. This one for example.

If href is always the second token separated by space in a,ine then u can try
grep "href" file | cut -d' ' -f2 | cut -d'=' -f2

Here's how to do it using xmlstarlet (optionally with tidy):
# extract content of href and <i>...</i>
echo '<i>some text</i>' |
xmlstarlet sel -T -t -m "//a" -v #href -n -v i -n
# using tidy & xmlstarlet
echo '<i>some text</i>' |
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:a" -v #href -n -v . -n

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Need to get substring from string in bash - regex

I'm trying to get Atom version in bash. Thid regex is working, but I need a substring from string, which giving grep. How can I get version from this string? <span class="version">1.34.0</span> curl https://atom.io/ | grep 'class="version"' | grep '[0-9]\+.[0-9]\+.[0-9]\+'

with awk $ curl ... | awk -F'[<>]' '/class="version"/{print $3; exit}'

Related

Grep next word after pattern match

grep within nested brackets

Grep and sed returning only first match

Grep in bash with regex

how to grep part of the content from a string in bash

Categories

Resources