how to grep part of the content from a string in bash - regex

For example when filtering html file,
if every line is in this kind of pattern:
<i>some text</i>
how can I get the content of href, and how can I get the text between <i> and </i>?

cat file | cut -f2 -d\"
FYI: Just about every other HTML/regexp post on Stackoverflow explains why getting values from HTML using anything other than HTML parsing is a bad idea. You may want to read some of those. This one for example.

If href is always the second token separated by space in a,ine then u can try
grep "href" file | cut -d' ' -f2 | cut -d'=' -f2

Here's how to do it using xmlstarlet (optionally with tidy):
# extract content of href and <i>...</i>
echo '<i>some text</i>' |
xmlstarlet sel -T -t -m "//a" -v #href -n -v i -n
# using tidy & xmlstarlet
echo '<i>some text</i>' |
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:a" -v #href -n -v . -n

Related

Grep next word after pattern match

I'm trying to get grep/sed out the following output: "name":"test_backup_1" from the below response
{"backups":[{"name":"test_backup_1","status":"CORRUPTED","creationTime":"2019-11-08T15:03:49.460","id":"test_backup_1"}]}
I have been trying variations of the following grep -Eo 'name:"\w+\"' but no joy.
I'm not sure if it would be easier to achieve this using grep or sed?
The way I am running this is curling a response from the server and saving it to a local variable, then echo out the variable and pipe grep/sed
example of what I am running
echo ${view_backup} | grep -Eo '"name":"\w+\"'
Referencing #sundeep answer
grep -Eo '"name":"[^"]+"'
resulted in the expected output
Make sure to transform the file to one line before grep
and pipe from your curl
echo `curl --silent https://someurl | tr -d '\n' | grep -oP "(?<=name\":\")[^\"]+"`
will return
test_backup_1
If you want more variables you can chain the -oP grep like in this example where I get some data on a danish license plate (bt419329)
curl --silent https://www.tjekbil.dk/api/v2/nummerplade/bt41932 | grep -oP -m 1 "(?<=\"RegNr\":\")[^\"]+|(?<=\"MaerkeTypeNavn\":\")[^\"]+|(?<=\"MaksimumHastighed\":)[^,]+"| tr '\n' ' '
returns
BT41932 SKODA 218

Need to get substring from string in bash

I'm trying to get Atom version in bash. Thid regex is working, but I need a substring from string, which giving grep. How can I get version from this string?
<span class="version">1.34.0</span>
curl https://atom.io/ | grep 'class="version"' | grep '[0-9]\+.[0-9]\+.[0-9]\+'
with awk
$ curl ... | awk -F'[<>]' '/class="version"/{print $3; exit}'
You can achieve this by using the cut command and adding your respective delimiters; in your case this would be the > and < tags encapsulating the version.
Input:
curl -s https://atom.io/ \
| grep 'class="version"' \
| grep '[0-9]\+.[0-9]\+.[0-9]\+' \
| cut -d '>' -f2 \
| cut -d '<' -f1
Output:
1.34.0
*added the curl -s flag to make output silent, personal choice

Linux delete egrepped lines

I pass file to my egrep expression (tcpdump log), then I want to delete all matched lines
Code example:
cat file | tr -d '\000' |egrep -i 'user: | usr: ' --color=auto --line-buffered -B20
How can I delete all matched lines now?
Use -v flag
-v, --invert-match
Selected lines are those not matching any of the specified patterns.
cat file | tr -d '\000' |egrep -iv 'user: | usr: ' --color=auto --line-buffered -B20 > newfile
You can do all that using sed:
sed -iE '/use?r: /d; s/\x0//g' file

Regexp is not working as expected in unix

I am trying with below code and its not working as expcted. I am new to REGEX. Please share your ideas. Thanks in advance.
test.xml
<?xml version="1.0"?>
<audit>
<interfaces>
<interface_dtls>ABCD,ABCD 123</interface_dtls>
<interface_dtls>TESTING,123 TEST</interface_dtls>
</interfaces>
</audit>
Trying with below unix commands
#!/bin/bash
for line in `cat test.xml | grep -oP "(?<=interface_dtls>)[^<]+"`; do
echo $line --Displaying line only for debugging purpose
interface_code=`echo $line | awk -F ',' '{print $1}'`
prcdr_cd=`echo $line | awk -F ',' '{print $2}'`
hive -e "select * from table \
where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done
Actual "ECHO" output:
ABCD,ABCD
TESTING,123
Expected "ECHO" output:
ABCD,ABCD 123
TESTING,123 TEST
Becuse of missing info(info after space) my query is not working as expected.
Using xml_grep, the more recommended option for parsing, as grep is not not an XML aware tool.
$ xml_grep 'interface_dtls' file --text_only
ABCD,ABCD 123
TESTING,123 TEST
One could also use grep as pointed by anubhava over in comments. Probably not the best of ways to do it, but can done for a one-time debug. For proper functionality use any XML readable commands (e.g xmllint or xml_grep).
$ grep -oP "(?<=<interface_dtls>)[^<]+" xml_file
ABCD,ABCD 123
TESTING,123 TEST
The skeletal code for extracting the individual words from the command can be done as below. I will leave it up to you to tweak it as you need and do not use the outdated `` style command expansion, rather use $ wherever applicable.
#!/bin/bash
while read -r paramA paramB;
do
interface_code=$(echo $paramA | awk -F ',' '{print $1}')
prcdr_cd=$(echo $paramA | awk -F ',' '{print $2}')
echo $interface_code $prcdr_cd
done < <(xml_grep 'interface_dtls' file --text_only)
The xml_grep utility was mentioned in another answer. This uses XMLStarlet, which is also able to validate and modify XML files on the command line:
$ xml sel -t -v '//interface_dtls' -nl data.xml
ABCD,ABCD 123
TESTING,123 TEST
After little bit of research i am able to resolve the issue. But thanks to https://stackoverflow.com/users/5291015/inian , https://stackoverflow.com/users/4941495/kusalananda and https://stackoverflow.com/users/548225/anubhava for helpful insights.
test.xml
<?xml version="1.0"?>
<audit>
<interfaces>
<interface_dtls>ABCD,ABCD 123</interface_dtls>
<interface_dtls>TESTING,123 TEST</interface_dtls>
</interfaces>
</audit>
Before:
#!/bin/bash
for line in `cat test.xml | grep -oP "(?<=interface_dtls>)[^<]+"`; do
echo $line --Displaying line only for debugging purpose
interface_code=`echo $line | awk -F ',' '{print $1}'`
prcdr_cd=`echo $line | awk -F ',' '{print $2}'`
hive -e "select * from table \
where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done
After:
#!/bin/bash
IFS='$\n'
for line in `cat test.xml | grep -oP "(?<=interface_dtls>)[^<]+" | cut -d '>' -f 2 | cut -d '<' -f 1`; do
echo $line --Displaying line only for debugging purpose
interface_code=$(echo $line | awk -F ',' '{print $1}')
prcdr_cd=$(echo $line | awk -F ',' '{print $2}')
hive -e "select * from table \
where sub_sys_cd='$interface_code' and data_prcdr_desc='$prcdr_cd';"
done
"ECHO" output:
ABCD,ABCD 123
TESTING,123 TEST

Insert a variable at line #1 of txt file using sed

I have the following bash:
#!/bin/bash
if ["$#" -ne "1"]; then
echo "Usage: `basename $0` <HOSTNAME>"
exit 1
fi
IPADDR=`ifconfig | head -2 | tail -1 | cut -d: -f2 | rev | cut -c8-23 | rev`
sed -i -e '1i$IPADDR $1\' /etc/hosts
But when I cat /etc/hosts:
$IPADDR
How can I deal with such issues?
Your problem is that variables inside single quotes ' aren't expanded by the shell, but left unchanged. To quote variables you want expanded use double quotes " or just leave off the quotes if they are unneeded like here, e.g.
sed -i -e '1i'$IPADDR' '$1'\' /etc/hosts
In above line $IPADDR and $1 are outside of quotes and will be expanded by the shell before the arguments are being feed to sed.
The single quotes mean the string isn't interpolated as a variable.
#!/bin/bash
IPADDR=$(/sbin/ifconfig | head -2 | tail -1 | cut -d: -f2 | rev | cut -c8-23 | rev)
sed -i -e "1i${IPADDR} ${1}" /etc/hosts
I also did the command in $(...) out of habit!
Refer to sed manual:https://www.gnu.org/software/sed/manual/sed.html
As a GNU extension, the i command and text can be separated into two
-e parameters, enabling easier scripting:
The formal usage should be:
sed -i -e "1i$IPADDR\\" -e "$1" /etc/hosts