Grep a result from Hive output log - regex

I have an output from Hive. I stored that output in a variable called match.
I am isolating the line I need from the log using the command below.
echo $(echo $match | grep "COUNT_TOTAL_MATCH")
0: jdbc:hive2://hiveaddress> . . . . . . . . . . . . . . . . . . . . . . .> +--------------------+-------+--+ | stats | _c1 | +--------------------+-------+--+ | COUNT_TOTAL_MATCH | 1000 | +--------------------+-------+--+ 0: jdbc:hive2://hiveaddress> 0: jdbc:hive2://hiveaddress>
How do I grab the 1000 value knowing it could be any other number?

You can treat | (space pipe space) as the field delimiter and print the sixth field, like this:
awk -F ' \\| ' '{ print $6 }'
Notice that the pipe has to be escaped twice.
Side note:
echo $(echo $match | grep "COUNT_TOTAL_MATCH")
can be rewritten as
grep 'COUNT_TOTAL_MATCH' <<< "$match"
No echo, no pipes, and no word splitting in $match. echo "$(command)" is always the same as just command. (Notice that quoting makes a difference, though.)
This means that you can combine your grep and awk commands into this:
awk -F ' \\| ' '/COUNT_TOTAL_MATCH/ { print $6 }' <<< "$match"

try
grep -oP 'COUNT_TOTAL_MATCH\h*\|\h*\K\d+'
\h*\|\h* optional space/tab followed by | followed by optional space/tab
\K is positive lookbehind... so only if COUNT_TOTAL_MATCH\h*\|\h* is matched
\d+ get digits
From man grep
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output
line.
-P, --perl-regexp
Interpret the pattern as a Perl-compatible regular expression (PCRE). This is highly experimental and
grep -P may warn of unimplemented features.

Related

Filtering matched content

I want to Filter all content after match with the content and bring the first value after the "."
I have an output something like this:
Output:
product: 13.6.0.35_0
More specifically, I need only the first two digits and the first digit after the dot, remembering that we should not cling to the values in the issue, but rather on the method of filtering the content.
Expected:
13.6
I tried something like:
echo "product: 13.6.0.35_0" | grep -ow '\w*13\w*'
If you need to use grep with the current logic, you can use
echo "product: 13.6.0.35_0" | grep -ow '13\.[0-9]*' | head -1
where 13\.[0-9]* matches 13, . and zero or more digits (as whole word due to w option) and head -1 gets the first match.
You may also use sed or awk:
sed -En 's/.* ([0-9]+\.[0-9]+).*/\1/p' <<< "product: 13.6.0.35_0"
awk -F'[[:space:].]' '{print $2"."$3}' <<< "product: 13.6.0.35_0"
See the online demo.
The sed command matches any text up to space, then matches the space and captures the two subsequent dot-separated numbers into Group 1 (\1) and then the rest of the line is matched and replaced with Group 1 value that is printed (as the default line output is suppressed with -n).
In the awk command, the field separator is set to whitespace and . with -F'[[:space:].]' and the {print $2"."$3} part prints the second and third field values joined with a ..
A pure shell solution using the builtin read , Parameter Expansion and curly braces for command groupings.
echo "product: 13.6.0.35_0" | { read -r _ value; echo "${value%.*.*}" ; }
You can also use cut:
echo 'product: 13.6.0.35_0' | cut -d ' ' -f2 | cut -d '.' -f1-2
13.6
I reached the expected output, it's simple but it works:
var=$(echo "product: 13.6.0.35_0" | grep -Eo '[[:digit:]]+' | sed -n 1,2p)
echo ${var} | sed 's/ /./g'

Extract sub-string from strings based on condition with shell command line

I have lines in myfile like this:
mount -t cifs //hostname/path/ /mount/path/ -o username='xxxx',password='xxxxx'
I need to extract sub-strings from this based on condition "start with // till next white-space including //".
I can't parse with the position as it won't be the same in all matched lines.
So far I have extracted the sub-string using grep's perl assertion, but the result does not return the //.
The piece of code I've used is
cat myfile | grep " cifs " | grep -oP "(?<=/)[^\s]*" | grep -v ^/
Output:
hostname/path/
Expected Output:
//hostname/path/
Is there a way to get the desired output by modifying the perl regex, perhaps some other method?
Simple bash one line solution
grep " cifs " myfile | sed -e "s/ /\n/g" | grep '^\/\/'
You may consider using some non-PCRE based solutions like
sed -En '/ cifs /{s,.*(//[^[:space:]]+).*,\1,p}' file
grep -oE '//[^[:space:]]+' file
The grep solution simply extracts all occurrences of // and 1+ non-whitespace chars after from the file.
The sed solution finds lines containing cifs and then extracts the last occurrence of // and 1+ non-whitespace chars after on those lines.
Following command should do what you ask for
grep cifs myfile | cut -d ' ' -f 4
or
grep cifs myfile | nawk '{print $4}'
or
awk '/cifs/ { print $4 }' myfile
or
perl -ne "print $1 if /cifs\s+(\S+)/" myfile

Regex w/grep against tnsnames.ora

I am trying to print out the contents of a TNS entry from the tnsnames.ora file to make sure it is correct from an Oracle RAC environment.
So if I do something like:
grep -A 4 "mydb.mydomain.com" $ORACLE_HOME/network/admin/tnsnames.ora
I will get back:
mydb.mydomain.com =
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)(HOST = myhost.mydomain.com)(PORT = 1521))
  (CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME=mydb)))
Which is what I want. Now I have an environment variable being set for the JDBC connection string by an external program when the shell script gets called like:
export $DB_URL=#myhost.mydomain.com:1521/mydb
So I need to get TNS alias mydb.mydomain.com out of the above string. I'm not sure how to do multiple matches and reorder the matches with regex and need some help.
grep #.+: $DB_URL
I assume will get the
#myhost.mydomain.com:
but I'm looking for
mydb.mydomain.com
So I'm stuck at this part. How do I get the TNS alias and then pipe/combine it with the initial grep to display the text for the TNS entry?
Thanks
update:
#mklement0 #Walter A - I tried your ways but they are not exactly what I was looking for.
echo "#myhost.mydomain.com:1521/mydb" | grep -Po "#\K[^:]*"
echo "#myhost.mydomain.com:1521/mydb" | sed 's/.*#\(.*\):.*/\1/'
echo "#myhost.mydomain.com:1521/mydb" | cut -d"#" -f2 | cut -d":" -f1
echo "#myhost.mydomain.com:1521/mydb" | tr "#:" "\t" | cut -f2
echo "#myhost.mydomain.com:1521/mydb" | awk -F'[#:]' '{ print $2 }'
All these methods get me back: myhost.mydomain.com
What I am looking for is actually: mydb.mydomain.com
Note:
- For brevity, the commands below use bash/ksh/zsh here-string syntax to send strings to stdin (<<<"$var"). If your shell doesn't support this, use printf %s "$var" | ... instead.
The following awk command will extract the desired string (mydb.mydomain.com) from $DB_URL (#myhost.mydomain.com:1521/mydb):
awk -F '[#:/]' '{ sub("^[^.]+", "", $2); print $4 $2 }' <<<"$DB_URL"
-F'[#:/]' tells awk to split the input into fields by either # or : or /. With your input, this means that the field of interest are part of the second field ($2) and the fourth field ($4). The sub() call removes the first .-based component from $2, and the print call pieces together the result.
To put it all together:
domain=$(awk -F '[#:/]' '{ sub("^[^.]+", "", $2); print $4 $2 }' <<<"$DB_URL")
grep -F -A 4 "$domain" "$ORACLE_HOME/network/admin/tnsnames.ora"
You don't strictly need intermediate variable $domain, but I've added it for clarity.
Note how -F was added to grep to specify that the search term should be treated as a literal, so that characters such as . aren't treated as regex metacharacters.
Alternatively, for more robust matching, use a regex that is anchored to the start of the line with ^, and \-escape the . chars (using shell parameter expansion) to ensure their treatment as literals:
grep -A 4 "^${domain//./\.}" "$ORACLE_HOME/network/admin/tnsnames.ora"
You can get a part of a string with
# Only GNU-grep
echo "#myhost.mydomain.com:1521/mydb" | grep -Po "#\K[^:]*"
# or
echo "#myhost.mydomain.com:1521/mydb" | sed 's/.*#\(.*\):.*/\1/'
# or
echo "#myhost.mydomain.com:1521/mydb" | cut -d"#" -f2 | cut -d":" -f1
# or, when the string already is in a var
echo "${DB_URL#*#}" | cut -d":" -f1
# or using a temp var
tmpvar="${DB_URL#*#}"
echo "${tmpvar%:*}"
I had skipped the alternative awk, that was given by #mklement0 already:
echo "#myhost.mydomain.com:1521/mydb" | awk -F'[#:]' '{ print $2 }'
The awk solution is straight-forward, when you want to use the same approach without awk you can do something like
echo "#myhost.mydomain.com:1521/mydb" | tr "#:" "\t" | cut -f2
or the ugly
echo "#myhost.mydomain.com:1521/mydb" | (IFS='#:' read -r _ url _; echo "$url")
What is happening here?
After introducing the new IFS I want to take the second word of the input. The first and third word(s) are caught in the dummy var's _ (you could have named them dummyvar1 and dummyvar2). The pipe | creates a subprocess, so you need ()to hold reading and displaying the var url in the same process.

Which characters to escape to match these in find regex expression in Bourne shell?

I writing a little bourne shell script which load a conf file content a string, this string is uses in find (after some awk tricks) like this following example:
original string:
rx='~ #'
find command:
find -regex "^.*~$\|^.*#$"
EDIT: the original string is in a conf file, so the problem is when the string content special characters as "*.".. Exemple:
original string (with characters to escape):
rx='~ # $*'
EDIT2: I trying to match any file ended by word in rx (separates with space). If rx="st ar", I want to match with "test" and "bar". But if the word content any characters as * $, my regex doesn't work properly.. So, I wanted to know which is all characters that I have to escape to make it work..
Thank's ! :)
As I understand it, you want to split your string on spaces, and match any substring from that split.
The irc.freenode.org #bash channel has a factoid providing a function for performing quoting, used below with some minor tweaks for POSIX compatibility:
requote() { printf '%s\n' "$1" | sed 's/[^^]/[&]/g; s/\^/\\^/g'; }
input_string='hello# cruel*world how~are~you'
output_string=$(printf '%s\n' "$input_string" | tr ' ' '\n' | {
out_s=''
while read -r line; do
if [ -n "$out_s" ]; then
out_s="${out_s}|$(requote "$line")"
else
out_s="$(requote "$line")"
fi
done
printf '%s\n' "$out_s"
})
find . -regex ".*(${output_string}).*"
Ok, thank's to Charles Duffy, I understand that the good method is to encapsule any characters in "[]" to make there safe in a regex. Except for '^', we make it like this '\^'. here's what I did bases on the answer of Mr. Duffy.
So, I have an init string and I want to match with any words in this string.
Init string (emacs tmp and example for this trick)
rx=' ~ # oo ^ '
First, I trim the strign like this:
rx=`printf '%s\n' "$rx" | awk '{$1=$1};1'`
==> rx='~ # oo ^'
Second, I do the sed trick of Duffy with some change to apply in my case:
rx=`printf '%s\n' "$rx" | sed 's/[[:blank:]]/ /g; s/[^^ ]/[&]/g; s/\^/\\^/g'`;
==> rx='[~] [#] [oo] [^]'
Third, I apply a little awk command to make a regex:
rx=`printf '%s\n' "$rx" | awk '{ gsub(" ", "$\\|^.*", $0); print "^.*"$0"$" }'`;
==> rx='^.*[~]$\|^.*[#]$\|^.*[o][o]$\|^.*\^$'
Finally, I just exec my find command like this:
find -regex "$rx"
et voilà !
BTW, i'm doing this:
rx=`printf '%s\n' "$rx" | awk '{$1=$1};1 | sed 's/[[:blank:]]/ /g; s/[^^ ]/[&]/g; s/\^/\\^/g' | awk '{ gsub(" ", "$\\|^.*", $0);'

grep to select strings that contains certain words

I have a list:
/device1/element1/CmdDiscovery
/device1/element1/CmdReaction
/device1/element1/Direction
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
How can I grep so that the returned strings containing only "Field" followed by digits or simply NRepeatLeft at the end of string (in my example it will be the last three strings)?
Expected output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
Try doing this :
grep -E "(Field[0-9]*|NRepeatLeft$)" file.txt
| | | ||
| | OR end_line |
| opening_choice closing_choice
extented_grep
if you don't have -E switch (stands for ERE : Extented Regex Expression):
grep "\(Field[0-9]*\|NRepeatLeft$\)" file.txt
OUTPUT
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
That will grep for lines matching Field[0-9] or lines matching RepeatLeft at the end. Is it what you expect ?
I am not much sure of how to use grep for your purpose.Probably you would like perl for this:
perl -lne 'if(/Field[\d]+/ or /NRepeatLeft/){print}' your_file
$ grep -E '(Field[0-9]*|NRepeatLeft)$' file.txt
Output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
Explanation:
Field # Match the literal word
[0-9]* # Followed by any number of digits
| # Or
NRepeatLeft # Match the literal word
$ # Match the end of the string
You can see how this works with your example here.