Get the following character which match a string - regex

I'm trying to retreive a specific data returned from a command line. Here is my command line:
snmpwalk -v2c -c community localhost 1.3.6.1.2.1.2 | grep tun0
Which give me as result:
IF-MIB::ifDescr.4 = STRING: tun0
In this result I want to retreive 4. I thought using regex, but maybe there is an easier way to fetch it.
Regex I tried :
\ifDescr.\s+\K\S+ https://regex101.com/r/9X04MD/1
[\n\r].*ifDescr.\s*([^\n\r]*) https://regex101.com/r/9X04MD/2
I would like to fetch it in a single command line like
snmpwalk -v2c -c community localhost 1.3.6.1.2.1.2 | grep tun0 | ?

There are so many options that don't involve using GNU grep's experimental -P option. For example given just your sample input to work off, here's one way with any sed:
$ echo "$out" | sed 's/.*\.\([0-9]\).*tun0/\1/'
4
or any awk:
$ echo "$out" | awk -F'[. ]' '/tun0/{print $2}'
4

I'd recommend pattern (?<=ifDescr\.)[^ =]+
Explanation:
(?<=ifDescr\.) - positive lookbehind, asserts that wat is preceeding is ifDescr.
[^ =]+ match one or more characters other than space or equal sign =
Demo

Related

extract string with grep -oP

I got empty String in variable IMAGE_TAG when trying to extract a substring :R8A144 from string:
Loaded image: rcsmw-ee:R8A144
by grep -oP in Jenkins execute shell:
Here is the code:
ssh -o "StrictHostKeyChecking=no" -o UserKnownHostsFile=/dev/null eccd#${DIRECTOR_IP_NUM} '
LOADED_IMAGE=$(sudo su root -c "docker load -i rcsmw-ee-5940688.4.tar")
IMAGE_TAG=$(echo $LOADED_IMAGE | grep -oP '\(:[A-ZA]\)\w+')
echo $IMAGE_TAG
'
here is the output:
bash: command substitution: line 5: syntax error near unexpected token `('
bash: command substitution: line 5: `echo $LOADED_IMAGE | grep -oP (:[A-ZA])w+)'
Error parsing reference: "rcsmw-ee:" is not a valid repository/tag: invalid reference format
You have a whole set of commands inside single quotes, so you cannot use single quotes around the grep pattern.
Also, the "$LOADED_IMAGE" is also better used in double quotes since it may cause trouble if it contains whitespaces.
Besides, the A after A-Z is redundant, as is the capturing group, you may remove the parentheses in the pattern and use
IMAGE_TAG=$(echo "$LOADED_IMAGE" | grep -oP -m 1 ":[A-Z]\w*")
Or, using an equivalent POSIX BRE regex:
IMAGE_TAG=$(echo "$LOADED_IMAGE" | grep -o -m 1 ":[[:upper:]][[:alnum:]_]*")
Note -m 1 with grep will extract the first match only, which seems to be what you are after here.
Another solution is to use ssh with a tag as below:
ssh -o "StrictHostKeyChecking=no" -o UserKnownHostsFile=/dev/null eccd#${DIRECTOR_IP_NUM} <<'SSHTAG'
.
IMAGE_TAG=$(echo $LOADED_IMAGE | grep -oP '(:[A-ZA])\w+')
.
.
SSHTAG

Parsing log file

I am trying to parse a text like this from a log file:
[2016-01-29 11:31:33,809: WARNING/Worker-1283]
1030140:::DEAL_OF_DAY:::29:::1:::11 [2016-01-29 11:31:34,103:
WARNING/Worker-1197] 1025311:::DEAL_OF_DAY:::29:::1:::11 [2016-01-29
11:31:34,291: WARNING/Worker-1197] 1025158:::DEAL_OF_DAY:::29:::1:::11
I want to extract these numbers 1030140, 1025311, 1025158 and so on.
I have tried the following
cat deals29.txt | egrep -o '[0-9]+'
But this gives other digits as well
I tried
cat deals29.txt | egrep -o ' [0-9]+:::'
but now it gives the colons in the output as well and there is no way to capture the group in the command line version of grep.
Any suggestions? grep solution would be preferred but I can go with sed/awk as well if grep cannot do the job.
Using grep -oP and match reset \K:
grep -oP '^\[.*?\] \K\d+' file.log
1030140
1025311
1025158
If your grep doesn't support -P (PCRE) then use awk:
awk -F '\\] |:::' '{print $2}' file.log
1030140
1025311
1025158
You can train regex here : https://regex101.com/
I get
] [0-9]*
and you have to delete the first 2 chars
You could use a solution like:
(\d{3,})::
# looks for at least 3 digits (or more) followed by two colons
# puts the matched numbers in group 1
See a demo for this approach here.

How to extract a complex version number using sed?

I use sed in CentOs to extract version number and it's work fine:
echo "var/opt/test/war/test-webapp-4.1.56.war" | sed -nre 's/^[^0-9]*(([0-9]+\.)*[0-9]+).*/\1/p'
But my problem is that i am not able to extract when the version is shown like this:
var/opt/test/war/test-webapp-4.1.56-RC1.war
I want to extract the 4.1.56-RC1 if it is present.
Any ideas ?
EDIT 2
Ok to be clear take this example, with a path:
Sometimes the path contains only a serial number like this var/opt/test/war/test-webapp-4.1.56.war and sometimes it contains a series of numbers and letters like this "var/opt/test/war/test-webapp-4.1.56-RC1.war
The need is to recover either 4.1.56 or 4.1.56-RC1 depending on the version present in the path. With sed or grep, no preference.
This seems to work but the .war is shown at the end:
echo "var/opt/test/war/test-webapp-4.1.56.war" | egrep -o '[0-9]\S*'
Little unclear what you are after, but this seems to be in the general direction.
Given:
$ echo "$e"
/var/opt/test/war/test-webapp-4.1.56-RC1.war
/var/opt/test/war/test-webapp-RC1.war
Version 4.2.4 (test version)
Try:
$ echo "$e" | egrep -o '(\d+\.\d+\.\d+-?\w*)'
4.1.56-RC1
4.2.4
The following will match the first digit up to 2 digits in length ({1,2}, second up to 2 digits and the last up to 4 digits followed by anything non-space up to a space.
grep -o '[0-9]\{1,2\}.[0-9]\{1,2\}.[0-9]\{1,4\}'
Just add (-[a-zA-Z]+[0-9]+) to your regex:
echo "Version 4.2.4 (test version)" | sed -nre 's/^[^0-9]*(([0-9]+\.)*[0-9]+(-[a-zA-Z]+[0-9]+)).*/\1/p'
What about just using whitespace as the delimiter like
echo "Version 4.2.4-RC1 (test version)" | grep -Po "Version\s+\K\S+"
for grep -P says to use Perl style regex, -o shows only the matching part and the \K in the string says not to show everything before it as part of the match
This passes both tests
egrep -o '[0-9]\S*'
Unfortunately, not all greps support -o, but grep in Linux does.
echo "Version 4.2.4 (test version)" | sed 's/Version[[:space:]]*\([^[:space:](]*\).*/\1/'
But like every extraction, you need to define what you want, not what could exist and extract it (or change your request).

simulating tail -1 command by using grep command

I want to simulate tail -1 command using grep i.e. I want to print the last line of the file using grep. It can be done easily using sed or awk. but I couldn't find any option with grep
Why on earth you want to do that ? There are better tools for this task as all are suggesting.
This is the solution you wanted :
grep "^" -n filename | grep -Po "(?<=^$(grep -c "^" filename):)(.*)"
The trick is to display all lines with line numbers (-n option).
Then match the line preceding the line count of the file.
The grep -c "^" filename part gives the line count.
The -P allows to use PCRE since a positive lookbehind match is needed.
If you don't have access to -P(I doubt it), use another filtering like follows although it won't work for lines containing : character :
grep "^" -n filename | grep "^$(grep -c "^" filename):" | grep -o "[^:]*$"
The reason behind this post is to show that this can be done only using grep.
Moral : ! ( It's highly recommended )

grep: group capturing

I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.
You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'
This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234
As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.
To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?
You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'
Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234