extract string with grep -oP - regex

I got empty String in variable IMAGE_TAG when trying to extract a substring :R8A144 from string:
Loaded image: rcsmw-ee:R8A144
by grep -oP in Jenkins execute shell:
Here is the code:
ssh -o "StrictHostKeyChecking=no" -o UserKnownHostsFile=/dev/null eccd#${DIRECTOR_IP_NUM} '
LOADED_IMAGE=$(sudo su root -c "docker load -i rcsmw-ee-5940688.4.tar")
IMAGE_TAG=$(echo $LOADED_IMAGE | grep -oP '\(:[A-ZA]\)\w+')
echo $IMAGE_TAG
'
here is the output:
bash: command substitution: line 5: syntax error near unexpected token `('
bash: command substitution: line 5: `echo $LOADED_IMAGE | grep -oP (:[A-ZA])w+)'
Error parsing reference: "rcsmw-ee:" is not a valid repository/tag: invalid reference format

You have a whole set of commands inside single quotes, so you cannot use single quotes around the grep pattern.
Also, the "$LOADED_IMAGE" is also better used in double quotes since it may cause trouble if it contains whitespaces.
Besides, the A after A-Z is redundant, as is the capturing group, you may remove the parentheses in the pattern and use
IMAGE_TAG=$(echo "$LOADED_IMAGE" | grep -oP -m 1 ":[A-Z]\w*")
Or, using an equivalent POSIX BRE regex:
IMAGE_TAG=$(echo "$LOADED_IMAGE" | grep -o -m 1 ":[[:upper:]][[:alnum:]_]*")
Note -m 1 with grep will extract the first match only, which seems to be what you are after here.

Another solution is to use ssh with a tag as below:
ssh -o "StrictHostKeyChecking=no" -o UserKnownHostsFile=/dev/null eccd#${DIRECTOR_IP_NUM} <<'SSHTAG'
.
IMAGE_TAG=$(echo $LOADED_IMAGE | grep -oP '(:[A-ZA])\w+')
.
.
SSHTAG

Related

Get the following character which match a string

I'm trying to retreive a specific data returned from a command line. Here is my command line:
snmpwalk -v2c -c community localhost 1.3.6.1.2.1.2 | grep tun0
Which give me as result:
IF-MIB::ifDescr.4 = STRING: tun0
In this result I want to retreive 4. I thought using regex, but maybe there is an easier way to fetch it.
Regex I tried :
\ifDescr.\s+\K\S+ https://regex101.com/r/9X04MD/1
[\n\r].*ifDescr.\s*([^\n\r]*) https://regex101.com/r/9X04MD/2
I would like to fetch it in a single command line like
snmpwalk -v2c -c community localhost 1.3.6.1.2.1.2 | grep tun0 | ?
There are so many options that don't involve using GNU grep's experimental -P option. For example given just your sample input to work off, here's one way with any sed:
$ echo "$out" | sed 's/.*\.\([0-9]\).*tun0/\1/'
4
or any awk:
$ echo "$out" | awk -F'[. ]' '/tun0/{print $2}'
4
I'd recommend pattern (?<=ifDescr\.)[^ =]+
Explanation:
(?<=ifDescr\.) - positive lookbehind, asserts that wat is preceeding is ifDescr.
[^ =]+ match one or more characters other than space or equal sign =
Demo

Print matching regex group in grep

I have this text https://bitbucket.com/user/repo.git and I want to print repo, the content between / and .git, without including delimiters. I have this:
echo https://bitbucket.com/user/repo.git | grep -E -o '\/(.*?)\.git'
But it prints /repo.git. How can I print just repo?
Use the [^/]+(?=\.git$) pattern with -P option:
echo https://bitbucket.com/user/repo.git | grep -P -o '[^/]+(?=\.git$)'
See the online demo
The [^/]+(?=\.git$) pattern matches 1+ chars other than / that are followed with .git at the end of the string.
You can use sed to do that
echo https://bitbucket.com/user/repo.git | sed -e 's/^.\*\\/\\(.\*\\).git$/\1/g'

grep with extended regex over multiple lines

I'm trying to get a pattern over multiple lines. I would like to ensure the line I'm looking for ends in \r\n and that there is specific text that comes after it at some point. The two problems I've had are I often get unmatched parenthesis in groupings or I get a positive match when there is none. Here are two simple examples.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'(\r\n)+.*TEST'
grep: Unmatched ( or \(
What exactly is unmatched there? I don't get it.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'\r\n.*TEST'
1
There is no TEST in the string, so why does this return a count of 1 for matches?
I'm using grep (GNU grep) 2.16 on Ubuntu 14. Thanks
Instead of -E you can use -P for PCRE support in gnu grep to use advanced regex like this:
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*TEST'
0
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*cd'
1
grep -E matches only in single line input.

Extract IPv4 and IPv6 Address Ranges in Bash?

I'm writing a bash script in which I need to extract IPv4 and IPv6 Address Ranges from multiple strings and then format it as per the requirements before saving to the file.
I've got the regex working fine: http://regexr.com?38jsb (Not optimized, roughly added)
However, with bash it throws an error if i use with egrep which states egrep: repetition-operator operand invalid
Here's my bash script:
#!/bin/bash
regex="(?>(?>([a-f\d]{1,4})(?>:(?1)){3}|(?!(?:.*[a-f\d](?>:|$)){})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f\d]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?4)){3}))\/\d{1,2}"
echo "v=abc ip4:127.0.0.1/19 ip4:192.168.1.1/32 ip4:192.168.2.50/20 ip6:2001:4860:4000::/36 ip6:2404:6800:4000::/36 ip6:2607:f8b0:4000::/36 ip6:2800:3f0:4000::/36 ip6:2a00:1450:4000::/36 ip6:2c0f:fb50:4000::/36 ~all" | egrep -o $regex
How can i extract both type of IP ranges in bash? What's a better solution?
Note: I'm using sample data for testing purpose
First, single-quote the regex variable assignment (regex='...').
Then, use grep -Po (and double-quote $regex), as #BroSlow suggests (note that -P is not available on all platforms (e.g., OSX)) -- -P activates support for PCREs (Perl-Compatible Regular Expressions), which is required for your regex.
To put it all together:
regex='(?>(?>([a-f\d]{1,4})(?>:(?1)){3}|(?!(?:.*[a-f\d](?>:|$)){})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f\d]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?4)){3}))\/\d{1,2}'
txt="v=abc ip4:127.0.0.1/19 ip4:192.168.1.1/32 ip4:192.168.2.50/20 ip6:2001:4860:4000::/36 ip6:2404:6800:4000::/36 ip6:2607:f8b0:4000::/36 ip6:2800:3f0:4000::/36 ip6:2a00:1450:4000::/36 ip6:2c0f:fb50:4000::/36 ~all"
echo "$txt" | grep -Po "$regex"
Alternative: Following #l'L'l's example, here's a greatly simplified solution that works with the sample data (again relies on -P):
echo "$txt" | grep -Po '\bip[46]:\K[^ ]+'
Variant for OSX, where grep doesn't support -P:
echo "$txt" | egrep -o '\<ip[46]:[^ ]+' | cut -c 5-
This pattern should work in combination with sed:
str="v=abc ip4:127.0.0.1/19 ip4:192.168.1.1/32 ip4:192.168.2.50/20 ip6:2001:4860:4000::/36 ip6:2404:6800:4000::/36 ip6:2607:f8b0:4000::/36 ip6:2800:3f0:4000::/36 ip6:2a00:1450:4000::/36 ip6:2c0f:fb50:4000::/36 ~all"
echo $str | grep -s -i -o "ip[0-9]\:[a-z0-9\.:/]*" --color=always | sed 's/ip[0-9]\://g'
output:
127.0.0.1/19
192.168.1.1/32
192.168.2.50/20
2001:4860:4000::/36
2404:6800:4000::/36
2607:f8b0:4000::/36
2800:3f0:4000::/36
2a00:1450:4000::/36
2c0f:fb50:4000::/36
omit the --color=always to exclude color output if desired.

grep: group capturing

I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.
You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'
This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234
As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.
To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?
You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'
Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234