Regex using sed and or grep

Regex using sed and or grep - regex

How can I display the arch and version of queried rpm package using sed or grep?
[root#kitchen-vm-centos6-box boot]# rpm -qa | grep kernel-devel
kernel-devel-2.6.32-642.11.1.el6.x86_64
kernel-devel-2.6.32-696.10.2.el6.x86_64
What i need only is:
2.6.32-642.11.1.el6.x86_64
What is missing in my sed? => sed 's/[^\.]\+\.//'
Thanks in advance!

You can also use cut:
rpm -qa | grep kernel-devel | cut -d \- -f 3-4

You can use sed as this and avoid en extra grep:
rpm -qa | sed '/kernel-devel/s/^[^0-9]*//'
2.6.32-642.11.1.el6.x86_64
2.6.32-696.10.2.el6.x86_64

Your sed removes the first dot after the first "2", because it's matched by the regex you provided.
You can fix easily by making the regex more explicit.
Other answers already suggested solutions, here's another one using grep:
$ rpm -qa | grep -oP "devel-\K(.*)"
2.6.32-642.11.1.el6.x86_64
2.6.32-696.10.2.el6.x86_64
\K tells the engine to pretend that the match attempt started at this position (that's the alternative that Perl suggested for lookbehind).

You can do it with grep only:
rpm -qa | grep -P -o '(?<=kernel-devel-).*'
Explanation:
-o is match only. I.e. grep will return the matched part only
-P is perl regex mode. It enables lookarounds.
(?<=...) is lookbehind. I.e. stuff before the match. This is not part of the match so -o is not going to retain it
Of course, sed can help too:
rpm -qa | grep 'kernel-devel' | sed 's/^[^.0-9]*-//g'
Explanation:
^ matches the start of the string
[^.0-9] matches the non-dot, non-number characters from the start of the string. This is the part that we don't need.
The //g ending of the sed command replaces the matched part with empty string

One in awk:
$ rpm -qa | awk 'match($0,/^kernel-devel-./){print substr($0,RLENGTH)}'
2.6.32-642.11.1.el6.x86_64
2.6.32-696.10.2.el6.x86_64
Explained:
match($0,/^kernel-devel-./) { # if the record starts with kernel-devel-[ANYTHING]
print substr($0,RLENGTH) # print starting from the [ANYTHING]
}

Related

Extract version using grep/regex in bash

I have a file that has a line stating
version = "12.0.08-SNAPSHOT"
The word version and quoted strings can occur on multiple lines in that file.
I am looking for a single line bash statement that can output the following string:
12.0.08-SNAPSHOT
The version can have RELEASE tag too instead of SNAPSHOT.
So to summarize, given
version = "12.0.08-SNAPSHOT"
expected output: 12.0.08-SNAPSHOT
And given
version = "12.0.08-RELEASE"
expected output: 12.0.08-RELEASE

The following command prints strings enquoted in version = "...":
grep -Po '\bversion\s*=\s*"\K.*?(?=")' yourFile
-P enables perl regexes, which allow us to use features like \K and so on.
-o only prints matched parts instead of the whole lines.
\b ensures that version starts at a word boundary and we do not match things like abcversion.
\s stands for any kind of whitespace.
\K lets grep forget, that it matched the part before \K. The forgotten part will not be printed.
.*? matches as few chararacters as possible (the matching part will be printed) ...
(?=") ... until we see a ", which won't be included in the match either (this is called a lookahead).
Not all grep implementations support the -P option. Alternatively, you can use perl, as described in this answer:
perl -nle 'print $& if m{\bversion\s*=\s*"\K.*?(?=")}' yourFile

Seems like a job for cut:
$ echo 'version = "12.0.08-SNAPSHOT"' | cut -d'"' -f2
12.0.08-SNAPSHOT
$ echo 'version = "12.0.08-RELEASE"' | cut -d'"' -f2
12.0.08-RELEASE

Portable solution:
$ echo 'version = "12.0.08-RELEASE"' |sed -E 's/.*"(.*)"/\1/g'
12.0.08-RELEASE
or even:
$ perl -pe 's/.*"(.*)"/\1/g'.
$ awk -F"\"" '{print $2}'

Print matching regex group in grep

I have this text https://bitbucket.com/user/repo.git and I want to print repo, the content between / and .git, without including delimiters. I have this:
echo https://bitbucket.com/user/repo.git | grep -E -o '\/(.*?)\.git'
But it prints /repo.git. How can I print just repo?

Use the [^/]+(?=\.git$) pattern with -P option:
echo https://bitbucket.com/user/repo.git | grep -P -o '[^/]+(?=\.git$)'
See the online demo
The [^/]+(?=\.git$) pattern matches 1+ chars other than / that are followed with .git at the end of the string.

You can use sed to do that
echo https://bitbucket.com/user/repo.git | sed -e 's/^.\*\\/\\(.\*\\).git$/\1/g'

Parsing log file

I am trying to parse a text like this from a log file:
[2016-01-29 11:31:33,809: WARNING/Worker-1283]
1030140:::DEAL_OF_DAY:::29:::1:::11 [2016-01-29 11:31:34,103:
WARNING/Worker-1197] 1025311:::DEAL_OF_DAY:::29:::1:::11 [2016-01-29
11:31:34,291: WARNING/Worker-1197] 1025158:::DEAL_OF_DAY:::29:::1:::11
I want to extract these numbers 1030140, 1025311, 1025158 and so on.
I have tried the following
cat deals29.txt | egrep -o '[0-9]+'
But this gives other digits as well
I tried
cat deals29.txt | egrep -o ' [0-9]+:::'
but now it gives the colons in the output as well and there is no way to capture the group in the command line version of grep.
Any suggestions? grep solution would be preferred but I can go with sed/awk as well if grep cannot do the job.

Using grep -oP and match reset \K:
grep -oP '^\[.*?\] \K\d+' file.log
1030140
1025311
1025158
If your grep doesn't support -P (PCRE) then use awk:
awk -F '\\] |:::' '{print $2}' file.log
1030140
1025311
1025158

You can train regex here : https://regex101.com/
I get
] [0-9]*
and you have to delete the first 2 chars

You could use a solution like:
(\d{3,})::
# looks for at least 3 digits (or more) followed by two colons
# puts the matched numbers in group 1
See a demo for this approach here.

grep with extended regex over multiple lines

I'm trying to get a pattern over multiple lines. I would like to ensure the line I'm looking for ends in \r\n and that there is specific text that comes after it at some point. The two problems I've had are I often get unmatched parenthesis in groupings or I get a positive match when there is none. Here are two simple examples.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'(\r\n)+.*TEST'
grep: Unmatched ( or \(
What exactly is unmatched there? I don't get it.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'\r\n.*TEST'
1
There is no TEST in the string, so why does this return a count of 1 for matches?
I'm using grep (GNU grep) 2.16 on Ubuntu 14. Thanks

Instead of -E you can use -P for PCRE support in gnu grep to use advanced regex like this:
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*TEST'
0
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*cd'
1
grep -E matches only in single line input.

grep: group capturing

I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.

You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'

This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2

I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234

As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.

To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?

You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'

Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex using sed and or grep - regex

You can also use cut: rpm -qa | grep kernel-devel | cut -d \- -f 3-4

You can use sed as this and avoid en extra grep: rpm -qa | sed '/kernel-devel/s/^[^0-9]*//' 2.6.32-642.11.1.el6.x86_64 2.6.32-696.10.2.el6.x86_64

One in awk: $ rpm -qa | awk 'match($0,/^kernel-devel-./){print substr($0,RLENGTH)}' 2.6.32-642.11.1.el6.x86_64 2.6.32-696.10.2.el6.x86_64 Explained: match($0,/^kernel-devel-./) { # if the record starts with kernel-devel-[ANYTHING] print substr($0,RLENGTH) # print starting from the [ANYTHING] }

Related

Extract version using grep/regex in bash

Print matching regex group in grep

Parsing log file

grep with extended regex over multiple lines

grep: group capturing

Categories

Resources