I have a file that has a line stating
version = "12.0.08-SNAPSHOT"
The word version and quoted strings can occur on multiple lines in that file.
I am looking for a single line bash statement that can output the following string:
12.0.08-SNAPSHOT
The version can have RELEASE tag too instead of SNAPSHOT.
So to summarize, given
version = "12.0.08-SNAPSHOT"
expected output: 12.0.08-SNAPSHOT
And given
version = "12.0.08-RELEASE"
expected output: 12.0.08-RELEASE
The following command prints strings enquoted in version = "...":
grep -Po '\bversion\s*=\s*"\K.*?(?=")' yourFile
-P enables perl regexes, which allow us to use features like \K and so on.
-o only prints matched parts instead of the whole lines.
\b ensures that version starts at a word boundary and we do not match things like abcversion.
\s stands for any kind of whitespace.
\K lets grep forget, that it matched the part before \K. The forgotten part will not be printed.
.*? matches as few chararacters as possible (the matching part will be printed) ...
(?=") ... until we see a ", which won't be included in the match either (this is called a lookahead).
Not all grep implementations support the -P option. Alternatively, you can use perl, as described in this answer:
perl -nle 'print $& if m{\bversion\s*=\s*"\K.*?(?=")}' yourFile
Seems like a job for cut:
$ echo 'version = "12.0.08-SNAPSHOT"' | cut -d'"' -f2
12.0.08-SNAPSHOT
$ echo 'version = "12.0.08-RELEASE"' | cut -d'"' -f2
12.0.08-RELEASE
Portable solution:
$ echo 'version = "12.0.08-RELEASE"' |sed -E 's/.*"(.*)"/\1/g'
12.0.08-RELEASE
or even:
$ perl -pe 's/.*"(.*)"/\1/g'.
$ awk -F"\"" '{print $2}'
I have this text https://bitbucket.com/user/repo.git and I want to print repo, the content between / and .git, without including delimiters. I have this:
echo https://bitbucket.com/user/repo.git | grep -E -o '\/(.*?)\.git'
But it prints /repo.git. How can I print just repo?
Use the [^/]+(?=\.git$) pattern with -P option:
echo https://bitbucket.com/user/repo.git | grep -P -o '[^/]+(?=\.git$)'
See the online demo
The [^/]+(?=\.git$) pattern matches 1+ chars other than / that are followed with .git at the end of the string.
You can use sed to do that
echo https://bitbucket.com/user/repo.git | sed -e 's/^.\*\\/\\(.\*\\).git$/\1/g'
I am trying to parse a text like this from a log file:
[2016-01-29 11:31:33,809: WARNING/Worker-1283]
1030140:::DEAL_OF_DAY:::29:::1:::11 [2016-01-29 11:31:34,103:
WARNING/Worker-1197] 1025311:::DEAL_OF_DAY:::29:::1:::11 [2016-01-29
11:31:34,291: WARNING/Worker-1197] 1025158:::DEAL_OF_DAY:::29:::1:::11
I want to extract these numbers 1030140, 1025311, 1025158 and so on.
I have tried the following
cat deals29.txt | egrep -o '[0-9]+'
But this gives other digits as well
I tried
cat deals29.txt | egrep -o ' [0-9]+:::'
but now it gives the colons in the output as well and there is no way to capture the group in the command line version of grep.
Any suggestions? grep solution would be preferred but I can go with sed/awk as well if grep cannot do the job.
Using grep -oP and match reset \K:
grep -oP '^\[.*?\] \K\d+' file.log
1030140
1025311
1025158
If your grep doesn't support -P (PCRE) then use awk:
awk -F '\\] |:::' '{print $2}' file.log
1030140
1025311
1025158
You can train regex here : https://regex101.com/
I get
] [0-9]*
and you have to delete the first 2 chars
You could use a solution like:
(\d{3,})::
# looks for at least 3 digits (or more) followed by two colons
# puts the matched numbers in group 1
See a demo for this approach here.
I'm trying to get a pattern over multiple lines. I would like to ensure the line I'm looking for ends in \r\n and that there is specific text that comes after it at some point. The two problems I've had are I often get unmatched parenthesis in groupings or I get a positive match when there is none. Here are two simple examples.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'(\r\n)+.*TEST'
grep: Unmatched ( or \(
What exactly is unmatched there? I don't get it.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'\r\n.*TEST'
1
There is no TEST in the string, so why does this return a count of 1 for matches?
I'm using grep (GNU grep) 2.16 on Ubuntu 14. Thanks
Instead of -E you can use -P for PCRE support in gnu grep to use advanced regex like this:
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*TEST'
0
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*cd'
1
grep -E matches only in single line input.
I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.
You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'
This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234
As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.
To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?
You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'
Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234