I get a blank echo from regex

I get a blank echo from regex - regex

If I run this file, it works fine and outputs the lines I expect:
workspaceFile=`cat tensorflow/workspace.bzl`
echo $workspaceFile | grep -oP '\/[a-z0-9]{12}.tar.gz'
However, if I run this, all I get is blank output in the terminal:
workspaceFile=`cat tensorflow/workspace.bzl`
TAR_FILE_WITH_SLASH=$workspaceFile | grep -oP '\/[a-z0-9]{12}.tar.gz'
echo $TAR_FILE_WITH_SLASH
The file is quite long so I'll add a shortened version here for simplicity's sake:
tf_http_archive(
name = "eigen_archive",
urls = [
"https://mirror.bazel.build/bitbucket.org/eigen/eigen/get/6913f0cf7d06.tar.gz",
"https://bitbucket.org/eigen/eigen/get/6913f0cf7d06.tar.gz",
],

You need to use $() syntax, echo the contents of workspaceFile and then pipe the grep command:
TAR_FILE_WITH_SLASH="$(echo $workspaceFile | grep -oE '/[a-z0-9]{12}\.tar\.gz')"
Also, note you need no PCRE regex here, you can use a POSIX ERE regex (that is, replace P with E). You may even use a POSIX BRE pattern here, like grep -o '/[a-z0-9]\{12\}\.tar\.gz'. The dot must be escaped to match a literal dot and the / is not special here and needs no escaping.
See the online demo.

What's about the path?
workspaceFile=`cat ~/tensorflow/workspace.bzl`

Related

Grep regex not working with square brackets

So I was trying to write a regex in grep to match square brackets, i.e [ad] should match [ and ]. But I was getting different results on using capturing groups and character classes. Also the result is different on putting ' in the beginning and end of regex string.
So these are the different result that I am getting.
Using capturing groups works fine
echo "[ad]" | grep -E '(\[|\])'
[ad]
Using capturing groups without ' gives syntax error
echo "[ad]" | grep -E (\[|\])
bash: syntax error near unexpected token `('
using character class with [ followed by ] gives no output
echo "[ad]" | grep -E [\[\]]
Using character class with ] followed by [ works correctly
echo "[ad]" | grep -E [\]\[]
[ad]
Using character class with ] followed by [ and using ' does not work
echo "[ad]" | grep -E '[\]\[]'
It'd be great if someone could explain the difference between them.

You should know about:
BRE ( = Basic Regular Expression )
ERE ( = Extended Regular Expression )
BRE metacharacters require a backslash to give them their special meaning and grep is based on
The ERE flavor standardizes a flavor similar to the one used by the UNIX egrep command.
Pay attention to -E and -G
grep --help
Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE or standard input.
PATTERN is, by default, a basic regular expression (BRE).
Example: grep -i 'hello world' menu.h main.c
Regexp selection and interpretation:
-E, --extended-regexp PATTERN is an extended regular expression (ERE)
-F, --fixed-strings PATTERN is a set of newline-separated strings
-G, --basic-regexp PATTERN is a basic regular expression (BRE)
-P, --perl-regexp PATTERN is a Perl regular expression
...
...
POSIX Basic Regular Expressions
POSIX Extended Regular Expressions
POSIX Bracket Expressions
And you should also know about bash, since some of your input is related to bash interpreter not grep or anything else
echo "[ad]" | grep -E (\[|\])
Here bash assumes you try to use () something like:
echo $(( 10 * 10 ))
and by using single quote ' you tell the bash that you do not want it treats as a special operator for it. So
echo "[ad]" | grep -E '(\[|\])'
is correct.

Firstly, always quote Regex pattern to prevent shell interpretation beforehand:
$ echo "[ad]" | grep -E '(\[|\])'
[ad]
Secondly, within [] surrounded by quotes, you don't need to escape the [] inside, just write them as is within the outer []:
$ echo "[ad]" | grep -E '[][]'
[ad]

Maybe you provided such a simple example on purpose (after all, it is minimal), but in case all you really want is to check for existence of square brackets (a fixed string, not regex pattern), you can use grep with -F/--fixed-strings and multiple -e options:
$ echo "[ad]" | grep -F -e '[' -e ']'
[ad]
Or, a little bit shorter with fgrep:
$ echo "[ad]" | fgrep -e '[' -e ']'
[ad]
Or, even:
$ echo "[ad]" | fgrep -e[ -e]
[ad]

Extract version using grep/regex in bash

I have a file that has a line stating
version = "12.0.08-SNAPSHOT"
The word version and quoted strings can occur on multiple lines in that file.
I am looking for a single line bash statement that can output the following string:
12.0.08-SNAPSHOT
The version can have RELEASE tag too instead of SNAPSHOT.
So to summarize, given
version = "12.0.08-SNAPSHOT"
expected output: 12.0.08-SNAPSHOT
And given
version = "12.0.08-RELEASE"
expected output: 12.0.08-RELEASE

The following command prints strings enquoted in version = "...":
grep -Po '\bversion\s*=\s*"\K.*?(?=")' yourFile
-P enables perl regexes, which allow us to use features like \K and so on.
-o only prints matched parts instead of the whole lines.
\b ensures that version starts at a word boundary and we do not match things like abcversion.
\s stands for any kind of whitespace.
\K lets grep forget, that it matched the part before \K. The forgotten part will not be printed.
.*? matches as few chararacters as possible (the matching part will be printed) ...
(?=") ... until we see a ", which won't be included in the match either (this is called a lookahead).
Not all grep implementations support the -P option. Alternatively, you can use perl, as described in this answer:
perl -nle 'print $& if m{\bversion\s*=\s*"\K.*?(?=")}' yourFile

Seems like a job for cut:
$ echo 'version = "12.0.08-SNAPSHOT"' | cut -d'"' -f2
12.0.08-SNAPSHOT
$ echo 'version = "12.0.08-RELEASE"' | cut -d'"' -f2
12.0.08-RELEASE

Portable solution:
$ echo 'version = "12.0.08-RELEASE"' |sed -E 's/.*"(.*)"/\1/g'
12.0.08-RELEASE
or even:
$ perl -pe 's/.*"(.*)"/\1/g'.
$ awk -F"\"" '{print $2}'

grep regex with backtick matches all lines

$ cat file
anna
amma
kklks
ksklaii
$ grep '\`' file
anna
amma
kklks
ksklaii
Why? How is that match working ?

This appears to be a GNU extension for regular expressions. The backtick ('\`') anchor matches the very start of a subject string, which explains why it is matching all lines. OS X apparently doesn't implement the GNU extensions, which would explain why your example doesn't match any lines there. See http://www.regular-expressions.info/gnu.html
If you want to match an actual backtick when the GNU extensions are in effect, this works for me:
grep '[`]' file

twm's answer provides the crucial pointer, but note that it is the sequence \`, not ` by itself that acts as the start-of-input anchor in GNU regexes.
Thus, to match a literal backtick in a regex specified as a single-quoted shell string, you don't need any escaping at all, neither with GNU grep nor with BSD/macOS grep:
$ { echo 'ab'; echo 'c`d'; } | grep '`'
c`d
When using double-quoted shell strings - which you should avoid for regexes, for reasons that will become obvious - things get more complicated, because you then must escape the ` for the shell's sake in order to pass it through as a literal to grep:
$ { echo 'ab'; echo 'c`d'; } | grep "\`"
c`d
Note that, after the shell has parsed the "..." string, grep still only sees `.
To recreate the original command with a double-quoted string with GNU grep:
$ { echo 'ab'; echo 'c`d'; } | grep "\\\`" # !! BOTH \ and ` need \-escaping
ab
c`d
Again, after the shell's string parsing, grep sees just \`, which to GNU grep is the start-of-the-input anchor, so all input lines match.
Also note that since grep processes input line by line, \` has the same effect as ^ the start-of-a-line anchor; with multi-line input, however - such as if you used grep -z to read all lines at once - \` only matches the very start of the whole string.
To BSD/macOS grep, \` simply escapes a literal `, so it only matches input lines that contain that character.

grep part of text from ps output with regex

From ps -ef command output -Dorg.xxx.yyy=/home/user/aaa/server.log.
I'd like to extract the file path /home/user/aaa/server.log (can be any name.file).
Now, I'm using command:
ps -ef | grep -Po '(?<=-Dorg.xxx.yyy=)[^\s]*'
It will display two matched results:
/home/user/aaa/server.log
)[^\s]*
It looks like it counts the command as well for the 2nd matched result. How can I remove it? Or is there other suggestions? (I can not use -m1).

If you just need the file name, use \K operator:
org\.xxx\.yyy=\K[^\s]*
ps -ef | grep -Po 'org\.xxx\.yyy=\K[^\s]*'
It will match the whole string, but will only print the file name matched with [^\s]*.
From perlre:
There is a special form of this construct, called \K (available since
Perl 5.10.0), which causes the regex engine to "keep" everything it
had matched prior to the \K and not include it in $& . This
effectively provides variable-length look-behind.

Use that:
grep -Po '(?<=-[D]org.xxx.yyy=)[^\s]*'
Just put one of the characters in square brackets ([D]). The meaning of the regex hasn't changed and the pattern doesn't match itself anymore.

Bash (grep) regex performing unexpectedly

I have a text file, which contains a date in the form of dd/mm/yyyy (e.g 20/12/2012).
I am trying to use grep to parse the date and show it in the terminal, and it is successful,
until I meet a certain case:
These are my test cases:
grep -E "\d*" returns 20/12/2012
grep -E "\d*/" returns 20/12/2012
grep -E "\d*/\d*" returns 20/12/2012
grep -E "\d*/\d*/" returns nothing
grep -E "\d+" also returns nothing
Could someone explain to me why I get this unexpected behavior?
EDIT: I get the same behavior if I substitute the " (weak quotes) for ' (strong quotes).

The syntax you used (\d) is not recognised by Bash's Extended regex.
Use grep -P instead which uses Perl regex (PCRE). For example:
grep -P "\d+/\d+/\d+" input.txt
grep -P "\d{2}/\d{2}/\d{4}" input.txt # more restrictive
Or, to stick with extended regex, use [0-9] in place of \d:
grep -E "[0-9]+/[0-9]+/[0-9]" input.txt
grep -E "[0-9]{2}/[0-9]{2}/[0-9]{4}" input.txt # more restrictive

You could also use -P instead of -E which allows grep to use the PCRE syntax
grep -P "\d+/\d+" file
does work too.

grep and egrep/grep -E don't recognize \d. The reason your first three patterns work is because of the asterisk that makes \d optional. It is actually not found.
Use [0-9] or [[:digit:]].

To help troubleshoot cases like this, the -o flag can be helpful as it shows only the matched portion of the line. With your original expressions:
grep -Eo "\d*" returns nothing - a clue that \d isn't doing what you thought it was.
grep -Eo "\d*/" returns / (twice) - confirmation that \d isn't matching while the slashes are.
As noted by others, the -P flag solves the issue by recognizing "\d", but to clarify Explosion Pills' answer, you could also use -E as follows:
grep -Eo "[[:digit:]]*/[[:digit:]]*/" returns 20/12/
EDIT: Per a comment by #shawn-chin (thanks!), --color can be used similarly to highlight the portions of the line that are matched while still showing the entire line:
grep -E --color "[[:digit:]]*/[[:digit:]]*/" returns 20/12/2012 (can't do color here, but the bold "20/12/" portion would be in color)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

I get a blank echo from regex - regex

What's about the path? workspaceFile=`cat ~/tensorflow/workspace.bzl`

Related

Grep regex not working with square brackets

Extract version using grep/regex in bash

grep regex with backtick matches all lines

grep part of text from ps output with regex

Bash (grep) regex performing unexpectedly

Categories

Resources