Why do my results appear to differ between ag and grep? - regex

I'm having trouble correctly (and safely) executing the right regex searches with grep. I seem to be able to do what I want using ag
What I want to do in plain english:
Search my current directory (recursively?) for files that have lines containing both the words "nested" and "merge"
Successful attempt with ag:
$ ag --depth=2 -l "nested.*merge|merge.*nested" .
scratch.md
scratch.rb
Unsuccessful attempt with grep:
$ grep -elr 'nested.*merge|merge.*nested' .
grep: nested.*merge|merge.*nested: No such file or directory
grep: .: Is a directory
What am I missing? Also, could either approach be improved?
Thanks!

You probably want -E not -e, or just egrep.
A man grep will make you understand why -e gave you that error.

You can use grep -lr 'nested.*merge\|merge.*nested' or grep -Elr 'nested.*merge|merge.*nested' for your case.
Besides, for the latter one, E mean using ERE regular expression syntax, since grep will use BRE by default, where | will match character | and \| mean or.
For more detail about ERE and BRE, you can read this article

Related

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

I understand that what I'm asking can be accomplished using awk or sed, I'm asking here how to do this using GREP.
Given the following input:
.bash_profile
.config/ranger/bookmarks
.oh-my-zsh/README.md
I want to use GREP to get:
.bash_profile
.config/
.oh-my-zsh/
Currently I'm trying
grep -Po '([^/]*[/]?){1}'
Which results in output:
.bash_profile
.config/
ranger/
bookmarks
.oh-my-zsh/
README.md
Is there some simple way to use GREP to only get the first matched string on each line?
I think you can grep non / letters like:
grep -Eo '^[^/]+'
On another SO site there is another similar question with solution.
You don't need grep for this at all.
cut -d / -f 1
The -o option says to print every substring which matches your pattern, instead of printing each matching line. Your current pattern matches every string which doesn't contain slashes (optionally including a trailing slash); but it's easy to switch to one which only matches this pattern at the beginning of a line.
grep -o '^[^/]*' file
Notice the addition of the ^ beginning of line anchor, and the omission of the -P option (which you were not really using anyway) as well as the silly beginner error {1}.
(I should add that plain grep doesn't support parentheses or repetitions; grep -E would support these constructs just fine, of you could switch to toe POSIX BRE variation which requires a backslash to use round or curly parentheses as metacharacters. You can probably ignore these details and just use grep -E everywhere unless you really need the features of grep -P, though also be aware that -P is not portable.)

Grep or in part of a string

Good day All,
A filename can either be
abc_source_201501.csv Or,
abc_source2_201501.csv
Is it possible to do something like grep abc_source|source2_201501.csv without fully listing out filename as the filenames I'm working with are much longer than examples given to get both options?
Thanks for assistance here.
Use extended regex flag in grep.
For example:
grep -E abc_source.?_201501.csv
would source out both lines in your example. You can think of other regex patterns that would suit your data more.
You can use Bash globbing to grep in several files at once.
For example, to grep for the string "hello" in all files with a filename that starts with abc_source and ends with 201501.csv, issue this command:
grep hello abc_source*201501.csv
You can also use the -r flag, to recursively grep in all files below a given folder - for example the current folder (.).
grep -r hello .
If you are asking about patterns for file name matching in the shell, the extended globbing facility in Bash lets you say
shopt -s extglob
grep stuff abc_source#(|2)_201501.csv
to search through both files with a single glob expression.
The simplest possibility is to use brace expansion:
grep pattern abc_{source,source2}_201501.csv
That's exactly the same as:
grep pattern abc_source{,2}_201501.csv
You can use several brace patterns in a single word:
grep pattern abc_source{,2}_2015{01..04}.csv
expands to
grep pattern abc_source_201501.csv abc_source_201502.csv \
abc_source_201503.csv abc_source_201504.csv \
abc_source2_201501.csv abc_source2_201502.csv \
abc_source2_201503.csv abc_source2_201504.csv

Extract number embedded in string

So I run a curl command and grep for a keyword.
Here is the (sanitized) result:
...Dir');">Town / Village</a></th><th>Phone Number</th></tr><tr class="rowodd"><td><a href="javascript:calldialog('ASDF','&Mode=view&helloThereId=42',600,800);"...
I want to get the number 42 - a command line one-liner would be great.
search for the string helloThereId=
extract the number right beside it (42 in the above case)
Does anyone have any tips for this? Maybe some regex for numbers? I'm afraid I don't have enough experience to construct an elegant solution.
You could use grep with -P (Perl-Regexp) parameter enabled.
$ grep -oP 'helloThereId=\K\d+' file
42
$ grep -oP '(?<=helloThereId=)\d+' file
42
\K here actually does the job of positive lookbehind. \K keeps the text matched so far out of the overall regex match.
References:
http://www.regular-expressions.info/keep.html
http://www.regular-expressions.info/lookaround.html
If your grep version supports -P, (as is true for the OP, given that they're on Linux, which comes with GNU grep), Avinash Raj's answer is the way to go.
For the potential benefit of future readers, here are alternatives:
If your grep doesn't support -P, but does support -o, here's a pragmatic solution that simply extracts the number from the overall match in a 2nd step, by splitting the input into fields by =, using cut:
grep -Eo 'helloThereId=[0-9]+' in | cut -d= -f2 file
Finally, if your grep supports neither -P nor -o, use sed:
Here's a POSIX-compliant alternative, using sed with a basic regular expression (hence the need to emulate + with \{1,\} and to escape the parentheses):
sed -n 's/.*helloThereId=\([0-9]\{1,\}\).*/\1/p' file
This will work with any sed on any UNIX OS, even the pre-POSIX default sed on Solaris:
$ sed -n 's/.*helloThereId=\([0-9]*\).*/\1/p' file
42

Regex to get delimited content with egrep

I would like to get the parameter (without parantheses) of a function call with a regular expression.
I am using egrep in a bash script with cygwin.
This is what I got so far (with parantheses):
$ echo "require(catch.me)" | egrep -o '\((.*?)\)'
(catch.me)
What would be the right regex here?
http://www.greenend.org.uk/rjk/2002/06/regexp.html
What are you looking for - is a lookbehind and lookahead regular expressions.
Egrep cannot do that. grep with perl support can do that.
from man grep:
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression. This is highly experimental and grep -P may warn of unimplemented features.
So
$> echo "require(catch.me)" | grep -o -P '(?<=\().*?(?=\))'
catch.me
If you can use sed then the following would work -
echo "require(catch.me)" | sed 's/.*[^(](\(.*\))/\1/'
You can modify your existing regex to this
echo "require(catch.me)" | egrep -o 'c.*e'
Even though egrep offers this (from the man page)
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
It isn't really the correct utility. SED and AWK are masters at this. You will have much more control using either SED or AWK. :)
From the manual :
grep, egrep, fgrep - print lines matching a pattern
Basically, grep is used to print the complete line, so you won't do anything more.
What you should do is using another tool, maybe perl, for such operations.

How to use regex OR in grep in Cygwin?

I need to return results for two different matches from a single file.
grep "string1" my.file
correctly returns the single instance of string1 in my.file
grep "string2" my.file
correctly returns the single instance of string2 in my.file
but
grep "string1|string2" my.file
returns nothing
in regex test apps that syntax is correct, so why does it not work for grep in cygwin ?
Using the | character without escaping it in a basic regular expression will only match the | literal. For instance, if you have a file with contents
string1
string2
string1|string2
Using grep "string1|string2" my.file will only match the last line
$ grep "string1|string2" my.file
string1|string2
In order to use the alternation operator |, you could:
Use a basic regular expression (just grep) and escape the | character in the regular expression
grep "string1\|string2" my.file
Use an extended regular expression with egrep or grep -E, as Julian already pointed out in his answer
grep -E "string1|string2" my.file
If it is two different patterns that you want to match, you could also specify them separately in -e options:
grep -e "string1" -e "string2" my.file
You might find the following sections of the grep reference useful:
Basic vs Extended Regular Expressions
Matching Control, where it explains -e
You may need to either use egrep or grep -E. The pipe OR symbol is part of 'extended' grep and may not be supported by the basic Cygwin grep.
Also, you probably need to escape the pipe symbol.
The best and most clear way I've found is:
grep -e REG1 -e REG2 -e REG3 _FILETOGREP_
I never use pipe as it's less evident and very awkward to get working.
You can find this information by reading the fine manual: grep(1), which you can find by running 'man grep'. It describes the difference between grep and egrep, and basic and regular expressions, along with a lot of other useful information about grep.