zshell grep negative lookbehind - regex

I'm trying to play around w/ a negative lookbehind regex, but I can't seem to get it to work in my zshell. Am I doing this wrong?
echo "Nate or nate" | grep "(\?<!N)a"
This should match the a in nate but NOT the a in Nate...right?

When I think of lookahead or lookbehind assertions, I think of Perl. You will need to use perl-regexp and single quotes to find the a in nate:
echo "Nate or nate" | grep -P '(?<!N)a'

It should. However, grep will print out any line with a match.
If you'd like grep to print out only the parts of the line it matches, you should give it the -o option.

There are a number of different regex flavours, but the regex for grep should probably look like this: "(?<!N)a".

First off you want to use single quotes (double quotes in zsh will try to expand the !N), you probably want extended regexen (grep -E). Also depending on your version of grep, it may not support 0-width assertions at all, check your man 7 re_format.

Related

Regex whitespace before character [duplicate]

I am attempting to grep for all instances of Ui\. not followed by Line or even just the letter L
What is the proper way to write a regex for finding all instances of a particular string NOT followed by another string?
Using lookaheads
grep "Ui\.(?!L)" *
bash: !L: event not found
grep "Ui\.(?!(Line))" *
nothing
Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep. You need a PCRE-enabled grep.
If you have GNU grep, the current version supports options -P or --perl-regexp and you can then use the regex you wanted.
If you don't have (a sufficiently recent version of) GNU grep, then consider getting ack.
The answer to part of your problem is here, and ack would behave the same way:
Ack & negative lookahead giving errors
You are using double-quotes for grep, which permits bash to "interpret ! as history expand command."
You need to wrap your pattern in SINGLE-QUOTES:
grep 'Ui\.(?!L)' *
However, see #JonathanLeffler's answer to address the issues with negative lookaheads in standard grep!
You probably cant perform standard negative lookaheads using grep, but usually you should be able to get equivalent behaviour using the "inverse" switch '-v'. Using that you can construct a regex for the complement of what you want to match and then pipe it through 2 greps.
For the regex in question you might do something like
grep 'Ui\.' * | grep -v 'Ui\.L'
(Edit: this is not as strong as a true lookahead, but can often be used to work around the problem.)
If you need to use a regex implementation that doesn't support negative lookaheads and you don't mind matching extra character(s)*, then you can use negated character classes [^L], alternation |, and the end of string anchor $.
In your case grep 'Ui\.\([^L]\|$\)' * does the job.
Ui\. matches the string you're interested in
\([^L]\|$\) matches any single character other than L or it matches the end of the line: [^L] or $.
If you want to exclude more than just one character, then you just need to throw more alternation and negation at it. To find a not followed by bc:
grep 'a\(\([^b]\|$\)\|\(b\([^c]\|$\)\)\)' *
Which is either (a followed by not b or followed by the end of the line: a then [^b] or $) or (a followed by b which is either followed by not c or is followed by the end of the line: a then b, then [^c] or $.
This kind of expression gets to be pretty unwieldy and error prone with even a short string. You could write something to generate the expressions for you, but it'd probably be easier to just use a regex implementation that supports negative lookaheads.
*If your implementation supports non-capturing groups then you can avoid capturing extra characters.
If your grep doesn't support -P or --perl-regexp, and you can install PCRE-enabled grep, e.g. "pcregrep", than it won't need any command-line options like GNU grep to accept Perl-compatible regular expressions, you just run
pcregrep "Ui\.(?!Line)"
You don't need another nested group for "Line" as in your example "Ui.(?!(Line))" -- the outer group is sufficient, like I've shown above.
Let me give you another example of looking negative assertions: when you have list of lines, returned by "ipset", each line showing number of packets in a middle of the line, and you don't need lines with zero packets, you just run:
ipset list | pcregrep "packets(?! 0 )"
If you like perl-compatible regular expressions and have perl but don't have pcregrep or your grep doesn't support --perl-regexp, you can you one-line perl scripts that work the same way like grep:
perl -e "while (<>) {if (/Ui\.(?!Lines)/){print;};}"
Perl accepts stdin the same way like grep, e.g.
ipset list | perl -e "while (<>) {if (/packets(?! 0 )/){print;};}"
At least for the case of not wanting an 'L' character after the "Ui." you don't really need PCRE.
grep -E 'Ui\.($|[^L])' *
Here I've made sure to match the special case of the "Ui." at the end of the line.

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

I understand that what I'm asking can be accomplished using awk or sed, I'm asking here how to do this using GREP.
Given the following input:
.bash_profile
.config/ranger/bookmarks
.oh-my-zsh/README.md
I want to use GREP to get:
.bash_profile
.config/
.oh-my-zsh/
Currently I'm trying
grep -Po '([^/]*[/]?){1}'
Which results in output:
.bash_profile
.config/
ranger/
bookmarks
.oh-my-zsh/
README.md
Is there some simple way to use GREP to only get the first matched string on each line?
I think you can grep non / letters like:
grep -Eo '^[^/]+'
On another SO site there is another similar question with solution.
You don't need grep for this at all.
cut -d / -f 1
The -o option says to print every substring which matches your pattern, instead of printing each matching line. Your current pattern matches every string which doesn't contain slashes (optionally including a trailing slash); but it's easy to switch to one which only matches this pattern at the beginning of a line.
grep -o '^[^/]*' file
Notice the addition of the ^ beginning of line anchor, and the omission of the -P option (which you were not really using anyway) as well as the silly beginner error {1}.
(I should add that plain grep doesn't support parentheses or repetitions; grep -E would support these constructs just fine, of you could switch to toe POSIX BRE variation which requires a backslash to use round or curly parentheses as metacharacters. You can probably ignore these details and just use grep -E everywhere unless you really need the features of grep -P, though also be aware that -P is not portable.)

bash regex for word with some suffixes but not one specific

I need (case-insensitive) all matches of several variations on a word--except one--including unknowns.
I want
accept
acceptance
acceptable
accepting
...but not "acception." A coworker used it when he meant "exception." A lot.
Since I can't anticipate the variations (or typos), I need to allow things like "acceptjunk" and "acceptMacarena"
I thought I could accomplish this with a negative lookahead, but this didn't give the results I needed
grep -iE '(?!acception)(accept[a-zA-Z]*)[[:space:]]' file
The trick is that I can accept (har) lines that contain "acception," provided that the other words match. For example this line is okay to match:
The acceptance of the inevitable is the acception.
...otherwise by now I'd have piped grep through grep -v and been done with it:
grep -iE '(accept)[a-zA-Z]*[[:space:]]' | grep -vi 'acception'
I've found some questions that are similar and many that are not quite so. Using a-zA-Z is likely unnecessary in grep -i but I'm flailing. I'm probably missing something small or basic...but I'm missing it nonetheless. What is it?
Thanks for reading.
PS: I'm not married to grep--but I am operating in bash--so if there's a magic awk command that would do this I'm all ears (eyes).
PPS: forgot to mention that on https://regex101.com/ the above lookahead seemed to work, but it doesn't with my full grep command.
To use lookarounds, you need GNU grep with PCRE available
grep -iP '(?!acception)(accept[a-z]*)[[:space:]]'
With awk, this might work
awk '{ip=$0; sub(/acception/, ""); if(/accept[a-zA-Z]*[[:space:]]/) print ip}'
ip=$0 save input line
sub(/acception/, "") remove unwanted words, can add other unwanted words with alternation
if(/accept[a-zA-Z]*[[:space:]]/) print ip then print the line if it still contains words being searched

How to find number of multiple line block comments using grep?

I'm trying to find number of block comments that span across multiple lines in /usr/include/stdio.h
I managed to do it using 2 grep commands:
egrep '/\*' /usr/include/stdio.h | egrep -cv '\*/'
Can this be done in a more elegant way, using only one regex expression?
The easiest way is with negative lookahead, if PCRE is supported in your version of grep (e.g. GNU grep).
grep -P '/\*(?!.*\*/)' filename
Doing negative lookahead in general is difficult with just extended RE. The following comes close, but doesn't work if the single-line comment ends with **/.
grep -E '/\*[^*]*((\*($|[^/]))?[^*]*)*$'
DEMO

Grep regex to unscramble a word

I want to unscramble a word using the grep command.
I am using below code. I know there are other ways to do it, but I think I'm missing something here:
grep "^[yxusonlia]\{9\}$" /usr/share/dict/words
should produce one output:
anxiously
but it produces:
annulosan
innoxious
and many more. Basically I can't find how I should specify that characters
can only be matched once, so that I get only one output.
I apologise if it seems very simple but I tried a lot and can't find anything.
You can use grep -P (PCRE regex) with negative lookahead
grep -P '^(?:([yxusonlia])(?!.*?\1)){9}$' /usr/share/dict/words
anxiously
Explanation:
This grep regex uses negative lookahead (?!.*?\1) for each character matched by group #1 i.e. \1. Each character is matched only and only when it is not followed by the same character again in the string till the end.
You can use lookaheads to make sure that each letter is matched exactly one time. It is verbose and requires a version of grep that supports lookaheads (e.g. via -P). It may be better to build the search string programmatically.
grep -P "^(?=.*y)(?=.*x)(?=.*u)(?=.*s)(?=.*o)(?=.*n)(?=.*l)(?=.*i)(?=.*a)[yxusonlia]{9}$" /usr/share/dict/words