Mac OSX, Bash, awk, and negative look behind

Mac OSX, Bash, awk, and negative look behind - regex

I want to find a particular process using awk:
ps aux|awk '/plugin-container.*Flash.*/'
Now it finds the process, but it includes itself in the results, because ps results include them as well. To prevent that, I am trying to use negative look behind as follows:
ps aux|awk '/(\?<!awk).*plugin-container.*Flash.*/'
But it does not work. Does awk support look behind? What am I doing wrong? Thanks

The common trick is to use
ps aux | grep '[p]lugin-container.*Flash.*'
The character class [p] prevents grep itself from being matched.

I don't know whether awk supports lookbehind, but I usually solve this problem with grep -v:
aix#aix:~$ ps aux | awk '/plugin-container.*Flash/' | grep -v awk
(Also, I'd normally use grep for the awk command above.)

I don't know if awk supports lookbehinds, but if, then the question mark at the start should not be escaped, a negative lookbehind starts (?<!
A question mark right after the opening bracket is the sign, that this group is not a capturing group, i.e. it has some special meaning.

Related

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

I understand that what I'm asking can be accomplished using awk or sed, I'm asking here how to do this using GREP.
Given the following input:
.bash_profile
.config/ranger/bookmarks
.oh-my-zsh/README.md
I want to use GREP to get:
.bash_profile
.config/
.oh-my-zsh/
Currently I'm trying
grep -Po '([^/]*[/]?){1}'
Which results in output:
.bash_profile
.config/
ranger/
bookmarks
.oh-my-zsh/
README.md
Is there some simple way to use GREP to only get the first matched string on each line?

I think you can grep non / letters like:
grep -Eo '^[^/]+'
On another SO site there is another similar question with solution.

You don't need grep for this at all.
cut -d / -f 1

The -o option says to print every substring which matches your pattern, instead of printing each matching line. Your current pattern matches every string which doesn't contain slashes (optionally including a trailing slash); but it's easy to switch to one which only matches this pattern at the beginning of a line.
grep -o '^[^/]*' file
Notice the addition of the ^ beginning of line anchor, and the omission of the -P option (which you were not really using anyway) as well as the silly beginner error {1}.
(I should add that plain grep doesn't support parentheses or repetitions; grep -E would support these constructs just fine, of you could switch to toe POSIX BRE variation which requires a backslash to use round or curly parentheses as metacharacters. You can probably ignore these details and just use grep -E everywhere unless you really need the features of grep -P, though also be aware that -P is not portable.)

bash regex for word with some suffixes but not one specific

I need (case-insensitive) all matches of several variations on a word--except one--including unknowns.
I want
accept
acceptance
acceptable
accepting
...but not "acception." A coworker used it when he meant "exception." A lot.
Since I can't anticipate the variations (or typos), I need to allow things like "acceptjunk" and "acceptMacarena"
I thought I could accomplish this with a negative lookahead, but this didn't give the results I needed
grep -iE '(?!acception)(accept[a-zA-Z]*)[[:space:]]' file
The trick is that I can accept (har) lines that contain "acception," provided that the other words match. For example this line is okay to match:
The acceptance of the inevitable is the acception.
...otherwise by now I'd have piped grep through grep -v and been done with it:
grep -iE '(accept)[a-zA-Z]*[[:space:]]' | grep -vi 'acception'
I've found some questions that are similar and many that are not quite so. Using a-zA-Z is likely unnecessary in grep -i but I'm flailing. I'm probably missing something small or basic...but I'm missing it nonetheless. What is it?
Thanks for reading.
PS: I'm not married to grep--but I am operating in bash--so if there's a magic awk command that would do this I'm all ears (eyes).
PPS: forgot to mention that on https://regex101.com/ the above lookahead seemed to work, but it doesn't with my full grep command.

To use lookarounds, you need GNU grep with PCRE available
grep -iP '(?!acception)(accept[a-z]*)[[:space:]]'
With awk, this might work
awk '{ip=$0; sub(/acception/, ""); if(/accept[a-zA-Z]*[[:space:]]/) print ip}'
ip=$0 save input line
sub(/acception/, "") remove unwanted words, can add other unwanted words with alternation
if(/accept[a-zA-Z]*[[:space:]]/) print ip then print the line if it still contains words being searched

Extract number embedded in string

So I run a curl command and grep for a keyword.
Here is the (sanitized) result:
...Dir');">Town / Village</a></th><th>Phone Number</th></tr><tr class="rowodd"><td><a href="javascript:calldialog('ASDF','&Mode=view&helloThereId=42',600,800);"...
I want to get the number 42 - a command line one-liner would be great.
search for the string helloThereId=
extract the number right beside it (42 in the above case)
Does anyone have any tips for this? Maybe some regex for numbers? I'm afraid I don't have enough experience to construct an elegant solution.

You could use grep with -P (Perl-Regexp) parameter enabled.
$ grep -oP 'helloThereId=\K\d+' file
42
$ grep -oP '(?<=helloThereId=)\d+' file
42
\K here actually does the job of positive lookbehind. \K keeps the text matched so far out of the overall regex match.
References:
http://www.regular-expressions.info/keep.html
http://www.regular-expressions.info/lookaround.html

If your grep version supports -P, (as is true for the OP, given that they're on Linux, which comes with GNU grep), Avinash Raj's answer is the way to go.
For the potential benefit of future readers, here are alternatives:
If your grep doesn't support -P, but does support -o, here's a pragmatic solution that simply extracts the number from the overall match in a 2nd step, by splitting the input into fields by =, using cut:
grep -Eo 'helloThereId=[0-9]+' in | cut -d= -f2 file
Finally, if your grep supports neither -P nor -o, use sed:
Here's a POSIX-compliant alternative, using sed with a basic regular expression (hence the need to emulate + with \{1,\} and to escape the parentheses):
sed -n 's/.*helloThereId=\([0-9]\{1,\}\).*/\1/p' file

This will work with any sed on any UNIX OS, even the pre-POSIX default sed on Solaris:
$ sed -n 's/.*helloThereId=\([0-9]*\).*/\1/p' file
42

Sed doesn't replace my text properly

My following regex in Sed doesn't extract the file I want without the #30 substring.
Could you please help pointing out what I am missing here?
[machine]# echo "//dir1/dir2/dir3/component/file.rb#70" | sed 's/\(.*rb\)#\d+$/\1/g'
Output: //dir1/dir2/dir3/component/file.rb#70
What I want is simply: //dir1/dir2/dir3/component/file.rb without #70 substring.
Thanks in advance
PL

The flavor of regular expression understood by sed by default doesn't include either \d for digits or + for "1 or more".
This will work:
sed 's/\(.*\.rb\)#[0-9][0-9]*$/\1/g'
Or you could turn on "extended" regular expression syntax with -E, which makes the + work (though still not \d), and swaps the meaning of backslashed vs non-backslashed parentheses:
sed -E 's/(.*\.rb)#[0-9]+$/\1/g'
Both of the above commands will work even on non-GNU sed, as you get by default on BSD and Mac OS X systems. In normal mode (without the -E), GNU sed also understands \+ to mean the same as bare + in extended mode, but BSD sed does not.
If all you're trying to do is get rid of the #digits, though, you can do it more simply. Sed regexes aren't anchored to the start of the line, so you don't have to include the filename - just replace the part you don't want with nothing at all:
sed 's/#[0-9][0-9]*$//'
or
sed -E 's/#[0-9]+$//'
If your real problem does require the fancy version, though, you could also use Perl, which has the advantage that there's relatively few (almost no) changes in regex syntax across versions. It also understands that \d syntax you tried to use:
perl -pe 's/(.*\.rb)#\d+$/\1/g'

With GNU sed, your command works if you use -E and change \d to [0-9] or [[:digit:]]:
echo "//dir1/dir2/dir3/component/file.rb#70" | sed -E 's/(.*rb)#[0-9]+$/\1/g'
//dir1/dir2/dir3/component/file.rb
Depending on the context, you may be able to use a simpler command, such as
sed 's/#[0-9]\+//g'

You got the answer but have you considered simply:
$ echo "//dir1/dir2/dir3/component/file.rb#70" | cut -d'#' -f1
//dir1/dir2/dir3/component/file.rb

zshell grep negative lookbehind

I'm trying to play around w/ a negative lookbehind regex, but I can't seem to get it to work in my zshell. Am I doing this wrong?
echo "Nate or nate" | grep "(\?<!N)a"
This should match the a in nate but NOT the a in Nate...right?

When I think of lookahead or lookbehind assertions, I think of Perl. You will need to use perl-regexp and single quotes to find the a in nate:
echo "Nate or nate" | grep -P '(?<!N)a'

It should. However, grep will print out any line with a match.
If you'd like grep to print out only the parts of the line it matches, you should give it the -o option.

There are a number of different regex flavours, but the regex for grep should probably look like this: "(?<!N)a".

First off you want to use single quotes (double quotes in zsh will try to expand the !N), you probably want extended regexen (grep -E). Also depending on your version of grep, it may not support 0-width assertions at all, check your man 7 re_format.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Mac OSX, Bash, awk, and negative look behind - regex

The common trick is to use ps aux | grep '[p]lugin-container.Flash.' The character class [p] prevents grep itself from being matched.

I don't know whether awk supports lookbehind, but I usually solve this problem with grep -v: aix#aix:~$ ps aux | awk '/plugin-container.*Flash/' | grep -v awk (Also, I'd normally use grep for the awk command above.)

I don't know if awk supports lookbehinds, but if, then the question mark at the start should not be escaped, a negative lookbehind starts (?<! A question mark right after the opening bracket is the sign, that this group is not a capturing group, i.e. it has some special meaning.

Related

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

bash regex for word with some suffixes but not one specific

Extract number embedded in string

Sed doesn't replace my text properly

zshell grep negative lookbehind

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Mac OSX, Bash, awk, and negative look behind - regex

The common trick is to use ps aux | grep '[p]lugin-container.*Flash.*' The character class [p] prevents grep itself from being matched.

I don't know whether awk supports lookbehind, but I usually solve this problem with grep -v: aix#aix:~$ ps aux | awk '/plugin-container.*Flash/' | grep -v awk (Also, I'd normally use grep for the awk command above.)

I don't know if awk supports lookbehinds, but if, then the question mark at the start should not be escaped, a negative lookbehind starts (?<! A question mark right after the opening bracket is the sign, that this group is not a capturing group, i.e. it has some special meaning.

Related

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

bash regex for word with some suffixes but not one specific

Extract number embedded in string

Sed doesn't replace my text properly

zshell grep negative lookbehind

Categories

Resources

The common trick is to use ps aux | grep '[p]lugin-container.Flash.' The character class [p] prevents grep itself from being matched.