How to find number of multiple line block comments using grep? - regex

I'm trying to find number of block comments that span across multiple lines in /usr/include/stdio.h
I managed to do it using 2 grep commands:
egrep '/\*' /usr/include/stdio.h | egrep -cv '\*/'
Can this be done in a more elegant way, using only one regex expression?

The easiest way is with negative lookahead, if PCRE is supported in your version of grep (e.g. GNU grep).
grep -P '/\*(?!.*\*/)' filename
Doing negative lookahead in general is difficult with just extended RE. The following comes close, but doesn't work if the single-line comment ends with **/.
grep -E '/\*[^*]*((\*($|[^/]))?[^*]*)*$'
DEMO

Related

How to extract value from shell and regex

I have a string "12G 39G 24% /dev" . I have to extract the value '24'. I have used the below regex
grep '[0-9][0-9]%' -o
But I am getting output as 24%. I want only 24 as output and don't want '%' character. How to modify the regex script to extract only 24 as value?
One option would be to just grep again for the digits:
grep -o '[0-9][0-9]%' | grep -o '[0-9][0-9]'
However, if you want to accomplish this with a single regex, you can use the following:
grep -Po '[0-9]{2}(?=%)'
Note the -P option in this case; vanilla grep doesn't seem to support the (?=%) "look-around" part.
The most common way not to capture something is using look-around assertions:
Use it like this
grep -oP '[0-9][0-9](?=%)'
It's worth noting that GNU grep support the -P option to enable Perl compatible regex syntax, however it is not included with OS X. On Linux, it will be available by default. A workaround would be to use ack instead.
But I'd still recommend to use GNU grep on OS X by default. It can be installed on OSX using Homebrew with the command brew grep install
Also, see How to match, but not capture, part of a regex?
You can use sed as an alternative:
sed -rn 's/(^.*)([[:digit:]]{2})(%.*$)/\2/p' <<< "12G 39G 24% /dev"
Enable regular expressions with -r or -E and then split the line into 3 sections represented through parenthesis. Substitute the line for the second section only and print.
Use awk:
awk '{print $3+0}'
The value you seek is in the third field, and adding a zero coerces the string to a number, so % is removed.

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

I understand that what I'm asking can be accomplished using awk or sed, I'm asking here how to do this using GREP.
Given the following input:
.bash_profile
.config/ranger/bookmarks
.oh-my-zsh/README.md
I want to use GREP to get:
.bash_profile
.config/
.oh-my-zsh/
Currently I'm trying
grep -Po '([^/]*[/]?){1}'
Which results in output:
.bash_profile
.config/
ranger/
bookmarks
.oh-my-zsh/
README.md
Is there some simple way to use GREP to only get the first matched string on each line?
I think you can grep non / letters like:
grep -Eo '^[^/]+'
On another SO site there is another similar question with solution.
You don't need grep for this at all.
cut -d / -f 1
The -o option says to print every substring which matches your pattern, instead of printing each matching line. Your current pattern matches every string which doesn't contain slashes (optionally including a trailing slash); but it's easy to switch to one which only matches this pattern at the beginning of a line.
grep -o '^[^/]*' file
Notice the addition of the ^ beginning of line anchor, and the omission of the -P option (which you were not really using anyway) as well as the silly beginner error {1}.
(I should add that plain grep doesn't support parentheses or repetitions; grep -E would support these constructs just fine, of you could switch to toe POSIX BRE variation which requires a backslash to use round or curly parentheses as metacharacters. You can probably ignore these details and just use grep -E everywhere unless you really need the features of grep -P, though also be aware that -P is not portable.)

Why do my results appear to differ between ag and grep?

I'm having trouble correctly (and safely) executing the right regex searches with grep. I seem to be able to do what I want using ag
What I want to do in plain english:
Search my current directory (recursively?) for files that have lines containing both the words "nested" and "merge"
Successful attempt with ag:
$ ag --depth=2 -l "nested.*merge|merge.*nested" .
scratch.md
scratch.rb
Unsuccessful attempt with grep:
$ grep -elr 'nested.*merge|merge.*nested' .
grep: nested.*merge|merge.*nested: No such file or directory
grep: .: Is a directory
What am I missing? Also, could either approach be improved?
Thanks!
You probably want -E not -e, or just egrep.
A man grep will make you understand why -e gave you that error.
You can use grep -lr 'nested.*merge\|merge.*nested' or grep -Elr 'nested.*merge|merge.*nested' for your case.
Besides, for the latter one, E mean using ERE regular expression syntax, since grep will use BRE by default, where | will match character | and \| mean or.
For more detail about ERE and BRE, you can read this article

Extract number embedded in string

So I run a curl command and grep for a keyword.
Here is the (sanitized) result:
...Dir');">Town / Village</a></th><th>Phone Number</th></tr><tr class="rowodd"><td><a href="javascript:calldialog('ASDF','&Mode=view&helloThereId=42',600,800);"...
I want to get the number 42 - a command line one-liner would be great.
search for the string helloThereId=
extract the number right beside it (42 in the above case)
Does anyone have any tips for this? Maybe some regex for numbers? I'm afraid I don't have enough experience to construct an elegant solution.
You could use grep with -P (Perl-Regexp) parameter enabled.
$ grep -oP 'helloThereId=\K\d+' file
42
$ grep -oP '(?<=helloThereId=)\d+' file
42
\K here actually does the job of positive lookbehind. \K keeps the text matched so far out of the overall regex match.
References:
http://www.regular-expressions.info/keep.html
http://www.regular-expressions.info/lookaround.html
If your grep version supports -P, (as is true for the OP, given that they're on Linux, which comes with GNU grep), Avinash Raj's answer is the way to go.
For the potential benefit of future readers, here are alternatives:
If your grep doesn't support -P, but does support -o, here's a pragmatic solution that simply extracts the number from the overall match in a 2nd step, by splitting the input into fields by =, using cut:
grep -Eo 'helloThereId=[0-9]+' in | cut -d= -f2 file
Finally, if your grep supports neither -P nor -o, use sed:
Here's a POSIX-compliant alternative, using sed with a basic regular expression (hence the need to emulate + with \{1,\} and to escape the parentheses):
sed -n 's/.*helloThereId=\([0-9]\{1,\}\).*/\1/p' file
This will work with any sed on any UNIX OS, even the pre-POSIX default sed on Solaris:
$ sed -n 's/.*helloThereId=\([0-9]*\).*/\1/p' file
42

zshell grep negative lookbehind

I'm trying to play around w/ a negative lookbehind regex, but I can't seem to get it to work in my zshell. Am I doing this wrong?
echo "Nate or nate" | grep "(\?<!N)a"
This should match the a in nate but NOT the a in Nate...right?
When I think of lookahead or lookbehind assertions, I think of Perl. You will need to use perl-regexp and single quotes to find the a in nate:
echo "Nate or nate" | grep -P '(?<!N)a'
It should. However, grep will print out any line with a match.
If you'd like grep to print out only the parts of the line it matches, you should give it the -o option.
There are a number of different regex flavours, but the regex for grep should probably look like this: "(?<!N)a".
First off you want to use single quotes (double quotes in zsh will try to expand the !N), you probably want extended regexen (grep -E). Also depending on your version of grep, it may not support 0-width assertions at all, check your man 7 re_format.