Regex to get delimited content with egrep - regex

I would like to get the parameter (without parantheses) of a function call with a regular expression.
I am using egrep in a bash script with cygwin.
This is what I got so far (with parantheses):
$ echo "require(catch.me)" | egrep -o '\((.*?)\)'
(catch.me)
What would be the right regex here?

http://www.greenend.org.uk/rjk/2002/06/regexp.html
What are you looking for - is a lookbehind and lookahead regular expressions.
Egrep cannot do that. grep with perl support can do that.
from man grep:
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression. This is highly experimental and grep -P may warn of unimplemented features.
So
$> echo "require(catch.me)" | grep -o -P '(?<=\().*?(?=\))'
catch.me

If you can use sed then the following would work -
echo "require(catch.me)" | sed 's/.*[^(](\(.*\))/\1/'
You can modify your existing regex to this
echo "require(catch.me)" | egrep -o 'c.*e'
Even though egrep offers this (from the man page)
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
It isn't really the correct utility. SED and AWK are masters at this. You will have much more control using either SED or AWK. :)

From the manual :
grep, egrep, fgrep - print lines matching a pattern
Basically, grep is used to print the complete line, so you won't do anything more.
What you should do is using another tool, maybe perl, for such operations.

Related

Print lines containing an exact word, not regex [duplicate]

I'm after a grep-type tool to search for purely literal strings. I'm looking for the occurrence of a line of a log file, as part of a line in a seperate log file. The search text can contain all sorts of regex special characters, e.g., []().*^$-\.
Is there a Unix search utility which would not use regex, but just search for literal occurrences of a string?
You can use grep for that, with the -F option.
-F, --fixed-strings PATTERN is a set of newline-separated fixed strings
That's either fgrep or grep -F which will not do regular expressions. fgrep is identical to grep -F but I prefer to not have to worry about the arguments, being intrinsically lazy :-)
grep -> grep
fgrep -> grep -F (fixed)
egrep -> grep -E (extended)
rgrep -> grep -r (recursive, on platforms that support it).
Pass -F to grep.
you can also use awk, as it has the ability to find fixed string, as well as programming capabilities, eg only
awk '{for(i=1;i<=NF;i++) if($i == "mystring") {print "do data manipulation here"} }' file
cat list.txt
one:hello:world
two:2:nothello
three:3:kudos
grep --color=always -F"hello
three" list.txt
output
one:hello:world
three:3:kudos
I really like the -P flag available in GNU grep for selective ignoring of special characters.
It makes grep -P "^some_prefix\Q[literal]\E$" possible
from grep manual
-P, --perl-regexp
Interpret I as Perl-compatible regular
expressions (PCREs). This option is experimental when
combined with the -z (--null-data) option, and grep -P may
warn of unimplemented features.

Why do my results appear to differ between ag and grep?

I'm having trouble correctly (and safely) executing the right regex searches with grep. I seem to be able to do what I want using ag
What I want to do in plain english:
Search my current directory (recursively?) for files that have lines containing both the words "nested" and "merge"
Successful attempt with ag:
$ ag --depth=2 -l "nested.*merge|merge.*nested" .
scratch.md
scratch.rb
Unsuccessful attempt with grep:
$ grep -elr 'nested.*merge|merge.*nested' .
grep: nested.*merge|merge.*nested: No such file or directory
grep: .: Is a directory
What am I missing? Also, could either approach be improved?
Thanks!
You probably want -E not -e, or just egrep.
A man grep will make you understand why -e gave you that error.
You can use grep -lr 'nested.*merge\|merge.*nested' or grep -Elr 'nested.*merge|merge.*nested' for your case.
Besides, for the latter one, E mean using ERE regular expression syntax, since grep will use BRE by default, where | will match character | and \| mean or.
For more detail about ERE and BRE, you can read this article

Extract number embedded in string

So I run a curl command and grep for a keyword.
Here is the (sanitized) result:
...Dir');">Town / Village</a></th><th>Phone Number</th></tr><tr class="rowodd"><td><a href="javascript:calldialog('ASDF','&Mode=view&helloThereId=42',600,800);"...
I want to get the number 42 - a command line one-liner would be great.
search for the string helloThereId=
extract the number right beside it (42 in the above case)
Does anyone have any tips for this? Maybe some regex for numbers? I'm afraid I don't have enough experience to construct an elegant solution.
You could use grep with -P (Perl-Regexp) parameter enabled.
$ grep -oP 'helloThereId=\K\d+' file
42
$ grep -oP '(?<=helloThereId=)\d+' file
42
\K here actually does the job of positive lookbehind. \K keeps the text matched so far out of the overall regex match.
References:
http://www.regular-expressions.info/keep.html
http://www.regular-expressions.info/lookaround.html
If your grep version supports -P, (as is true for the OP, given that they're on Linux, which comes with GNU grep), Avinash Raj's answer is the way to go.
For the potential benefit of future readers, here are alternatives:
If your grep doesn't support -P, but does support -o, here's a pragmatic solution that simply extracts the number from the overall match in a 2nd step, by splitting the input into fields by =, using cut:
grep -Eo 'helloThereId=[0-9]+' in | cut -d= -f2 file
Finally, if your grep supports neither -P nor -o, use sed:
Here's a POSIX-compliant alternative, using sed with a basic regular expression (hence the need to emulate + with \{1,\} and to escape the parentheses):
sed -n 's/.*helloThereId=\([0-9]\{1,\}\).*/\1/p' file
This will work with any sed on any UNIX OS, even the pre-POSIX default sed on Solaris:
$ sed -n 's/.*helloThereId=\([0-9]*\).*/\1/p' file
42

Print RegEx matches using SED in bash

I have an XML file, the file is made up of one line.
What I am trying to do is extract the "finalNumber" attribute value from the file via Putty. Rather than having to download a copy and search using notepad++.
I've built up a regular expression that I've tested on an On-line Tool, and tried using it within a sed command to duplicate grep functionality. The command runs but doesn't return anything.
RegEx:
(?<=finalNumber=")(.*?)(?=")
sed Command (returns nothing, expected 28, see file extract):
sed -n '/(?<=finalNumber=")(.*?)(?=")/p' file.xml
File Extract:
...argo:finalizedDate="2012-02-09T00:00:00.000Z" argo:finalNumber="28" argo:revenueMonth=""...
I feel like I am close (i could be wrong), am I on the right lines or is there better way to achieve the output?
Nothing wrong with good old grep here.
grep -E -o 'finalNumber="[0-9]+"' file.xml | grep -E -o '[0-9]+'
Use -E for extended regular expressions, and -o to print only the matching part.
Though you already select an answer, here is a way you can do in pure sed:
sed -n 's/^.*finalNumber="\([[:digit:]]\+\)".*$/\1/p' <test
Output:
28
This replaces the entire line by the match number and print (because p will print the entire line so you have to replace the entire line)
This might work for you (GNU sed):
sed -r 's/.*finalNumber="([^"]*)".*/\1/' file
sed does not support look-ahead assertions. Perl does, though:
perl -ne 'print $1 if /(?<=finalNumber=")(.*?)(?=")/'
As I understand, there is no need to use look-aheads here.
Try this one
sed -n '/finalNumber="[[:digit:]]\+"/p'

How to use regex OR in grep in Cygwin?

I need to return results for two different matches from a single file.
grep "string1" my.file
correctly returns the single instance of string1 in my.file
grep "string2" my.file
correctly returns the single instance of string2 in my.file
but
grep "string1|string2" my.file
returns nothing
in regex test apps that syntax is correct, so why does it not work for grep in cygwin ?
Using the | character without escaping it in a basic regular expression will only match the | literal. For instance, if you have a file with contents
string1
string2
string1|string2
Using grep "string1|string2" my.file will only match the last line
$ grep "string1|string2" my.file
string1|string2
In order to use the alternation operator |, you could:
Use a basic regular expression (just grep) and escape the | character in the regular expression
grep "string1\|string2" my.file
Use an extended regular expression with egrep or grep -E, as Julian already pointed out in his answer
grep -E "string1|string2" my.file
If it is two different patterns that you want to match, you could also specify them separately in -e options:
grep -e "string1" -e "string2" my.file
You might find the following sections of the grep reference useful:
Basic vs Extended Regular Expressions
Matching Control, where it explains -e
You may need to either use egrep or grep -E. The pipe OR symbol is part of 'extended' grep and may not be supported by the basic Cygwin grep.
Also, you probably need to escape the pipe symbol.
The best and most clear way I've found is:
grep -e REG1 -e REG2 -e REG3 _FILETOGREP_
I never use pipe as it's less evident and very awkward to get working.
You can find this information by reading the fine manual: grep(1), which you can find by running 'man grep'. It describes the difference between grep and egrep, and basic and regular expressions, along with a lot of other useful information about grep.