How to use grep with two patterns? - regex

I have followed many links and reference documentation but still facing errors.
I want to list all the words start or ending with 'sx'.
I already found a solution which works fine:
awk '/^sx|sx$/' usr/dict/words
However I tried to use grep or egrep and neither worked.
And then I made it simpler. Any word which contains 'ax' or 'sx', and again it did not work!
grep 'ax|sx' words
grep -E '^ax\|sx$' words
egrep '^ax\|sx$' words
Error:
grep: illegal option -- E
Usage: grep -hblcnsviw pattern file . . .
Any suggestion appreciated.

This should work for you:
egrep '(^ax|sx$)' usr/dict/words
NOTE:
I couldn't find any words ending in sx on the dictionary I've used.

Related

grep strings between "{{_(" and ")}}"

I want to parse html files to extract strings between "{{_(" and ")}}" using GREP. I tried something like this:
grep '"[^{{_(|)}}$]"' *.html
but it didn't work.
Can someone help me please?
Thanks!
You may use
grep -oP '(?<={{_\().+?(?=\)}})' file
Details
-o - output only matched substrings
-P - enable the PCRE regex engine
(?<={{_\().+?(?=\)}}) match:
(?<={{_\() - a location that is immediately preceded with {{+(
.+? - any 1 or more more chars other than line break chars, as few as possible
(?=\)}}) - a location that is immediately followed with )}} .
See the regex demo.
#Wiktor Stribiżew's answer works really good. However, if you have multiple files, you would get an output like this, where the respective file name per each match is also displayed:
foo.html: content abc
foo.html: test 123
bar.html: first match
bar.html: second match
So, if you are only interested in the matching string as output, you can try sed instead
sed -n 's/.*{{_(\(.*\))}}.*/\1/p' *.html
You can also count the unique occurrence of matches and things like that...
Update:
Or just use the -h | --no-filename with the grep that #Wiktor Stribiżew has provided.
grep -h -oP '(?<={{_\().+?(?=\)}})' *.html
Or the -c flag in order to display the count of matches per each file:
grep -c -oP '(?<={{_\().+?(?=\)}})' *.html
As in the posts before with it is possible to grep the value of an HTML property.
placeholder="SOME TEXT_HERE" -> grep -> "SOME TEXT_HERE"
grep -oP '(?<=placeholder=").+?(?=")' *html

Basic grep regex

I'm trying to use grep to find all files whose contents contain ".ple". I tried:
grep -ilr /.\.ple/ *
But it did not work.
Would someone be able to tell me what I did wrong?
Thanks.
there are many programming/script languages allow us to write regex between /.../ (slashes). But with grep, you cannot do it, you should wrap your regex expression in quotes. I think that is the problem of your grep line. The correct one:
grep -ilr '.\.ple' *
with yours: grep -ilr /.\.ple/ * grep will look for lines like this:
foo/a.ple/bar
that is, the slash would be literal letter.
What do you mean by it did not work?
Try this:
grep -rlE '\.ple$'
The above should work.
Edit: To search all files whose contents contain ".ple" try this
grep -rl '\.ple'

regex for one character or another?

I'm searching for a regex code which lists all the lines that contain an a OR an i.
I tryed this:
grep -E '[(a|i)]{1}' testFile.txt
but this gives me the words containing a or i and words that contain a en i.
What's wrong?
You can achieve that with:
grep -E "^[^ai]*(a|i){1}[^ai]*$" testFile.txt
If you want an exclusive OR then try this:
grep -E '^[^i]*a[^i]*$|^[^a]*i[^a]*$'

Having trouble with GREP and REGEX

I have a text file that stores combinations of four numbers in the following format:
Num1,Num2,Num3,Num4
Num5,Num6,Num7,Num8
.............
I have a whole bunch of such files and what I need is to grep for all filenames that contains the pattern described above.
I constructed my grep as follows:
grep -l "{d+},{d+},{d+},{d+}" /some/path/to/file/name
The grep terminates without returning anything.
Can somebody point out what I might be doing wrong with my grep statement?
Thanks
This should do what you want:
egrep -l '[[:digit:]]+,[[:digit:]]+,[[:digit:]]+,[[:digit:]]+' /some/path/to/file/name
One way is using a perl regexp:
grep -Pl "\d+,\d+,\d+,\d+" /some/path/to/file/name
In your syntax d is literal. It should be escaping that letter, but is not accepted by grep regular regexp.

Regex: Match only if string A is found and string B is not

I'm trying to write a regular expression that will essentially return true if string A is found and string B is not found.
To be more specific, I'm looking for any file on my server which has the text 'base64_decode' in it, but not 'copyright'.
Thanks!
I'm not sure your real task can be solved purely within the regex passed into grep, since grep processes files line-by-line. I would use the -l (--files-with-matches) and -L (--files-without-match) options along with command substitution backticks, like so:
grep -L copyright `grep -l base64_decode *`
grep -l base64_decode * lists the names of all the files with "base64_decode" in them, and the backticks put that list on the command line after grep -L copyright, which searches those files and lists the subset of them that doesn't contain "copyright".
It's not recommended to do this in one regex, but if you must, you can use lookaheads:
^(?=.*must-have)(?!.*must-not-have)
You may want to do this in single-line/dot-all mode, and the beginning anchor may be \A instead of ^.
(?=…) is positive lookahead; it asserts that a given pattern can be matched. (?!…) is negative lookahead; it asserts that a given pattern can NOT be matched.
References
regular-expressions.info/Lookarounds, Anchors, Dot
Piped greps should be able to achieve that easily:
find -type f -print | xargs grep -l "base64_decode" | xargs grep -L "copyright"
Use negative lookahead and lookbehind:
^(?<!.*copyright.*)(base64_decode)(?!.*copyright.*)$
Perl doesn't support this yet :-P.
i think it is
^.*[^copyright].*base64_decode.*[^copyright].*$
however this will catch the phrase copyright anywhere even if it is not by itself in the word.
I was wrong, but bewhere that my example below still holds true for most other examples here
eg it will match
This text is non-copyrightable because I said so! but it is not encoded in base64_decode unfortunatly :(