Regular expressions: [a]bc vs abc - regex

I am not a regular expressions expert, but I thought I understood the basics. I was reading a tutorial that mentioned using this syntax:
$ ps -ewwo pid,args | grep [s]sh
to determine if SSHD is running or not.
I do not understand why the first s is in brackets. I would think that ssh and [s]sh would yield the same results, but I actually get different results.
$ ps -ewwo pid,args | grep [s]sh
1258 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu
2988 /usr/sbin/sshd -D
$ ps -ewwo pid,args | grep ssh
1258 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu
2988 /usr/sbin/sshd -D
3082 grep --color=auto ssh
So why does it find the 3rd result in the second example?
Thanks!

The regular expressions [a]bc and abc match exactly the same set of strings, but they're being applied to different data, because the command-line arguments to grep appear in the output of the ps command.
Using [a]bc causes the literal string "[a]bc" to appear in the output of ps -- and this isn't matched by the regular expression [a]bc.
The idea is to avoid matching the line for the grep command itself.

The brackets are a character class but it doesn't really make sense to have a character class with one character and no repeat specified.
The reason you get different results is because ssh matches itself the grep arguments in the process list, but it [s]sh does not match itself.

When you pipe ps into grep, you'll often find the running grep process because the term exists in the program name and it's probable it will match.

Related

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

I understand that what I'm asking can be accomplished using awk or sed, I'm asking here how to do this using GREP.
Given the following input:
.bash_profile
.config/ranger/bookmarks
.oh-my-zsh/README.md
I want to use GREP to get:
.bash_profile
.config/
.oh-my-zsh/
Currently I'm trying
grep -Po '([^/]*[/]?){1}'
Which results in output:
.bash_profile
.config/
ranger/
bookmarks
.oh-my-zsh/
README.md
Is there some simple way to use GREP to only get the first matched string on each line?
I think you can grep non / letters like:
grep -Eo '^[^/]+'
On another SO site there is another similar question with solution.
You don't need grep for this at all.
cut -d / -f 1
The -o option says to print every substring which matches your pattern, instead of printing each matching line. Your current pattern matches every string which doesn't contain slashes (optionally including a trailing slash); but it's easy to switch to one which only matches this pattern at the beginning of a line.
grep -o '^[^/]*' file
Notice the addition of the ^ beginning of line anchor, and the omission of the -P option (which you were not really using anyway) as well as the silly beginner error {1}.
(I should add that plain grep doesn't support parentheses or repetitions; grep -E would support these constructs just fine, of you could switch to toe POSIX BRE variation which requires a backslash to use round or curly parentheses as metacharacters. You can probably ignore these details and just use grep -E everywhere unless you really need the features of grep -P, though also be aware that -P is not portable.)

Why do my results appear to differ between ag and grep?

I'm having trouble correctly (and safely) executing the right regex searches with grep. I seem to be able to do what I want using ag
What I want to do in plain english:
Search my current directory (recursively?) for files that have lines containing both the words "nested" and "merge"
Successful attempt with ag:
$ ag --depth=2 -l "nested.*merge|merge.*nested" .
scratch.md
scratch.rb
Unsuccessful attempt with grep:
$ grep -elr 'nested.*merge|merge.*nested' .
grep: nested.*merge|merge.*nested: No such file or directory
grep: .: Is a directory
What am I missing? Also, could either approach be improved?
Thanks!
You probably want -E not -e, or just egrep.
A man grep will make you understand why -e gave you that error.
You can use grep -lr 'nested.*merge\|merge.*nested' or grep -Elr 'nested.*merge|merge.*nested' for your case.
Besides, for the latter one, E mean using ERE regular expression syntax, since grep will use BRE by default, where | will match character | and \| mean or.
For more detail about ERE and BRE, you can read this article

OS X groups seems to allow character classes to repeat many times by default

I am trying to grep a text file for somethings, and I noticed some odd behaviro on OS X. I feel that I have a pretty solid grasp on regular expressions, but maybe I don't know as much as I think. So, I apologize if the answer is obvious.
Each line of my text file has this format:
<number> <number> <text>
So just to start, I want to see if I could match lines starting with a 1:
grep "^1" dataset.txt
However, it seems grepped match any line starting with 1, 11, 111, etc. This is just incorrect I think. EDIT: grep is matching 1, 11, 111, etc. This was causing some confusion. My problem is that grep is matching too many 1's, not that it is returning lines starting with 11.
Next, I wanted to see what would happen if I searched for any line starting with any digit:
grep "^[0-9]" dataset.txt
This matched the whole number at the start of each line, such as 130380, which is also incorrect. I tried this to see if I could only match the first digit in the line:
grep "^[0-9]?" dataset.txt
This pattern returns nothing. I also tried specifying -P to use perl style regular expressions and got this:
grep -P "^[0-9]" dataset.txt
usage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
Clearly P is in the list of arguments, although I read the man page on my system, and -P was not listed. Does anyone know why grep is acting like this?
Thanks
grep "^1" dataset.txt
However, it seems grepped match any line starting with 1, 11, 111, etc. This is just incorrect I think.
This is expected behavior: you're asking for lines whose first char is 1, without further constraining what comes after.
If, by contrast, you don't want to constrain matching, but instead want to constrain the output by only printing the matching part of the line, you must use grep's -o option.
Update: Turns out that the OP was referring to the --color option's behavior: --color is supposed to color (highlight) the matching part of every matching line, but does so incorrectly due to a bug - as of grep (BSD grep) 2.5.1-FreeBSD (OS X 10.9.2).
.
Clearly P is in the list of arguments, although I read the man page on my system, and -P was not listed. Does anyone know why grep is acting like this?
-P (Perl-style regexes) are indeed NOT supported on OSX - what you see is a typo in the error message (it should be -p (lowercase!), an entirely different option - see man grep).
grep "^[0-9]?" dataset.txt
This pattern returns nothing.
This is expected behavior: OSX grep defaults to basic (aka obsolete) regular expressions, which require escaping ? as \?.
If you want to use extended (aka modern) regular expressions - where such escaping is not needed - invoke grep either as egrep or with the -E option.

Strange regex behavior with grep

grep '[:digit:]{1,}-{1,}' *.txt| wc -l
This command outputs: 0
grep '1-' *.txt| wc -l
However, this command outputs: 10598
Both commands are being run from the same directory. The first command should have returned greater than or equal to the output of the second command. Can anyone shed some insight about what is going on here?
echo 1 | grep '[:digit:]'
#nothing....
grep uses a different syntax, you need [[:digit:]] or [0-9].
The {1,} syntax is not supported by basic grep, you can use other modes, like the extended one with -E... Note: Normally one would use + for matching one or more characters....
General note: always test regexes in small parts to see that each part really does what you thought it does. Once the expression gets complicated, it's really hard to tell what went wrong.

How to use regex OR in grep in Cygwin?

I need to return results for two different matches from a single file.
grep "string1" my.file
correctly returns the single instance of string1 in my.file
grep "string2" my.file
correctly returns the single instance of string2 in my.file
but
grep "string1|string2" my.file
returns nothing
in regex test apps that syntax is correct, so why does it not work for grep in cygwin ?
Using the | character without escaping it in a basic regular expression will only match the | literal. For instance, if you have a file with contents
string1
string2
string1|string2
Using grep "string1|string2" my.file will only match the last line
$ grep "string1|string2" my.file
string1|string2
In order to use the alternation operator |, you could:
Use a basic regular expression (just grep) and escape the | character in the regular expression
grep "string1\|string2" my.file
Use an extended regular expression with egrep or grep -E, as Julian already pointed out in his answer
grep -E "string1|string2" my.file
If it is two different patterns that you want to match, you could also specify them separately in -e options:
grep -e "string1" -e "string2" my.file
You might find the following sections of the grep reference useful:
Basic vs Extended Regular Expressions
Matching Control, where it explains -e
You may need to either use egrep or grep -E. The pipe OR symbol is part of 'extended' grep and may not be supported by the basic Cygwin grep.
Also, you probably need to escape the pipe symbol.
The best and most clear way I've found is:
grep -e REG1 -e REG2 -e REG3 _FILETOGREP_
I never use pipe as it's less evident and very awkward to get working.
You can find this information by reading the fine manual: grep(1), which you can find by running 'man grep'. It describes the difference between grep and egrep, and basic and regular expressions, along with a lot of other useful information about grep.