UNIX grep with $ - regex

I have a quick question:
Suppose I have a file contains:
abc$
$
$abc
and then I use grep "c\$" filename, then I got abc$ only. But if I use grep "c\\$", I got abc$.
I am pretty confused, doesn't back slash already turn off the special meaning of $? So grep "c\$" filename return me the line abc$?
Really hope who can kindly give me some suggestion.
Many thanks in advance.

The double quotes are throwing you off. That allows the shell to expand meta-characters. On my Linux box using single quotes only:
$ grep 'abc$' <<<'abc$'
$ grep 'abc\$' <<<'abc$'
$ grep 'abc\$' <<<"abc$"
abc$
$ grep 'abc$' <<<'abc$'
$ grep 'abc\\$' <<<'abc$'
$
Note that the only grep in the five commands above that found the pattern (and printed it out) was abc\$. If I didn't escape the $, it assumed I was looking for the string abc that was anchored to the end of the line. When I put a single backslash before the $, it recognized the $ as a literal character and not as a end of line anchor.
Note that the $ as an end of line anchor has some intelligence. If I put the $ in the middle of a regular expression, it's a regular character:
$ grep 'a$bc' <<<'a$bc'
a$bc
$ grep 'a\$bc' <<<'a$bc'
a$bc
Here, it found the literal string a$bc whether or not i escaped the $.
Tried things with double quotes:
$ grep "abc\$" <<<'abc$'
$ grep "abc\\$" <<<'abc$'
abc$
The single \ escaped the $ as a end of line anchor. Putting two \\ in front escaped the $ as a non-shell meta-character and as a regular expression literal.

If you're tempted to think that $ need to be escaped, then it's not so.
From the GNU grep manual, you'd figure:
The meta-characters that need to be escaped while using basic regular expressions are ?, +, {, |, (, and ).

I would suggest using fgrep if you want to search for literal $ and avoid escaping $ (which means end of line):
fgrep 'abc$' <<< 'abc$'
gives this output:
abc$
PS: fgrep is same as grep -F and as per the man grep
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.

Sign $ has special meaning in regexp patterns as the end of line, so when you use double quotes
grep "c\$"
the string expanded as two characters c and $ and grep thinks that it is regexp clause c'mon, find all lines with 'c' at the end.
In case of singe quotes, all characters treated as each one, i.e.
grep 'c\$'
command will have three characters c, \ and $. So grep will got all those symbols at its input and therefore he gets escaped special $ symbol, i.e. as \$ and do what you have expected.

Related

How to match sequence ../../../ in sed?

How to realize in sed (regex) the next output?
It is necessary to extract the preceded '../'
input
$ ../some/path
$ ../../some/path
$ ../../../any/path/to/folder
$ ../../../../path/to/some/folder
output
$ ../
$ ../../
$ ../../../
$ ../../../../
I would just do:
sed 's#/[^.].*#/#'
or (depending on the input and desired behavior):
sed -E 's#/[^.]+#/#'
Match all repeated ../ prefixes from the beginning and replace the rest with nothing:
s#^((\.\./)*).*$#\1#g
with extended regular expressions, or with basic regular expressions:
s#^\(\(\.\./\)*\).*$#\1#g
^ matches beginning of line
(\.\./) matches ../
* repeats 0 or more times
.*$ matches the rest until the end of line
\1 references the first capturing group match
BUT this is probably achieved easier with grep instead of sed, since you do not need to do the replacing (so likely a bit faster too):
egrep -o '^(\.\./)*'
-o prints only the matching portion of the line.

find | sed - regex matching apostrophe instead of full stop in search phrase

Got this really odd thing happening with a search and replace script I am trying to implement. The following command...
find './files' -type f \( -iname "*.js" \) -exec sed -i '' s/\$stateProvider.state\(\'app\./\$stateProvider.state\(\'app.myap\./ {} +
Is matching $stateProvider.state('app'
When it should only be matching $stateProvider.state('app.' <-- you can see that when the full stop is missing it should not match (since it is included in the matching pattern).
The confusion arises from the fact that you have not quoted the sed code. As a result, all the escapes are interpreted by the the shell. So the \. that you have included, is interpreted by the the shell to be a plain . and that's what sed sees. This is why sed matches it with any character. You can see what sed sees by typing the following in your shell:
$ echo s/\$stateProvider.state\(\'app\./\$stateProvider.state\(\'app.myap\./
s/$stateProvider.state('app./$stateProvider.state('app.myap./
or just try this:
$ echo \.
.
So you need to escape the escape character, i.e.
$ echo \\.
\.
Edit: To find the complete command, you have to think backwards. The command that we want to send to sed is the following:
s/\$stateProvider\.state('app\./$stateProvider.state('app.myap./
Notice that I have escaped the characters $ and . because they have special meaning in sed when used in the pattern. Now I have to escape the above string again, but bash (or whatever shell) this time:
s/\\\$stateProvider\\.state\(\'app\\./\$stateProvider.state\(\'app.myap\./
Notice that I have escaped the characters \, $, (, and ' because they have special meaning in bash.
So the complete command would be
sed -i '' s/\\\$stateProvider\\.state\(\'app\\./\$stateProvider.state\(\'app.myap\./
Alternatively, for the last step, I could have simply used quoting:
's/\$stateProvider\.state('\''app\./$stateProvider.state('\''app.myap./'
Notice that I only had to take special care for ', which has to be written as '\'' inside single quotes.

What is the meaning of the -F option in grep manual

-F is an option of grep, from the manual below:
interpret pattern as a list of fixed strings,separated by
newlines,any of which is to be matched
My question is
How to separated multiple fixed strings, what is the newline character, \n or \?
It seems grep -F a\nh file is not valid if I want to find lines which starts with a character a or h.
Thanks in advance !
In grep, -F will cause patterns to match literally i.e. no Regex interpretation is done on the pattern(s).
Multiple patterns can be inputted by \n i.e. newline separation.
Not all shells convert \n to newline by default, you can use $'a\nh' in that case.
Example:
$ echo $'foo\nf.o\nba.r\nbaar\n'
foo
f.o
ba.r
baar
$ grep -F $'f.o\nba.r' <<<$'foo\nf.o\nba.r\nbaar\n'
f.o
ba.r
By default the pattern is a Basic Regular Expressions (BRE) pattern, but with -F it will be interpreted as a literal string with no metacharacters.
You can also use -E which will enable Extended Regular Expressions (ERE).
% grep -F '..' <<< $'hello\nworld\n...'
...
% grep '..' <<< $'hello\nworld\n...'
hello
world
...

grep regex with backtick matches all lines

$ cat file
anna
amma
kklks
ksklaii
$ grep '\`' file
anna
amma
kklks
ksklaii
Why? How is that match working ?
This appears to be a GNU extension for regular expressions. The backtick ('\`') anchor matches the very start of a subject string, which explains why it is matching all lines. OS X apparently doesn't implement the GNU extensions, which would explain why your example doesn't match any lines there. See http://www.regular-expressions.info/gnu.html
If you want to match an actual backtick when the GNU extensions are in effect, this works for me:
grep '[`]' file
twm's answer provides the crucial pointer, but note that it is the sequence \`, not ` by itself that acts as the start-of-input anchor in GNU regexes.
Thus, to match a literal backtick in a regex specified as a single-quoted shell string, you don't need any escaping at all, neither with GNU grep nor with BSD/macOS grep:
$ { echo 'ab'; echo 'c`d'; } | grep '`'
c`d
When using double-quoted shell strings - which you should avoid for regexes, for reasons that will become obvious - things get more complicated, because you then must escape the ` for the shell's sake in order to pass it through as a literal to grep:
$ { echo 'ab'; echo 'c`d'; } | grep "\`"
c`d
Note that, after the shell has parsed the "..." string, grep still only sees `.
To recreate the original command with a double-quoted string with GNU grep:
$ { echo 'ab'; echo 'c`d'; } | grep "\\\`" # !! BOTH \ and ` need \-escaping
ab
c`d
Again, after the shell's string parsing, grep sees just \`, which to GNU grep is the start-of-the-input anchor, so all input lines match.
Also note that since grep processes input line by line, \` has the same effect as ^ the start-of-a-line anchor; with multi-line input, however - such as if you used grep -z to read all lines at once - \` only matches the very start of the whole string.
To BSD/macOS grep, \` simply escapes a literal `, so it only matches input lines that contain that character.

AWK regex for gsubs pattern

I am trying to define a gsub awk statement to find all non escaped $ chars and escape them.
so following input -> results should be handled:
$ -> \$
$a -> \$a
\$ -> \$
$$$ -> \$\$\$
So basically I am looking for the correct pattern to put in this statement:
gsub(pattern,"\\\$", input_string);
Using $ as a Field separator, awk splits the input and it would add a backslash at the end only if the string is empty or ends with any character but not of a backslash character.
$ cat file
$
$a$$$
\$
$$$\$$
$ awk -F$ -v OFS="$" '{for(i=1;i<NF;i++){if($i == "" || $i ~/[^\\]$/) $i=$i"\\"}}1' file
\$
\$a\$\$\$
\$
\$\$\$\$\$
You could also try the below perl solution.
perl -pe 's/(?<!\\)\$/\\\$/g' file
The substitution matches all $ that are not preceded by backslash and adds a backslash before them. The backslashes themselves all need escaping, as does the $, as it has a special meaning in regular expressions.