Correct (?) regex not understood by sed

Correct (?) regex not understood by sed - regex

According to https://regex101.com/r/NLSymf/3, the following regex:
\[\[(foo)([^\]]+)\]\]
(full) matches the string [[foo>test1|test2]], but this seems to not be understood by sed, since:
echo "[[foo>test1|test2]]" | sed -E -e '/\[\[(foo)([^\]]+)\]\]/d'
(which should return an empty string) returns:
[[foo>test1|test2]]
What is the regex that matches [[foo>test1|test2]] from sed's point of view?

The backslash character loses its escaping capability within a bracket expression. And stray closing brackets in a RE need not be escaped, that's why grep doesn't fail the first pipeline below. See RE Bracket Expression for reference.
$ echo 'a]' | grep -Eo '[^\]]'
a]
$ echo 'a]' | grep -Eo '[^]]'
a
The correct regex would be:
\[\[(foo)([^]]+)]]

Related

Printing only text from group

I have working example of substitution in online regex tester https://regex101.com/r/3FKdLL/1 and I want to use it as a substitution in sed editor.
echo "repo-2019-12-31-14-30-11.gz" | sed -r 's/^([\w-]+)-\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}.gz$.*/\1/p'
It always prints whole string: repo-2019-12-31-14-30-11.gz, but not matched group [\w-]+.
I expect to get only text from group which is repo string in this example.

Try this:
echo "repo-2019-12-31-14-30-11.gz" |
sed -rn 's/^([A-Za-z]+)-[[:alnum:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}.gz.*$/\1/p'
Explanations:
\w will work (not [\w] wich matches either backslash or w), but you should use [[:alnum:]] which is POSIX
For sed, \d isn't a regex class, but an escaped character representing a non-printable character
Add -n to mute sed, with /p to explicitly print matched lines
Additionaly, you could refactor your regex by removing duplication:
echo "repo-2019-12-31-14-30-11.gz" |
sed -rn 's/^([[:alnum:]]+)-[[:digit:]]{4}(-[[:digit:]]{2}){5}.gz.*$/\1/p'

Looks like a job for GNU grep :
echo "repo-2019-12-31-14-30-11.gz" | grep -oP '^\K[[:alpha:]-]+'
Displays :
repo-
On this example :
echo "repo-repo-2019-12-31-14-30-11.gz" | grep -oP '^\K[[:alpha:]-]+'
Displays :
repo-repo-
Which I think is what you want because you tried with [\w-]+ on your regex.
If I'm wrong, just replace the grep command with : grep -oP '^\K\w+'

Why can't I use ^\s with grep?

Both of the regexes below work In my case.
grep \s
grep ^[[:space:]]
However all those below fail. I tried both in git bash and putty.
grep ^\s
grep ^\s*
grep -E ^\s
grep -P ^\s
grep ^[\s]
grep ^(\s)
The last one even produces a syntax error.
If I try ^\s in debuggex it works.
Debuggex Demo
How do I find lines starting with whitespace characters with grep ? Do I have to use [[:space:]] ?

grep \s works for you because your input contains s. Here, you escape s and it matches the s, since it is not parsed as a whitespace matching regex escape. If you use grep ^\\s, you will match a string starting with whitespace since the \\ will be parsed as a literal \ char.
A better idea is to enable POSIX ERE syntax with -E and quote the pattern:
grep -E '^\s' <<< "$s"
See the online demo:
s=' word'
grep ^\\s <<< "$s"
# => word
grep -E '^\s' <<< "$s"
# => word

Grep regex not working with square brackets

So I was trying to write a regex in grep to match square brackets, i.e [ad] should match [ and ]. But I was getting different results on using capturing groups and character classes. Also the result is different on putting ' in the beginning and end of regex string.
So these are the different result that I am getting.
Using capturing groups works fine
echo "[ad]" | grep -E '(\[|\])'
[ad]
Using capturing groups without ' gives syntax error
echo "[ad]" | grep -E (\[|\])
bash: syntax error near unexpected token `('
using character class with [ followed by ] gives no output
echo "[ad]" | grep -E [\[\]]
Using character class with ] followed by [ works correctly
echo "[ad]" | grep -E [\]\[]
[ad]
Using character class with ] followed by [ and using ' does not work
echo "[ad]" | grep -E '[\]\[]'
It'd be great if someone could explain the difference between them.

You should know about:
BRE ( = Basic Regular Expression )
ERE ( = Extended Regular Expression )
BRE metacharacters require a backslash to give them their special meaning and grep is based on
The ERE flavor standardizes a flavor similar to the one used by the UNIX egrep command.
Pay attention to -E and -G
grep --help
Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE or standard input.
PATTERN is, by default, a basic regular expression (BRE).
Example: grep -i 'hello world' menu.h main.c
Regexp selection and interpretation:
-E, --extended-regexp PATTERN is an extended regular expression (ERE)
-F, --fixed-strings PATTERN is a set of newline-separated strings
-G, --basic-regexp PATTERN is a basic regular expression (BRE)
-P, --perl-regexp PATTERN is a Perl regular expression
...
...
POSIX Basic Regular Expressions
POSIX Extended Regular Expressions
POSIX Bracket Expressions
And you should also know about bash, since some of your input is related to bash interpreter not grep or anything else
echo "[ad]" | grep -E (\[|\])
Here bash assumes you try to use () something like:
echo $(( 10 * 10 ))
and by using single quote ' you tell the bash that you do not want it treats as a special operator for it. So
echo "[ad]" | grep -E '(\[|\])'
is correct.

Firstly, always quote Regex pattern to prevent shell interpretation beforehand:
$ echo "[ad]" | grep -E '(\[|\])'
[ad]
Secondly, within [] surrounded by quotes, you don't need to escape the [] inside, just write them as is within the outer []:
$ echo "[ad]" | grep -E '[][]'
[ad]

Maybe you provided such a simple example on purpose (after all, it is minimal), but in case all you really want is to check for existence of square brackets (a fixed string, not regex pattern), you can use grep with -F/--fixed-strings and multiple -e options:
$ echo "[ad]" | grep -F -e '[' -e ']'
[ad]
Or, a little bit shorter with fgrep:
$ echo "[ad]" | fgrep -e '[' -e ']'
[ad]
Or, even:
$ echo "[ad]" | fgrep -e[ -e]
[ad]

Simple replacement with sed inside bash not working

Why is this simple replacement with sed inside bash not working?
echo '[a](!)' | sed 's/[a](!)/[a]/'
It returns [a](!) instead of [a]. But why, given that only three characters need to be escaped in a sed replacement string?
If I account for the case that additional characters need to be replaced in the regex string and try
echo '[a](!)' | sed 's/\[a\]\(!\)/[a]/'
it is still not working.

The point is that [a] in the regex pattern does not match square brackets that form a bracket expression. Escape the first [ for it to be parsed as a literal [ symbol, and your replacement will work:
echo '[a](!)' | sed 's/\[a](!)/[a]/'
^^
See this demo

sed uses BREs by default and EREs can be enabled by escaping individual ERE metacharaters or by using the -E argument. [ and ] are BRE metacharacters, ( and ) are ERE metacharacters. When you wrote:
echo '[a](!)' | sed 's/\[a\]\(!\)/[a]/'
you were turning the [ and ] BRE metacharacters into literals, which is good, but you were turning the literal ( and ) into ERE metacharacters, which is bad. This is what you were trying to do:
echo '[a](!)' | sed 's/\[a\](!)/[a]/'
which you'd probably really want to write using a capture group:
echo '[a](!)' | sed 's/\(\[a\]\)(!)/\1/'
to avoid duplicating [a] on both sides of the substitution. With EREs enabled using the -E argument that last would be:
echo '[a](!)' | sed -E 's/(\[a\])\(!\)/\1/'
Read the sed man page and a regexp tutorial.

man echo tells that the command echo display a line of text. So [ and ( with their closing brackets are just text.
If you read man grep and type there /^\ *Character Classes and Bracket Expressions and /^\ *Basic vs Extended Regular Expressions you can read the difference. sed and other tools that use regex interprets this as Character Classes and Bracket Expressions.
You can try this
$ echo '[a](!)' | sed 's/(!)//'

regex: return characters inside parenthensis

I can't seem to get the values inside a parenthesis using grep.
echo "(this is a string)" | grep -Eo '[a-z ]*'
Ideally that should return the value inside the parenthesis, "this is a astring", instead it is not returning anything. Does anyone know the explanation?

This grep with -P (perl regex) works:
echo "foo (this is a string) bar" | grep -Po '\(\K[^)]*'
this is a string
OR using awk:
echo "foo (this is a string) bar" | awk -F '[()]+' '{print $2}'
this is a string
OR using sed:
echo "foo (this is a string) bar" | sed 's/^.*(\(.*\)*).*$/\1/'
this is a string

If you're trying to match everything enclosed by the parentheses, not including the parentheses, you should use this grep:
grep -Po '(?<=\()[^\)]*?'
The (?<=\() is a negative look-behind assertion that tells the regex engine to start from a character preceded by an opening parenthesis. [^\)]*? tells it to match all characters until it encounters a closing parenthesis. The -P tells it to use Perl regex syntax.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Correct (?) regex not understood by sed - regex

Related

Printing only text from group

Why can't I use ^\s with grep?

Grep regex not working with square brackets

Simple replacement with sed inside bash not working

regex: return characters inside parenthensis

Categories

Resources