regex: return characters inside parenthensis

regex: return characters inside parenthensis - regex

I can't seem to get the values inside a parenthesis using grep.
echo "(this is a string)" | grep -Eo '[a-z ]*'
Ideally that should return the value inside the parenthesis, "this is a astring", instead it is not returning anything. Does anyone know the explanation?

This grep with -P (perl regex) works:
echo "foo (this is a string) bar" | grep -Po '\(\K[^)]*'
this is a string
OR using awk:
echo "foo (this is a string) bar" | awk -F '[()]+' '{print $2}'
this is a string
OR using sed:
echo "foo (this is a string) bar" | sed 's/^.*(\(.*\)*).*$/\1/'
this is a string

If you're trying to match everything enclosed by the parentheses, not including the parentheses, you should use this grep:
grep -Po '(?<=\()[^\)]*?'
The (?<=\() is a negative look-behind assertion that tells the regex engine to start from a character preceded by an opening parenthesis. [^\)]*? tells it to match all characters until it encounters a closing parenthesis. The -P tells it to use Perl regex syntax.

Related

Bash Regex extract all text from 2nd occurence of specific character until end of line

I have the following strings:
text/:some_random_text:text_i_w4nt_to:k33p.until_th3_end_1
text/:some_random_text:text_i_w4nt_to::k33p.until_th3_end_1
with a regular expression, I want to extract:
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
I have tried with regex101.com the following expression: ([^:]+)(?::[^:]+){1}$
and it worked (only for the first string)
But if I try in bash, it does not
echo "text/:some_random_text:text_i_w4nt_to::k33p.until_th3_end_1" | sed -n "/([^:]+)(?::[^:]+){1}$/p"

It would be much easier with cut without any regex:
cut -d: -f3- file
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1

Non capture groups (?: are not supported in sed and you have to escape the \( \) \{ \} and \+
You can repeat 2 occurrences of : from the start of the string and replace that with an empty string.
sed 's/^\([^:]\+:\)\{2\}//' file
Or using sed -E for extended regexp:
sed -E 's/^([^:]+:){2}//' file
Output
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1

Using sed
$ sed s'|\([^:]*:\)\{2\}\(.*\)$|\2|' input_file
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
or
$ sed s'|\([^:]*:\)\{2\}||' input_file
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1

There's no reason to drag sed or other external programs into this; just use bash's built in regular expression matching:
#!/usr/bin/env bash
strings=(text/:some_random_text:text_i_w4nt_to:k33p.until_th3_end_1
text/:some_random_text:text_i_w4nt_to::k33p.until_th3_end_1)
for s in "${strings[#]}"; do
[[ $s =~ ^([^:]*:){2}(.*) ]] && printf "%s\n" "${BASH_REMATCH[2]}"
done
Heck, you don't need regular expressions in bash:
printf "%s\n" "${s#*:*:}"

awk
string='ext/:some_random_text:text_i_w4nt_to:k33p.until_th3_end_1
text/:some_random_text:text_i_w4nt_to::k33p.until_th3_end_1'
awk -vFS=: -vOFS=: '{$1=$2="";gsub(/^::/,"")}1' <<<"$string"
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1

absolutely no need to use anything that requires regex-backreferences, since the regex anchoring is right at the line head anyway :
mawk ++NF OFS= FS='^[^:]*:[^:]*:'
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1

Correct (?) regex not understood by sed

According to https://regex101.com/r/NLSymf/3, the following regex:
\[\[(foo)([^\]]+)\]\]
(full) matches the string [[foo>test1|test2]], but this seems to not be understood by sed, since:
echo "[[foo>test1|test2]]" | sed -E -e '/\[\[(foo)([^\]]+)\]\]/d'
(which should return an empty string) returns:
[[foo>test1|test2]]
What is the regex that matches [[foo>test1|test2]] from sed's point of view?

The backslash character loses its escaping capability within a bracket expression. And stray closing brackets in a RE need not be escaped, that's why grep doesn't fail the first pipeline below. See RE Bracket Expression for reference.
$ echo 'a]' | grep -Eo '[^\]]'
a]
$ echo 'a]' | grep -Eo '[^]]'
a
The correct regex would be:
\[\[(foo)([^]]+)]]

Printing only text from group

I have working example of substitution in online regex tester https://regex101.com/r/3FKdLL/1 and I want to use it as a substitution in sed editor.
echo "repo-2019-12-31-14-30-11.gz" | sed -r 's/^([\w-]+)-\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}.gz$.*/\1/p'
It always prints whole string: repo-2019-12-31-14-30-11.gz, but not matched group [\w-]+.
I expect to get only text from group which is repo string in this example.

Try this:
echo "repo-2019-12-31-14-30-11.gz" |
sed -rn 's/^([A-Za-z]+)-[[:alnum:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}.gz.*$/\1/p'
Explanations:
\w will work (not [\w] wich matches either backslash or w), but you should use [[:alnum:]] which is POSIX
For sed, \d isn't a regex class, but an escaped character representing a non-printable character
Add -n to mute sed, with /p to explicitly print matched lines
Additionaly, you could refactor your regex by removing duplication:
echo "repo-2019-12-31-14-30-11.gz" |
sed -rn 's/^([[:alnum:]]+)-[[:digit:]]{4}(-[[:digit:]]{2}){5}.gz.*$/\1/p'

Looks like a job for GNU grep :
echo "repo-2019-12-31-14-30-11.gz" | grep -oP '^\K[[:alpha:]-]+'
Displays :
repo-
On this example :
echo "repo-repo-2019-12-31-14-30-11.gz" | grep -oP '^\K[[:alpha:]-]+'
Displays :
repo-repo-
Which I think is what you want because you tried with [\w-]+ on your regex.
If I'm wrong, just replace the grep command with : grep -oP '^\K\w+'

Why can't I use ^\s with grep?

Both of the regexes below work In my case.
grep \s
grep ^[[:space:]]
However all those below fail. I tried both in git bash and putty.
grep ^\s
grep ^\s*
grep -E ^\s
grep -P ^\s
grep ^[\s]
grep ^(\s)
The last one even produces a syntax error.
If I try ^\s in debuggex it works.
Debuggex Demo
How do I find lines starting with whitespace characters with grep ? Do I have to use [[:space:]] ?

grep \s works for you because your input contains s. Here, you escape s and it matches the s, since it is not parsed as a whitespace matching regex escape. If you use grep ^\\s, you will match a string starting with whitespace since the \\ will be parsed as a literal \ char.
A better idea is to enable POSIX ERE syntax with -E and quote the pattern:
grep -E '^\s' <<< "$s"
See the online demo:
s=' word'
grep ^\\s <<< "$s"
# => word
grep -E '^\s' <<< "$s"
# => word

bash sed/grep extract text between 2 words

My problem is the same as it's here, except I only want the first occurrence, ignore all the rest:
How to use sed/grep to extract text between two words?
In his example if it would be:
input: "Here is a String Here is a String"
But I only care about the first "is"
echo "Here is a String Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
output: "is a String Here is a"
Is this even possible with grep? I could use sed as well for the job.
Thanks

Your regexp happens to be matching against the longest string that sits between "Here" and "String". That is, indeed, "Here is a String Here is a String". This is the default behaviour of the * quantifier.
$ echo "Here is a String Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a String Here is a
If you want to match the shortest, you may put a ? (greediness modifier) just after the * quantifier:
$ echo "Here is a String Here is a String" | grep -Po '(?<=(Here )).*?(?= String)'
is a
is a

To get the first word you can use grep -o '^[^ ]*':
echo "Here is a String Here is a String" | grep -Po '(?<=(Here )).*(?= String)' | grep -o '^[^ ]*'
And you can pipe grep to grep multiple times to compose simple commands into complex ones.

sed 's/ String.*//;s/.*Here //'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex: return characters inside parenthensis - regex

I can't seem to get the values inside a parenthesis using grep. echo "(this is a string)" | grep -Eo '[a-z ]*' Ideally that should return the value inside the parenthesis, "this is a astring", instead it is not returning anything. Does anyone know the explanation?

This grep with -P (perl regex) works: echo "foo (this is a string) bar" | grep -Po '\(\K[^)]' this is a string OR using awk: echo "foo (this is a string) bar" | awk -F '[()]+' '{print $2}' this is a string OR using sed: echo "foo (this is a string) bar" | sed 's/^.(\(.\)).*$/\1/' this is a string

Related

Bash Regex extract all text from 2nd occurence of specific character until end of line

Correct (?) regex not understood by sed

Printing only text from group

Why can't I use ^\s with grep?

bash sed/grep extract text between 2 words

Categories

Resources