I can't seem to get the values inside a parenthesis using grep.
echo "(this is a string)" | grep -Eo '[a-z ]*'
Ideally that should return the value inside the parenthesis, "this is a astring", instead it is not returning anything. Does anyone know the explanation?
This grep with -P (perl regex) works:
echo "foo (this is a string) bar" | grep -Po '\(\K[^)]*'
this is a string
OR using awk:
echo "foo (this is a string) bar" | awk -F '[()]+' '{print $2}'
this is a string
OR using sed:
echo "foo (this is a string) bar" | sed 's/^.*(\(.*\)*).*$/\1/'
this is a string
If you're trying to match everything enclosed by the parentheses, not including the parentheses, you should use this grep:
grep -Po '(?<=\()[^\)]*?'
The (?<=\() is a negative look-behind assertion that tells the regex engine to start from a character preceded by an opening parenthesis. [^\)]*? tells it to match all characters until it encounters a closing parenthesis. The -P tells it to use Perl regex syntax.
Related
I have the following strings:
text/:some_random_text:text_i_w4nt_to:k33p.until_th3_end_1
text/:some_random_text:text_i_w4nt_to::k33p.until_th3_end_1
with a regular expression, I want to extract:
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
I have tried with regex101.com the following expression: ([^:]+)(?::[^:]+){1}$
and it worked (only for the first string)
But if I try in bash, it does not
echo "text/:some_random_text:text_i_w4nt_to::k33p.until_th3_end_1" | sed -n "/([^:]+)(?::[^:]+){1}$/p"
It would be much easier with cut without any regex:
cut -d: -f3- file
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
Non capture groups (?: are not supported in sed and you have to escape the \( \) \{ \} and \+
You can repeat 2 occurrences of : from the start of the string and replace that with an empty string.
sed 's/^\([^:]\+:\)\{2\}//' file
Or using sed -E for extended regexp:
sed -E 's/^([^:]+:){2}//' file
Output
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
Using sed
$ sed s'|\([^:]*:\)\{2\}\(.*\)$|\2|' input_file
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
or
$ sed s'|\([^:]*:\)\{2\}||' input_file
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
There's no reason to drag sed or other external programs into this; just use bash's built in regular expression matching:
#!/usr/bin/env bash
strings=(text/:some_random_text:text_i_w4nt_to:k33p.until_th3_end_1
text/:some_random_text:text_i_w4nt_to::k33p.until_th3_end_1)
for s in "${strings[#]}"; do
[[ $s =~ ^([^:]*:){2}(.*) ]] && printf "%s\n" "${BASH_REMATCH[2]}"
done
Heck, you don't need regular expressions in bash:
printf "%s\n" "${s#*:*:}"
awk
string='ext/:some_random_text:text_i_w4nt_to:k33p.until_th3_end_1
text/:some_random_text:text_i_w4nt_to::k33p.until_th3_end_1'
awk -vFS=: -vOFS=: '{$1=$2="";gsub(/^::/,"")}1' <<<"$string"
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
absolutely no need to use anything that requires regex-backreferences, since the regex anchoring is right at the line head anyway :
mawk ++NF OFS= FS='^[^:]*:[^:]*:'
text_i_w4nt_to:k33p.until_th3_end_1
text_i_w4nt_to::k33p.until_th3_end_1
According to https://regex101.com/r/NLSymf/3, the following regex:
\[\[(foo)([^\]]+)\]\]
(full) matches the string [[foo>test1|test2]], but this seems to not be understood by sed, since:
echo "[[foo>test1|test2]]" | sed -E -e '/\[\[(foo)([^\]]+)\]\]/d'
(which should return an empty string) returns:
[[foo>test1|test2]]
What is the regex that matches [[foo>test1|test2]] from sed's point of view?
The backslash character loses its escaping capability within a bracket expression. And stray closing brackets in a RE need not be escaped, that's why grep doesn't fail the first pipeline below. See RE Bracket Expression for reference.
$ echo 'a]' | grep -Eo '[^\]]'
a]
$ echo 'a]' | grep -Eo '[^]]'
a
The correct regex would be:
\[\[(foo)([^]]+)]]
I have working example of substitution in online regex tester https://regex101.com/r/3FKdLL/1 and I want to use it as a substitution in sed editor.
echo "repo-2019-12-31-14-30-11.gz" | sed -r 's/^([\w-]+)-\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}.gz$.*/\1/p'
It always prints whole string: repo-2019-12-31-14-30-11.gz, but not matched group [\w-]+.
I expect to get only text from group which is repo string in this example.
Try this:
echo "repo-2019-12-31-14-30-11.gz" |
sed -rn 's/^([A-Za-z]+)-[[:alnum:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}.gz.*$/\1/p'
Explanations:
\w will work (not [\w] wich matches either backslash or w), but you should use [[:alnum:]] which is POSIX
For sed, \d isn't a regex class, but an escaped character representing a non-printable character
Add -n to mute sed, with /p to explicitly print matched lines
Additionaly, you could refactor your regex by removing duplication:
echo "repo-2019-12-31-14-30-11.gz" |
sed -rn 's/^([[:alnum:]]+)-[[:digit:]]{4}(-[[:digit:]]{2}){5}.gz.*$/\1/p'
Looks like a job for GNU grep :
echo "repo-2019-12-31-14-30-11.gz" | grep -oP '^\K[[:alpha:]-]+'
Displays :
repo-
On this example :
echo "repo-repo-2019-12-31-14-30-11.gz" | grep -oP '^\K[[:alpha:]-]+'
Displays :
repo-repo-
Which I think is what you want because you tried with [\w-]+ on your regex.
If I'm wrong, just replace the grep command with : grep -oP '^\K\w+'
Both of the regexes below work In my case.
grep \s
grep ^[[:space:]]
However all those below fail. I tried both in git bash and putty.
grep ^\s
grep ^\s*
grep -E ^\s
grep -P ^\s
grep ^[\s]
grep ^(\s)
The last one even produces a syntax error.
If I try ^\s in debuggex it works.
Debuggex Demo
How do I find lines starting with whitespace characters with grep ? Do I have to use [[:space:]] ?
grep \s works for you because your input contains s. Here, you escape s and it matches the s, since it is not parsed as a whitespace matching regex escape. If you use grep ^\\s, you will match a string starting with whitespace since the \\ will be parsed as a literal \ char.
A better idea is to enable POSIX ERE syntax with -E and quote the pattern:
grep -E '^\s' <<< "$s"
See the online demo:
s=' word'
grep ^\\s <<< "$s"
# => word
grep -E '^\s' <<< "$s"
# => word
My problem is the same as it's here, except I only want the first occurrence, ignore all the rest:
How to use sed/grep to extract text between two words?
In his example if it would be:
input: "Here is a String Here is a String"
But I only care about the first "is"
echo "Here is a String Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
output: "is a String Here is a"
Is this even possible with grep? I could use sed as well for the job.
Thanks
Your regexp happens to be matching against the longest string that sits between "Here" and "String". That is, indeed, "Here is a String Here is a String". This is the default behaviour of the * quantifier.
$ echo "Here is a String Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a String Here is a
If you want to match the shortest, you may put a ? (greediness modifier) just after the * quantifier:
$ echo "Here is a String Here is a String" | grep -Po '(?<=(Here )).*?(?= String)'
is a
is a
To get the first word you can use grep -o '^[^ ]*':
echo "Here is a String Here is a String" | grep -Po '(?<=(Here )).*(?= String)' | grep -o '^[^ ]*'
And you can pipe grep to grep multiple times to compose simple commands into complex ones.
sed 's/ String.*//;s/.*Here //'