grep regular expression returns full line - regex

Im trying to print everything after a keyword using grep but the command returns the whole line. Im using the following:
grep -P (\skeyword\s)(.*)
an example line is:
abcdefg keyword hello, how are you.
The result should be hello, how are you but instead it gives the full line. Am I doing something wrong here?

You need to use -o (only matching) parameter and \K (discards the previously matched characters) or a positive lookbehind.
grep -oP '\skeyword\s+\K.*' file
\K keeps the text matched so far out of the overall regex match. \s+ matches one or more space characters.
Example:
$ echo 'abcdefg keyword hello, how are you.' | grep -oP '\skeyword\s+\K.*'
hello, how are you.

By default, Grep prints lines that match. To print only matching expressions try the '-o' option.

Related

Parsing only first regex match in a line with several matches

Is it possible to have a regex that parses only a1bcdea1 from this line a1bcdea1ABCa1DEFa1 ?
This grep command does not work:
$ cat txtfile
a1bcdea1ABCa1DEFa1
$ grep -oE "[A-Z,a-z]1.*?[A-Z,a-z]1" txtfile
a1bcdea1ABCa1DEFa1
I want the output of grep to be only a1bcdea1.
EDIT:
It is obvious that I can just use grep -o "a1bcdea1" for the above line, but consider if one has several thousands of lines and the goal is to match FIRST [A-Z,a-z]1.*?[A-Z,a-z]1 for each single line.
How about using a ^ start anchor and restricting character set used:
grep -o '^[A-Za-z]1[A-Za-z]*1'
See this Bash demo or Regex Pattern at regex101
If you expect more digits or other characters in between, go with this
grep -oP '^[A-Za-z]1.*?[A-Za-z]1'
The lazy matching requires perl compatible mode. For not at line start, go with this
grep -oP '^.*?\K[A-Za-z]1.*?[A-Za-z]1'
\K resets beginning of the reported match and is a PCRE feature as well.
Here is a gnu awk solution using split function:
awk '(n = split($0, a, /[a-zA-Z]1/, b)) > 1 {print b[1] a[2] b[2]}' file
a1bcdea1
This awk command splits each line on regex /[a-zA-Z]1/ and stores split tokens in array a and delimiters in array b.

Unable to parse text from a file using sed?

Here is the line in a file I want to extract information from:
backup-initiation-time="00:00" backup-directory-path="/store/backup" backup-retention-period-days="2"
My command is:
grep "backup-directory-path" test.txt | sed 's/.*backup-directory-path="\(.*?\)" /\1/'
I just want /store/backup that's it. I don't know what I'm doing wrong.
You can use \K (keep) and simply:
grep -oP '.*backup-directory-path=\K([^ ])+'
This will display only the captured part after the "keep".
In order to remove the quotes you have there, just modify it:
grep -oP '.*backup-directory-path="\K([^"])+'
I couldn't find any documentation on non-greedy matching in sed so I'm not sure if this was implemented.
Instead of your non-greedy match using .*? you could use [^"]* if you know the last or one past last character you want to match, in your case ".
This command produces the expected output:
grep "backup-directory-path" test.txt | sed 's|.* backup-directory-path="\([^"]*\)".*|\1|'

regex command line linux - select all lines between two strings

I have a text file with contents like this:
here is some super text:
this is text that should
be selected with a cool match
And this is how it all ends
blah blah...
I am trying to get the two lines (but could be more or less lines) between:
some super text:
and
And this is how
I am using grep on an ubuntu machine and a lot of the patterns I've found seem to be specific to different kinds of regex engines.
So I should end up with something like this:
grep "my regex goes here" myFileNameHere
Not sure if egrep is needed, but could use that just as easy.
You can use addresses in sed:
sed -e '/some super text/,/And this is how/!d' file
!d means "don't output if not in the range".
To exclude the border lines, you must be more clever:
sed -n -e '/some super text/ {n;b c}; d;:c {/And this is how/ {d};p;n;b c}' file
Or, similarly, in Perl:
perl -ne 'print if /some super text/ .. /And this is how/' file
To exclude the border lines again, change it to
perl -ne '$in = /some super text/ .. /And this is how/; print if $in > 1 and $in !~ /E/' file
I don't see how it could be done in grep. Using awk:
awk '/^And this is how/ {p=0}; p; /some super text:$/ {p=1}' file
Give a try to pcregrep instead of normal grep. Because normal grep won't help you to fetch multiple lines in a row.
$ pcregrep -M -o '(?s)some super text:[^\n]*\n\K.*?(?=\n[^\n]*And this is how)' file
this is text that should
be selected with a cool match
(?s) Dotall modifier allows dot to match even newline characters also.
\K Discards the previously matched characters.
From pcregrep --help
-M, --multiline run in multiline mode
-o, --only-matching=n show only the part of the line that matched
TL;DR
With your corpus, another way to solve the problem is by matching lines with leading whitespace, rather than using a flip-flop operator of some sort to match start and end lines. The following solutions work with your posted example.
GNU Grep with PCRE Compiled In
$ grep -Po '^\s+\K.*' /tmp/corpus
this is text that should
be selected with a cool match
Alternative: Use pcregrep Instead
$ pcregrep -o '^\s+\K.*' /tmp/corpus
this is text that should
be selected with a cool match

grep not returning expected result with regex on xml

I'm running a grep command on some xml, and it appears to be misinterpretting the regular expression I'm trying to use.
Here's the command
grep '<ernm:NewReleaseMessage.*?>' ./075679942012_ORIGNAL.xml
what appears to be happening is that the ?> aspect of the regex seems to cause no matching rather than matching to the first occurence of >
Any ideas?
If you want to get the text upto the first occurrence of > character then try the below command,
grep -o '<ernm:NewReleaseMessage[^>]*>' file
If you want the whole line then remove -o parameter.
Example:
$ cat aa1.txt
<ernm:NewReleaseMessage blah> foo bar>
$ grep -o '<ernm:NewReleaseMessage[^>]*>' aa1.txt
<ernm:NewReleaseMessage blah>
grep with -o prints only the matched text.
[^>]* - Not of > character zero or more. So it matches upto the first occurance of > character.
By default, grep uses basic regular expression and considers ? as a literal question-mark. For it to be considered regular expression syntax, you need to escape that character.
grep '<ernm:NewReleaseMessage.*\?>' ./075679942012_ORIGNAL.xml
You can use the -E option which interprets the pattern as an extended regular expression.
grep -E '<ernm:NewReleaseMessage.*?>' ./075679942012_ORIGNAL.xml
Note: This above will return the whole line that matches your pattern, if you only want the matched text, use the -o option which prints only the matched parts of matching lines.
grep -o '<ernm:NewReleaseMessage.*\?>' ./075679942012_ORIGNAL.xml
OR
grep -Eo '<ernm:NewReleaseMessage.*?>' ./075679942012_ORIGNAL.xml

How to use grep to get anything just after `name=`?

I’m stuck in trying to grep anything just after name=, include only spaces and alphanumeric.
e.g.:
name=some value here
I get
some value here
I’m totally newb in this, the following grep match everything including the name=.
grep 'name=.*' filename
Any help is much appreciated.
As detailed here, you want a positive lookbehind clause, such as:
grep -P '(?<=name=)[ A-Za-z0-9]*' filename
The -P makes grep use the Perl dialect, otherwise you'd probably need to escape the parentheses. You can also, as noted elsewhere, append the -o parameter to print out only what is matched. The part in brackets specifies that you want alphanumerics and spaces.
The advantage of using a positive lookbehind clause is that the "name=" text is not part of the match. If grep highlights matched text, it will only highlight the alphanumeric (and spaces) part. The -o parameter will also not display the "name=" part. And, if you transition this to another program like sed that might capture the text and do something with it, you won't be capturing the "name=" part, although you can also do that by using capturing parenthess.
Try this:
sed -n 's/^name=//p' filename
It tells sed to print nothing (-n) by default, substitute your prefix with nothing, and print if the substitution occurs.
Bonus: if you really need it to only match entries with only spaces and alphanumerics, you can do that too:
sed -n 's/^name=\([ 0-9a-zA-Z]*$\)/\1/p' filename
Here we've added a pattern to match spaces and alphanumerics only until the end of the line ($), and if we match we substitute the group in parentheses and print.
gawk
echo "name=some value here" | awk -F"=" '/name=/ { print $2}'
or with bash
str="name=some value here"
IFS="="
set -- $str
echo $1
unset IFS
or
str="name=some value here"
str=${str/name=/}
grep does not extract like you expect. What you need is
grep "name=" file.txt | cut -d'=' -f1-
grep will print the entire line where it matches the pattern. To print only the pattern matched, use the grep -o option. You'll probably also need to use sed to remove the name= part of the pattern.
grep -o 'name=[0-9a-zA-Z ]' myfile | sed /^name=/d