Unable to parse text from a file using sed? - regex

Here is the line in a file I want to extract information from:
backup-initiation-time="00:00" backup-directory-path="/store/backup" backup-retention-period-days="2"
My command is:
grep "backup-directory-path" test.txt | sed 's/.*backup-directory-path="\(.*?\)" /\1/'
I just want /store/backup that's it. I don't know what I'm doing wrong.

You can use \K (keep) and simply:
grep -oP '.*backup-directory-path=\K([^ ])+'
This will display only the captured part after the "keep".
In order to remove the quotes you have there, just modify it:
grep -oP '.*backup-directory-path="\K([^"])+'

I couldn't find any documentation on non-greedy matching in sed so I'm not sure if this was implemented.
Instead of your non-greedy match using .*? you could use [^"]* if you know the last or one past last character you want to match, in your case ".
This command produces the expected output:
grep "backup-directory-path" test.txt | sed 's|.* backup-directory-path="\([^"]*\)".*|\1|'

Related

Back-reference when preprend using sed linux command and i sed command

I'm trying to prepend the first character of "monkey" using this command:
echo monkey | sed -E '/(.)onkey/i \1'
But when I use it like this, the output shows
1
monkey
I actually hope to see:
m
monkey
But back-reference doesn't work. Please someone tell me if it is possible to use Back-reference with \1. Thanks in advance.
You may use this sed:
echo 'monkey' | sed -E 's/(.)onkey/\1\n&/'
m
monkey
Here:
\1: is back-reference for group #1
\n: inserts a line break
&: is back-reference for full match
With any version of awk you can try following solution, written and tested with shown samples. Simply searching regex ^.onkey and then using sub function to substitute starting letter with itself new line and itself and printing the value(s).
echo monkey | awk '/^.onkey/{sub(/^./,"&\n&")} 1'
This might work for you (GNU sed):
sed -E '/monkey/i m' file
Insert the line containing m only above a line containing monkey.
Perhaps a more generic solution would be to insert the first character of a word above that word:
sed -E 'h;s/\B.*//;G' file
Make copy of the word.
Remove all but the first character of the word.
Append the original word delimited by a newline.
Print the result.
N.B. \B starts a match between characters of a word. \b represents the start or end of a word (as does \< and \> separately).

How to use grep/sed/awk, to remove a pattern from beginning of a text file

I have a text file with the following pattern written to it:
TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"
I would like to discard the first part of each line containing
TIME[32.468ms] -(3)-.............
To test the regular expression I've tried the following:
cat myfile.txt | egrep "^TIME\[.*\]\s\s\-\(3\)\-\.+"
This identifies correctly the lines I want. Now, to delete the pattern I've tried:
cat myfile.txt | sed s/"^TIME\[.*\]\s\s\-\(3\)\-\.+"//
but it just seems to be doing the cat, since it shows the content of the complete file and no substitution happens.
What am I doing wrong?
OS: CentOS 7
With your shown samples, please try following grep command. Written and tested with GNU grep.
grep -oP '^TIME\[\d+\.\d+ms\]\s+-\(\d+\)-\.+\K.*' Input_file
Explanation: Adding detailed explanation for above code.
^TIME\[ ##Matching string TIME from starting of value here.
\d+\.\d+ms\] ##Matching digits(1 or more occurrences) followed by dot digits(1 or more occurrences) followed by ms ] here.
\s+-\(\d+\)-\.+ ##Matching spaces91 or more occurrences) followed by - digits(1 or more occurrences) - and 1 or more dots.
\K ##Using \K option of GNU grep to make sure previous match is found in line but don't consider it in printing, print next matched regex part only.
.* ##to match till end of the value.
2nd solution: Adding awk program here.
awk 'match($0,/^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+/){print substr($0,RSTART+RLENGTH)}' Input_file
Explanation: using match function of awk, to match regex ^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+ which will catch text which we actually want to remove from lines. Then printing rest of the text apart from matched one which is actually required by OP.
This awk using its sub() function:
awk 'sub(/^TIME[[][^]]*].*\.+/,"")' file
"TEXT I WANT TO KEEP"
If there is replacement, sub() returns true.
$ cut -d'"' -f2 file
TEXT I WANT TO KEEP
You may use:
s='TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"'
sed -E 's/^TIME\[[^]]*].*\.+//'
"TEXT I WANT TO KEEP"
The \s regex extension may not be supported by your sed.
In BRE syntax (which is what sed speaks out of the box) you do not backslash round parentheses - doing that turns them into regex metacharacters which do not match themselves, somewhat unintuitively. Also, + is just a regular character in BRE, not a repetition operator (though you can turn it into one by similarly backslashing it: \+).
You can try adding an -E option to switch from BRE syntax to the perhaps more familiar ERE syntax, but that still won't enable Perl regex extensions, which are not part of ERE syntax, either.
sed 's/^TIME\[[^][]*\][[:space:]][[:space:]]-(3)-\.*//' myfile.txt
should work on any reasonably POSIX sed. (Notice also how the minus character does not need to be backslash-escaped, though doing so is harmless per se. Furthermore, I tightened up the regex for the square brackets, to prevent the "match anything" regex you had .* from "escaping" past the closing square bracket. In some more detail, [^][] is a negated character class which matches any character which isn't (a newline or) ] or [; they have to be specified exactly in this order to avoid ambiguity in the character class definition. Finally, notice also how the entire sed script should normally be quoted in single quotes, unless you have specific reasons to use different quoting.)
If you have sed -E or sed -r you can use + instead of * but then this complicates the overall regex, so I won't suggest that here.
A simpler one for sed:
sed 's/^[^"]*//' myfile.txt
If the "text you want to keep" always surrounded by the quote like this and only them having the quote in the line starting with "TIME...", then:
sed -n '/^TIME/p' file | awk -F'"' '{print $2}'
should get the line starting with "TIME..." and print the text within the quotes.
Thanks all, for your help.
By the end, I've found a way to make it work:
echo 'TIME[32.468ms] -(3)-.............TEXT I WANT TO KEEP' | grep TIME | sed -r 's/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//'
More generally,
grep TIME myfile.txt | sed -r ‘s/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//’
Cheers,
Pedro

How to extract jira ticket number with sed?

I want to extract Jira ticket number from the branch name with sed.
This is what I have
echo "PTW-123-branch-name" | sed 's/.*\([A-Z]+-[0-9]+[^-]\).*/\1/'
expected result: PTW-123
What is wrong with the regexp?
You may use this sed:
echo "PTW-123-branch-name" | sed 's/\([0-9]\)-.*$/\1/'
PTW-123
Details:
\([0-9]\)-: Matches a digit and captures it in group #1 followed by hyphen
.*$: Match remaining string until end
\1: Is replacement that puts captured digit back in output
Alternatively you can use cut also:
echo "PTW-123-branch-name" | cut -d- -f1,2
PTW-123
In case you are ok with GNU grep please try following then. Simple explanation would be passing echo command's output as a standard input to grep command. Then in grep command using -oP option to print only matched portion and enabling PCRE regex capabilities here. In match section of grep then using non-greedy match to match till digits which should be followed by -, then if a match is found it will print it.
echo "PTW-123-branch-name" | grep -oP '^.*?\d+(?=-)'

grep part of text from ps output with regex

From ps -ef command output -Dorg.xxx.yyy=/home/user/aaa/server.log.
I'd like to extract the file path /home/user/aaa/server.log (can be any name.file).
Now, I'm using command:
ps -ef | grep -Po '(?<=-Dorg.xxx.yyy=)[^\s]*'
It will display two matched results:
/home/user/aaa/server.log
)[^\s]*
It looks like it counts the command as well for the 2nd matched result. How can I remove it? Or is there other suggestions? (I can not use -m1).
If you just need the file name, use \K operator:
org\.xxx\.yyy=\K[^\s]*
ps -ef | grep -Po 'org\.xxx\.yyy=\K[^\s]*'
It will match the whole string, but will only print the file name matched with [^\s]*.
From perlre:
There is a special form of this construct, called \K (available since
Perl 5.10.0), which causes the regex engine to "keep" everything it
had matched prior to the \K and not include it in $& . This
effectively provides variable-length look-behind.
Use that:
grep -Po '(?<=-[D]org.xxx.yyy=)[^\s]*'
Just put one of the characters in square brackets ([D]). The meaning of the regex hasn't changed and the pattern doesn't match itself anymore.

How to use grep to get anything just after `name=`?

I’m stuck in trying to grep anything just after name=, include only spaces and alphanumeric.
e.g.:
name=some value here
I get
some value here
I’m totally newb in this, the following grep match everything including the name=.
grep 'name=.*' filename
Any help is much appreciated.
As detailed here, you want a positive lookbehind clause, such as:
grep -P '(?<=name=)[ A-Za-z0-9]*' filename
The -P makes grep use the Perl dialect, otherwise you'd probably need to escape the parentheses. You can also, as noted elsewhere, append the -o parameter to print out only what is matched. The part in brackets specifies that you want alphanumerics and spaces.
The advantage of using a positive lookbehind clause is that the "name=" text is not part of the match. If grep highlights matched text, it will only highlight the alphanumeric (and spaces) part. The -o parameter will also not display the "name=" part. And, if you transition this to another program like sed that might capture the text and do something with it, you won't be capturing the "name=" part, although you can also do that by using capturing parenthess.
Try this:
sed -n 's/^name=//p' filename
It tells sed to print nothing (-n) by default, substitute your prefix with nothing, and print if the substitution occurs.
Bonus: if you really need it to only match entries with only spaces and alphanumerics, you can do that too:
sed -n 's/^name=\([ 0-9a-zA-Z]*$\)/\1/p' filename
Here we've added a pattern to match spaces and alphanumerics only until the end of the line ($), and if we match we substitute the group in parentheses and print.
gawk
echo "name=some value here" | awk -F"=" '/name=/ { print $2}'
or with bash
str="name=some value here"
IFS="="
set -- $str
echo $1
unset IFS
or
str="name=some value here"
str=${str/name=/}
grep does not extract like you expect. What you need is
grep "name=" file.txt | cut -d'=' -f1-
grep will print the entire line where it matches the pattern. To print only the pattern matched, use the grep -o option. You'll probably also need to use sed to remove the name= part of the pattern.
grep -o 'name=[0-9a-zA-Z ]' myfile | sed /^name=/d