Regexp or Grep in Bash - regex

Can you please tell me how to get the token value correctly? At the moment I am getting: "1jdq_dnkjKJNdo829n4-xnkwe",258],["FbtResult
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | sed -n 's/.*"token":\([^}]*\)\}/\1/p'

You need to match the full string, and to get rid of double quotes, you need to match a " before the token and use a negated bracket expression [^"] instead of [^}]:
sed -n 's/.*"token":"\([^"]*\).*/\1/p'
Details:
.* - any zero or more chars
"token":" - a literal "token":" string
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
.* - any zero or more chars.

This replacement works:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult'
| sed -n 's/.*"token":"\([a-z]*\)"\}.*/\1/p'
Key capture after "token" found between quotes via \([a-z]*\), followed by a closing brace \} and remaining characters after that as .* (you were missing this part before, which caused the replacement to include the text after keyword as well).
Output:
aaaaaaa

A grep solution:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | grep -Po '(?<="token":")[^"]+'
yields
aaaaaaa
The -P option to grep enables the Perl-compatible regex (PCRE).
The -o option tells grep to print only the matched substring, not the entire line.
The regex (?<="token":") is a PCRE-specific feature called a zero-width positive lookbehind assertion. The expression (?<=pattern) matches a pattern without including it in the matched result.

Related

Regex to match exact version phrase

I have versions like:
v1.0.3-preview2
v1.0.3-sometext
v1.0.3
v1.0.2
v1.0.1
I am trying to get the latest version that is not preview (doesn't have text after version number) , so result should be:
v1.0.3
I used this grep: grep -m1 "[v\d+\.\d+.\d+$]"
but it still outputs: v1.0.3-preview2
what I could be missing here?
To return first match for pattern v<num>.<num>.<num>, use:
grep -m1 -E '^v[0-9]+(\.[0-9]+){2}$' file
v1.0.3
If you input file is unsorted then use grep | sort -V | head as:
grep -E '^v[0-9]+(\.[0-9]+){2}$' f | sort -rV | head -1
When you use ^ or $ inside [...] they are treated a literal character not the anchors.
RegEx Details:
^: Start
v: Match v
[0-9]+: Match 1+ digits
(\.[0-9]+){2}: Match a dot followed by 1+ dots. Repeat this group 2 times
$: End
To match the digits with grep, you can use
grep -m1 "v[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" file
Note that you don't need the [ and ] in your pattern, and to escape the dot to match it literally.
With awk you could try following awk code.
awk 'match($0,/^v[0-9]+(\.[0-9]+){2}$/){print;exit}' Input_file
Explanation of awk code: Simple explanation of awk program would be, using match function of awk to match regex to match version, once match is found print the matched value and exit from program.
Regular expressions match substrings, not whole strings. You need to explicitly match the start (^) and end ($) of the pattern.
Keep in mind that $ has special meaning in double quoted strings in shell scripts and needs to be escaped.
The boundary characters need to be outside of any group ([]).

Extract capture group only from string

I have the following rule:
https://regex101.com/r/noX9lj/4
I want to make this work in a script so I'm using grep like this:
echo "\$this->table('test')" | grep -Po "qr/\$this->table\(\'(test)\'\);/"
The output should be "test"
It's not working, not sure why..
You may use
echo "\$this->table('test');" | grep -oP "\\\$this->table\\('\\K[^']+(?='\\);)"
Or, if you feed a file path to grep:
grep -oP "\\\$this->table\\('\\K[^']+(?='\\);)" file
See the online grep demo
To match $, you need to escape it with a literal backslash, and inside a double quoted string, you need to escape $ itself with one backslash char in order to stop variable expansion, and then you need to add two more backslashes to regex-escape the literal $ char, hence is the "\\\$" in the pattern.
To match any text between two single quotes, you may use [^']+ - 1 or more chars other than '.
See the regex demo
Pattern details
\$this->table\(' - $this->table(' string
\K - match reset operator that discards the text matched so far from the overall match buffer
[^']+ - one or more chars other than '
(?='\);) - a positive lookahead that requires '); string to be present immediately to the right of the current position.
There were multiple issues:
had to use "cat" instead of echo for some reason
used this rule instead:
grep -oP "this->table\('\K\w+(?='\);)"

sed -> replace fixed text and parenthesis from string

How to bring this expression
echo "ObjectId(5e257e424ed10b0015e3e780),'qwe',ObjectId(5e257e424ed10b0015e3e780),()"
to this
5e257e424ed10b0015e3e780,'qwe',5e257e424ed10b0015e3e780,()
using sed?
I use this:
echo "ObjectId(5e257e424ed10b0015e3e780),'qwe',ObjectId(5e257e424ed10b0015e3e780),()" | \
sed 's/ObjectId(\([a-z0-9]\)/\1/'
You may use
sed 's/ObjectId(\([[:alnum:]]*\))/\1/g'
See the online demo
The POSIX BRE pattern means:
ObjectId( - matches a literal string
\([[:alnum:]]*\) - Group 1: zero or more alphanumeric chars
) - a literal ).
The \1 replacement will keep the Group 1 value only.
The g flag will replace all occurrences.

regular expression to extract number from string

I want to extract number from string. This is the string
#all/30
All I want is 30. How can I extract?
I try to use :
echo "#all/30" | sed 's/.*\/([^0-9])\..*//'
But nothing happen.
How should I write for the regular expression?
Sorry for bad english.
You may consider using grep to extract the numbers from a simple string like this.
echo "#all/30" | grep -o '[0-9]\+'
-o option shows only the matching part that matches the pattern.
You could try the below sed command,
$ echo "#all/30" | sed 's/[^0-9]*\([0-9]\+\)[^0-9]*/\1/'
30
[^0-9]* [^...] is a negated character class. It matches any character but not the one inside the negated character class. [^0-9]* matches zero or more non-digit characters.
\([0-9]\+\) Captures one or more digit characters.
[^0-9]* Matches zero or more non-digit characters.
Replacing the matched characters with the chars inside group 1 will give you the number 30
echo "all/30" | sed 's/[^0-9]*\/\([0-9][0-9]*\)/\1/'
Avoid writing '.*' as it consumes entire string. Default matches are always greedy.
echo "all/30" | sed 's/[^0-9]*//g'
# OR
echo "all/30" | sed 's#.*/##'
# OR
echo "all/30" | sed 's#.*\([0-9]*\)#\1#'
without more info about possible input string we can only assume that structure is #all/ followed by the number (only)

Regex to match unique substrings

Here's a basic regex technique that I've never managed to remember. Let's say I'm using a fairly generic regex implementation (e.g., grep or grep -E). If I were to do a list of files and match any that end in either .sty or .cls, how would I do that?
ls | grep -E "\.(sty|cls)$"
\. matches literally a "." - an unescaped . matches any character
(sty|cls) - match "sty" or "cls" - the | is an or and the brackets limit the expression.
$ forces the match to be at the end of the line
Note, you want grep -E or egrep, not grep -e as that's a different option for lists of patterns.
egrep "\.sty$|\.cls$"
This regex:
\.(sty|cls)\z
will match any string ends with .sty or .cls
EDIT:
for grep \z should be replaced with $ i.e.
\.(sty|cls)$
as jelovirt suggested.