sed -> replace fixed text and parenthesis from string - regex

How to bring this expression
echo "ObjectId(5e257e424ed10b0015e3e780),'qwe',ObjectId(5e257e424ed10b0015e3e780),()"
to this
5e257e424ed10b0015e3e780,'qwe',5e257e424ed10b0015e3e780,()
using sed?
I use this:
echo "ObjectId(5e257e424ed10b0015e3e780),'qwe',ObjectId(5e257e424ed10b0015e3e780),()" | \
sed 's/ObjectId(\([a-z0-9]\)/\1/'

You may use
sed 's/ObjectId(\([[:alnum:]]*\))/\1/g'
See the online demo
The POSIX BRE pattern means:
ObjectId( - matches a literal string
\([[:alnum:]]*\) - Group 1: zero or more alphanumeric chars
) - a literal ).
The \1 replacement will keep the Group 1 value only.
The g flag will replace all occurrences.

Related

Regex to match exact version phrase

I have versions like:
v1.0.3-preview2
v1.0.3-sometext
v1.0.3
v1.0.2
v1.0.1
I am trying to get the latest version that is not preview (doesn't have text after version number) , so result should be:
v1.0.3
I used this grep: grep -m1 "[v\d+\.\d+.\d+$]"
but it still outputs: v1.0.3-preview2
what I could be missing here?
To return first match for pattern v<num>.<num>.<num>, use:
grep -m1 -E '^v[0-9]+(\.[0-9]+){2}$' file
v1.0.3
If you input file is unsorted then use grep | sort -V | head as:
grep -E '^v[0-9]+(\.[0-9]+){2}$' f | sort -rV | head -1
When you use ^ or $ inside [...] they are treated a literal character not the anchors.
RegEx Details:
^: Start
v: Match v
[0-9]+: Match 1+ digits
(\.[0-9]+){2}: Match a dot followed by 1+ dots. Repeat this group 2 times
$: End
To match the digits with grep, you can use
grep -m1 "v[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" file
Note that you don't need the [ and ] in your pattern, and to escape the dot to match it literally.
With awk you could try following awk code.
awk 'match($0,/^v[0-9]+(\.[0-9]+){2}$/){print;exit}' Input_file
Explanation of awk code: Simple explanation of awk program would be, using match function of awk to match regex to match version, once match is found print the matched value and exit from program.
Regular expressions match substrings, not whole strings. You need to explicitly match the start (^) and end ($) of the pattern.
Keep in mind that $ has special meaning in double quoted strings in shell scripts and needs to be escaped.
The boundary characters need to be outside of any group ([]).

Chage text in first bracket that are succeded by third/square brackets

I want to append some text (.html) at the end of line in format [some_text](some/other/text); basically links in markdown syntax.
Example
[Test](test/link) would be [Test](test/link.html)
[Test](test/link1/link) would be [Test](test/link1/link.html)
[Test] would be [Test]
(test) would be (test)
So I was trying out unix sed with syntax: sed -i 's/\[*\](*)/.html)/g' filename.md. The said sed syntax is wrong and not working, can someone help? I'm open to using other tools like awk or perl, it is appropriate for this scenario.
Solution: sed -i 's/\(\[[^][]*]([^()]*\))/\1.html)/g' filename.md
Suggested by #WiktorStribiżew
Based on given samples:
$ cat ip.txt
[Test](test/link)
[Test](test/link1/link)
[Test]
(test)
# if closing ) should also be matched: sed -E 's/(\[[^]]+]\([^)]+)\)/\1.html)/'
$ sed -E 's/\[[^]]+]\([^)]+/&.html/' ip.txt
[Test](test/link.html)
[Test](test/link1/link.html)
[Test]
(test)
\[ match [
[^]]+ match one or more non ] characters
]\( match ](
[^)]+ match one or more non ) characters
& backreferences entire matched portion
\1 backreferences portion matched by first capture group
With perl:
perl -pe 's/\[[^]]+]\([^)]+\K(?=\))/.html/'
\K helps to avoid capturing the text matched until that point
(?=\)) is a lookahead assertion to match ) character, this also is not part of the matched portion
Add -i option for either solutions once it is working as expected.
You can use
sed -i 's/\(\[[^][]*]([^()]*\))/\1.html)/g' filename.md
See the online demo.
The regex is a POSIX BRE expression that matches
\(\[[^][]*]([^()]*\) - Group 1:
\[ - a [ char
[^][]* - zero or more chars other than [ and ]
] - a ] char
( - a ( char
[^()]* - zero or more char sother than ( and )
) - a ) char.
The -i option makes the replacements in the same file provided as input file in a GNU sed. g flag will look for all matches on the lines.

Regexp or Grep in Bash

Can you please tell me how to get the token value correctly? At the moment I am getting: "1jdq_dnkjKJNdo829n4-xnkwe",258],["FbtResult
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | sed -n 's/.*"token":\([^}]*\)\}/\1/p'
You need to match the full string, and to get rid of double quotes, you need to match a " before the token and use a negated bracket expression [^"] instead of [^}]:
sed -n 's/.*"token":"\([^"]*\).*/\1/p'
Details:
.* - any zero or more chars
"token":" - a literal "token":" string
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
.* - any zero or more chars.
This replacement works:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult'
| sed -n 's/.*"token":"\([a-z]*\)"\}.*/\1/p'
Key capture after "token" found between quotes via \([a-z]*\), followed by a closing brace \} and remaining characters after that as .* (you were missing this part before, which caused the replacement to include the text after keyword as well).
Output:
aaaaaaa
A grep solution:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | grep -Po '(?<="token":")[^"]+'
yields
aaaaaaa
The -P option to grep enables the Perl-compatible regex (PCRE).
The -o option tells grep to print only the matched substring, not the entire line.
The regex (?<="token":") is a PCRE-specific feature called a zero-width positive lookbehind assertion. The expression (?<=pattern) matches a pattern without including it in the matched result.

Extract capture group only from string

I have the following rule:
https://regex101.com/r/noX9lj/4
I want to make this work in a script so I'm using grep like this:
echo "\$this->table('test')" | grep -Po "qr/\$this->table\(\'(test)\'\);/"
The output should be "test"
It's not working, not sure why..
You may use
echo "\$this->table('test');" | grep -oP "\\\$this->table\\('\\K[^']+(?='\\);)"
Or, if you feed a file path to grep:
grep -oP "\\\$this->table\\('\\K[^']+(?='\\);)" file
See the online grep demo
To match $, you need to escape it with a literal backslash, and inside a double quoted string, you need to escape $ itself with one backslash char in order to stop variable expansion, and then you need to add two more backslashes to regex-escape the literal $ char, hence is the "\\\$" in the pattern.
To match any text between two single quotes, you may use [^']+ - 1 or more chars other than '.
See the regex demo
Pattern details
\$this->table\(' - $this->table(' string
\K - match reset operator that discards the text matched so far from the overall match buffer
[^']+ - one or more chars other than '
(?='\);) - a positive lookahead that requires '); string to be present immediately to the right of the current position.
There were multiple issues:
had to use "cat" instead of echo for some reason
used this rule instead:
grep -oP "this->table\('\K\w+(?='\);)"

How to extract with sed a pattern starting with '('

The output of xdpyinfo | grep dimensions is
dimensions: 2560x1600 pixels (676x423 millimeters)
Piping it through sed -r 's/^[^0-9]*([0-9]+x[0-9]+).*$/\1/') does extract the dimensions in pixels (2560x1600), but won't work with the opening parenthesis.
How to get the dimensions in milimiters (i.e. 646x423) with sed?
You may use
sed -r 's/.*\(([0-9]+x[0-9]+).*/\1/'
See an online demo.
Details
.* - any 0+ chars as many as possible
\( - a literal ( (in POSIX ERE flavor you are using with -r)
([0-9]+x[0-9]+) - Group 1 (later referred to with \1 backreference): 1+ digits, x, 1+digits
.* - any 0+ chars as many as possible
Note you actually can omit both ^ and $ here since there is a single whole line match with sed.
And here is an equivalent solution using a POSIX BRE regex:
sed 's/.*(\([0-9][0-9]*x[0-9][0-9]*\).*/\1/'
Note that a ( denotes a literal ( char in POSIX BRE, and \(...\) defines a capturing group here. Since + quantifier is not supported by POSIX BRE, you may just use [0-9][0-9]* instead (1 digit and 0+ digits).
If you want to extract it in single command then you can just replace your existing grep command with this gnu grep using match reset regex i.e. \K:
xdpyinfo | grep -oP 'dimensions:.*\(\K\d+x\d+'
676x423
Above will require gnu grep. If that is not available to you then you can use this grep piped with another grep:
xdpyinfo | grep -oE 'dimensions:.*\([0-9]+x[0-9]+' | grep -oE '[0-9]+x[0-9]+$'
If you have to use sed, then use a single sed command as this and cut down grep:
xdpyinfo | sed -nE '/dimensions/{s/.*\(([0-9]+x[0-9]+).*/\1/p;q;}'