How to extract with sed a pattern starting with '('

How to extract with sed a pattern starting with '(' - regex

The output of xdpyinfo | grep dimensions is
dimensions: 2560x1600 pixels (676x423 millimeters)
Piping it through sed -r 's/^[^0-9]*([0-9]+x[0-9]+).*$/\1/') does extract the dimensions in pixels (2560x1600), but won't work with the opening parenthesis.
How to get the dimensions in milimiters (i.e. 646x423) with sed?

You may use
sed -r 's/.*\(([0-9]+x[0-9]+).*/\1/'
See an online demo.
Details
.* - any 0+ chars as many as possible
\( - a literal ( (in POSIX ERE flavor you are using with -r)
([0-9]+x[0-9]+) - Group 1 (later referred to with \1 backreference): 1+ digits, x, 1+digits
.* - any 0+ chars as many as possible
Note you actually can omit both ^ and $ here since there is a single whole line match with sed.
And here is an equivalent solution using a POSIX BRE regex:
sed 's/.*(\([0-9][0-9]*x[0-9][0-9]*\).*/\1/'
Note that a ( denotes a literal ( char in POSIX BRE, and \(...\) defines a capturing group here. Since + quantifier is not supported by POSIX BRE, you may just use [0-9][0-9]* instead (1 digit and 0+ digits).

If you want to extract it in single command then you can just replace your existing grep command with this gnu grep using match reset regex i.e. \K:
xdpyinfo | grep -oP 'dimensions:.*\(\K\d+x\d+'
676x423
Above will require gnu grep. If that is not available to you then you can use this grep piped with another grep:
xdpyinfo | grep -oE 'dimensions:.*\([0-9]+x[0-9]+' | grep -oE '[0-9]+x[0-9]+$'
If you have to use sed, then use a single sed command as this and cut down grep:
xdpyinfo | sed -nE '/dimensions/{s/.*\(([0-9]+x[0-9]+).*/\1/p;q;}'

Related

sed replaces word boundary with two periods/fullstops

why would sed replace the word boundary with three periods/fullstops instead of one?
echo " 0/1:53,0,56:5:3,2 0/0:0,18,155:6:6,0 0/0:0,35,255:23:22,1" | sed 's/:[0-9,]\+\b/\./g'
returns 0/1... 0/0... 0/0...
This happens even when I use \> instead of \b for word boundary.
I'm running on
Operating System: Ubuntu 16.04.7 LTS
Kernel: Linux 4.15.0-128-generic

You get multiple dots because you have : in the number sequence as well. Do this:
$ echo " 0/1:53,0,56:5:3,2 0/0:0,18,155:6:6,0 0/0:0,35,255:23:22,1" | sed 's/:[0-9,:]\+/./g'
0/1. 0/0. 0/0.
In other words scan over [0-9,:]\+ instead of [0-9,]\+. Also there is no need to escape the dot in the replacement part.

You want to remove all non-space chars after each :, so use [^ ]* POSIX BRE pattern instead of [0-9,]\+:
echo " 0/1:53,0,56:5:3,2 0/0:0,18,155:6:6,0 0/0:0,35,255:23:22,1" | \
sed 's/:[^ ]*/./g'
# => 0/1. 0/0. 0/0.
See the online sed demo.
If there can be any whitespace, use sed 's/:[^[:space:]]*/./g'.
Note you do not need to escape the dot in the replacement pattern, it is a literal . char there.

Regexp or Grep in Bash

Can you please tell me how to get the token value correctly? At the moment I am getting: "1jdq_dnkjKJNdo829n4-xnkwe",258],["FbtResult
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | sed -n 's/.*"token":\([^}]*\)\}/\1/p'

You need to match the full string, and to get rid of double quotes, you need to match a " before the token and use a negated bracket expression [^"] instead of [^}]:
sed -n 's/.*"token":"\([^"]*\).*/\1/p'
Details:
.* - any zero or more chars
"token":" - a literal "token":" string
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
.* - any zero or more chars.

This replacement works:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult'
| sed -n 's/.*"token":"\([a-z]*\)"\}.*/\1/p'
Key capture after "token" found between quotes via \([a-z]*\), followed by a closing brace \} and remaining characters after that as .* (you were missing this part before, which caused the replacement to include the text after keyword as well).
Output:
aaaaaaa

A grep solution:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | grep -Po '(?<="token":")[^"]+'
yields
aaaaaaa
The -P option to grep enables the Perl-compatible regex (PCRE).
The -o option tells grep to print only the matched substring, not the entire line.
The regex (?<="token":") is a PCRE-specific feature called a zero-width positive lookbehind assertion. The expression (?<=pattern) matches a pattern without including it in the matched result.

And condition usage on grep command

I have two regular expression, and trying to bind it into And condition
what I got
-grep -E "/[1-9]{4,}/" file
-grep -E '([0-9])(.*\1){3}' file
I tried to take a regular expression from each command, then bind it with multiple grep with pipe
cat file | grep pattern1 | grep patterns
, but didn't work.
anyone can teach me way to use and condition for grep with these two patterns?
"/[1-9]{4,}/" '([0-9])(.*\1){3}'
sample input
Q4HXD/7100525/+wg4C54V2I4mh4Xh
aaaa/123/422444qjem,,qewriiafa
!##AVADFQWERASDFASDFQervzxcilh
expected output
Q4HXDa /7100525/+wg4C54V2I4mh4Xh
which satisfy both condition

You need to use [0-9] or [[:digit:]] to match any digit in a POSIX pattern and make sure both patterns are handled as POSIX ERE by passing -E option:
cat file | grep -E '/[0-9]{4,}/' | grep -E '([0-9])(.*\1){3}'
Else, you may use a PCRE pattern like
grep -P '^(?=.*/[0-9]{4,}/).*([0-9])(.*\1){3}' file
See an online grep demo
The latter pattern matches
^ - start of a string
(?=.*/[0-9]{4,}/) - a positive lookahead that makes sure there is /, 4 or more digits, / after any 0+ chars other than line break chars
.* - any 0+ chars other than line break chars, as many as possible
([0-9]) - Group 1: any digit
(.*\1){3} - three occurrences of any 0+ chars other than line break chars, as many as possible, and then the Group 1 value.

Regex matches but sed fails replace

I am having a tricky regex issue
I have the string like below
some_Name _ _Bday Date Comm.txt
And here is my regex to match the spaces and underscore
\_?\s\_?
Now when i try to replace the string using sed and the above regex
echo "some_Name _ _Bday Date Comm.txt" | sed 's/\_?\s\_?/\_/g'
The output i want is
some_Name_Bday_Date_Comm.txt
Any ideas on how do i go about this ?

You are using a POSIX BRE regex engine with the \_?\s\_? pattern that matches a _?, a whitespace (if your sed supports \s shorthand) an a _? substring, i.e. the ? are treated as literal question mark symbols.
You may use
sed -E 's/[[:space:]_]+/_/g'
sed 's/[[:space:]_]\{1,\}/_/g'
See online sed demo
The [[:space:]_]+ POSIX ERE pattern (enabled with -E option) will match one or more whitespace or underscore characters.
The POSIX ERE + quantifier can be written as \{1,\} in POSIX BRE. Also, if you use a GNU sed, you may use \+ in the second sed command.

This might work for you (GNU sed):
sed -E 's/\s(\s*_)*/_/g' file
This will replace a space followed by zero or more of the following: zero or more spaces followed by an underscore.

Regex to match unique substrings

Here's a basic regex technique that I've never managed to remember. Let's say I'm using a fairly generic regex implementation (e.g., grep or grep -E). If I were to do a list of files and match any that end in either .sty or .cls, how would I do that?

ls | grep -E "\.(sty|cls)$"
\. matches literally a "." - an unescaped . matches any character
(sty|cls) - match "sty" or "cls" - the | is an or and the brackets limit the expression.
$ forces the match to be at the end of the line
Note, you want grep -E or egrep, not grep -e as that's a different option for lists of patterns.

egrep "\.sty$|\.cls$"

This regex:
\.(sty|cls)\z
will match any string ends with .sty or .cls
EDIT:
for grep \z should be replaced with $ i.e.
\.(sty|cls)$
as jelovirt suggested.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract with sed a pattern starting with '(' - regex

Related

sed replaces word boundary with two periods/fullstops

Regexp or Grep in Bash

And condition usage on grep command

Regex matches but sed fails replace

Regex to match unique substrings

Categories

Resources