And condition usage on grep command - regex

I have two regular expression, and trying to bind it into And condition
what I got
-grep -E "/[1-9]{4,}/" file
-grep -E '([0-9])(.*\1){3}' file
I tried to take a regular expression from each command, then bind it with multiple grep with pipe
cat file | grep pattern1 | grep patterns
, but didn't work.
anyone can teach me way to use and condition for grep with these two patterns?
"/[1-9]{4,}/" '([0-9])(.*\1){3}'
sample input
Q4HXD/7100525/+wg4C54V2I4mh4Xh
aaaa/123/422444qjem,,qewriiafa
!##AVADFQWERASDFASDFQervzxcilh
expected output
Q4HXDa /7100525/+wg4C54V2I4mh4Xh
which satisfy both condition

You need to use [0-9] or [[:digit:]] to match any digit in a POSIX pattern and make sure both patterns are handled as POSIX ERE by passing -E option:
cat file | grep -E '/[0-9]{4,}/' | grep -E '([0-9])(.*\1){3}'
Else, you may use a PCRE pattern like
grep -P '^(?=.*/[0-9]{4,}/).*([0-9])(.*\1){3}' file
See an online grep demo
The latter pattern matches
^ - start of a string
(?=.*/[0-9]{4,}/) - a positive lookahead that makes sure there is /, 4 or more digits, / after any 0+ chars other than line break chars
.* - any 0+ chars other than line break chars, as many as possible
([0-9]) - Group 1: any digit
(.*\1){3} - three occurrences of any 0+ chars other than line break chars, as many as possible, and then the Group 1 value.

Related

Regex to match exact version phrase

I have versions like:
v1.0.3-preview2
v1.0.3-sometext
v1.0.3
v1.0.2
v1.0.1
I am trying to get the latest version that is not preview (doesn't have text after version number) , so result should be:
v1.0.3
I used this grep: grep -m1 "[v\d+\.\d+.\d+$]"
but it still outputs: v1.0.3-preview2
what I could be missing here?
To return first match for pattern v<num>.<num>.<num>, use:
grep -m1 -E '^v[0-9]+(\.[0-9]+){2}$' file
v1.0.3
If you input file is unsorted then use grep | sort -V | head as:
grep -E '^v[0-9]+(\.[0-9]+){2}$' f | sort -rV | head -1
When you use ^ or $ inside [...] they are treated a literal character not the anchors.
RegEx Details:
^: Start
v: Match v
[0-9]+: Match 1+ digits
(\.[0-9]+){2}: Match a dot followed by 1+ dots. Repeat this group 2 times
$: End
To match the digits with grep, you can use
grep -m1 "v[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" file
Note that you don't need the [ and ] in your pattern, and to escape the dot to match it literally.
With awk you could try following awk code.
awk 'match($0,/^v[0-9]+(\.[0-9]+){2}$/){print;exit}' Input_file
Explanation of awk code: Simple explanation of awk program would be, using match function of awk to match regex to match version, once match is found print the matched value and exit from program.
Regular expressions match substrings, not whole strings. You need to explicitly match the start (^) and end ($) of the pattern.
Keep in mind that $ has special meaning in double quoted strings in shell scripts and needs to be escaped.
The boundary characters need to be outside of any group ([]).

Regexp or Grep in Bash

Can you please tell me how to get the token value correctly? At the moment I am getting: "1jdq_dnkjKJNdo829n4-xnkwe",258],["FbtResult
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | sed -n 's/.*"token":\([^}]*\)\}/\1/p'
You need to match the full string, and to get rid of double quotes, you need to match a " before the token and use a negated bracket expression [^"] instead of [^}]:
sed -n 's/.*"token":"\([^"]*\).*/\1/p'
Details:
.* - any zero or more chars
"token":" - a literal "token":" string
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
.* - any zero or more chars.
This replacement works:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult'
| sed -n 's/.*"token":"\([a-z]*\)"\}.*/\1/p'
Key capture after "token" found between quotes via \([a-z]*\), followed by a closing brace \} and remaining characters after that as .* (you were missing this part before, which caused the replacement to include the text after keyword as well).
Output:
aaaaaaa
A grep solution:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | grep -Po '(?<="token":")[^"]+'
yields
aaaaaaa
The -P option to grep enables the Perl-compatible regex (PCRE).
The -o option tells grep to print only the matched substring, not the entire line.
The regex (?<="token":") is a PCRE-specific feature called a zero-width positive lookbehind assertion. The expression (?<=pattern) matches a pattern without including it in the matched result.

Search with grep only lines that start with #

before i get my ass kicked, I want you to know that I checked several documents on "grep" and I couldn't find what I'm looking for or maybe my English is too limited to get the idea.
I have a lot of markdown documents. Each document contain a first level heading (#) which is always on line 1.
I can search for ^# and that works, but how can I tell grep to look for certain words on the line that starts with #?
I want this this
grep 'some words' file.markdown
But also specify that the line starts with a #.
You may use
grep '^# \([^ ].*\)\{0,1\}some words' file.markdown
Or, using ERE syntax
grep -E '^# ([^ ].*)?some words' file.markdown
Details
^ - start of a line
# - a # char
\([^ ].*\)\{0,1\} - an optional sequence of patterns (a \(...\) is a capturing group in BRE syntax, in ERE, it is (...)) (\{0,1\} is an interval quantifier that repeats the pattern it modifies 1 or 0 times):
[^ ] - any char but a space
.* - any 0+ chars
some words - some words text.
See an online grep demo:
s="# Get me some words here
#some words here I don't want
# some words here I need"
grep '^# \([^ ].*\)\{0,1\}some words' <<< "$s"
# => # Get me some words here
# # some words here I need

How to extract with sed a pattern starting with '('

The output of xdpyinfo | grep dimensions is
dimensions: 2560x1600 pixels (676x423 millimeters)
Piping it through sed -r 's/^[^0-9]*([0-9]+x[0-9]+).*$/\1/') does extract the dimensions in pixels (2560x1600), but won't work with the opening parenthesis.
How to get the dimensions in milimiters (i.e. 646x423) with sed?
You may use
sed -r 's/.*\(([0-9]+x[0-9]+).*/\1/'
See an online demo.
Details
.* - any 0+ chars as many as possible
\( - a literal ( (in POSIX ERE flavor you are using with -r)
([0-9]+x[0-9]+) - Group 1 (later referred to with \1 backreference): 1+ digits, x, 1+digits
.* - any 0+ chars as many as possible
Note you actually can omit both ^ and $ here since there is a single whole line match with sed.
And here is an equivalent solution using a POSIX BRE regex:
sed 's/.*(\([0-9][0-9]*x[0-9][0-9]*\).*/\1/'
Note that a ( denotes a literal ( char in POSIX BRE, and \(...\) defines a capturing group here. Since + quantifier is not supported by POSIX BRE, you may just use [0-9][0-9]* instead (1 digit and 0+ digits).
If you want to extract it in single command then you can just replace your existing grep command with this gnu grep using match reset regex i.e. \K:
xdpyinfo | grep -oP 'dimensions:.*\(\K\d+x\d+'
676x423
Above will require gnu grep. If that is not available to you then you can use this grep piped with another grep:
xdpyinfo | grep -oE 'dimensions:.*\([0-9]+x[0-9]+' | grep -oE '[0-9]+x[0-9]+$'
If you have to use sed, then use a single sed command as this and cut down grep:
xdpyinfo | sed -nE '/dimensions/{s/.*\(([0-9]+x[0-9]+).*/\1/p;q;}'

Regex to match unique substrings

Here's a basic regex technique that I've never managed to remember. Let's say I'm using a fairly generic regex implementation (e.g., grep or grep -E). If I were to do a list of files and match any that end in either .sty or .cls, how would I do that?
ls | grep -E "\.(sty|cls)$"
\. matches literally a "." - an unescaped . matches any character
(sty|cls) - match "sty" or "cls" - the | is an or and the brackets limit the expression.
$ forces the match to be at the end of the line
Note, you want grep -E or egrep, not grep -e as that's a different option for lists of patterns.
egrep "\.sty$|\.cls$"
This regex:
\.(sty|cls)\z
will match any string ends with .sty or .cls
EDIT:
for grep \z should be replaced with $ i.e.
\.(sty|cls)$
as jelovirt suggested.