Regex / grep match on: "not this" and "that" - regex

From a Linux command line, I would like to find all the instances in multiple files where I do not reference a figure reference with Fig..
So I'm looking each line for when I don't preface \ref{fig with exactly Fig. .
Fig. \ref{fig:myFigure}
A sentence with Fig. \ref{fig:myFigure} there.
\ref{fig:myFigure}
A sentence with \ref{fig:myFigure} there.
The regex should ignore cases (1) and (2), but find cases (3) and (4).

You can use Negative Lookahead like:
^((?!Fig\. {0,1}\\ref\{fig).)*$
https://regex101.com/r/wSw9iI/2
Negative Lookahead (?!Fig\.\s*\\ref\{fig)
Assert that the Regex below does not match
Fig matches the characters Fig literally (case sensitive)
\. matches the character . literally (case sensitive)
\s* matches any whitespace character (equal to [\r\n\t\f\v ])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\\ matches the character \ literally (case sensitive)
ref matches the characters ref literally (case sensitive)
\{ matches the character { literally (case sensitive)
fig matches the characters fig literally (case sensitive)

Related

Google Sheets Regex Match all URL between two strings (In a paragraph)

I have try to build my regex, to using Google Sheet to extract the domain url from any paragraph:
Website: https://www.interprism.co.jp/ => interprism.co.jp
Website: https://growupwork.com => growupwork.com
Email: contact#interprism.com website: None => interprism.com
HP: onetech.jp => onetech.jp
Web:interprism.jp/index.html => interprism.jp
I have tried with this look ok, =iferror(regexextract(A11,".+?[#|www.](.*\n?)( )")) but not match all case, any one can help me on this?
Best Regards
Nim
You can try:
=iferror(regexextract(A11,"(?:https?:\/\/)?(?:[^#]+#)?(?:www\.)?([^:\/?]+)"))
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
http matches the characters http literally (case sensitive)
s? matches the character s literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
: matches the character : literally (case sensitive)
\/ matches the character / literally (case sensitive)
Non-capturing group (?:[^#]+#)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
Match a single character not present in the list below [^#]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
# matches the character # literally (case sensitive)
Non-capturing group (?:www\.)?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
www matches the characters www literally (case sensitive)
\. matches the character . literally (case sensitive)
1st Capturing Group ([^:\/?]+)
Match a single character not present in the list below [^:\/?]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
: matches the character : literally (case sensitive)
\/ matches the character / literally (case sensitive)
? matches the character ? literally (case sensitive)

How to match string starting from multiple characters (can be nested) via regex?

Im trying to make regex to get domain from different kinds of url.
Im using regex, that works properly either with links w/o # in domain part, e.g:
https://stackoverflow.com/questions/ask
https://regexr.com/
/(?<=(\/\/))[^\n|\/|:]+/g
For links with # (e.g. http://regex#regex.com) works with replasing \/\/ to \#:/(?<=(#))[^\n|\/|:]+/g
But when im trying to make regex to match both of these cases and making
/(?<=((\/\/)|(\#)))[^\n|\/|:]+/g
it dosen't work.
You should look for the string ://,(Positive Look Behind) if it comes in the string means it is domain and you need to capture everything after that. Whether it has # or not.
Case 1
Capture the whole string after ://
Regex:
(?<=\:\/\/).*
Explanation:
Positive Lookbehind (?<=\:\/\/)
Assert that the Regex below matches
\: matches the character : literally (case sensitive)
\/ matches the character / literally (case sensitive)
\/ matches the character / literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Example
https://regex101.com/r/jsqqw8/1/
Case 2
Capture just the domain after ://
Regex:
(?<=:\/\/)[^\n|\/|:]+
Explanation:
Positive Lookbehind (?<=:\/\/)
Assert that the Regex below matches
: matches the character : literally (case sensitive)
\/ matches the character / literally (case sensitive)
\/ matches the character / literally (case sensitive)
Match a single character not present in the list below [^\n|\/|:]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\n matches a line-feed (newline) character (ASCII 10)
| matches the character | literally (case sensitive)
\/ matches the character / literally (case sensitive)
|: matches a single character in the list |: (case sensitive)
Case 3:
Capture domain after :// if there is not # in the text and if # present in the text, capture text after that.
Regex:
(?!:\/\/)(?:[A-z]+\.)*[A-z][A-z]+\.[A-z]{2,}
Explanation:
Negative Lookahead (?!:\/\/)
Assert that the Regex below does not match
: matches the character : literally (case sensitive)
\/ matches the character / literally (case sensitive)
\/ matches the character / literally (case sensitive)
Non-capturing group (?:[A-z]+\.)*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below [A-z]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
A-z a single character in the range between A (index 65) and z (index 122) (case sensitive)
\. matches the character . literally (case sensitive)
Match a single character present in the list below [A-z]
A-z a single character in the range between A (index 65) and z (index 122) (case sensitive)
Match a single character present in the list below [A-z]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
A-z a single character in the range between A (index 65) and z (index 122) (case sensitive)
\. matches the character . literally (case sensitive)
Match a single character present in the list below [A-z]{2,}
{2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed (greedy)
A-z a single character in the range between A (index 65) and z (index 122) (case sensitive)
Example:
https://regex101.com/r/jsqqw8/4

Validating emails in file with batch

I have a file with emails and I need to validate them.
The sequence is:
First name.
Dot.
Last name.
Number (optional - for same names).
static string domain(#utp.ac.pa).
I wrote this:
egrep -E [a-z]\.+[a-z][0-9]*#["utp.ac.pa"] test.txt
It should match this email: "anell.zheng#utp.ac.pa"
But it is also matching:
test4#utp.ac.pa
2anell#utp.ac.pa
Although they don't follow the sequence. What am I doing wrong?
Your regex doesn't even match the first email. If I understand your requirements correctly, this should work:
[A-Za-z]+\.[A-Za-z]+[0-9]*#utp\.ac\.pa
Note that to match a dot, it needs to be escaped (i.e., \.) because . matches any character.
You can get rid of A-Z if you don't want to match upper-case letters.
Try it online.
Let me know if this isn't what you want.
Regex: ^[A-Za-z]+\.[A-Za-z]+(?:_\d+)*#utp\.ac\.pa$
Demo
Regex Details:
^ asserts position at start of a line
Match a single character present in the list below [A-Za-z]+
. matches the character . literally (case sensitive)
Match a single character present in the list below [A-Za-z]+
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Non-capturing group (?:_\d+)*
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
_ matches the character _ literally (case sensitive)
\d+ matches a digit (equal to [0-9])
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
#utp matches the characters #utp literally (case sensitive)
. matches the character . literally (case sensitive)
ac matches the characters ac literally (case sensitive)
. matches the character . literally (case sensitive)
pa matches the characters pa literally (case sensitive)
$ asserts position at the end of a line

BASH -- grep'ing for perl-regex [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Beginner here and I'm trying to understand this. Can someone please break down the part in between the single quotes and describe what it does?
grep -oP '(?<=\S\/1\.\d.\s)[345]\d+'
Many thanks in advance!
Positive Lookbehind (?<=\S/1.\d.\s) Assert that the Regex below matches
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\/ matches the character / literally (case sensitive)
1 matches the character 1 literally (case sensitive)
\. matches the character . literally (case sensitive)
\d matches a digit (equal to [0-9])
. matches any character (except for line terminators)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
Match a single character present in the list below [345]
345 matches a single character in the list 345 (case sensitive)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Output simply copied from https://regex101.com/r/HfJSNm/1 : very handy to test/share/have automatic explications on regexes.

regex: extract text blocks, defined beginning, undefined end

i have text like this:
Date: 01.02.2015 //<-stable format
something
something more
some random more
Date: 02.02.2015
something random
i dont know
so i have many such blocks. Starts with Date... ends with next Date... start.
The text in the lines in the block could be anything, but not Date... format
I need an array at the end, with such blocks:
array[0] = "Date: 01.02.2015
something
something more
some random more"
array[1] = "Date: 02.02.2015
something random
i dont know"
for now i add some unique splitter before Date... than split by the splitter.
Question: is it possible to get such blocks only by regex?
(i use VBA to parse the text, RegExp object)
Instead of split just match using
\bDate:\s\d{1,2}\.\d{1,2}\.\d{4}[\s\S]*?(?=\nDate:|$)
See demo.
https://regex101.com/r/uF4oY4/77
Syntax explanation (from the linked site):
\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
Date: matches the characters Date: literally (case sensitive)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\d{1,2} matches a digit (equal to [0-9]) between 1 and 2 times, as many times as possible, giving back as needed (greedy)
. matches the character . literally (case sensitive)
\d{1,2} matches a digit (equal to [0-9]) between 1 and 2 times, as many times as possible, giving back as needed (greedy)
. matches the character . literally (case sensitive)
\d{4} matches a digit (equal to [0-9]) exactly 4 times
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy) , what specified in previous brackets
?= Positive Lookahead - Assert that the following Regex matches
\nDate Option 1
\n matches a line-feed (newline) character (ASCII 10)
Date matches the characters Date: literally (case sensitive)
$: Option 2 - $ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)