RegExp match lines NOT starting with at-symbol [duplicate] - regex

This question already has an answer here:
Regular expression for a string that does not start with a sequence
(1 answer)
Closed 7 years ago.
If I write this regexp (?<=^[\t ]*#).+ I can match only the lines starting with optional spaces (but not newlines) and at-symbol, without matching the at-symbol.
Example:
#test Matches "test", but not the " #".
I'm trying to match lines that it first not space character is not the at-symbol. For that purpose I negate the look-behind, resulting in this: (?<!^[\t ]*#).+.
But it matches lines even if their first non-space character is the at-symbol.
I've tried regexps like these:
^[\t ]*[^#].*,
(?<=^[\t ]*[^#]).+,
(?<=^[\t ]*)(.(?!#)).*.
All of then matches lines even their first non-space character is the at-symbol.
How can I do to match lines not starting with optional spaces (not newlines) and the at-symbol?
matches
matches
m#tches
m#tches
#Doesn't match
#Doesn't match
Thanks!

Your pattern was good except two things:
you need a lookahead (followed with), not a lookbehind
you need to anchor your pattern at the start of the line
So, if you read the text line by line:
^(?![ \t]*#).+
If you read the whole text you need to use the multiline modifier to make ^ to match the start of the line (and not the start of the string by default):
(?m)^(?![ \t]*#).+
(or an other way to switch on this m modifier)

Related

How to fix regex to match the whole word, and not a substring? [duplicate]

This question already has answers here:
Regex.Match whole words
(4 answers)
Regex match entire words only
(7 answers)
Bash regex finding particular words in a sentence
(4 answers)
Closed 1 year ago.
I haven't found any success in fixing this regular expression:
B..y
I am currently searching a text file, its output are the following:
Baby
Babylon
Babyland
eBaby
What should I change in the expression to only output 'Baby' and exclude the other three?
EDIT: What if I have another entry - 'Blay'? I need to get 'Baby' and 'Blay'.
The regex:
\bBaby\b
Test here.
To find both 'Baby' and 'Blay', you need to update the regex to:
\b(Baby|Blay)\b
Test here.
Explanations:
From here about \b:
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.
From here about (Baby|Blay) :
If you want to search for the literal text cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog. If you want more options, simply expand the list: cat|dog|mouse|fish.
The alternation operator has the lowest precedence of all regex operators. That is, it tells the regex engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar. If you want to limit the reach of the alternation, you need to use parentheses for grouping. If we want to improve the first example to match whole words only, we would need to use \b(cat|dog)\b. This tells the regex engine to find a word boundary, then either cat or dog, and then another word boundary. If we had omitted the parentheses then the regex engine would have searched for a word boundary followed by cat, or, dog followed by a word boundary.
In Addition to the Answer of virolino:
The Regex Metacharacter \b matches word boundaries, i.e. between two characters, where one is a word character and the other is not is a word character, plus the start and the end of the string, if the first character (or last respectively) is a word character.
A word character is a match to the \w character class - there seems to be no real consent about what a word character actually is, but [A-Za-z0-9_] seems to be the minimum, hence your example should work with virolinos pattern (\bBaby\b) in any case.
Furthermore the pattern match the following strings
Baby-Boomer
Baby.Feed();
See my fork of virolinos regex test.

RegEx for combining "match everything" and "negative lookahead" [duplicate]

This question already has answers here:
RegExp exclusion, looking for a word not followed by another
(3 answers)
Closed 3 years ago.
I'm trying to match the string "this" followed by anything (any number of characters) except "notthis".
Regex: ^this.*(?!notthis)$
Matches: thisnotthis
Why?
Even its explanation in a regex calculator seems to say it should work. The explanation section says
Negative Lookahead (?!notthis)
Assert that the Regex below does not match
notthis matches the characters notthis literally (case sensitive)
The negative lookahead has no impact in ^this.*(?!notthis)$ because the .* will first match until the end of the string where notthis is not present any more at the end.
I think you meant ^this(?!notthis).*$ where you match this from the start of the string and then check what is directly on the right can not be notthis
If that is the case, then match any character except a newline until the end of the string.
^this(?!notthis).*$
Details of the pattern
^ Assert start of the string
this Match this literally
(?!notthis)Assert what is directly on the right is notnotthis`
.* Match 0+ times any char except a newline
$ Assert end of the string
Regex demo
If notthis can not be present in the string instead of directly after this you could add .* to the negative lookahead:
^this(?!.*notthis).*$
^^
Regex demo
See it in a regulex visual
Because of the order of your rules. Before your expression would get to negative lookahead, prior rules has been fulfilled, there is nothing left to match.
If you wish to match everything after this, except for notthis, this RegEx might also help you to do so:
^this([\s\S]*?)(notthis|())$
which creates an empty group () for nothing, with an OR to ignore notthis:
^this([\s\S]*?)(notthis|())$
You might remove (), ^ and $, and it may still work:
this([\s\S]*?)(notthis|)

Regex for string containing one string, but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101

Ignore specific lines when matching with a regex

I'm trying to make a regex that matches a specific pattern, but I want to ignore lines starting with a #. How do I do it?
Let's say i have the pattern (?i)(^|\W)[a-z]($|\W)
It matches all lines with a single occurance of a letter. It matches these lines for instance:
asdf e asdf
j
kke o
Now I want to override this so that it does not match lines starting with a #
EDIT:
I was not specific enough. My real pattern is more complicated. It looks a bit like this: (?i)(^|\W)([a-hj-z]|lala|bwaaa|foo($|\W)
It should be used kind of like I want to block offensive language, if a line does not start with a hash, in which case it should override.
This is what you are looking for
^(?!#).+$
^ marks the beginning of line and $ marks the end of line(in multiline mode)
.+ would match 1 to many characters
(?!#) is a lookahead which would match further only if the line doesn't start with #
This regex will match any word character \w not preceeded by a #:
^(?<!#)\w+$
It performs a negative lookbehind at the start of the string and then follows it with 1 or more word characters.

Positive Lookahead Regex

I have the following regex:
^(?=.{8}$).+
The way I understand this is it will accept 8 of any type of character, followed by 1 or more of any character. I feel I am not grasping how a Positive Lookahead works. Because both sections of the Regex are looking for '.' wouldn't any series of characters fit this?
My question is, how does the positive lookahead effect this regex and what is an example of a matching string?
The following did not match when supplied in the following regex tool:
123456781
(12345678)1
(12345678)
(abcdefgh)a
(abcdefgh)
abc
123
EDIT: Removed first two data entries as I clearly wasn't using the regex tool correctly as they now match with exactly 8 characters.
^(?=.{8}$).+
will match the string
aaaaaaaa
Reasoning:
The content inside of the brackets is a lookahead, since it starts with ?=.
The content inside of a lookahead is parsed - it is not interpreted literally.
Thus, the lookahead only allows the regex to match if .{8}$ would match (at the start of the string, in this case). So the string has to be exactly eight characters then it has to end, as evidenced by $.
Then .+ will match those eight characters.
It is trying to match:
^ # start of line, but...
(?=.{8}$) # only if it precedes exactly 8 characters and the end of line
.+ # this one matches those 8 characters
and from your input, it should also match these (try this engine with match at line breaks checked):
12345678
abcdefgh
Matching 12345678 works in ruby:
'12345678' =~ /^(?=.{8}$).+/
=> 0
Maybe your test site don't support look ahead on regexps?