Ignore specific lines when matching with a regex - regex

I'm trying to make a regex that matches a specific pattern, but I want to ignore lines starting with a #. How do I do it?
Let's say i have the pattern (?i)(^|\W)[a-z]($|\W)
It matches all lines with a single occurance of a letter. It matches these lines for instance:
asdf e asdf
j
kke o
Now I want to override this so that it does not match lines starting with a #
EDIT:
I was not specific enough. My real pattern is more complicated. It looks a bit like this: (?i)(^|\W)([a-hj-z]|lala|bwaaa|foo($|\W)
It should be used kind of like I want to block offensive language, if a line does not start with a hash, in which case it should override.

This is what you are looking for
^(?!#).+$
^ marks the beginning of line and $ marks the end of line(in multiline mode)
.+ would match 1 to many characters
(?!#) is a lookahead which would match further only if the line doesn't start with #

This regex will match any word character \w not preceeded by a #:
^(?<!#)\w+$
It performs a negative lookbehind at the start of the string and then follows it with 1 or more word characters.

Related

Notepad++ remove linebreak in between two specific strings

I have something like this
\text
This is a sentence.
This is another sentence.
\endtext
I want to remove the line break in between the two sentences in all instances of \text and \endtext. In order to look like this
\text
This is a sentence. This is another sentence.
\endtext
But of course where it gets complicated is that there are also line breaks after \text and before \endtext. These I want to keep. So, logically and in english speaking, what I was looking to do is something like
(after "\text\n") (remove all instances of \n) (before "\n\endtext")
but since I'm not very good at regex, I'm not sure how that would be written out in that language. Could someone help?
Notepad++ supports PCRE.
You can use this regex to search:
(?:\\text\R|(?<!\A)\G).*\K\R(?!\\endtext)
Replace this with a space.
RegEx Demo
RegEx Breakdown:
(?:: Start non-capture group
\\text: Match \text
\R: Match a line break
|: OR
(?<!\A)\G: Start from the end of the previous match. (?<!\A) is to make sure that we are not the start position
): End non-capture group
.*: Match 0 or more any character except line break
\K: Reset match info
\R: Match a line break
(?!\\endtext): Negative lookahead to make sure that we don't have Match \endtext` right next to the current position

Regular expression to filter lines based on a word combination

I want to create a regular expression to filter lines based on a word combination.
In the following example I want to match any lines that have wheel and ignore any lines that have steering in them. In the example below there are lines with both. I want to skip the line with steeringWheel but select all the rest.
chrysler::plastic::steeringWheel
chrysler::chrome::L_rearWheelCentre
chrysler::chrome::R_rearWheelCentre
If I do the following
.*(Wheel|^steering).*
It would find lines including steeringWheel.
You need to use a negative lookahead anchored at the start:
(?i)^(?!.*steering).*(wheel|tyre).*
^^^^^^^^^^^^^^
See the regex demo.
The pattern matches:
(?i) - make the pattern case insensitive
^ - start of string
(?!.*steering) - a negative lookahead that fails the match if there is steering substring after any 0+ chars
.* - any 0+ chars as many as possible up to the last occurrence of
(wheel|tyre) - either wheel or tyre
.* - any 0+ chars up to the end of line.
This regex should work. It uses a negative lookbehind, assuming that the word steering will be immediately followed by the word 'wheel'.
.*(?<!steering)Wheel.*
I don't think you'll be able to write it all as one regex. My understanding is regex doesn't truly support not matching words. The negative look arounds are good, but it has to be right there next to it not just somewhere on the line. What you are trying to do with ^ is for character classes like:
[^abc0-9] #not a character a,b,c,0..9
If possible something like this should work:
thelist = [
"chrysler::plastic::steeringWheel",
"chrysler::chrome::L_rearWheelCentre",
"chrysler::chrome::R_rearWheelCentre"
]
theregex_wheel = re.compile("wheel", re.IGNORECASE)
theregex_steering = re.compile("steering", re.IGNORECASE)
for thestring in thelist:
if re.search(theregex_wheel, thestring) and not re.search(theregex_steering, thestring):
print ("yep, want this")
else:
print ("skip this guy")

Perl: Matching string not containing PATTERN

While using Perl regex to chop a string down into usable pieces I had the need to match everything except a certain pattern. I solved it after I found this hint on Perl Monks:
/^(?:(?!PATTERN).)*$/; # Matches strings not containing PATTERN
Although I solved my initial problem, I have little clue about how it actually works. I checked perlre, but it is a bit too formal to grasp.
Regular expression to match a line that doesn't contain a word? helps a lot in understanding, but why is the . in my example and the ?: and how do the outer parentheses work?
Can someone break up the regex and explain in simple words how it works?
Building it up piece by piece (and throughout assuming no newlines in the string or PATTERN):
This matches any string:
/^.*$/
But we don't want . to match a character that starts PATTERN, so replace
.
with
(?!PATTERN).
This uses a negative look-ahead that tests a given pattern without actually consuming any of the string and only succeeds if the pattern does not match at the given point in the string. So it's like saying:
if PATTERN doesn't match at this point,
match the next character
This needs to be done for every character in the string, so * is used to match zero or more times, from the beginning to the end of the string.
To make the * apply to the combination of the negative look-ahead and ., not just the ., it needs to be surrounded by parentheses, and since there's no reason to capture, they should be non-capturing parentheses (?: ):
(?:(?!PATTERN).)*
And putting back the anchors to make sure we test at every position in the string:
/^(?:(?!PATTERN).)*$/
Note that this solution is particularly useful as part of a larger match; e.g. to match any string with foo and later baz but no bar in between:
/foo(?:(?!bar).)*baz/
If there aren't such considerations, you can simply do:
/^(?!.*PATTERN)/
to check that PATTERN does not match anywhere in the string.
About newlines: there are two problems with your regex and newlines. First, . doesn't match newlines, so "foo\nbar" =~ /^(?:(?!baz).)*$/ doesn't match, even though the string does not contain baz. You need to add the /s flag to make . match any character; "foo\nbar" =~ /^(?:(?!baz).)*$/s correctly matches. Second, $ doesn't match just at the end of the string, it also can match before a newline at the end of the string. So "foo\n" =~ /^(?:(?!\s).)*$/s does match, even though the string contains whitespace and you are attempting to only match strings with no whitespace; \z always only matches at the end, so "foo\n" =~ /^(?:(?!\s).)*\z/s correctly fails to match the string that does in fact contain a \s. So the correct general purpose regex is:
/^(?:(?!PATTERN).)*\z/s
jippie, first, here's a tip. If you see a regex that is not immediately obvious to you, you can dump it in a tool that explains every token.
For instance, here is the RegexBuddy output:
"
^ # Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed)
(?: # Match the regular expression below
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
PATTERN # Match the character string “PATTERN” literally (case insensitive)
)
. # Match any single character that is NOT a line break character (line feed)
)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\$ # Assert position at the end of a line (at the end of the string or before a line break character) (line feed)
# Perl 5.18 allows a zero-length match at the position where the previous match ends.
# Perl 5.18 attempts the next match at the same position as the previous match if it was zero-length and may find a non-zero-length match at the same position.
"
Some people also use regex101.
A Human Explanation
Now if I had to explain the regex, I would not be so linear. I would start by saying that it is fully anchored by the ^ and the $, implying that the only possible match is the whole string, not a substring of that string.
Then we come to the meat: a non-capturing group introduced by (?: and repeated any number of times by the *
What does this group do? It contains
a negative lookahead (you may want to read up on lookarounds here) asserting that at this exact position in the string, we cannot match the word PATTERN,
then a dot to match the next character
This means that at each position in the string, we assert that we cannot match PATTERN, then we match the next character.
If PATTERN can be matched anywhere, the negative lookahead fails, and so does the entire regex.

Matching WORD pattern through regex

Assume i have a big paragraph, in which there are words are like found field failed fired killed (so many negative words i know!!)
Now, I want to fetch line which have words starting from fi hi or k and ends with eld or ed
How would i go about searching this pattern of word in string....??
keep in check that i am asking about word pattern in string, not string pattern
These 2 surely didn't worked
egrep "^(f[ai]|k)+(eld|ed)$"
and
egrep "\<(f|k)+(eld|ed)$\>"
I'll admit i am not a hulk of regex, doing it out of basic understanding, so any one willing to suggest a better way (with some description) is most welcome too!! :)
The regex you are probably looking for would be
"\b([fh]i|k)\w*(eld|ed)\b"
The \w* should be equivalent to [a-zA-Z0-9_]*, so that will allow any word-like characters be between requested strings.
The \b is there to ensure, that the word really starts and ends with letters you want. Otherwise you might for example match string which contains word Unfailed
Also you need to remove $ and ^ from the regex because $ means end of line and ^ the beginning of line.
I'd use
\<(fi|hi|k)[a-zA-Z]*?(eld|ed)\>
to match the words you want.
demo # regex101
(when you take a look at the demo: \b is the same as \<
Explanation:
\< #beginning of word
(fi|hi|k) #either fi or hi or k
[a-zA-Z]*? #zero to unlimited of a-z and A-Z
(eld|ed) #either eld or ed
\> #end of word
If you want to allow numbers, dashes, underscores, ... within your words, simply add them to the character-class, for example: [a-zA-Z$_] if you want to allow $ and _, too.
You can use word boundary \b.
^.*\b(fi|hi|k)\w*(eld|ed)\b.*$
------------------------
This pattern would select lines that contain those words
NOTE:You need to use multiline modifier m & global modifier g
Try it here

Positive Lookahead Regex

I have the following regex:
^(?=.{8}$).+
The way I understand this is it will accept 8 of any type of character, followed by 1 or more of any character. I feel I am not grasping how a Positive Lookahead works. Because both sections of the Regex are looking for '.' wouldn't any series of characters fit this?
My question is, how does the positive lookahead effect this regex and what is an example of a matching string?
The following did not match when supplied in the following regex tool:
123456781
(12345678)1
(12345678)
(abcdefgh)a
(abcdefgh)
abc
123
EDIT: Removed first two data entries as I clearly wasn't using the regex tool correctly as they now match with exactly 8 characters.
^(?=.{8}$).+
will match the string
aaaaaaaa
Reasoning:
The content inside of the brackets is a lookahead, since it starts with ?=.
The content inside of a lookahead is parsed - it is not interpreted literally.
Thus, the lookahead only allows the regex to match if .{8}$ would match (at the start of the string, in this case). So the string has to be exactly eight characters then it has to end, as evidenced by $.
Then .+ will match those eight characters.
It is trying to match:
^ # start of line, but...
(?=.{8}$) # only if it precedes exactly 8 characters and the end of line
.+ # this one matches those 8 characters
and from your input, it should also match these (try this engine with match at line breaks checked):
12345678
abcdefgh
Matching 12345678 works in ruby:
'12345678' =~ /^(?=.{8}$).+/
=> 0
Maybe your test site don't support look ahead on regexps?