What is the regular expression to search for word string that is not followed by the # symbol?
For example:
mywordLLD OK
myword.dff OK
myword#ld Exclude
The (?!#) negative look-ahead will make word match only if # does not appear immediately after word:
word(?!#)
If you need to fail a match when a word is followed with a character/string somewhere to the right, you may use any of the three below
word(?!.*#) # Note this will require # to be on the same line as word
(?s)word(?!.*#) # (except Ruby, where you need (?m)): This will check for # anywhere...
word(?![\s\S]*#) # ... after word even if it is on the next line(s)
See demo
This regex matches word substring and (?!#) makes sure there is no # right after it, and if it is there, the word is not returned as a match (i.e. the match fails).
From Regular-expressions.info:
Negative lookahead is indispensable if you want to match something not followed by something else. When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u). The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point.
And on Character classes page:
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).
Related
What is the regular expression to search for word string that is not followed by the # symbol?
For example:
mywordLLD OK
myword.dff OK
myword#ld Exclude
The (?!#) negative look-ahead will make word match only if # does not appear immediately after word:
word(?!#)
If you need to fail a match when a word is followed with a character/string somewhere to the right, you may use any of the three below
word(?!.*#) # Note this will require # to be on the same line as word
(?s)word(?!.*#) # (except Ruby, where you need (?m)): This will check for # anywhere...
word(?![\s\S]*#) # ... after word even if it is on the next line(s)
See demo
This regex matches word substring and (?!#) makes sure there is no # right after it, and if it is there, the word is not returned as a match (i.e. the match fails).
From Regular-expressions.info:
Negative lookahead is indispensable if you want to match something not followed by something else. When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u). The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point.
And on Character classes page:
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).
What is the regular expression to search for word string that is not followed by the # symbol?
For example:
mywordLLD OK
myword.dff OK
myword#ld Exclude
The (?!#) negative look-ahead will make word match only if # does not appear immediately after word:
word(?!#)
If you need to fail a match when a word is followed with a character/string somewhere to the right, you may use any of the three below
word(?!.*#) # Note this will require # to be on the same line as word
(?s)word(?!.*#) # (except Ruby, where you need (?m)): This will check for # anywhere...
word(?![\s\S]*#) # ... after word even if it is on the next line(s)
See demo
This regex matches word substring and (?!#) makes sure there is no # right after it, and if it is there, the word is not returned as a match (i.e. the match fails).
From Regular-expressions.info:
Negative lookahead is indispensable if you want to match something not followed by something else. When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u). The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point.
And on Character classes page:
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).
I try to find method definitions except constructors.
To simplify Im looking for abc::def, foo::bar but not foo::foo
I already know how to write an expression like so:
\w[\w\d_]+::\w[\w\d_]+
But how to make sure the left part of the :: does not match the right part?
By the way, I cannot check if there is a type definition left of the qualified method name. I have a very old project where it was fine to not specify a type if it was int.
Note that \w already matches \d and _ and \w[\w\d_]+ = \w{2,}.
You can capture the first "word" (before ::) and check with a negative lookahead that the "word" after :: is not equal to it:
\b(\w+)::(?!\b\1\b)\w+\b
See the regex demo
Explanation:
\b - leading word boundary
(\w+) - Group 1: one or more alphanumeric and underscore characters
:: - 2 consecutive colons
(?!\b\1\b) - the next "word" cannot be the same as the value in Group 1
\w+\b - one or more alphanumeric and underscore characters followed with a trailing word boundary.
If you are not looking to match 1-character "words", you can use
\b(\w{2,})::(?!\b\1\b)\w{2,}\b
You can capture first part and check if it's repeated using back-referencing like this.
Regex: \b(\w[\w\d_]+)::(?!\1)\w[\w\d_]+
Explanation:
\b(\w[\w\d_]+) matches the first part.
(?!\1) negative lookahead for first part. If repeated whole match will be discarded.
\w[\w\d_]+ If not repeated then this part will match.
Regex101 Demo
Is there a way to have a regular expression to match anything but certain characters? Say for example the only characters that aren't allowed is the * character. Rather than list out all possibly characters allowed in the regular expression is there anything that will say "everything not equal to * is allowed".
You can use the negated class character that you can use by [^]. So, for your case you can use:
^[^*]+$
A useful debuggex graph to see this is:
You can check more about the theory on negated class. Below you can find a quotation explaining this.
Negated Character Classes
Typing a caret after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class. Unlike the dot, negated character classes also match (invisible) line break characters. If you don't want a negated character class to match line breaks, you need to include the line break characters in the class. [^0-9\r\n] matches any character that is not a digit or a line break.
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).
[^*] Any single character except: *
Whenever I had to work with regular expressions I usually go to rubular.com and test my attempts. It also has some examples, pretty usefull
This is explained in the manual.
The solution is:
"[^*]*"
i need a regex that matches an expression ending with a word boundary, but which does not consider the hyphen as a boundary.
i.e. get all expressions matched by
type ([a-z])\b
but do not match e.g.
type a-1
to rephrase: i want an equivalent of the word boundary operator \b which instead of using the word character class [A-Za-z0-9_], uses the extended class: [A-Za-z0-9_-]
You can use a lookahead for this, the shortest would be to use a negative lookahead:
type ([a-z])(?![\w-])
(?![\w-]) would mean "fail the match if the next character is in \w or is a -".
Here is an option that uses a normal lookahead:
type ([a-z])(?=[^\w-]|$)
You can read (?=[^\w-]|$) as "only match if the next character is not in the character class [\w-], or this is the end of the string".
See it working: http://www.rubular.com/r/NHYhv72znm
I had a pretty similar problem except I didn't want to consider the '*' as a boundary character. Here's what I did:
\b(?<!\*)([^\s\*]+)\b(?!*)
Basically, if you're at a word boundary, look back one character and don't match if the previous character was an '*'. If you're in the middle, don't match on a space or asterisk. If you're at the end, make sure the end isn't an asterisk. In your case, I think you could use \w instead of \s. For me, this worked in these situations:
*word
wo*rd
word*