Is there a way to have a regular expression to match anything but certain characters? Say for example the only characters that aren't allowed is the * character. Rather than list out all possibly characters allowed in the regular expression is there anything that will say "everything not equal to * is allowed".
You can use the negated class character that you can use by [^]. So, for your case you can use:
^[^*]+$
A useful debuggex graph to see this is:
You can check more about the theory on negated class. Below you can find a quotation explaining this.
Negated Character Classes
Typing a caret after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class. Unlike the dot, negated character classes also match (invisible) line break characters. If you don't want a negated character class to match line breaks, you need to include the line break characters in the class. [^0-9\r\n] matches any character that is not a digit or a line break.
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).
[^*] Any single character except: *
Whenever I had to work with regular expressions I usually go to rubular.com and test my attempts. It also has some examples, pretty usefull
This is explained in the manual.
The solution is:
"[^*]*"
Related
What is the regular expression to search for word string that is not followed by the # symbol?
For example:
mywordLLD OK
myword.dff OK
myword#ld Exclude
The (?!#) negative look-ahead will make word match only if # does not appear immediately after word:
word(?!#)
If you need to fail a match when a word is followed with a character/string somewhere to the right, you may use any of the three below
word(?!.*#) # Note this will require # to be on the same line as word
(?s)word(?!.*#) # (except Ruby, where you need (?m)): This will check for # anywhere...
word(?![\s\S]*#) # ... after word even if it is on the next line(s)
See demo
This regex matches word substring and (?!#) makes sure there is no # right after it, and if it is there, the word is not returned as a match (i.e. the match fails).
From Regular-expressions.info:
Negative lookahead is indispensable if you want to match something not followed by something else. When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u). The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point.
And on Character classes page:
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).
I am trying to match any character or a new line and this arbitraryly often.
I tried [\n.]* but that did not seem to work. Can anybody explain why?
As was stated previously, the dot is an actual dot in the square brackets. Try this instead
\n*|.*
https://regex101.com/r/DL6yuF/1
What you're trying to do is match any character and are being thrown
off by the intent of the dot meta-character which means match any
character except newlines.
The analogy of any character except a single character can be seen
using a character class.
For instance
And [\a] = [A]
Not [\A] = [^A]
Replacing Aa with Ss letters,
any character would be [\s] or [\S].
Combining them into a class you'd get this
[\S\s]
the meaning of which is match any character and is not restricted
to the meaning of what a dot is as you go to and from a Unicode
environment.
The dot is a real dot inside a character class (square brackets), i.e. is not considered a metacharacter.
The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash.
Yet another question about a regex.
I'm trying to match all special characters, except '*'.
So if I match my regex against:
John%%%* dadidou
I should get:
John* dadidou
Here: How to match with regex all special chars except "-" in PHP?
The accepted answer advices to use (if I want to exclude '-'):
[^\w-]
But doesn't that mean: "NOT a special character, NOT -", which is a bit redundant ?
What you really want is this regex for matching:
[^\w\s*]+
Replace it by empty string.
Which means match 1 or more of any character that is:
Not a word character [AND]
Not a whitespace [AND]
Not a literal *
RegEx Demo
When you define a negative character class, you are really inverting it.
What does that mean ?
A positive character class implicitly OR's it's contents.
When you negate a class, you implicitly AND it's contents.
So, [\w-] means word OR dash,
the inverse, [^\w-] means not word AND not dash.
A negative word for instance, [^\w] would match a dash -.
So, to not match it, you have to add a not dash as well.
A C analogy would be
existing (varA || varB)
inverted (!varA && !varB)
where inverting changes the Boolean of each of the components.
Basically a negative class changes the Boolean of each of its components,
so the implicit OR becomes an implicit AND and the components characters
(or expressions) are negated.
What will really bake your noodle later on is when you see something like
[^\S\r\n]
This translates to NOT-NOT-Whitespace and NOT-cr and NOT-lf
which reduces to matching all whitespace except CR,LF
I have want to match a string which starts with number, followed by any characters and ends with .html;
I have tried the following:
/([0-9]*[^\.html]*.html)/g
But Regexr for an example like "21212dfsd.htmlfdf.html" says 2 matches?! Why is that?
Thanks
You get two matches because of the * quantifier following the character class. * means match the preceding token "zero or more" times. Use + instead, meaning "one or more".
You can't place whole words inside of a character class as well. A character class matches any one character from a set of characters and the dot . needs to be escaped (it's a character of special meaning).
You can use the below regular expression:
/\d+.*?\.html/g
i need a regex that matches an expression ending with a word boundary, but which does not consider the hyphen as a boundary.
i.e. get all expressions matched by
type ([a-z])\b
but do not match e.g.
type a-1
to rephrase: i want an equivalent of the word boundary operator \b which instead of using the word character class [A-Za-z0-9_], uses the extended class: [A-Za-z0-9_-]
You can use a lookahead for this, the shortest would be to use a negative lookahead:
type ([a-z])(?![\w-])
(?![\w-]) would mean "fail the match if the next character is in \w or is a -".
Here is an option that uses a normal lookahead:
type ([a-z])(?=[^\w-]|$)
You can read (?=[^\w-]|$) as "only match if the next character is not in the character class [\w-], or this is the end of the string".
See it working: http://www.rubular.com/r/NHYhv72znm
I had a pretty similar problem except I didn't want to consider the '*' as a boundary character. Here's what I did:
\b(?<!\*)([^\s\*]+)\b(?!*)
Basically, if you're at a word boundary, look back one character and don't match if the previous character was an '*'. If you're in the middle, don't match on a space or asterisk. If you're at the end, make sure the end isn't an asterisk. In your case, I think you could use \w instead of \s. For me, this worked in these situations:
*word
wo*rd
word*