Regular expression [\n.]* does not seem to work to match anything - regex

I am trying to match any character or a new line and this arbitraryly often.
I tried [\n.]* but that did not seem to work. Can anybody explain why?

As was stated previously, the dot is an actual dot in the square brackets. Try this instead
\n*|.*
https://regex101.com/r/DL6yuF/1

What you're trying to do is match any character and are being thrown
off by the intent of the dot meta-character which means match any
character except newlines.
The analogy of any character except a single character can be seen
using a character class.
For instance
And [\a] = [A]
Not [\A] = [^A]
Replacing Aa with Ss letters,
any character would be [\s] or [\S].
Combining them into a class you'd get this
[\S\s]
the meaning of which is match any character and is not restricted
to the meaning of what a dot is as you go to and from a Unicode
environment.

The dot is a real dot inside a character class (square brackets), i.e. is not considered a metacharacter.
The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash.

Related

Purpose of this dash character in Regex capture

I am trying to understand the purpose of - in this regex capture clause
(?P<slug>[\w-]+)
This is what I came up for when search for a dash
A dash (-) can be used to specify a range. So the dash is a
metacharacter, but only within a character class.If you want to use a
literal dash within a character class, you should escape it with a
backslash, except when the dash is the first or last character of the
character class. So, the regexp [a-z] is equal to [az-] and [-az],
they will match any of those three characters.
My questions is what is the - after \w
You are looking at what my former CS professor would refer to as a rabbit (out of a hat):
(?P<slug>[\w-]+)
The reason it is a rabbit is because normally your research is correct and dash is used as a part of a range of characters. But in this case, the dash is a literal dash, since it appears at the end of the character class.
So here [\w-]+ means to match one or more word characters or literal dashes.
If you want to include a literal dash in a character class, a safer way is to escape it:
[\w\-]+
Then, the dash may be placed anywhere in the class.

Regular expression to match anything but certain characters

Is there a way to have a regular expression to match anything but certain characters? Say for example the only characters that aren't allowed is the * character. Rather than list out all possibly characters allowed in the regular expression is there anything that will say "everything not equal to * is allowed".
You can use the negated class character that you can use by [^]. So, for your case you can use:
^[^*]+$
A useful debuggex graph to see this is:
You can check more about the theory on negated class. Below you can find a quotation explaining this.
Negated Character Classes
Typing a caret after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class. Unlike the dot, negated character classes also match (invisible) line break characters. If you don't want a negated character class to match line breaks, you need to include the line break characters in the class. [^0-9\r\n] matches any character that is not a digit or a line break.
It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).
[^*] Any single character except: *
Whenever I had to work with regular expressions I usually go to rubular.com and test my attempts. It also has some examples, pretty usefull
This is explained in the manual.
The solution is:
"[^*]*"

Regular expression for alpahbet,underscore,hyphen,apostrophe only

I want a regular expression that accept only alphabets,hyphen,apostrophe,underscore.
I tried
/^[ A-Za-z-_']*$/
but its not working. Please help.
Your regex is wrong. Try this:
/^[0-9A-Za-z_#'-]+$/
OR
/^[\w#'-]+$/
Hyphen needs to be at first or last position inside a character class to avoid escaping. Also if empty string isn't allowed then use + (1 or more) instead of * (0 or more)
Explanation:
^ assert position at start of the string
[\w#'-]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible
\w match any word character [a-zA-Z0-9_]
#'- a single character in the list #'- literally
$ assert position at end of the string
Move the hyphen at the end or the beginig of the character class or escape it:
^[ A-Za-z_'-]*$
or
^[- A-Za-z_']*$
or
^[ A-Za-z\-_']*$
If you want all letters:
^[ \pL_'-]*$
or
When using a hyphen in a character class, be sure to place it at the end of the character class as a best practice.
The reason for this is because the hyphen is used to signify a range of characters in the character class, and when it is at the end of the class, it will not create any ranges.
My best bet would be :
/[A-Za-z-\'_#0-9]+/g
You can use the following (in Java):
String acceptHyphenApostropheUnderscoreRegEx = "^(\\p{Alpha}*+((['_-]+)\\p{Alpha})?)*+$";
If you want to have spaces and # also (as some have given above) try:
String acceptHyphenApostropheUnderscoreRegEx = "^(\\p{Alpha}*+((\\s|['#_-]+)\\p{Alpha})?)*+$";

regex why aren't these two the same?

[\w+\.]{3}
and
\w+\.\w+\.\w+\.
the former matches "dra"
later matches "dragon.is.awesome"
What am I not understanding right about them?
Input text looks like
i know dragon.is.awesome but
i know dragon.is.awesome.because, he is awesome
i know dragon.sucks.because, he is not awesome
i know dragon.is.dead, someone killed him
so i need to match any combination of groupings that are of the pattern \w+.
Because the first one is a character class.
[\w+/\.]
matches either one \w, or one + or one / or one literal .. If you want to shorten the latter, use normal parentheses:
(\w+\.){3}
Note that within character classes, most meta-characters lose their meaning. So + and . and * (for example) can all be contained and matched without being escaped.
[...] is a character class. It matches one character. [\w+\.] matches one character which is either a "word" character (letter, number, or underscore), or a plus, or a dot. [\w+\.]{3} matches three such characters in a row.
[] is a character class, not a subpattern. [abc] Matches a single a, b or c.
You probably meant (\w+\.){3}, which does match the same as your second regex.

Regexp question mark (in emacs)

I'd like to ask what the following emacs regular expression means (if anyone wonders, this is the regexp that erlang-mode uses for matching a single-quoted atom):
'\\(?:[^\\']\\|\\(?:\\\\.\\)\\)*'
specifically I'm having trouble finding explanations for three things.
First, the question mark which supposedly should either make the preceding item optional or specify that the preceding quantifier make lazy, but there is no item or quantifier here, only the start of a new group so what effect does it have here?
Second, the escaped apostrophe. Why would you need to escape the apostrophe?
Third, the quadruple escape \\., wouldn't this leave you with an escaped backslash and a \. which would make it an invalid regexp?
Thanks
"[^\\']"
Second, the escaped apostrophe. Why would you need to escape the apostrophe?
Firstly note that In Emacs regexp syntax, \` matches the start of the string, and \' matches the end of the string. In multi-line strings this is different to the more familiar ^ and $, which match the beginning of a line and the end of a line.
However that is not relevant within a character alternative (square brackets), so this sequence is actually matching any character other than a backslash or an apostrophe.
Edit:
So from the comments, this is still causing confusion, so let's break it down:
"'\\(?:[^\\']\\|\\(?:\\\\.\\)\\)*'"
That code evaluates to this string/regexp:
'\(?:[^\']\|\(?:\\.\)\)*'
' matches an apostrophe
\(?:foo\)* matches zero or more foo
foo\|bar matches either of foo or bar
[^\'] matches any character other than a backslash or an apostrophe
\(?:\\.\) could (in this case, being a non-capturing group which occurs exactly once) be rewritten as simply \\., and matches a backslash followed by any character other than a newline.
' matches an apostrophe
So the whole thing matches a single-quoted string in which:
any other single-quotes must each be preceded by a backslash
any backslash must be paired with another non-newline character (which could also be a backslash)
Which of course sounds like a typical string syntax in which backslashes can be used to escape special characters, including backslashes themselves and any instances of the delimiting quote character.
First: (?: groups multiple tokens together without creating a capturing group. This allows you to apply quantifiers to the full group.
Second and third, I think those are escaped bars. Each pair means \, and the quadruple means \\. So, its not scaping apostrophe at all.