Comparing two regex expressions for efficiency [duplicate] - regex

This question already has answers here:
Why is a character class faster than alternation?
(2 answers)
Using alternation or character class for single character matching?
(3 answers)
Closed 3 years ago.
Why is:
[\s\S]+?
Much more efficient than:
(?:.|\n)+?
What are the differences between the two in terms of how they work behind the scenes?
Note: this is with DOTALL turned off. Also, from https://www.regular-expressions.info/dot.html:
JavaScript and VBScript do not have an option to make the dot match line break characters. In those languages, you can use a character class such as [\s\S] to match any character. This character matches a character that is either a whitespace character (including line break characters), or a character that is not a whitespace character. Since all characters are either whitespace or non-whitespace, this character class matches any character.

Related

Regular expressions and characters [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 2 years ago.
Some characters, such as question marks and plus signs, have special meanings in regular expressions and must be preceded by a backslash if they are meant to represent the character itself.
May I know which is the complete list of characters which must be preceded by a backslash ?
Is it correct to say that all non alphanumeric characters must be escaped ?
And how to add a backslash to a php string , addslash() only add a slash in this few cases
single quote (')
double quote (")
backslash ()
NUL (the NUL byte)
Actually it, depends. There are many flavors of regular expressions, most common:
BRE
ERE
PCRE (even it have multiple flavors through programming languages)
If you want to, you should escape meta-characters described in references above with \ , thats all.
Or surround them in [], but this is kind of overkill.
Also, you can embed any UTF-8 character in PCRE (and some other flavors) via \x{FFFF} syntax, where
FFFF - byte, representing codepoint

What is the purpose of [.] in regular expressions? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I found [.] in a regular expression on manpage of notmuch:
notmuch search 'from:"/bob#.*[.]example[.]com/"'
It seemed to be useless because brackets are for list but have only one character, but finally I learned it matches a literal dot.
Then, why they use it rather than \.? Are there any advantages on this expression?
At first I thought that this is to avoid double escaping but on further consideration I think this is because a dot in a character set ([]) is treated differently than normal. It makes sense that in a character set a dot only matches a literal dot, the whole point is to match a specific set of characters so having a wildcard in the set doesn't make sense.
So [.,;:] may be used to match punctuation marks.
Once you take that into account it's obvious that [.] just matches dot.
Whether to use \. or [.] is left as an aesthetic decision.

Regex to capute single backslash with single space after [duplicate]

This question already has answers here:
Check if string contains single backslashes with regex
(3 answers)
Closed 3 years ago.
I have trouble with figuring out this regex:
https://regex101.com/r/WtAYVa/2
It works capturing the first single backslash (\), but I want to ignore (\\), especially, when there's a space after \\.
If we wish to fail the double backslash, and only pass the single one, we would be simply adding more boundaries to our expression, such as we would be using start and end anchors:
^\\\s$
Demo

Ruby regex for extracting email addresses not detecting hypens [duplicate]

This question already has answers here:
Get final special character with a regular expression
(2 answers)
Closed 8 years ago.
Tried looking at the regex that some others are using, but for some reason it's not working for me.
I just basically have a string, such as "testing-user#example.com", It'll only extract user#example.com and not the whole thing.
Here's what I have:
regex = Regexp.new(/\b[a-zA-Z0-9._%+-,]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/)
email = line.scan(regex)
Any help would be greatly appreciated.
The hyphen needs to be escaped for the position it is at inside of the character class.
[a-zA-Z0-9._%+\-,]+
^
(+-,) currently matches a single character in the range between + and ,
Inside of a character class the hyphen has special meaning. You can place the hyphen as the first or last character of the class. In some regex implementations, you can also place directly after a range. If you place the hyphen anywhere else you need to precede it with a backslash it in order to add it to your class.

Regex to allow just letters and special characters [duplicate]

This question already has answers here:
Regex only allow letters and some characters
(4 answers)
Closed 9 years ago.
I'm currently using the following regex to allow only characters:
"^[a-zA-Z]+$"
I would like to change it so that it allows characters and special characters like '-', and other characters which are found in non-English characters.
How can I do it?
If you need to allow specific special characters, simply include them in the character class:
"^[a-zA-Z\-]+$"
Some special characters need to be escaped, some don't.
But if you want to accept every character except numeric characters, it might be simpler to simply use:
"^\D+$"
hmm try this regex: "\D" it allows only characters and no signs. Its equivalent to [^\d].
you can add special characters just by writing them to it. For example: "[\D-+#$]+$"