Is "#" a special character in regular expressions? - regex

I am working on an email filter and I have come across a list of regular expressions that are used to block all emails coming from senders that match a record in that list. While browsing through the list, I have discovered that all occurrences of the # character are escaped with a \.
Does the # mean anything special in regular expressions and needs to be escaped like so \#?

It's normally not a special character, but it doesn't hurt to escape it which is probably why many people do it, they just want to be safe (or they think it's a special character).

No, the # is not special character in regex.
The the \ can be use in this meaning
Pattern:
\Q...\E
Def
Matches the characters between \Q and \E literally, suppressing the meaning of special characters.
Example:
\Q+-/\E matches +-/

Related

Regular expressions \w without underscore in a character set []

This question is identical to Regular Expressions: How to Express \w Without Underscore, except that the goal is to match characters in the letter (L) general category plus a specified set of additional characters.
For example, [-$\a\d]+ would match identifiers like $gâteau-Noël-19 but not $gâteau_Noël-19, if a hypothetical \a letter class existed. But for some bizarre and incomprehensible reason it does not.
So the clumsy substitute suggested in the previous question, [^\W_], works fine as a replacement for \a by itself. But how can it be combined with additional characters to form the above regular expression?

Regular Expressions for some alphanumeric sentences

Can anyone provide Regular expression for there statements--
annotations/16/16366.eng
annotations/29/21345.eng
annotations/10/20132.eng
And these type of statements. I have tried 'a(\w+).eng' but, it did not worked.
To match alphanumeric separated by slashes, ending with .eng you can do:
(\w+\/\w+\/\w+\.eng)
Remember that [ and ] are used for sets. You can specify a word verbatim as a match, without any flags. If you wanted to match anything in the same format with annotations you can do:
annotations\/\w+\/\w+\.eng
Where \/ escapes a / and \. escapes a period.
And to simplify it:
[\w/]*\.eng
Meaning "Match any repetitions of the set with alphanumeric characters \w, and / followed by `.eng'.

Regular Expressions pattern with Special characters

I'm working on a regular expressions pattern, but it contains a number of special characters. I'm not really sure how to incorporate them in a normal regex pattern string. Specifically, I need to test to see if a string contains '+/-'...
I've tried using quotes etc but have no luck (I'm extremely new to regex). I am coding this in C# 4.0.
One string example is "3Z1Z +/- 5.5"
Any help is much appreciated - Thanks a lot!
Create a simple regex :
foundMatch = Regex.IsMatch(SubjectString, #"\+/-");
Will return true if this sequence of characters is found anywhere in your string. The explanation is left as an exercise to you.
Read more here.
These are part of the special character list (see also). Basically, add them to the pattern by prefixing them with a backslash (\). e.g. + becomes \+
^\+|\-$ # + or -
The same would go for anything else with special meaning, such as ., {, }, (, ), ^, $, |, [, ], etc.
There are some exceptions though. For instance, when creating a class such as: [a-z] the hyphen (-) would have special meaning (all letters from a through z). So if you wanted a literal hyphen you'd have to escape it (unless it falls as the last character of the class). e.g.
[a-z-A-Z] # hyphen should be escaped if you wanted a literal hyphen
[a-z\-A-Z] # the "correct" counter-part
[a-zA-Z-] # actually legal because it's inserted as the last character
# and therefor treated as a literal hyphen despite not being
# escaped.

Regex to match all of a set except certain ones

I'm sure this has been asked before, but I can't seem to find it (or know the proper wording to search for)
Basically I want a regex that matches all non-alphanumeric except hyphens. So basically match \W+ except exclude '-' I'm not sure how to exclude specific ones from a premade set.
\W is a shorthand for [^\w]. So:
[^\w-]+
A bit of background:
[…] defines a set
[^…] negates a set
Generally, every \v (smallcase) set is negated by a \V (uppercase) where V is any letter that defines a set.
for international characters, you may want to look into [[:alpha:]] and [[:alnum:]]
[^\w-]+
will do just that. Match any characters not in the \w set except hyphen.
You can use:
[^a-zA-Z0-9_-]
or
[^\w-]
to match a single non-hyphen, non-alphanumeric. To match one or more of then prefix with a +
In Java7 or above, you need to prepend the (?U) to match all locale specific characters. e.g.
(?U)[^\w-]
In a Java string (you need to escape \ character with another one):
(?U)[^\\w-]

What is the proper regular expression for an unescaped backslash before a character?

Let's say I want to represent \q (or any other particular "backslash-escaped character"). That is, I want to match \q but not \\q, since the latter is a backslash-escaped backslash followed by a q. Yet \\\q would match, since it's a backslash-escaped backslash followed by a backslash-escaped q. (Well, it would match the \q at the end, not the \\ at the beginning.)
I know I need a negative lookbehind, but they always tie my head up in knots, especially since the backslashes themselves have to be escaped in the regexp.
Updated:
My new and improved Perl regex, supporting more than 3 backslashes:
/(?<!\\) # Not preceded by a single backslash
(?>\\\\)* # an even number of backslashes
\\q # Followed by a \q
/x;
or if your regex library doesn't support extended syntax.
/(?<!\\)(?>\\\\)*\\q/
Output of my test program:
q does not match
\q does match
\\q does not match
\\\q does match
\\\\q does not match
\\\\\q does match
Older version
/(?:(?<!\\)|(?<=\\\\))\\q/
Leon Timmermans got exactly what I was looking for. I would add one small improvement for those who come here later:
/(?<!\\)(?:\\\\)*\\q/
The additional ?: at the beginning of the (\\\\) group makes it not saved into any match-data. I can't imagine a scenario where I'd want the text of that saved.
Now You Have Two Problems.
Just write a simple parser. If the regex ties your head up in knots now, just wait a month.
The best solution to this is to do your own string parsing as Regular Expressions don't really support what you are trying to do. (rep #Frank Krueger if you go this way, I'm just repeating his advice)
I did however take a shot at a exclusionary regex. This will match all strings that do not fit your criteria of a "\" followed by a character.
(?:[\\][\\])(?!(([\\](?![\\])[a-zA-Z])))