REGEX: Special Characters detected for [[a-zA-Z0-9]] - regex

I built a filter with the rule: [[a-zA-Z0-9]]
So the intention was that content with at least a number or a letter and should remove content with special characters for example "?" and ":)" or any other emojis.
So far it worked great, however i noticed that the rule does not worked for symbols that starts with the unicode "U+1" and seems to recognize as a letter or a number. Other special letters/symbols starting with "U+0" for example ◌́ seems to work as intended.
Can somebody please explain the reason for this?
Thanks

Related

Character Classes in Vim

I have a question about using regexes in Vim.
When using character classes, if I search using the pattern [a-y] the search is case insensitive.
But the pattern a-z seems to make the search case sensitive.
I think it's because of the z. But I don't know why.
And I am using gVim 7.4 on Win 8.1.
And the character z of [a-z] is a lowercase z
Changing the pattern to a-Z makes the search case sensitive too.
With the pattern a-Y, strangely a 'wrong scope' error occurs.
The following images are descriptions about encoding and config.
Thanks, everyone. :)
The most straightforward answer is that in one search, case-sensitivity is on, and in the other, it is not. See :help 'ignorecase.
If that is not the case, then the only way I can reproduce this is to use a character that looks like an ASCII z but is in reality a completely different character. Among characters that resemble an ASCII z, the only one I can find that reproduces this behavior is U+0396 Greek Capital Letter Zeta: Ζ.
Even this theory is a little shaky, as this character looks like an uppercase Z, not a lowercase z - at least on my screen.
It is difficult to be certain that this is the issue given only the screenshots above and your description. More information in your question about exactly how you are entering the search characters, what encoding you are using, what your keyboard layout is, etc., might help someone write a better answer than this one.

Regex match cannot get to work

I've spent far to long trying to get this to work. I'm trying to validation only for the following:
alpha numerics, hyphens, full stops, exclamation marks, open/closing brackets (normal not curly), forward slashes and question marks.
I thought it was the following regex
/([a-zA-Z0-9\!\(\)\-\/\.\?\s])+/
Which kinda works as if I put #~ in the box, it shows that it is invalid. However if I put #~Paul then it states that the match is valid true. It seems to say true as soon as it finds a valid character. The example #~Paul should be false as it contains invalid characters. Should only be true if all characters are valid ones.
Example of working matches should be.
Paul!!
Paul (Stack-Overflow.)!
I'm sure some whizz can help me out there. Please help.
^([a-zA-Z0-9\!\(\)\-\/\.\?\s])+$
What you need are anchors to make validation strict.

Blogger weird behavior with Japanese brackets

I'm experiencing a weird behavior from Blogger. The code works fine when I test it locally, but Blogger seems to skip Japanese brackets: () in my code.
I need to remove them, with a simple regex:
.replace(/\(/g,'').replace(/\)/g,'')
(I tried without using the backslash as well, it works locally, and omits brackets on Blogger in both cases.)
It seems to work well with other Japanese characters though, the only problem I've encountered so far are brackets. I'm looking for both solution/cheat/workaround for this specific case, but I'm also interested in more detailed information about why it happens.
Instead of the brackets you need to put their unicode value.
In most regex engines, we do this in this format:
\uFFFF
Where FFFF is the hex value of the unicode character.
In this case, a Japanese opening bracket is unicode FF08 and a Japanese closing bracket is unicode FF09.
So replace:
\( and \)
With:
\uFF08 and \uFF09
In your replaceAll regex.
Good Luck!

Trouble rejecting a specific character in my RegEx

I'm running the following regular expression to check a username:
^(?=.*[a-zA-Z0-9])\w{2,25}\s*$
It works fine but now I need to amend it to reject any instances of underscores(_). I've tried wedging ^(?!_)$ in there but it doesn't seem to work for me in that it either checks at the beginning or the end.
I know a little about regular expressions but I'm hazy on all the classes. I've found a great resource for it at http://www.regular-expressions.info/reference.html
Thanks for the help, folks.
This should work for you:
[a-zA-Z][a-zA-Z0-9.\-]{2,25}\s*$
What this regex will validate:
The first character is a letter
The input contains only alphanumeric characters (i added - also)
if dont want - just remove \-
The input is 2-25 characters long
Well, you could always remove the \w by its character class excluding _.
^(?=.*[a-zA-Z0-9])[A-Za-z0-9]{2,25}\s*$

What is wrong with my simple regex that accepts empty strings and apartment numbers?

So I wanted to limit a textbox which contains an apartment number which is optional.
Here is the regex in question:
([0-9]{1,4}[A-Z]?)|([A-Z])|(^$)
Simple enough eh?
I'm using these tools to test my regex:
Regex Analyzer
Regex Validator
Here are the expected results:
Valid
"1234A"
"Z"
"(Empty string)"
Invalid
"A1234"
"fhfdsahds527523832dvhsfdg"
Obviously if I'm here, the invalid ones are accepted by the regex. The goal of this regex is accept either 1 to 4 numbers with an optional letter, or a single letter or an empty string.
I just can't seem to figure out what's not working, I mean it is a simple enough regex we have here. I'm probably missing something as I'm not very good with regexes, but this syntax seems ok to my eyes. Hopefully someone here can point to my error.
Thanks for all help, it is greatly appreciated.
You need to use the ^ and $ anchors for your first two options as well. Also you can include the second option into the first one (which immediately matches the third variant as well):
^[0-9]{0,4}[A-Z]?$
Without the anchors your regular expression matches because it will just pick a single letter from anywhere within your string.
Depending on the language, you can also use a negative look ahead.
^[0-9]{0,4}[A-Za-z](?!.*[0-9])
Breakdown:
^[0-9]{0,4} = This look for any number 0 through 4 times at the beginning of the string
[A-Za-z] = This look for any characters (Both cases)
(?!.*[0-9]) = This will only allow the letters if there are no numbers anywhere after the letter.
I haven't quite figured out how to validate against a null character, but that might be easier done using tools from whatever language you are using. Something along this logic:
if String Doesn't equal $null Then check the Rexex
Something along those lines, just adjusted for however you would do it in your language.
I used RegEx Skinner to validate the answers.
Edit: Fixed error from comments