Regex should allow German Umlauts in C# - regex

I am using following regular expression:
[RegularExpression(#"^[A-Za-z0-9äöüÄÖÜß]+(?:[\._-äöüÄÖÜß][A-Za-z0-9]+)*$", ErrorMessageResourceName = "Error_User_UsernameFormat", ErrorMessageResourceType = typeof(Properties.Resources))]
Now I want to improve it the way it will allow German Umlauts(äöüÄÖÜß).

The way you added German letters to your regex, it will only be possible to use German letters in the first word.
You need to put the letters into the last character class:
#"^[A-Za-z0-9äöüÄÖÜß]+(?:[._-][A-Za-z0-9äöüÄÖÜß]+)*$"
^^^^^^^
See the regex demo
Also, note that _-ä creates a range inside a character class that matches a lot more than just a _, - and ä (and does not even match - as it is not present in the range).
Note that if you validate on the server side only, and want to match any Unicode letters, you may also consider using
#"^[\p{L}0-9]+(?:[._-][\p{L}0-9]+)*$"
Where \p{L} matches any Unicode letter. Another way to write [\p{L}0-9] would be [^\W_], but in .NET, it would also match all Unicode digits while 0-9 will only match ASCII digits.

replace [A-Za-z0-9äöüÄÖÜß] with [\w]. \w already contains Umlauts.

This works better i just modified somebody else his code who posted it on Stackoverflow. this works good for German language encoding.
I just added this code (c >= 'Ä' && c <= 'ä') and now it is working more towards my needs. Not all German letters are supported you need to create your own (c >= 'Ö' && c <= 'ö') type to add the letters u are having a issue with.
public static string RemoveSpecialCharacters(this string str)
{
StringBuilder sb = new StringBuilder();
foreach (char c in str)
{
if ((c >= '0' && c <= '9') || (c >= 'Ö' && c <= 'ö') || (c >= 'Ü' && c <= 'ü') || (c >= 'Ä' && c <= 'ä') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '.' || c == ' ')
{
sb.Append(c);
}
}
return clean(sb);
}

Related

Github username convention using regex

I've been trying to convert the Github username convention using regex in Go for a while now and I couldn't do it. Also the username length shouldn't exceed more than 39 characters.
Below is the username convention from Github
Username may only contain alphanumeric characters or single hyphens, and cannot begin or end with a hyphen.
and for the length
Username is too long (maximum is 39 characters).
Here is the code I've written. You could check here in Go playground
package main
import (
"fmt"
"regexp"
)
func main() {
usernameConvention := "^[a-zA-Z0-9]*[-]?[a-zA-Z0-9]*$"
if re, _ := regexp.Compile(usernameConvention); !re.MatchString("abc-abc") {
fmt.Println("false")
} else {
fmt.Println("true")
}
}
Currently, I could achieve these:
a-b // true - Working!
-ab // false - Working!
ab- // false - Working!
0-0 // true - Working!
But the problem I'm facing is that I couldn't find the regex pattern which should work for the below scenario:
a-b-c // false - Should be true
Also it has to be within 39 characters which I've found that we could use {1,38}, but I don't know where exactly should I add that in the regex pattern.
In Go RE2-based regex, you can't use lookarounds, so checking length limit can only be done either with another regex, or with regular string length checking.
A fully non-regex approach (demo):
package main
import (
"fmt"
"strings"
)
func IsAlnumOrHyphen(s string) bool {
for _, r := range s {
if (r < 'a' || r > 'z') && (r < 'A' || r > 'Z') && (r < '0' || r > '9') && r != '-' {
return false
}
}
return true
}
func main() {
s := "abc-abc-abc"
if len(s) < 40 && len(s) > 0 && !strings.HasPrefix(s, "-") && !strings.Contains(s, "--") && !strings.HasSuffix(s, "-") && IsAlnumOrHyphen(s) {
fmt.Println("true")
} else {
fmt.Println("false")
}
}
Details
len(s) < 40 && len(s) > 0 - Length restriction, from 1 to 39 chars are allowed
!strings.HasPrefix(s, "-") - should not start with -
!strings.Contains(s, "--") - should not contain --
!strings.HasSuffix(s, "-") - should not end with -
IsAlnumOrHyphen(s) - can only contain ASCII alphanumeric and hyphens.
For a partially regex approach, see this Go demo:
package main
import (
"fmt"
"regexp"
)
func main() {
usernameConvention := "^[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*$"
re,_ := regexp.Compile(usernameConvention)
s := "abc-abc-abc"
if len(s) < 40 && len(s) > 0 && re.MatchString(s) {
fmt.Println("true")
} else {
fmt.Println("false")
}
}
Here, the ^[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*$ regex matches
^ - start of string
[a-zA-Z0-9]+ - 1 or more ASCII alphanumeric chars
(?:-[a-zA-Z0-9]+)* - 0 or more repetitions of - and then 1 or more ASCII alphanumeric chars
$ - end of string.

Regex: Any letters, digit, and 0 up to 3 special chars

It seems I'm stuck with a simple regex for a password check.
What I'd like:
8 up to 30 symbols (Total)
With any of these: [A-Za-z\d]
And 0 up to 3 of these: [ -/:-#[-`{-~À-ÿ] (Special list)
I took a look here and then I wrote something like:
(?=.{8,15}$)(?=.*[A-Za-z\d])(?!([ -\/:-#[-`{-~À-ÿ])\1{4}).*
But it doesn't work, one can put more than 3 of the special chars list.
Any tips?
After shuffling your regex around a bit, it works for the examples you provided (I think you made a mistake with the example "A#~` C:", it should not match as it has 6 special chars):
(?!.*(?:[ -\/:-#[-`{-~À-ÿ].*){4})^[A-Za-z\d -\/:-#[-`{-~À-ÿ]{8,30}$
It only needs one lookahead instead of two, because the length and character set check can be done without lookahead: ^[A-Za-z\d -/:-#[-`{-~À-ÿ]{8,30}$
I changed the negative lookahead a bit to be correct. Your mistake was to only check for consecutive special chars, and you inserted the wildcards .* in a way that made the lookahead never hit (because the wildcard allowed everything).
Will this work?
string characters = " -/:-#[-`{-~À-ÿ";
string letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
string[] inputs = {
"AABBCCDD",
"aaaaaaaa",
"11111111",
"a1a1a1a1",
"AA####AA",
"A1C EKFE",
"AADE F"
};
foreach (string input in inputs)
{
var counts = input.Cast<char>().Select(x => new { ch = characters.Contains(x.ToString()) ? 1 : 0, letter = letters.Contains(x.ToString()) ? 1 : 0, notmatch = (characters + letters).Contains(x) ? 0 : 1}).ToArray();
Boolean isMatch = (input.Length >= 8) && (input.Length <= 30) && (counts.Sum(x => x.notmatch) == 0) && (counts.Sum(x => x.ch) <= 3);
Console.WriteLine("Input : '{0}', Matches : '{1}'", input, isMatch ? "Match" : "No Match");
}
Console.ReadLine();
I would use: (if you want to stick to Regex)
var specialChars = #" -\/:-#[-`{-~À-ÿ";
var regularChars = #"A-Za-z\d";
if (Regex.Match(password,$"^(.[{regularChars}{specialChars}]{7,29})$").Success && Regex.Matches(password, $"[{specialChars}]").Count<=3))
{
//Password OK
}
If consists of:
Check Length and if password contains illegal characters
Check if ony contains 3 times special char
A litle faster:
var specialChars = #" -\/:-#[-`{-~À-ÿ";
var regularChars = #"A-Za-z\d";
var minChars = 8;
var maxChars = 30;
if (password.Length >= minChars && password.Length <= maxChars && Regex.Match(password,$"^[{regularChars}{specialChars}]+$").Success && Regex.Matches(password, $"[{specialChars}]").Count<=3))
{
//Password OK
}
Newbie here..I think I've managed to get what you need but one of the test cases you shared was kinda weird..
A#~` C:
OK -- Match (3 specials, it's okay)
Shouldn't this be failed because it has more than 3 specials?
Could you perhaps try this? If it works I'll type out the explanations for the regex.
https://regex101.com/r/KCL6R1/2
(?=^[A-Za-z\d -\/:-#[-`{-~À-ÿ]{8,30}$)^(?:[A-Za-z\d]*[ -\/:-#[-`{-~À-ÿ]){0,3}[A-Za-z\d]*$

Looking for a regex for a password with at least one lowercase letter, at least one upper case letter, at least one digit & length between 6 and 14

I am looking for a regular expression for a validating a password. The password rules are:
at least one lowercase letter
at least one upper case letter
at least one digit
length between 6 and 14
I created following regular expression but it's not working
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,14}$
It's accepting
qwerty1
QWERTY1
but not qwERTy
i.e. it's fulfilling only 2 conditions
at least one digit
length between 6 and 14
I'm not sure that's possible, but I'm sure that if it is, and it turns out a long complicated regex string, it's a wrong design decision. It will be unmaintainable, unclear and very error prone.
At the same time, this is easy to do, understand and maintain:
function isValid(password)
{
if(password.length < 6 || password.length > 14)
return false;
var valid = { hasLower: false, hasUpper: false, hasDigit: false };
for(var i = 0; i < password.length; i++) {
var c = password[i];
var upperC = c.toUpperCase();
valid.hasLower |= c != upperC;
valid.hasUpper |= c == upperC;
valid.hasDigit |= c >= '0' && c <= '9';
}
return valid.hasLower && valid.hasUpper && valid.hasDigit;
}
alert('"123abcDEF" valid = ' + isValid('123abcDEF'));
alert('"123 DEF" valid = ' + isValid('123 DEF'));
You can use \S instead of . for restricting spaces:
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])\S{6,14}$
^
See DEMO

Simplified regular expression matching in Scala

I am trying to write a simple regular expression matching in Scala as an exercise. For simplicity I assume the strings to match are ASCII and the regexps consists of ASCII characters and two metacharacters: . and * only. (Obviously, I don't use any regexp library).
This is my simple and slow (exponential) solution.
def doMatch(r: String, s: String): Boolean = {
if (r.isEmpty) s.isEmpty
else if (r.length > 1 && r(1) == '*') star(r(0), r.tail.tail, s)
else if (!s.isEmpty && (r(0) == '.' || r(0) == s(0))) doMatch(r.tail, s.tail)
else false
}
def star(c: Char, r: String, s: String): Boolean = {
if (doMatch(r, s)) true
else if (!s.isEmpty && (c == '.' || c == s(0))) star(c, r, s.tail)
else false
}
Now I would like to improve it. Could you suggest a simple polynomial solution in ~10-15 lines of "pure" Scala code ?

Parsing non-alphanumeric characters in QueryParser

I am working on a former team mate's code using Lucene++ 3.0.3.
There is a comment that claims QueryParser cannot handle "special characters" and one way this has been handled is to replace "special characters" with a space:
if (((*pos) >= L'A' && (*pos) <= L'Z') ||
((*pos) >= L'a' && (*pos) <= L'z') ||
... ||
(*pos == L'-'))
{
// do nothing, these are OK
} else {
// remaining characters are []{}*
(*pos) = L' ';
}
StandardAnalyzer is the Analyzer being used. (Thanks Mark)
I assume the "special characters" are for combining queries or some sort of wildcard processing, for want of a better term.
Is there a better function that can account for these characters within a query string?
You need to look at what Analyzer is used, as the Analyzer determines the Tokenizer used (and the Tokenizer determines which characters are special).