Regular expression for comma seperated name search with wild card - regex

Right now I am using multiple if conditions to valid the input for search by name with wildcard(*). Since I have multiple 'if' with inner 'if' statements I am trying to use regular expression to validate my input. I want to use this expression in both front end and back end.
Appreciate if anyone can help.
Validating rules are follow
Input is last name, first name i.e. separated by comma.
Must have at least two characters while using wild card search.
Valid wildcard character is '*' only.
At most two wildcard characters can be used.
No consecutive wild cards.
If no wild card used no constraint on length of characters in both last and first name.
Some of the valid inputs are:
- hopkins, johns
- h, j
- ho*, jp*
- *ins, johns
- *op*, john*
Some of the invalid inputs are:
- hopkins johns
- h*, johns
- hop**, joh*
- h*pk*n*
If regular expression not going to be complex we can consider this as valid otherwise it OK to consider this as invalid
- ho*in*, jo*
In short general name format is
[*]XX[*], [*]XX[*]
where [] ==> Optional
X ==> A-Z, a-z
XX ==> length 2 or more if wild card used

You can use this regex
\*?[a-zA-Z]{2,}\*?, \*?[a-zA-Z]{2,}\*?
The before doing validation with the above regex, just do something like match the number of * with the regex /\*/g and make sure it's length is between 0 to 2.

With the help of #Amit_Joki answer I wrote the following code and its working fine.
var nameArray = [...];
var re = /\*?[a-zA-Z]{2,}\*?, \*?[a-zA-Z]{2,}\*?/;
for (var i = 0; i < nameArray.length; i++) {
if(nameArray[i].indexOf(',') < 0 ||
(nameArray[i].indexOf('*') >= 0 && !re.test(nameArray[i]))) {
console.log(nameArray[i] + ": Invalid");
} else {
console.log(nameArray[i] + ": Valid");
}
}

Related

Check string has a date in it and extract part of the string

I have thousands of lines of text that I need to work through and the lines I am interested with lines that look like the following:
01/04/2019 09:35:41 - Test user (Additional Comments)
I am currently using this code to filter out all the other rows:
If InStr(FullCell(i), " - ") <> 0 And InStr(FullCell(i), ":") <> 0 And InStr(FullCell(i), "(") <> 0 Then
FullCell is the array that I am working through.
which I know is not the best way to do it. Is there a way to check that there is a date at the beginning of the string in the format dd/mm/yyyy and then extract the user name inbetween the '-' and the '(' symbol.
I had a play with regex to see if that could help but i'm limited in skills to be able to pull off both VBA and regex in the same code.
Whats the best way to do this.
Assuming Fullcell(i) contains the string,
If Left(Fullcell(i), 10) Like "##/##/####"
Will return True if you have a date (note that it will not differentiate between dd/mm/yyyy and mm/dd/yyyy.
And
Mid(Fullcell(i), InStr(Fullcell(i), " - ") + 2, InStr(Fullcell(i), " (") - InStr(Fullcell(i), " - ") - 2)
Will return the username
I'm sure there is a more efficient way to do this, but I've used the following solution quite a few times:
This will select the date:
x = 1
Do While Mid(FullCell,1,x) <> " "
x = x + 1
Loop
strDate = Left(FullCell,x)
This will find the character number of the hyphen, the username starts 2 characters after.
x = 1
Do While Mid(FullCell,x,1) <> "-"
x = x + 1
Loop
Then we will find the end of the username
y = x + 2
Do While Mid(FullCell,y,1) <> " "
y = y + 1
Loop
The username should now be characters (x+2 to y-1)
strUsername = Mid(FullCell, x + 2, y - (x + 2) - 1)
Here's how I would do it
Dim your variables
Dim ring as Range
Dim dat as variant
Dim FullCell() as string
Dim User as string
Dim I as long
Set your range
Set rng = ` any way you choose
Dat = rng.value2
Loop dat
For i = 1 to UBound(dat, 1)
Split the data
FullCell = Trim(Split(FullCell, "-"))
Test if it split
If UBound(FullCell) > 0 Then
Test if it matches
If IsDate(FullCell(0)) Then
i = Instr(FullCell(1), "(")-1)
If i then
User = left$(FullCell(1), i)
' Found a user
End If
End If
End If
Next
Abstraction is your friend, it's always helpful to break these into their own private functions whenever you can. You could put your code in a function and call it something like ExtractUsername.
Below I did an example of this, and I decided to go with the RegExp approach (late binding), but you could use string functions like the examples above as well.
This function returns the username if it finds the pattern you mentioned above, otherwise, it returns an empty string.
Private Function ExtractUsername(ByVal SourceString As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
'(FIRST GROUP FINDS THE DATE FORMATTED AS DD/MM/YYY, AS WELL AS THE FORWARD SLASH)
'(SECOND GROUP FINDS THE USERNAME) THIS WILL BE SUBMATCH 1
With RegEx
.Pattern = "(^\d{2}\/\d{2}\/\d{4}.*-)(.+)(\()"
.Global = True
End With
Dim Match As Object
Set Match = RegEx.Execute(SourceString)
'ONLY RETURN IF A MATCH WAS FOUND
If Match.Count > 0 Then
ExtractUsername = Trim(Match(0).SubMatches(1))
End If
Set RegEx = Nothing
End Function
The regex pattern is grouped into three parts, the date (and slash), username, and opening parentheses. What you are interested in is the username, which in the SubMatch would be number 1.
Regexr is a helpful site for practicing regular expressions and can show you a bit more of what the pattern I went with is doing.
Please note that using regular expressions might give you performance issues and you should test it against regular string functions to see what works best for your situation.

regular expression to check ends with a digit

EDIT: Hi I want to parse this log
String log1 = "Yellow A Yellow Flow Meter -4363.00 ---> -4194.00 pulse" ;
I used this pattern
String maxPattern11 = "([\\w.*-?\\d.$]+)([\\s]+['--->'|'-->']+[\\s]+)([-?][\\d.]+\\s[\\w]+)";
For the string I want to parse like a series of words separated by white space and ends with a +ve or a -ve digit.
Please reply whats wrong in the pattern
Instead of a difficult regular expression, here is another idea:
String[] words = logLine.split("\\s+");
int n = words.length;
if (n > 3 && words[n - 3].equals("--->")) {
}
It may be more code than the regular expression, but it is much easier to understand.

Multiple capture groups - all optional

I know this (or similar) has been asked a hundred times - but I really need help now :D
The strings the regex should match.
Note: n is in the range of INTEGER_MIN - INTEGER_MAX
{number}
{number(1-n)}
{number(1-n,-n-n)}
{number(1-n,-n-n,0-n)}
If the pattern matches it should result in 3 seperate capture groups, with this results.
All groups should be optional - so that if request in for example Java they return null.
1: 1-n
2: -n-n
3: 0-n
What I've tried:
\{number(?:\(([1-9])(?:(?:,)([0-9])){0,2}\))?\}
This obviously isn't right and is only containing 2 groups (m.groupCount())
Okay, from what I deduced, I would do this:
\{number(?:\((\-?\d+)(?:\,(\-?\d+))?(?:\,(\-?\d+))?\))?\}
Then carry out operations on the captured groups to valid the range of the integers such as...
[Pseudo code since I don't know what language you are using]
captured integers = "capture1", "capture2", "capture3"
if{("capture1" < "capture2" && "capture1" > "capture3") ||
("capture1" > "capture2" && "capture1" < "capture3")} {
Do something
} else {
Do something else; like reject or throw error
}

use regular expression to find and replace but only every 3 characters for DNA sequence

Is it possible to do a find/replace using regular expressions on a string of dna such that it only considers every 3 characters (a codon of dna) at a time.
for example I would like the regular expression to see this:
dna="AAACCCTTTGGG"
as this:
AAA CCC TTT GGG
If I use the regular expressions right now and the expression was
Regex.Replace(dna,"ACC","AAA") it would find a match, but in this case of looking at 3 characters at a time there would be no match.
Is this possible?
Why use a regex? Try this instead, which is probably more efficient to boot:
public string DnaReplaceCodon(string input, string match, string replace) {
if (match.Length != 3 || replace.Length != 3)
throw new ArgumentOutOfRangeException();
var output = new StringBuilder(input.Length);
int i = 0;
while (i + 2 < input.Length) {
if (input[i] == match[0] && input[i+1] == match[1] && input[i+2] == match[2]) {
output.Append(replace);
} else {
output.Append(input[i]);
output.Append(input[i]+1);
output.Append(input[i]+2);
}
i += 3;
}
// pick up trailing letters.
while (i < input.Length) output.Append(input[i]);
return output.ToString();
}
Solution
It is possible to do this with regex. Assuming the input is valid (contains only A, T, G, C):
Regex.Replace(input, #"\G((?:.{3})*?)" + codon, "$1" + replacement);
DEMO
If the input is not guaranteed to be valid, you can just do a check with the regex ^[ATCG]*$ (allow non-multiple of 3) or ^([ATCG]{3})*$ (sequence must be multiple of 3). It doesn't make sense to operate on invalid input anyway.
Explanation
The construction above works for any codon. For the sake of explanation, let the codon be AAA. The regex will be \G((?:.{3})*?)AAA.
The whole regex actually matches the shortest substring that ends with the codon to be replaced.
\G # Must be at beginning of the string, or where last match left off
((?:.{3})*?) # Match any number of codon, lazily. The text is also captured.
AAA # The codon we want to replace
We make sure the matches only starts from positions whose index is multiple of 3 with:
\G which asserts that the match starts from where the previous match left off (or the beginning of the string)
And the fact that the pattern ((?:.{3})*?)AAA can only match a sequence whose length is multiple of 3.
Due to the lazy quantifier, we can be sure that in each match, the part before the codon to be replaced (matched by ((?:.{3})*?) part) does not contain the codon.
In the replacement, we put back the part before the codon (which is captured in capturing group 1 and can be referred to with $1), follows by the replacement codon.
NOTE
As explained in the comment, the following is not a good solution! I leave it in so that others will not fall for the same mistake
You can usually find out where a match starts and ends via m.start() and m.end(). If m.start() % 3 == 0 you found a relevant match.

Regular expression needed for specific string in PHP

I need a regular expression to validate a string with the following conditions
String might contain any of digits space + - () / .
If string contain anything else then it should be invalid
If there is any + in the string then it should be at the beginning and there should at most one + , otherwise it would be invalid, if there are more than one + then it is invalid
String should be 7 to 20 character long
It is not compulsory to have all these digits space + - () / .
But it is compulsory to contain at least 7 digit
I think you are validating phone numbers with E.164 format. Phone number can contain many other format. It can contain . too. Multiple spaces in a number is not uncommon. So its better to format all the numbers to a common format and store that format in db. If that common format is wrong you can throw error.
I validate those phone numbers like this.
function validate_phone($phone){
// replace anything non-digit and add + at beginning
$e164 = "+". preg_replace('/\D+/', '', $phone);
// check validity by length;
return (strlen($e164)>6 && strlen($e164)<21);
}
Here I store $e164 in Db if its valid.
Even after that you can not validate a phone number. A valid phone number format does not mean its a valid number. For this an sms or call is generated against the number and activation code is sent. Once the user inputs the code phone number is fully validated.
You can do this in one regex:
/^(?=(?:.*\d){7})[0-9 ()\/+-][0-9 ()\/-]{6,19}$/
However I would personally do something like:
/^[0-9 ()\/+-][0-9 ()\/-]{6,19}$/
And then strip any non-digit and see if the remaining string is 7 or longer.
Let's try ...
preg_match('/^(?=(?:.*\d){7})[+\d\s()\/\-\.][\d\s()\/\-\.]{6,19}$/', $text);
Breaking this down:
We start with a positive look-ahead that requires a digit at least 7 times.
Then we match all the valid characters, including the plus.
Followed by matching all the valid characters without plus between 6 and 20 times.
A little more concise:
^\+?(?=(.*\d){7})[()/\d-]{7,19}$
'Course, why would you even use regular expressions?
function is_valid($string) {
$digits = 0;
$length = strlen($string);
if($length < 7 || $length > 20) {
return false;
}
for($i = 0; $i < $length; $i++) {
if(ctype_digit($string[$i])) {
$digits++;
} elseif(strpos('+-() ', $string[$i]) === false && ($string[$i] !== '+' || $i !== 0)) {
return false;
}
}
return $digits >= 7;
}