Regex: change the order of the controls - regex

I would like to check if a string contains:
at least 1 number
at least 2 characters (uppercase or lowercase)
This is the regex I though I might use:
(?=(?:.*?\d))(?=(?:.*?[A-Za-z]){2})
With aa1 the test gives a false statement, while with a1a or 1aa it gives a true result.
The strange thing is that if I change the order of the controls in the regexp:
(?=(?:.*?[A-Za-z]){2})(?=(?:.*?\d))
all 3 of the test string I used wives a true value.
How is it possible?
Thanks

You wouldn't happen to be writing this in JavaScript and testing in Internet Explorer, would you? That configuration has a known bug that causes this kind of error.

It is not strange. Your first regex checks if there is one number followed somewhere by 2 chars.
The second one checks it in the other way. You need to take both cases in account.
Something like this should work (not tested)
/(\d(?:.*?)[a-z]{2})|([a-z]{2}(?:.*?)\d)/i

Try this:
(?=\D*\d)(?=[^A-Za-z]*[A-Za-z][^A-Za-z]*[A-Za-z])
Or a little more compact:
(?=\D*\d)(?=(?:[^A-Za-z]*[A-Za-z]){2})

Related

Is there a simple way to replace non numeric characters hive excluding - to allow only -ve and +ve numbers

The following will give me 9090 but I wish to get -9090
regexp_replace('abcd-9090',[^0-9],'')
If I use regexp_replace('abcd-9090',[^0-9-],'')
then it gives -9090
but when the string is abcd9090- it would give me 9090-
There could be many more cases I guess where abc-abcd-9090 would give me -9090 but its safe to assume that such will not be the case and there would be only a single - before the numeric values.
Since there could be many cases , I am just supposed to assume the best and replace the flawed code with a more correct pattern which produces an integer almost always.
May be like assuming a condition where only single - could come at the beginning of any digits in the string is okay to assume.
Any help is appreciated.
I guess you can try to use regexp_extract instead:
regexp_extract('abcd-9090','.*(-[0-9]+)',1)
UPD from comment - author need to address one more corner case:
regexp_extract(regexp_replace('-ab2cd9090','[^\\d-]+',''),'(-?\\d+)',1)

How to programmatically learn regexes?

My question is a continuation of this one. Basically, I have a table of words like so:
HAT18178_890909.098070313.1
HAT18178_890909.098070313.2
HAT18178_890909.143412462.1
HAT18178_890909.143412462.2
For my purposes, I do not need the terminal .1 or .2 for this set of names. I can manually write the following regex (using Python syntax):
r = re.compile('(.*\.\d+)\.\d+')
However, I cannot guarantee that my next set of names will have a similar structure where the final 2 characters will be discardable - it could be 3 characters (i.e. .12) and the separator could change as well (i.e. . to _).
What is the appropriate way to either explicitly learn a regex or to determine which characters are unnecessary?
It's an interesting problem.
X y
HAT18178_890909.098070313.1 HAT18178_890909.098070313
HAT18178_890909.098070313.2 HAT18178_890909.098070313
HAT18178_890909.143412462.1 HAT18178_890909.143412462
HAT18178_890909.143412462.2 HAT18178_890909.143412462
The problem is that there is not a single solution but many.
Even for a human it is not clear what the regex should be that you want.
Based on this data, I would think the possibilities to learn are:
Just match a fixed width of 25: .{25}
Fixed first part: HAT18178_890909.
Then:
There's only 2 varying numbers on each single spot (as you show 2 cases).
So e.g. [01] (either 0 or 1), [94] the next spot and so on would be a good solution.
The obvious one would be \d+
But it could also be \d{9}
You see, there are multiple correct answers.
These regexes would still work if the second point would be an underscore instead.
My conclusion:
The problem is that it is much more work to prepare the data for machine learning than it is to create a regex. If you want to be sure you cover everything, you need to have complete data, so then a regex is probably less effort.
You could split on non-alphanumeric characters;
[^a-zA-Z0-9']+
That would get you, in this case, few strings like this:
HAT18178
890909
098070313
1
From there on you can simply discard the last one if that's never necessary, and continue on processing the first sequences

Using Regex to validate the number of words in a text area

I am attempting to write a MVC model validation that verifies that there is 10 or more words in a string. The string is being populated correctly, so I did not include the HTML. I have done a fair bit of research, and it seems that something along the lines of what I have tries should work, but, for whatever reason, mine always seem to fail. Any ideas as to what I am doing wrong here?
(using System.ComponentModel.DataAnnotations, in a mvc 4 vb.net environment)
Have tried ([\w]+){10,}, ((\\S+)\s?){10,}, [\b]{20,}, [\w+\w?]{10,}, (\b(\w+?)\b){10,}, ([\w]+?\s){10}, ([\w]+?\s){9}[\w], ([\S]+\s){9}[\S], ([a-zA-Z0-9,.'":;$-]+\s+){10,} and several more varaiations on the same basic idea.
<Required(ErrorMessage:="The Description of Operations field is required"), RegularExpression("([\w]+){20,}", ErrorMessage:="ERROZ")>
Public Property DescOfOperations As String = String.Empty
Correct Solution was ([\S]+\s+){9}[\S\s]+
EDIT Moved accepted version to the top, removing unused versions. Unless I am wrong and the whole sequence needs to match, then something like (also accounting for double spaces):
([\S]+\s+){9}[\S\s]+
Or:
([\w]+?\s+){9}[\w]+
Give this a try:
([a-zA-Z0-9,.'":;$-]+\s){10,}

How to create a regex to check whether a set of words exists in a given string?

How can I write a regex to check if a set of words exist in a given string?
For example, I would like to check if a domain name contains "yahoo.com" at the end of it.
'answers.yahoo.com', would be valid.
'yahoo.com.answers', would be wrong. 'yahoo.com' must come in the end.
I got a hint from somewhere that it might be something like this.
"/^[^yahoo.com]$/"
But I am totally new to regex. So please help with this one, then I can learn further.
When asking regex questions, always specify the language or application, too!
From your history it looks like JavaScript / jQuery is most likely.
Anyway, to test that a string ends in "yahoo.com" use /.*yahoo\.com$/i
In JS code:
if (/.*yahoo\.com$/i.test (YOUR_STR) ) {
//-- It's good.
}
To test whether a set of words has at least one match, use:
/word_one|word_two|word_three/
To limit matches to just the most-common, legal sub-domains, ending with "yahoo.com", use:
/^(\w+\.)+yahoo\.com$/
(As a crude, first pass)
For other permutations, please clarify the question.

How to validate with regex that a string is OK as long as it contains 10 digits?

I'm processing input from a Web form. Basically, all I care about is that the value provided includes 10 digits, no more, no less.
These would be valid inputs:
1234567890
123 456 789 0 Hello!
My number is: 123456-7890 thanks
These would be invalid inputs:
123456789033 (too long)
123 Hello! (too short)
My number is one five zero nine thanks (no digits)
I've tried many different things with Regextester but it never matches correctly. I'm using the 'preg' setting (which is what I figured my CMS Typo3 uses) and my closest attempt is:
([0-9][^0-9]*){10}
which is kinda lame but is the closest I got.
Cheers!
EDIT: I cannot use any programming language to implement this. Imagine that I have a admin console field in front of me, in which I must enter a regular expression that will be used to validate the value. That's all the latitude I have. Cheers.
I think you've got the right idea. Maybe you can simplify it as (\d\D*){10}
If the regex has to match the complete string, you would want \D*(\d\D*){10}
UPDATE: It looks like you need ^\D*(\d\D*){10}$ to make sure you match the complete string.
A regular expression is not always the best tool for this kind of job. In this case it's probably easier and simpler to write a function to count the number of digits in a string. (Since you didn't mention a programming language, I'll use Python in my example.)
def count_digits(s):
return len([x for x in s if x.isdigit()])
Then, you can use it like this:
s = "My number is: 123456-7890 thanks"
if count_digits(s) == 10:
print("looks okay")
else:
print("doesn't contain 10 digits")