Regex - disallow combinations like "u12345" - regex

I need to disallow combinations in this structure:
start by small "u"
the 5 following characters can not be numbers within (if starts by "u")
except this disallowed combination, allow only [a-zA-Z0-9]+
I did only regex like ^[^u][^0-9][^0-9][^0-9][^0-9][^0-9]$, because I have no idea for add only except for starting by "u".
List of some allowed combinations:
u12adfw3
u1a234
ud1235
And list of disallowed combinations:
u12345
u91
u1
I need this for aliases for system-generated name like "u20". Because I am creating system, when user can be identified by name/alias/e-mail (just looking for that string in database) and because user do not must set own alias, I want get there some limits. The destination of this regex is "pattern" in input tag in HTML or PHP check after submit.
If you have some interesting tutorials to do that/topics with simiplar problem or you just want help me, thanks you in advance :)
Greetings

If you're checking that in PHP, you could use preg_match and check with this regex:
^(?!u\d{1,5}\b)
preg_match will return false if the string begins with a u and 1 to 5 digits.
^ matches at the beginning of the string.
(?! ... ) is a negative lookahead. If what's inside matches, the whole regex will fail.
u\d{1,5} is to match u followed by 1 to 5 digits.
\b is a word boundary and will prevent any following word characters.

Related

Regex to match(extract) string between dot(.)

I want to select some string combination (with dots(.)) from a very long string (sql). The full string could be a single line or multiple line with new line separator, and this combination could be in start (at first line) or a next line (new line) or at both place.
I need help in writing a regex for it.
Examples:
String s = I am testing something like test.test.test in sentence.
Expected output: test.test.test
Example2 (real usecase):
UPDATE test.table
SET access = 01
WHERE access IN (
SELECT name FROM project.dataset.tablename WHERE name = 'test' GROUP BY 1 )
Expected output: test.table and project.dataset.tablename
, can I also add some prefix or suffix words or space which should be present where ever this logic gets checked. In above case if its update regex should pick test.table, but if the statement is like select test.table regex should not pick it up this combinations and same applies for suffix.
Example3: This is to illustrate the above theory.
INS INTO test.table
SEL 'abcscsc', wu_id.Item_Nbr ,1
FROM test.table as_t
WHERE as_t.old <> 0 AND as_t.date = 11
AND (as_t.numb IN ('11') )
Expected Output: test.table, test.table (Key words are INTO and FROM)
Things Not Needed in selection:as_t.numb, as_t.old, as_t.date
If I get the regex I can use in program to extract this word.
Note: Before and after string words to the combination could be anything like update, select { or(, so we have to find the occurrence of words which are joined together with .(dot) and all the number of such occurrence.
I tried something like this:
(?<=.)(.?)(?=.)(.?) -: This only selected the word between two .dot and not all.
.(?<=.)(.?)(?=.)(.?). - This everything before and after.
To solve your initial problem, we can just use some negation. Here's the pattern I came up with:
[^\s]+\.[^\s]+
[^ ... ] Means to make a character class including everything except for what's between the brackets. In this case, I put \s in there, which matches any whitespace. So [^\s] matches anything that isn't whitespace.
+ Is a quantifier. It means to find as many of the preceding construct as you can without breaking the match. This would happily match everything that's not whitespace, but I follow it with a \., which matches a literal .. The \ is necessary because . means to match any character in regex, so we need to escape it so it only has its literal meaning. This means there has to be a . in this group of non-whitespace characters.
I end the pattern with another [^\s]+, which matches everything after the . until the next whitespace.
Now, to solve your secondary problem, you want to make this match only work if it is preceded by a given keyword. Luckily, regex has a construct almost specifically for this case. It's called a lookbehind. The syntax is (?<= ... ) where the ... is the pattern you want to look for. Using your example, this will only match after the keywords INTO and FROM:
(?<=(?:INTO|FROM)\s)[^\s]+\.[^\s]+
Here (?:INTO|FROM) means to match either the text INTO or the text FROM. I then specify that it should be followed by a whitespace character with \s. One possible problem here is that it will only match if the keywords are written in all upper case. You can change this behavior by specifying the case insensitive flag i to your regex parser. If your regex parser doesn't have a way to specify flags, you can usually still specify it inline by putting (?i) in front of the pattern, like so:
(?i)(?<=(?:INTO|FROM)\s)[^\s]+\.[^\s]+
If you are new to regex, I highly recommend using the www.regex101.com website to generate regex and learn how it works. Don't forget to check out the code generator part for getting the regex code based on the programming language you are using, that's a cool feature.
For your question, you need a regex that understands any word character \w that matches between 0 and unlimited times, followed by a dot, followed by another series of word character that repeats between 0 and unlimited times.
So here is my solution to your question:
Your regex in JavaScript:
const regex = /([\w][.][\w])+/gm;
in Java:
final String regex = "([\w][.][\w])+";
in Python:
regex = r"([\w][.][\w])+"
in PHP:
$re = '/([\w][.][\w])+/m';
Note that: this solution is written for your use case (to be used for SQL strings), because now if you have something like '.word' or 'word..word', it will still catch it which I assume you don't have a string like that.
See this screenshot for more details

Regular expression starts with a string but not contain a special character after that

I am trying to find a regular expression which basically matches start of a string but not having a specific character after that. By this I should be achieving same level routes.
Example : Lets say I have the following strings and I need to get routes starting from LAX with no stops.
LAX-LAS-JFK
LAX-PHX-JFK
LAX-JFK
LAX-PHX
The regex should match only route 3 and 4.
I have tried this ^LAX-([^-])* and it didn't work for me when I cross checked on https://www.regextester.com/15.
You can try this:
^LAX(-[A-Z]+){1}$
This matches
LAX-JFK
LAX-PHX
but not
LAX-LAS-JFK
LAX-PHX-JFK
Demo: regex101
Explanation:
^ start
$ end
{1} exact number of repetitions of a pattern, in this case 1
Fun fact: you can replace the 1 by (number of stops + 1), and it will select only the routes with the defined number of stops (another example).
So it sounds like you want to match with strings that only have 1 dash. Perhaps something like this ^(LAX)(-{1})[a-zA-Z]+$ would work? It will check to make sure the string LAX is in the beginning, followed by one dash and ending with alphabetical characters.

Regex to MATCH number string (with optional text) in a sentence

I am trying to write a regex that matches only strings like this:
89-72
10-123
109-12
122-311(a)
22-311(a)(1)(d)(4)
These strings are embedded in sentences and sometimes there are 2 potential matches in the sentence like this:
In section 10-123 which references section 122-311(a) there is a phone number 456-234-2222
I do not want to match the phone. Here is my current working regex
\d{2,3}\-\d{2,3}(\([a-zA-Z0-9]\))*
see DEMO
I've been looking on Stack and have not found anything yet. Any help would be appreciated. Will be using this in a google sheet and potentially postgres.
Based on regex, suggested by #Wiktor Stribiżew:
=REGEXEXTRACT(A1,REPT("\b(\d{2,3}-\d{2,3}\b(?:\([A-Za-z0-9]\))*)(?:[^-]|$)(?:.*)",LEN(REGEXREPLACE(REGEXREPLACE(A1,"\b(\d{2,3}-\d{2,3}\b(?:\([A-Za-z0-9]\))*)(?:[^-]|$)", char (9)),"[^"&char(9)&"]",""))))
The formula will return all matches.
String:
A
In 22-311(a)(1)(d)(4) section 10-123 which ... 122-311(a) ... number 456-234-2222
Output:
B C D
22-311(a)(1)(d)(4) 10-123 122-311(a)
Solution
To extract all matches from a string, use this pattern:
=REGEXEXTRACT(A1,
REPT(basic_regex & "(?:.*)",
LEN(REGEXREPLACE(REGEXREPLACE(A1,basic_regex, char (9)),"[^"&char(9)&"]",""))))
The tail of a function:
LEN(REGEXREPLACE(REGEXREPLACE(A1,basic_regex, char (9)),"[^"&char(9)&"]","")))
is just for finding number 3 -- how many entries of a pattern in a string.
To not match the phone number you have to indicate that the match must neither be preceded nor followed by \d or -. Google spreadsheet uses RE2 which does not support look around assertion (see the list of supported feature) so as far as I can tell, the only solution is to add a character before and after the match, or the string boundary:
(?:^|[^-\d])\d{2,3}\-\d{2,3}(\([a-zA-Z0-9]\))*(?:$|[^-\d])
(?:^|[^-\d]) means either the start of a line (^) or a character that is not - or \d (you might want to change that, and forbid all letters as well). $ is the end of a line. ^ and $ only do what you want with the /m flag though
As you can see here this finds the correct strings, but with additional spaces around some of the matches.

Interesting easy looking Regex

I am re-phrasing my question to clear confusions!
I want to match if a string has certain letters for this I use the character class:
[ACD]
and it works perfectly!
but I want to match if the string has those letter(s) 2 or more times either repeated or 2 separate letters
For example:
[AKL] should match:
ABCVL
AAGHF
KKUI
AKL
But the above should not match the following:
ABCD
KHID
LOVE
because those are there but only once!
that's why I was trying to use:
[ACD]{2,}
But it's not working, probably it's not the right Regex.. can somebody a Regex guru can help me solve this puzzle?
Thanks
PS: I will use it on MYSQL - a differnt approach can also welcome! but I like to use regex for smarter and shorter query!
To ensure that a string contains at least two occurencies in a set of letters (lets say A K L as in your example), you can write something like this:
[AKL].*[AKL]
Since the MySQL regex engine is a DFA, there is no need to use a negated character class like [^AKL] in place of the dot to avoid backtracking, or a lazy quantifier that is not supported at all.
example:
SELECT 'KKUI' REGEXP '[AKL].*[AKL]';
will return 1
You can follow this link that speaks on the particular subject of the LIKE and the REGEXP features in MySQL.
If I understood you correctly, this is quite simple:
[A-Z].*?[A-Z]
This looks for your something in your set, [A-Z], and then lazily matches characters until it (potentially) comes across the set, [A-Z], again.
As #Enigmadan pointed out, a lazy match is not necessary here: [A-Z].*[A-Z]
The expression you are using searches for characters between 2 and unlimited times with these characters ACDFGHIJKMNOPQRSTUVWXZ.
However, your RegEx expression is excluding Y (UVWXZ])) therefore Z cannot be found since it is not surrounded by another character in your expression and the same principle applies to B ([ACD) also excluded in you RegEx expression. For example Z and A would match in an expression like ZABCDEFGHIJKLMNOPQRSTUVWXYZA
If those were not excluded on purpose probably better can be to use ranges like [A-Z]
If you want 2 or more of a match on [AKL], then you may use just [AKL] and may have match >= 2.
I am not good at SQL regex, but may be something like this?
check (dbo.RegexMatch( ['ABCVL'], '[AKL]' ) >= 2)
To put it in simple English, use [AKL] as your regex, and check the match on the string to be greater than 2. Here's how I would do in Java:
private boolean search2orMore(String string) {
Matcher matcher = Pattern.compile("[ACD]").matcher(string);
int counter = 0;
while (matcher.find())
{
counter++;
}
return (counter >= 2);
}
You can't use [ACD]{2,} because it always wants to match 2 or more of each characters and will fail if you have 2 or more matching single characters.
your question is not very clear, but here is my trial pattern
\b(\S*[AKL]\S*[AKL]\S*)\b
Demo
pretty sure this should work in any case
(?<l>[^AKL\n]*[AKL]+[^AKL\n]*[AKL]+[^AKL\n]*)[\n\r]
replace AKL for letters you need can be done very easily dynamicly tell me if you need it
Is this what you are looking for?
".*(.*[AKL].*){2,}.*" (without quotes)
It matches if there are at least two occurences of your charactes sorrounded by anything.
It is .NET regex, but should be same for anything else
Edit
Overall, MySQL regular expression support is pretty weak.
If you only need to match your capture group a minimum of two times, then you can simply use:
select * from ... where ... regexp('([ACD].*){2,}') #could be `2,` or just `2`
If you need to match your capture group more than two times, then just change the number:
select * from ... where ... regexp('([ACD].*){3}')
#This number should match the number of matches you need
If you needed a minimum of 7 matches and you were using your previous capture group [ACDF-KM-XZ]
e.g.
select * from ... where ... regexp('([ACDF-KM-XZ].*){7,}')
Response before edit:
Your regex is trying to find at least two characters from the set[ACDFGHIJKMNOPQRSTUVWXZ].
([ACDFGHIJKMNOPQRSTUVWXZ]){2,}
The reason A and Z are not being matched in your example string (ABCDEFGHIJKLMNOPQRSTUVWXYZ) is because you are looking for two or more characters that are together that match your set. A is a single character followed by a character that does not match your set. Thus, A is not matched.
Similarly, Z is a single character preceded by a character that does not match your set. Thus, Z is not matched.
The bolded characters below do not match your set
ABCDEFGHIJKLMNOPQRSTUVWXYZ
If you were to do a global search in the string, only the italicized characters would be matched:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Regex validation to don't allow submission of numbers which starts with given sequence

we Have an issue with spammers and I'd like to add a validation regex to the phone field in my form, in order to don't allow input which starts with a particular sequence of numbers.
I am using a wordpress plugin to build up the form, and I can add custom regex validation to each field.
so at the moment for my phone field I am using a text field and I have this regex to allow only numbers: /^\d+$/
the prefixes I'd like to block are these:
+44704, +44714, 0704, 0714, 0044704, 0044714
is it possible to create a regex which will check if the input starts with one of these sequences, and if yes it will block them?
If possible I need it to keep allowing only numbers, in addition of allowing only if it's not starting with one of those sequences.
I hope someone will be able to help me, as I really don't understand regex at all.. :(
Thank You!
You can make use of optional groups, like this:
^\+?(?:(?:00)?44|0)7[01]4
regex101 demo
This regex matches only strings that begins with the patterns you described. To negate it, you could use a negative lookahead with the pattern above:
^(?!\+?(?:(?:00)?44|0)7[01]4)
^ matches the beginning of the line
\+? matches an optional + sign.
(?:(?:00)?44|0) matches either of: 0044, or 44, or 0
7[01]4 matches either 704 or 714.
To validate the whole entry string and prevent the matches, then add the bit you already had, with an optional + sign:
/^(?!\+?((00)?44|0)7[01]4)\+?\d+$/