Using Regex to only accept x amount of a certain character - regex

I have this Regex pattern:
\b(?:[A-Z\d]+[\/\-])+[A-Z\d]+\b
And it collects everything I need perfectly, but then also grabs some things I don't want. I'm wondering how to write in there something like I do want to accept "-", but no more than 5 at a time. Same with "/" but maybe no more than 1 for those. Here's an example of what it's grabbing that I do want vs what it's grabbing that I don't want:
Yes:
AIR-CT2504-50-K9
1000BASE-T
ISR4451-X-SEC/K9
No:
0/1/10/5/50
2B108-250A-2B-2B-2B-250A-2B-2B-2B-250A-2B-2B
2022/10/28

If you don't want partial matches, you might use anchors and exclude a certain number of hyphens or forward slashes.
As the strings do not seems to contain spaces, and you can mix - and /:
^(?!(?:[^\s-]*-){5})(?!(?:[^\s\/]*\/){2})(?:[A-Z\d]+[\/-])+[A-Z\d]+$
The pattern matches:
^ Start of string
(?!(?:[^\s-]*-){5}) Assert not 5 hyphens where [^\s-] matches a non whitespace char except for -
(?!(?:[^\s\/]*\/){2}) Assert not 2 forward slashes
(?:[A-Z\d]+[\/-])+ Repeat 1+ times matching 1+ chars A-Z or digits followed by either / or -
[A-Z\d]+ match 1+ chars A-Z or a digit
$ End of string
Regex demo

Related

RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that are overriden afterwards?

The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-]+ (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-]+$
^ Start of string
(?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
(?=[\d-]{5}) Assert at least 5 digits or -
[A-Z\d-]+ Match 1+ times any of the listed characters
$ End of string
Regex demo
If atomic groups are available:
^(?=[\d-]{5})(?>\d+-*|-{5})[A-Z\d_]*$
^ Start of string
(?=[\d-]{5}) Assert at least 5 chars - or digit
(?> Atomic group
\d+-* Match 1+ digits and optional -
| or
-{5} match 5 times -
) Close atomic group
[A-Z\d_]* Match optional chars A-Z digit or _
$ End of string
Regex demo
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
demo
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
demo

I wrote url validation regex but the regex is very slow

I know this is slow because of ([\.\-][a-z0-9])*. But I don't know how to optimize it.
^https:\/\/([a-z0-9]+([\.\-][a-z0-9])*)+(\.([a-z]{2,11}|[0-9]{1,5}))(:[0-9]{1,5})?(\/.*)?$
You don't have to use this part )*)+ in your pattern. This could also potentially lead to catastrophic backtracking.
Note that you only have to escape the backslash if the delimiters for the regex are also / and you don't have to escape the [\.\-]
If you don't need that capture groups afterwards, you can omit them.
^https:\/\/[a-z0-9]+(?:[.-][a-z0-9]+)*\.(?:[a-z]{2,11}|[0-9]{1,5})(?::[0-9]{1,5})?(\/.*)?$
The pattern matches:
^ Start of string
https:\/\/ Match https:// As you only want to match https
[a-z0-9]+ Match 1+ times any of the listed
(?:[.-][a-z0-9]+)* Optionally repeat matching . or - and 1+ times any of the listed
\.(?:[a-z]{2,11}|[0-9]{1,5}) Match either 2-11 times a char a-z or match 1-5 digits
(?::[0-9]{1,5})? Optionally match : and 1-5 digits
(\/.*)? Optionally match /` and the rest of the line
$ End of string
Regex demo

Regex match string 3-6 characters long, at least one letter, no duplicate "-"

I have to match a string that is 3-6 characters long, contains at least one letter, but can have letters, numbers and only 1 "-".
The "-" must not be at the start or at the beginning.
Match:
string
str-ng
st-ng
s1-1g
st-1g
Do not match:
strings
-string
string-
st--ng
s-tn-g
1111
st
The closest I've gotten is this:
^((?!-.*-)[0-9A-Z]{3,6})$
But this divides the regex match with - So it matches s-tri but not st-ri because there aren't 3 chars at each end
Maybe you can use:
^(?=.*[a-z])(?!-|.*-$|.*-.*-)[a-z\d-]{3,6}$
See the online demo
^ - Start string anchor.
(?=.*[a-z]) - Positive lookahead to make sure there is at least one letter.
(?!-|.*-$|.*-.*-) - Negative lookahead to prevent a hyphen at the beginning or at the end or multiple.
[a-z\d-]{3,6} - Three to six times a character from the give class.
$ - End string anchor.
Note that I used the case-insensitive flag.
You can use
^(?=.{3,6}$)(?=[^a-zA-Z]*[A-Za-z])[0-9a-zA-Z]+(?:-[0-9a-zA-Z]+)?$
See the regex demo. Details:
^ - start of string
(?=.{3,6}$) - string must contain three to six chars other than line break chars
(?=[^a-zA-Z]*[A-Za-z]) - there must be at least one ASCII letter in the string
[0-9a-zA-Z]+ - one or more alphanumeric ASCII chars
(?:-[0-9a-zA-Z]+)? - an optional sequence of - and then one or more alphanumeric ASCII chars
$ - end of string.
Looking at the pattern that you tried, you meant to exclude the match when there are 2 hyphens present using the negative lookahead.
Also this part [0-9A-Z]{3,6} does not match a hyphen.
Reading
The "-" must not be at the start or at the beginning.
You might do that using
^(?![^\n-]*-[^\n-]*-)(?=[^a-zA-Z\n]*[a-zA-Z])[a-zA-Z0-9][a-zA-Z0-9-]{2,5}$
Regex demo
If you meant also no - at the end:
^(?![^\n-]*-[^\n-]*-)(?=[^a-zA-Z\n]*[a-zA-Z])[a-zA-Z0-9][a-zA-Z0-9-]{1,4}[a-zA-Z0-9]$
Explanation
^ Start of string
(?![^\n-]*-[^\n-]*-) Assert not 2 times -
(?=[^a-zA-Z\n]*[a-zA-Z]) Assert a char a-zA-Z
[a-zA-Z0-9] Match One of the listed without -
[a-zA-Z0-9-]{1,4} Repeat 1-4 times any of the listed including -
[a-zA-Z0-9] Match One of the listed without -
$ End of string
Regex demo

Regex - Allow dash character in body text but not at start or end

How can I allow one subsequent dash character in the body part but not at start or end?
https://regex101.com/r/D8MAXP/8/
One subsequent example: https://regex101.com/r/D8MAXP/9/
Regex
^((https?):\/\/)?(www.)?([a-z0-9-])+\.[a-z]+(\/[a-zA-Z0-9#]+\/?)*$
Allow:
http://www.b-c.de
https://www.b-c.de
www.b-c.de
b-c.de
Don't allow:
https://www.foufos-.gr
http://www.foufos-.gr
https://-foufos.gr
http://foufos-.gr
www.-foufos.gr
www.foufos-.gr
www.-foufos.gr
foufos-.gr
-foufos.gr
Instead of matching the - in the character class, you could take it out and use a repeating group prepending the hyphen before the character class
Use a * to repeat it 0+ times or a ? to match it zero or 1 times.
For the example data in the question, you might use
^((https?):\/\/)?(www\.)?[a-z0-9]+(?:-[a-zA-Z]+)*(?:\.[a-z]+)+(\/[a-zA-Z0-9#]+\/?)*$
Regex demo
For all the links in the regex101 example, you might use for example 2 negative lookaheads:
^(?!ww?\.)(?:https?:\/\/)?(?:www\.)?(?!.*\.www\b)[a-z0-9]+(?:-[a-zA-Z]+)*(?:\.[a-z]+)+(?:\/[a-zA-Z0-9#]+\/?)*$
In parts
^ Start of string
(?!ww?\.) Assert not starting with 1 or 2 times a w char followed by a .
(?:https?:\/\/)? Optionally match the protocol part
(?:www\.)? Optionally match www.
(?!.*\.www\b) Assert that what is on the right is not again www.
[a-z0-9]+(?:-[a-zA-Z]+)* Match chars a-z0-9 optionally repeated by a - and again chars a-z0-9
(?:\.[a-z]+)+ Repeat 1+ times a dot and 1+ chars a-z
(?:\/[a-zA-Z0-9#]+\/?)* Repeat 0+ times matching / and 1+ times any of the listed followed by an optional question mark
$ End of string
Regex demo

Regex: Detect Phone numbers that are separated by dashes (-) and/or spaces

I am trying to recognize these types of phone number inputs:
0172665476
+6265476393
+62-65476393
+62-654-76393
+62 65476393
While my regex: (?:\d+\s*)+ can recognize the 1st 2 sample values, it recognizes the last 3 sample values as multiple matches in each line, instead of recognizing the number as a whole.
How can I modify this to support multiple dashes and/or spaces and still recognize it as 1 whole number instead of multiple matches?
You may use this regex:
^\+?\d+(?:[\s-]\d+)*\b
RegEx Details:
^\+?: Match optional + at start
\d+: match 1+ digits
(?:[\s-]\d+)*: Match 0 or more groups that start with whitespace or - followed by 1+ digits
$: End (Replaced by word boundary as if there are trailing spaces, that match would be missed.)
This should work:
(?:[\d +-]+)+
This would work as per your reqt: (If there are trailing spaces, this regex will ignore.)
Regex: '^(?:[\d +-]+)\b'
Another option could be to use an alternation to match either 10 digits without a leading plus sign or match the pattern with a +, and optional space or hyphen:
(?:\d{10}|\+\d{2}[- ]?\d{3}-?\d{5})\b
That will match:
(?: Non capturing group
\d{10} Match 10 digits
| Or
\+\d{2}[-\s]?\d{3}-?\d{5} Match +, 2 digits, optional whitespace char or -, 3 digits, optional -, 5 digits
)\b Close non capturing group and word boundary
Regex demo
If your language supports negative lookbehinds you could prepend (?<!\S) which checks that what comes before is not a non-whitespace character.