In Regular Expression, disable double dashes

In Regular Expression, disable double dashes - regex

I have this RegExp:
^(?!^(PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(\..+)?$)([a-zA-Z]|)(\:|)[^\x00-\x1f\'\?\-\*\:\"\;\|\/]+$
This do not allow filenames with a single dash. But I would like to do not allow only double dashes (anywhere in the filename/folder), single dash should be ok.
Thanks for any info.

Change the [^\x00-\x1f\'\?\-\*\:\"\;\|\/]+ at the end into an expression which allows this character class, optionally followed by a dash, followed by this character class, repeated any number of times. Add an optional leading and trailing dash as well if you like. (I have added them here because it's easier than to explain :-)
^(?!^(PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(\..+)?$)([a-zA-Z]|)(\:|)-?[^\x00-\x1f\'\?\-\*\:\"\;\|\/]+(-[^\x00-\x1f\'\?\-\*\:\"\;\|\/]+)*-?$
I have required at least one non-dash character; if you want to allow a single dash, the first non-optional group could include that instead, but then the trailing context will have to look different.
I would use non-capturing groups but you're not telling which regex flavor you are using, so maybe you don't have them.

Related

Regular Expression for Password strength with one special characters except Underscore

I have the following regular expression:
^.*(?=^.{8,}$)(?=.*\d)(?=.*[!##$%^&*-])(?=.*[A-Z])(?=.*[a-z]).*$
I am using it to validate for
At least one letter
least one capital letter
least one number
least one special characters
least 8 characters
But along with this I need to restrict the underscore (_).
If I enter password Pa$sw0rd, this is validating correctly, which is true.
If I enter Pa$_sw0rd this is also validating correctly, which is wrong.
The thing is the regex is passing when all the rules are satisfied. I want a rule to restrict underscore along with above.
Any help will be very appreciable.

I think you can use a negated character class [^_]* to add this restriction (also, remove the initial .*, it is redundant, and the first look-ahead is already at the beginning of the pattern, no need to duplicate ^, and it is totally redundant since the total length limit can be checked at the end):
^(?=.*\d)(?=.*[!##$%^&*-])(?=.*[A-Z])(?=.*[a-z])[^_]{8,}$
See demo

^(?=.*?\d)(?=.*?[!##$%^&*-])(?=.*?[A-Z])(?=.*?[a-z])(?!.*_).{8,}$
You can try this..* at start is of no use.See demo.
https://regex101.com/r/pG1kU1/34

Regex for Alphanumeric characters, .#&,’()+/: and one hyphen only

I have a regex for matching letters, numbers and some special characters as follows: ^[A-za-z0-9 .#&,’()+/:]*$
I need to add a single hyphen to this list, not allowing multiple hyphens, but I'm not quite sure how to do it. I saw something along the lines of -{1} but I don't know how to add that to the existing rexex.
I'm using C++ and Qt5.

How about:
^[A-za-z0-9 .#&,’()+/:]*-?[A-za-z0-9 .#&,’()+/:]*$
that could be reduce to:
^[\w .#&,’()+/:]*-?[\w .#&,’()+/:]*$
I don't know if C++ support it, but it could be reduced to:
^([\w .#&,’()+/:])*-?(?1)*$

^[A-za-z0-9.#&,’()+/:]*-[A-za-z0-9.#&,’()+/:]*$ allows a single hyphen anywhere in the string.
Note that the hyphen may come at any part (at the beginning or end of the string also) and it is mandatory also.
To make the hyphen optional, use ^[A-za-z0-9.#&,’()+/:]*-?[A-za-z0-9.#&,’()+/:]*$

regex for links - help to understand it

how do you read this regex?
#(http|https|ftp)://([A-Z0-9][A-Z0-9_-]*(?:.[A-Z0-9][A-Z0-9_-]*)+):?(d+)?/?#i
this is a regex for links, but i'm having trouble to understand it
Thanks

Depending on what language you're in, regexes need a delimiter. Seems the # (pound sign or hash) is used here. So,
#...actual regex goes here...#
In javascript you need forward slashes (/..../).
Some regex engines allow you to pass flags that influence matching process. These appear after the closing delimiter:
#...actual regex goes here...#..flags go here..
In your example, there is one flag, the i and I am guessing that means: "case insensitive" (i for insensitive). Depending on the regex engine you can have flags that influence the syntax you can use for the actual regex (for example, the dot can match either any character or any character except newlines depending upon wheter a flag was passed), flags that influence how the matching is done (for example, in javascript a g indicates the global flag, and that means matching anywhere inside the string is done, and state is preserved), flags that determine whether whitespace is allowed as indentation inside the regex. And some have a m flag indicating whether the regex will be applied on a line by line basis, or on the entire text. There is AFAIK no standard set of flags, check your regex engine documentation.
If you have multiple flags, you just concatenate them together to a string of flags and put them after the closing delimiter.
Now for the actual regex. First, you start with a parenthesized expression:
(...group...)
This is also called a group. In many regex engines, these groups have special meaning, because when a match is found you can access the bits of text that matched the expression inside the group using a special variable (or sometimes, the match is returned as an array, where each element represents a group). If you can access the bits inside groups, it is called a "capturing group".
In this particular case the group uses "alternation" or "choice" and this is indicated by the | (pipe). The pipe is part of the regex syntax and means "or". So,
(http|https|ftp)
means: match "http", or if that doesn't match, "https", of if that doesn't match, "ftp". This also brings up another reason for using parenthesis: of all special regex syntax operators, the pipe has the lowest precedence, so the parenthesis would not have been there, it would have meant: match "http" or "https" or "ftp://...etc"
So far, we've seen these "special characters": | (pipe) and ( and ). After that we get
://
These are not special characters, and any non-special characters simply match themselves.
We then get another group, which makes up almost the rest of the regex:
([A-Z0-9][A-Z0-9_-]*(?:.[A-Z0-9][A-Z0-9_-]*)+)
Inside it, we see a bracketed expression:
[A-Z0-9]
The brackets [ and ] are special, and indicate a "character class". There are other ways to denote character classes, but in all cases a character class matches a single character. Which character depends on the nature of the class. In this case, the class is defined using two ranges:
A-Z
means characters A thru Z (and anything in between) and
0-9
means characters 0 thru 9 (and anything in between).
Basically, [A-Z0-9] matches any alpha-numeric character.
Note that the dash between the boundaries of the range is only a special character inside these bracketed expressions. Paradoxically, a dash inside the brackets can also simply mean a dash if it cannot be interpreted as a range.
This is folllowed by yet another character class:
[A-Z0-9_-]
Almost the same as the previous on, it just adds the underscore and the dash. This last dash cannot be interpreted as a range separator, so it simply means a dash. This character class will match any alpha-numeric character as well as underscore and dash.
This class is followed by a * (asterisk) and this is a special character indicating a cardinality. Cardinalities specify how often the immediately preceding element may occur. These are the common cardinalities:
* (asterisk) means zero or more times.
? (question mask) means zero or once.
+ (plus) means one or more times.
Now the entire bit starts to make sense:
[A-Z0-9][A-Z0-9_-]*
means: a sequence starting with one alphanumeric caracter, optionally followed by a string of "word" characters (that is, alphanumeric, dash and underscore).
The following bit of the regex is this:
(?:.[A-Z0-9][A-Z0-9_-]*)+
I think this is trying to match the domain parts. So that if you have say:
https://mail.google.com
The .google and .com bits would be matched by this part. The initial (?: bit is meant to tell the regex engine to not create a "backreference". This is not really my stronghold, maybe someone else can explain. But the rest of that group is quite clear and resembles what we saw before. I think there is a mistake though: the dot (.) that appears immediately before the bracketed character class usually means "match any character" or "match any non-newline character", not "match a literal dot". Typically if you want a literal dot, you need to escape it. This would be the syntax in javascript and I think perl:
(\.[A-Z0-9][A-Z0-9_-]*)+
(note the backslash immediately before the dot to indicate a literal dot)
The final bits of the regex seem an attempt to match a port number:
:?(d+)?
However, the d+ bit is probably wrong: right now it matches "one or more d's". It should probably be:
:?(\d+)?
meaning: optionally match a colon (:), optionally followed by a bunch of digits. The \d is also a character class, but a predefined one. I think most regex engines use \d to denote a digit, but you should check the documentation of your engine to see the exact convention. So in say:
http://domain.server.extension:8080/
this part of the regex would match :8080 (provided you fix the d+ thing).
Finally, we see
/?
Meaning the entire thing can be followed optionally by a forward slash.
So, all in all, I don;'t think this matches a "link", rather it matches the inital part of a URL. To match an entire url, you would need a bit more, at least I don't see any expression that could match the path, resource, hash and query bits that may occur in a proper URL.

When you say you have trouble understanding it, it means you tried something and are stuck somewhere?
Please ask more specific questions.
I can give you some keywords that you can lookup them more easy, a good place for that is regular-expressions.info
(http|https|ftp) is an alternation
[A-Z0-9] is a character class
*, + and ? are quantifiers
(...) is a (capturing) group, (?:...) is a non capturing group
The # at the start and end are regex delimiters, the i at the very end is a modifier/option (match case independent).
The (d+)? at the end would match one or more (optional) letters "d". This is quite strange. I assume it should be (\d+)? that would be one or more (optional) digits.

Regex matching terminating quote only if quote at the beginning

I want to match the following element with regex
target="#MOBILE"
and all valid variants.
I've written the regex
target[\s\S]*#MOBILE[^>^\s]*
which matches the following
target="#MOBILE"
target = "#MOBILE"
target=#MOBILE
target="#MOBILE" (followed directly by >)
but it doesn't match
target=" #MOBILE "
properly (note the extra space). It only matches
target=" #MOBILE
missing out the final quote
What I need is the terminating expression [^>^\s]* to match a quote only if it matches a quote at the beginning. It also needs to work with single quotes. The terminating expression also needs to end with a whitespace or > char as it does currently.
I'm sure there is a way to do this - but I'm not sure how. It's probably standard stuff - I just don't know it
Incidently I'm not sure that [^>^\s]* is the best way to terminate if the regex hits a space or > char but it's the only way that I can get it to work.

You can use a backreference, similar to jensgram's suggestion:
target\s*=\s*(?:(")\s*)?#Mobile\s*\1
(?:(")\s*)? - Optional non-capturing group that contains a quote (which is captured), and additional optional spaces. If it matched, \1 will contain a quote.
Working example: http://regexr.com?2vkkq
A better alternative for .Net (mainly because you want single quotes, and \1 behaves differently for uncaptured groups):
target\s*=\s*(["']?)\s*?#Mobile\s*\1
Working example: Regex Storm

Try the following if you need to check that your quotes are in pairs:
target\s*=\s*(['"])(?=\1)\s*#MOBILE\s*(?<=\1)\1
But it really depends if your regex engine supports positive look-(ahead|behind) syntax. And if it supports back-referencing.

Without quotes target\s*=\s*#MOBILE
With double quotes target\s*=\s*"\s*#MOBILE\s*"
With single quotes target\s*=\s*'\s*#MOBILE\s*'
All together
(target\s*=\s*#MOBILE)|(target\s*=\s*"\s*#MOBILE\s*")|(target\s*=\s*'\s*#MOBILE\s*')
Or someone can make it neater.

Regex to match name1.name2[.name3]

I am trying to validate user id's matching the example:
smith.jack or smith.jack.s
In other words, any number of non-whitespace characters (except dot), followed by exactly one dot, followed by any number of non-whitespace characters (except dot), optionally followed by exactly one dot followed by any number of non-whitespace characters (except dot). I have come up with several variations that work fine except for allowing consecutive dots! For example, the following Regex
^([\S][^.]*[.]{1}[\S][^.]*|[\S][^.]*[.]{1}[\S][^.]*[.]{1}[\S][^.]*)$
matches "smith.jack" and "smith.jack.s" but also matches "smith..jack" "smith..jack.s" ! My gosh, it even likes a dot as a first character. It seems like it would be so simple to code, but it isn't. I am using .NET, btw.
Frustrating.

that helps?
/^[^\s\.]+(?:\.[^\s\.]+)*$/
or, in extended format, with comments (ruby-style)
/
^ # start of line
[^\s\.]+ # one or more non-space non-dot
(?: # non-capturing group
\. # dot something
[^\s\.]+ # one or more non-space non-dot
)* # zero or more times
$ # end of line
/x
you're not clear on how many times you can have dot-something, but you can replace the * with {1,3} or something, to specify how many repetitions are allowed.
i should probably make it clear that the slashes are the literal regex delimiter in ruby (and perl and js, etc).

^([^.\s]+)\.([^.\s]+)(?:\.([^.\s]+))?$

I'm not familiar with .NET's regexes. This will do what you want in Perl.
/^\w+\.\w+(?:\.\w+)?$/
If .NET doesn't support the non-capturing (?:xxx) syntax, use this instead:
/^\w+\.\w+(\.\w+)?$/
Note: I'm assuming that when you say "non-whitespace, non-dot" you really mean "word characters."

You are using the * duplication, which allows for 0 iterations of the given component.
You should be using plus, and putting the final .[^.]+ into a group followed by ? to represent the possibility of an extra set.
Might not have the perfect syntax, but something similar to the following should work.
^[^.\s]+[.][^.\s]+([.][^.\s]+)?$
Or in simple terms, any non-zero number of non-whitespace non-dot characters, followed by a dot, followed by any non-zero number of non-whitespace non-dot characters, optionally followed by a dot, followed by any non-zero number of non-whitespace non-dot characters.

I realise this has already been solved, but I find Regexpal extremely helpful for prototyping regex's. The site has a load of simple explanations of the basics and lets you see what matches as you adjust the expression.

[^\s.]+\.[^\s.]+(\.[^\s.]+)?
BTW what you asked for allows "." and ".."

I think you'd benefit from using + which means "1 or more", instead of * meaning "any number including zero".

(^.)+|(([^.]+)[.]([^.]+))+
But this would match x.y.z.a.b.c and from your description, I am not sure if this is sufficiently restrictive.
BTW: feel free to modify if I made a silly mistake (I haven't used .NET, but have done plently of regexs)

[^.\s]+\.[^.\s]+(\.([^\s.]+?)?
has unmatched paren. If corrected to
[^.\s]+\.[^.\s]+(\.([^\s.]+?))?
is still too liberal. Matches a.b. as well as a.b.c.d. and .a.b
If corrected to
[^.\s]+\.[^.\s]+(\.([^\s.]+?)?)
doesn't match a.b

^([^.\W]+)\.?([^.\W]+)\.?([^.\W]+)$
This should capture as described, group the parts of the id and stop duplicate periods

I took a slightly different approach. I figured you really just wanted a string of non-space characters followed by only one dot, but that dot is optional (for the last entry). Then you wanted this repeated.
^([^\s\.]+\.?)+$
Right now, this means you have to have at least one string of characters, e.g. 'smith' to match. You, of course could limit it to only allow one to three repetitions with
^([^\s\.]+\.?){1,3}$
I hope that helps.

RegexBuddy Is a good (non-free) tool for regex stuff

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js