Regex to check URL's

Regex to check URL's - regex

I need to test for URLs with the below patterns:
https://cloudhusethelp.zendesk.com
https://cloudhusethelp.zendesk.com/
https://cloudhusethelp.zendesk.com/en-us
https://cloudhusethelp.zendesk.com/da
https://cloudhusethelp.zendesk.com/fr
https://cloudhusethelp.zendesk.com/aa
The regex used is https\:\/\/cloudhusethelp\.zendesk\.com\/[A-z][A-z]
So this compares the URL with 2 alphabets at the end. The URL can end with any language or no language.
Should I write multiple regular expression to find the match for above condition or one condition can do it.
Any help is appreciated.

You can definitely do it with a single expression:
https\:\/\/cloudhusethelp\.zendesk\.com(\/[A-Za-z]{2}(-[A-Za-z]{2})?)?
The part that differs from your expression is at the end:
([A-Za-z]{2}(-[A-Za-z]{2})?)?
It is a nested optional expression that matches nothing, a pair of letters, or a pair of letters followed by dash and another pair of letters.
Demo.

the slash at the end is also optional, as the first example you provided dont have it.
https\:\/\/cloudhusethelp\.zendesk\.com(\/[A-z\-]{2}(\-[A-z\-]{2})?)?
demo

Related

What is the best way to tag these strings using Regex Expressions?

#{config_name=Scene&sn={2}&field=name}
#{pos_x={3}&y={4}&z={5}&stage={6}&stageId={7}&content=Go Now&http=true&underline=true}
{{rescue_{1}_{2}_Rescue Now}}
#+\{.*?\}
I tried to use this expression but it didn't help

My solution that will match either strings with '#' at start and multiple params inside the brackets, either anything inside double brackets (if this is what you want)
(#\{[a-zA-Z_]+=.*(&[a-zA-Z_]+=.*)*\})|(\{\{.*\}\})

You could either repeat matching the key=value pairs with an ampersand in between for the #{....} strings, or for the {{...}} strings match the curly's at the start and end where you would only match curly's in between that contain digits.
^(?:#{[^=&]+=[^=&]+(?:&[^=&\n]+=[^=&\n]+)*}|{{[^{}]*(?:{\d+}[^{}]*)*}})$
See a regex101 demo.
Another bit broader match is to match the same key=value pairs, or match {{...}} from the start till the end of the string allowing any character in between:
^(?:{{.*}}|#{[^{}\n]*(?:{\d+}[^{}\n]*)*})
See another regex101 demo.

Allowing words picked up in regex in certain cases only

I have a regex expression to look for people just sticking "N/A" or similar into a form field.
^(?!(\b(N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)\b))
Probably not the most elegant I am sure. However I cannot for the life of me get it to allow the above words if followed by something.
So if someone just types "yes" then I want it to fail the regex check. But if someone types "yes, I have blah blah etc etc" I want it to pass.
The expression I have allows the word to be used as long as it isn't the first word in the sentence. I just want to disallow the listed words as the ONLY words in the field.
Any ideas?
Thanks

You may remove the first \b (it is redundant between the start of string and a word char) and replace the second one with $ (end of string):
^(?!(?:N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)$)
See the regex demo
With a case insensitive option, you may reduce the pattern to
^(?!(?:n/?a|yes|no)$)
See another regex demo
Details
^ - start of string, then...
(?!(?:n/?a|yes|no)$) - a location in string that is not immediately followed with n/?a (na, n/a), yes or no that are followed with the end of string.
In human words, only the start of string is matched if the whole string is not equal to the alternatives inside the alternation group.

The easiest way would be to match all the forbidden strings exactly and invert the result.
Try ^(n/?a|yes|no)$ with a case-insensitive option and invert the result.
^ matches the beginning of the string. $ matches the end of the string.
When you don't have a case-insensitive option, use ^([nN]/?[aA]|[yY][eE][sS]|[nN][oO])$.

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.

The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

Match pattern anywhere in string?

I want to match the following pattern:
Exxxx49 (where x is a digit 0-9)
For example, E123449abcdefgh, abcdefE123449987654321 are both valid. I.e., I need to match the pattern anywhere in a string.
I am using:
^*E[0-9]{4}49*$
But it only matches E123449.
How can I allow any amount of characters in front or after the pattern?

Remove the ^ and $ to search anywhere in the string.
In your case the * are probably not what you intended; E[0-9]{4}49 should suffice. This will find an E, followed by four digits, followed by a 4 and a 9, anywhere in the string.

I would go for
^.*E[0-9]{4}49.*$
EDIT:
since it fullfills all requirements state by OP.
"[match] Exxxx49 (where x is digit 0-9)"
"allow for any amount of characters in front or after pattern"
It will match
^.* everything from, including the beginning of the line
E[0-9]{4}49 the requested pattern
.*$ everthing after the pattern, including the the end of the line

Your original regex had a regex pattern syntax error at the first *. Fix it and change it to this:
.*E\d{4}49.*
This pattern is for matching in engines (most engines) that are anchored, like Java. Since you forgot to specify a language.
.* matches any number of sequences. As it surrounds the match, this will match the entire string as long as this match is located in the string.
Here is a regex demo!

Just simply use this:
E[0-9]{4}49

How do I allow for any amount of characters in front or after pattern? but it only matches E123449
Use global flag /E\d{4}49/g if supported by the language
OR
Try with capturing groups (E\d{4}49)+ that is grouped by enclosing inside parenthesis (...)
Here is online demo

Regular expression to prevent adjacent repeating dashes

In my asp.net application I am restricting allowed URL formats with regular expressions.I need to create regular expression which will not allow adjacent dashes in URLs
01) allow URLs like
text1-text2.htm
text1-text2-textn.htm
02) prevent URLS like
text1--text2.htm
text1--text2-textn.htm

Try this regex:
/--/
If you found a match then it means the URL had two dashes.

url.Contains("--") will work for you, where the url variable is the url entered. Nice and concise, and you don't have to fuss with a RegEx.

The negative answer posted by Aziz is best, but just for completeness sake here is a regex that matches the kinds of strings you wish to accept (as opposed to reject):
You want a string made up of zero or more of the following:
a non-dash character, or
a dash followed by a non-dash
A regex for this is
/^(?:[^-]|-(?!-))*$/
Now you can adjust the [^-] part to accept not just any character at all, but only those characters permitted in a URL (that is, if you wish to match all possible urls except those with two consecutive dashes). To do this you will have to find the RFC that gives the URI syntax. Will be somewhat tedious, which is why the negative solution with /--/ combined with other checks is your best bet.

This will match a filename with 0 or more occurences of a single dash followed by a some word characters.
^\w+(-\w+)*\.\w+

Should be enough to search for the problem -{2,} and then do the negation. Ie as long as this regex (two or more dashes in a row) does not match, it's valid.
Or positive regex matching only urls you do want: ^([A-Za-z0-9]+-?)+\.htm$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to check URL's - regex

the slash at the end is also optional, as the first example you provided dont have it. https\:\/\/cloudhusethelp\.zendesk\.com(\/[A-z\-]{2}(\-[A-z\-]{2})?)? demo

Related

What is the best way to tag these strings using Regex Expressions?

Allowing words picked up in regex in certain cases only

Mixing Lookahead and Lookbehind in 1 Regexp

Match pattern anywhere in string?

Regular expression to prevent adjacent repeating dashes

Categories

Resources