A regex that validates a web address and matches an empty string? - regex

The current expression validates a web address (HTTP), how do I change it so that an empty string also matches?
(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?

If you want to modify the expression to match either an entirely empty string or a full URL, you will need to use the anchor metacharacters ^ and $ (which match the beginning and end of a line respectively).
^(|https?:\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)$
As dirkgently pointed out, you can simplify your match for the protocol a little, so I've included that for you too.
Though, if you are using this expression from within a program or script, it may be simpler for you to use the languages own means of checking if the input is empty.
// in no particular language...
if input.length > 0 then
if input matches <regex> then
input is a URL
else
input is invalid
else
input is empty

Put the whole expression in parenthesis and mark it as optional (“?” quantifier, no or one repetition)
((http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)?

Use expression markers ^$ around your expression and add |^$ to the end. This way you're using the | or operator with two expressions showing that you have two different match cases.
^(https?:\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)$|^$
The key here is that |^$ means "or match blank".
Also, that expression with only work in javascript if you use a template string.

Expr? where Expr is your URL matcher. Just like I would for http and https: https?. The ? is a known as a Quantifier -- you can look it up. From Wikipedia:
? The question mark indicates there is zero or one of the preceding element.

Related

Regex - Matching a part of a URL

I'm trying to use regular expression to match a part of the following url:
http://www.example.com/store/store.html?ptype=lst&id=370&3434323&root=nav_3&dir=desc&order=popularity
I want the Regex to find:
&3434323
Basically, it's meant to search any part of the argument that doesn't follow the variable=value formula. So basically I need it to search sections of the URL that don't have an equal sign it, but match just that part.
I tried using:
&\w*+[^=_-]
But it returns: &3434323&. I need it to not return the next ampersand.
And it must be done in regex. Thanks in advance!
You can use this regex:
[?&][^=]+(&|$)
It looks for any string that doesn't contain the equal sing [^=]+ and starts with the question mark or the ampersand [?&] and ends with ampersand or the end of the URL (&|$).
Please note that this will return &3434323&, so you'll have to strip the ampersands on both sides in your code. I assume that you're fine with that. If you really don't want the second ampersand, you can use a lookahead:
[?&][^=]+(?=&|$)
If you don't want even the first ampersand, you can use this regex, but not all compilers support it:
(?<=\?|&)[^=]+(?=&|$)
Parsing query parameters can be tricky, but this may do the job:
((?:[?&])[^=&]+)(?=&|$)
It will not catch the ampersand at the end of the parameter, but it will include either the question mark or the ampersand at the beginning. It will match any parameter not in the form of a key-value pair.
Demo here.

specify pattern at the beginning of string in regular expression

I have some string with multiple possible values:
e
(space)Exact
Exact
exact
phase
I want to get only the first four values, the regular expression I came up with is:
^\s*e
it means at the beginning of the string it has 0 or more white space followed by e(or E, case insensitive), howevever it always filters out the case
(space)Exact
my guess is it take ^ as not instead of beginning of string. How can i correct that? I use Perl Compatible Regular Expressions(PCRE) as the matching engine.
Try the using the mode modifiers in your regex to turn on ^$ match at linebreaks; and also, if necessary case insensitive
(?mi)^\s*e
The ^ character means only the beginning of a string. The beginning of a new line does not count as the beginning of a string. So this would not work if more than one are inside the same "string" object. Not sure how pcre works, but if you want to be able to match the begging of a line also you have to have the multi-line flag enabled.
Edit: If you want to pick up the beginnning of a new line go this route instead: \r\n at the beginning of the expression and remove the "^"
Edit #2 (because I feel like doing regex): here's what you're looking for:
(\b)[eE]+\w*

how to define a regular expression in boost?

I have a section in file:
[Source]
[Source.Ia32]
[Source.Ia64]
I have created the expression as:
const boost::regex source_line_pattern ("(Sources)(.*?)");
Now, I am trying to match the string, but I am not able to match; it is always returning 0.
if (boost::regex_match ( sToken, source_line_pattern ) )
return TRUE;
Please note that sToken value is [Source]. [Source.Ia32]... and so on.
Thanks,
There are at least two problems with your code. First, the
regular expression you give contains the literal string
"Sources", and not "Source", which is what you seem to be
trying to match. The second is that boost::regex_match is
bound: it must match the entire string. What you seem to want
is boost::regex_search. Depending on what you are doing,
however, it might be better to try to match the entire string:
"\\[Source(?:\\.(\\w+))?\\]\\s*". Which provides for capture of
the trailing part, if present (but not the leading
"Source"---no point, in general, in capturing something that is
a constant).
Note too that the sequence ".*?" is very dubious. Normally,
I would expect the regular expression parser to fail if
a (non-escaped) '*' is followed by a '?'.
The issue is that boost::regex_match only returns true if the entire input string is matched by the regex. So the '[' and ']' are not matched by your current regex, and it will fail.
Your options are either to use boost::regex_search, which will search for a substring of the input that matches your regex, or modify your regex to accept the entire string being passed in.

URL Rewrite Pattern to exclude application name from path

I'm trying to use the IIS 7 URL Rewrite feature for the first time, and I'm having trouble getting my regular expression working. It seems like it should be simple enough. All I need to do is rewrite a URL like this:
http://localhost/myApplication/MySpecialFolder
To:
http://localhost/MySpecialFolder
Is this possible? I want the regular expression to ignore everything before "myApplication" in the original URL, so that I could use "http://localhost" OR "http://mysite", etc.
Here's what I've got so far:
^myApplication/MySpecialFolder$
But using the "Test Pattern..." feature in IIS, it says my patterns don't match unless I supply "myApplication/MySpecialFolder" exactly. Does anyone know how I can update my regular expression so that everything prior to "myApplication" is ignored and the following URLs will be seen as a match?
http://localhost/myApplication/MySpecialFolder
http://mysite/myApplication/MySpecialFolder
Many thanks in advance!
SOLUTION:
I needed to change my regex to:
myApplication/MySpecialFolder
Without the ^ at the beginning and without the $ at the end.
Your regular expression is correct, the pattern will be matched against path starting after the first slash after the domain.
So only bold part will be used for matching: http://localhost/myApplication/MySpecialFolder
To limit the rewriting to specific domain you have to use Conditions section with Condition input = {HTTP_HOST}
Unless there is something radically different with regexes in IIS, you would want to take out the anchor (^) at the beginning to match.
myApplication/MySpecialFolder$
The carat ^ tells it that that is the beginning of the string and the dollar sign $ tells it to match the end. A regex like abc finds "abc" anywhere in the string, ^abc matches strings that start with "abc", abc$ matches strings that end with "abc", and ^abc$ only matches when the whole string is "abc".

How do I get the following regular expression to not allow blank e-mails?

I am using the following regular expression to validate e-mails, but it allows empty strings as well, how can I change it to prevent it:
^[\w\.\-]+#[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]{1,})*(\.[a-zA-Z]{2,3}){1,2}$
I am using an asp:RegularExpressionValidator. My other option is to add on a asp:RequiredFieldValidator, but I am curious if this is possible to check for blanks in my RegularExpressionValidator, so I don't have to have 2
see http://www.regular-expressions.info/email.html
That expression does not match empty strings. The expression starts with ^[\w\.\-]+ this translates to "The string must start with a word character, period or slash. There can be more than one of these." There must be something else wrong or you copied the expression incorrectly.
This RegEx validates if a given string is in a valid email-format or not:
/^[a-zA-Z0-9\_\-\.]+\#([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9]{2,4}$/