Regex to test if string begins with http:// or https:// - regex

I'm trying to set a regexp which will check the start of a string, and if it contains either http:// or https:// it should match it.
How can I do that? I'm trying the following which isn't working:
^[(http)(https)]://

Your use of [] is incorrect -- note that [] denotes a character class and will therefore only ever match one character. The expression [(http)(https)] translates to "match a (, an h, a t, a t, a p, a ), or an s." (Duplicate characters are ignored.)
Try this:
^https?://
If you really want to use alternation, use this syntax instead:
^(http|https)://

Case insensitive:
var re = new RegExp("^(http|https)://", "i");
var str = "My String";
var match = re.test(str);

^https?://
You might have to escape the forward slashes though, depending on context.

^https?:\/\/(.*) where (.*) is match everything else after https://

This should work
^(http|https)://

^ for start of the string pattern,
? for allowing 0 or 1 time repeat. ie., s? s can exist 1 time or no need to exist at all.
/ is a special character in regex so it needs to be escaped by a backslash \/
/^https?:\/\//.test('https://www.bbc.co.uk/sport/cricket'); // true
/^https?:\/\//.test('http://www.bbc.co.uk/sport/cricket'); // true
/^https?:\/\//.test('ftp://www.bbc.co.uk/sport/cricket'); // false

(http|https)?:\/\/(\S+)
This works for me
Not a regex specialist, but i will try to explain the awnser.
(http|https) : Parenthesis indicates a capture group, "I" a OR statement.
\/\/ : "\" allows special characters, such as "/"
(\S+) : Anything that is not whitespace until the next whitespace

This will work for URL encoded strings too.
^(https?)(:\/\/|(\%3A%2F%2F))

Making this case insensitive wasn't working in asp.net so I just specified each of the letters.
Here's what I had to do to get it working in an asp.net RegularExpressionValidator:
[Hh][Tt][Tt][Pp][Ss]?://(.*)
Notes:
(?i) and using /whatever/i didn't work probably because javascript hasn't brought in all case sensitive functionality
Originally had ^ at beginning but it didn't matter, but the (.*) did (Expression didn't work without (.*) but did work without ^)
Didn't need to escape the // though might be a good idea.
Here's the full RegularExpressionValidator if you need it:
<asp:RegularExpressionValidator ID="revURLHeaderEdit" runat="server"
ControlToValidate="txtURLHeaderEdit"
ValidationExpression="[Hh][Tt][Tt][Pp][Ss]?://(.*)"
ErrorMessage="URL should begin with http:// or https://" >
</asp:RegularExpressionValidator>

Related

Regex to extract second word from URL

I want to extract a second word from my url.
Examples:
/search/acid/all - extract acid
/filter/ion/all/sss - extract ion
I tried to some of the ways
/.*/(.*?)/
but no luck.
A couple things:
The forward slashes / have to be escaped like this \/
The (.*?) will match the least amount of any character, including zero characters. In this case it will always match with an empty string.
The .* will take as many characters as it can, including forward slashes
A simple solution will be:
/.+?\/(.*?)\//
Update:
Since you are using JavaScript, try the following code:
var url = "/search/acid/all";
var regex = /.+?\/(.*?)\//g;
var match = regex.exec(url);
console.log(match[1]);
The variable match is a list. The first element of that list is a full match (everything that was matched), you can just ignore that, since you are interested in the specific group we wanted to match (the thing we put in parenthesis in the regex).
You can see the working code here
This regex will do the trick:
(?:[^\/]*.)\/([^\/]*)\/
Proof.
For me, I had difficulties with the above answers for URL without an ending forward slash:
/search/acid/all/ /* works */
/search/acid /* doesn't work */
To extract the second word from both urls, what worked for me is
var url = "/search/acid";
var regex = /(?:[^\/]*.)\/([^\/]*)/g;
var match = regex.exec(url);
console.log(match[1]);

Regex matching only a portion of string

I would like to match a portion of a URL in this order.
First the domain name will remain static. So, nothing check with regex.
$domain_name = "http://foo.com/";
What I would like to validate is what comes after the last /.
So, my AIM is to create something like.
$stings_only = "[\w+]";
$number_only = "[\d+]";
$numbers_and_strings = "[0-9][a-z][A-Z]";
Now, I would like to just use the above variables to check if a URL confirms to the patterns mentioned.
$example_url = "http://foo.com/some-title-with-id-1";
var_dump(preg_match({$domain_name}{$strings_only}, $example_url));
The above should return false, because title is NOT $string_only.
$example_url = "http://foo.com/foobartar";
var_dump(preg_match({$domain_name}{$strings_only}, $example_url));
The above should return true, because title is $string_only.
Update:
~^http://foo\.com/[a-z]+/?$~i
~^http://foo\.com/[0-9]+/?$~
~^http://foo\.com/[a-z0-9]+/?$~i
These would be your three expressions to match alphabetical URLs, numeric URLS, and alphanumeric. A couple notes, \w matches [a-zA-Z0-9_] so I don't think it is what you expected. The + inside of your character class ([]) does not have any special meaning, like you may expect. \w and \d are "shorthand character classes" and do not need to be within the [] syntax (however they can be, e.g. [\w.,]). Notice the i modifier, this makes the expressions case-insensitive so we do not need to use [a-zA-Z].
$strings_only = '~^http://foo\.com/[a-z]+/?$~i';
$url = 'http://foo.com/some-title-with-id-1';
var_dump(preg_match($strings_only, $url)); // int(0)
$url = 'http://foo.com/foobartar';
var_dump(preg_match($strings_only, $url)); // int(1)
Test/tweak all of my above expressions with Regex101.
. matches any character, but only once. Use .* for 0+ or .+ for 1+. However, these will be greedy and match your whole string and can potentially cause problems. You can make it lazy by adding ? to the end of them (meaning it will stop as soon as it sees the next character /). Or, you can specify anything but a / using a negative character class [^/].
My final regex of choice would be:
~^https://stolak\.ru/([^/]+)/?$~
Notice the ~ delimiters, so that you don't need to escape every /. Also, you need to escape the . with \ since it has a special meaning. I threw the [^/]+ URI parameter into a capture group and made the trailing slash optional by using /?. Finally, I anchored this to the beginning and the end of the strings (^ and $, respectively).
Your question was somewhat vague, so I tried to interpret what you wanted to match. If I was wrong, let me know and I can update it. However, I tried to explain it all so that you could learn and tweak it to your needs. Also, play with my Regex101 link -- it will make testing easier.
Implementation:
$pattern = '~^https://stolak\.ru/([^/]+)/?$~';
$url = 'https://stolak.ru/car-type-b1';
preg_match($pattern, $url, $matches);
var_dump($matches);
// array(2) {
// [0]=>
// string(29) "https://stolak.ru/car-type-b1"
// [1]=>
// string(11) "car-type-b1"
// }

Regex to match a string not followed by anything

I am trying to figure out a regex sequence that will match the first item in the list below but not the other two, {Some-Folder} is variable.
http://www.url.com/{Some-Folder}/
http://www.url.com/{Some-Folder}/thing/key/
http://www.url.com/{Some-Folder}/thing/119487302/
http://www.url.com/{Some-Folder}/{something-else}
Essentially I want to be able to detect anything that is of the form:
http://www.url.com/{Some-Folder}/
or
http://www.url.com/{Some-Folder}
but not
http://www.url.com/{Some-Folder}/{something-else}
So far I have
http://www.url.com/[A-Z,-]*\/^.
but this doesn't match anything
http://www.url.com/[^/]+/?$
Or, in the few parsers that use \Z as end of text,
http://www.url.com/[^/]+/?\Z
I customized a regex I've used for URL parsing before, it's not perfect, and will need even more work once gTLD becomes more used. Anyway, here it is:
\bhttps?:\/\/[a-z0-9.-]+\.(?:[a-z]{2,4}|museum|travel)\/[^\/\s]+(?:\/\b)?
You may want to add case insensitive flag, for whichever language you're using.
Demo: http://rubular.com/r/HyVXU30Hvp
You may use the following regex:
(?m)http:\/\/www\.example\.com\/[^\/]+\/?$
Explanation:
(?m) : Set the m modifier which makes ^ and $ match start and end of line respectively
http:\/\/www\.example\.com\/ : match http://www.example.com/
[^\/]+ : match anything except / one or more times
\/? : optionally match /
$ : declare end of line
Online demo
I've been looking for an answer to this exact problem. aaaaaa123456789's answer almost worked for me. But the $ and \Z didn't work. My solution is:
http://www.url.com/[^/]+/?.{0}

Regex to match alphanumerics, URL operators except forward slashes

I've been trying for the past couple of hours to get this regex right but unfortunately, I still can't get it. Tried searching through existing threads too but no dice. :(
I'd like a regex to match the following possible strings:
userprofile?id=123
profile
search?type=player&gender=male
someotherpage.htm
but not
userprofile/
helloworld/123
Basically, I'd like the regex to match alphanumerics, URL operators such as ?, = and & but not forward slashes. (i.e. As long as the string contains a forward slash, the regex should just return 0 matches.)
I've tried the following regexes but none seem to work:
([0-9a-z?=.]+)
(^[^\/]*$[0-9a-z?=.]+)
([0-9a-z?=.][^\/]+)
([0-9a-z?=.][\/$]+)
Any help will be greatly appreciated. Thank you so much!
The reason they all match is that your regexp matches part of the string and you've not told it that it needs to match the entire string. You need to make sure that it doesn't allow any other characters anywhere in the string, e.g.
^[0-9a-z&?=.]+$
Here's a small perl script to test it:
#!/usr/bin/perl
my #testlines = (
"userprofile?id=123",
"userprofile",
"userprofile?type=player&gender=male",
"userprofile.htm",
"userprofile/",
"userprofile/123",
);
foreach my $testline(#testlines) {
if ($testline =~ /^[0-9a-z&?=.]+$/) {
print "$testline matches\n";
} else {
print "$testline doesn't match - bad regexp, no cookie\n";
}
}
This should do the trick:
/\w+(\.htm|\?\w+=\w*(&\w+=\w*)*)?$/i
To break this down:
\w+ // Match [a-z0-9_] (1 or more), to specify resource
( // Alternation group (i.e., a OR b)
\.htm // Match ".htm"
| // OR
\? // Match "?"
\w+=\w* // Match first term of query string (e.g., something=foo)
(&\w+=\w*)* // Match remaining terms of query string (zero or more)
)
? // Make alternation group optional
$ // Anchor to end of string
The i flag is for case-insensitivity.

ColdFusion - How to get only the URL's in this block of text?

How can I extract only the URL's from the given block of text?
background(http://w1.sndcdn.com/f15ikDS9X_m.png)
background-image(http://w1.sndcdn.com/5ikDIlS9X_m.png)
background('http://w1.sndcdn.com/m1kDIl9X_m.png')
background-image('http://w1.sndcdn.com/fm15iIlS9X_m.png')
background("http://w1.sndcdn.com/fm15iklS9X_m.png")
background-image("http://w1.sndcdn.com/m5iIlS9X_m.png")
Perhaps Regex would work, but I'm not advanced enough to work it out!
Many thanks!
Mikey
You're over-thinking the problem - all you need to do is match the URLs, which is a simple match:
rematch('\bhttps?:[^)''"]+',input)
That'll work based on the input provided - might need tweaking if different input used.
(e.g. You can optionally add a \s into the char class if that might be a factor.)
The regex itself is simple:
\bhttps?: ## look for http: or https: with no alphanumeric chars beforehand.
[^)'"]+ ## match characters that are NOT ) or ' or "
## match as many as possible, at least one required.
If this is matching false positives, you can of course look for a more refined URL regex, such as these.
DEMO
background(?:-image)?\((["']?)(?<url>http.*)\1\)
Explanation:
background(?:-image)? -> It matches background or background-image (without grouping)
\( -> matches a literal parentheses
(["']?) -> matches if there is a ' or " or VOID before the url
(?<url>http.*) -> matches the url
\1\) -> matches the grouped (third line of this explanation) and then a literal parentheses
If you want an answer without regular expressions, something like this will work.
YourString = "background(http://w1.sndcdn.com/f15ikDS9X_m.png)";
YourString = ListLast(YourString, "("); // yields http://w1.sndcdn.com/f15ikDS9X_m.png)
YourString = replace(YourString, ")", ""); // http://w1.sndcdn.com/f15ikDS9X_m.png
Since you are doing it more than once, you can make it a function. Also, you might need some other replace commands to handle the quotes that are in some of your strings.
Having said all that, getting a regex to work would be better.