Very much a newb with regex and having a hard time figuring this one out. I have an HTML document and I want to clear out a ton of URLs that are inside of it. All of the URLs begin with https:// and they all end with a pound sign #.
Any help would be extremely appreciative.

A basic way to do it:
\b //word start
[^\s#]+ //followed by anything but whitespace and '#'

If you truly want to clear everything in between the url from https:// [...] # then you can use:
But you may want to be more specific in terms of what you are filtering out. If this is from a database query you should be ok since you can assume the URL will be the content of the field(s) returned the you will be running the regex through a code loop of some kind.
Use Regex to match beginning and end part of URL in Google Analytics

I'm looking for a regex function to implement in a goal for Google Analytics.
Consider this URL: /dagje-uit/....variable part..../contact/vpv/bedankt
Regex should work when beginning of URL matches /dagje-uit/ and end part contains /contact/vpv/bedankt Everything in the middle can be variable.
Without result i've tried

Forgive me if Google Analytics has some regex standards which I am overlooking but is it possible that your regex is failing because it does not account for the start of the whole of the URL? Adding .* to either end of your regex may help.
It also looks like your regex is over-complex for the conditions you have described. Could a simpler match be :
if you want to be a little more confident that it is a valid URL.

Regex for simple urls

I am looking for regex for simple URLs as
No subdirectories allowed. For example in this cases it must not validate,

I'm still a noob, but try this:
This one should do:
This should work for URLs starting with http:// or https:// or without the protocol name.
The regex should also be used as case-insensitive. In that case, it can be shortened a bit:
If you don't care whether it is a valid url, you can use:
All the examples contain www. followed by a nonspace character, but that is unlikely to occur in a normal word.

Regex for excluding URL

I working with an email company that has a feature where they spider your site in order to provide custom content. I have the ability to have the spider ignore urls based on the regex patterns I provide.
For this system a pattern starts and ends with a "/".
What I'm trying to do is ignore BUT allow
I would have thought the pattern below would work since it does not have a trailing slash but no luck.

Your regex matches a part of the URL, so you need to tell it not to allow a slash to follow it:
If you want to also avoid other partial matches like in, then an additional word boundary might help:
It depends on the regexp engine but you can probably either use $ (if the URL is tokenised beforehand) or a match for whitespace and delimiters

transforming URLS to active links with REGEX

i have this code in php that transforms URL inside a text to active html links.
For example in a string
Hey check this cool link
this transforms to:
Hey check this cool link
As you can see it just adds the correct < a > html tag
The code is this:
$active_links_text = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]","\\0", $original_text);
My question is, how to do this to work EXCEPT if the URL is a youtube url.
So i want this result: In a string
Wow have you checked its even better than !!!
i want to be transformed to
Wow have you checked its even better than
As you can see the < a > html tag was added to the's URL but NOT at the youtube's URL.

Last note: i am using this code in php 5.2.14

[EDIT : Wow, I had gotten your question completely wrong! Below's a better attempt at helping you.]
I gave it a go in js here, here is the original regex : /(http:\/\/(?![^<>\s]+)\b/g, since i'm not a php coder. The negative lookahead prevents a litteral match (the lookahead content can be adapted if you need a more complex pattern).
There's nothing js-specific here to my knowledge, but I don't know the ereg regex syntax. with preg functions, you would just need not to escape the slashes, the word boundaries \b and negative lookahead (?!*pattern*) are the same. The /g flag is for a global replacement, that is, not stopping on the first match, I suppose you have a kind of replaceAll function in your toolbox.
Also, I'm not sure about the global flag in php, I guess you can just call a kind of replaceAll function.
You've made several mistakes about valid URI components. The scheme is defined as ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ), not [[:alpha:]]+.
The part after the : of the scheme need not start with //, that's particular to http: and a few other file-oriented schemes. But the [[:alpha:]]+: start of your regex shows you weren't aiming to restrict yourself to http:. In that case, all printable ASCII characters are valid. I.e. everything from ! to ~, or [\x21-x7E]* as a regex.
To summarize: [[:alpha:]][A-Za-z0-9+-.]*:[\x21-x7E]*.

Regex Problem (newbie)

i'm writing a little app for spam-checking and i'm having problems with a regex.
let's say i'm having this spam-url:
so i want to check its url for having 2 full stops (subdomain+ending), a slash, a word, full stop and "html".
here's what i got so far:
might look like rubbish but it works - the problem: it's really slow and freezing my app.
any hints on how to optimize it?
The reason it's slow is that the non-greedy operators ? being used this way is prone to catastrophic backtracking
Instead of saying "any amount of anything, but only to an extent where it doesn't conflict with later requirements", which is effectively what .*? is saying, try asking for "as much as possible, that isn't a double quote, which would terminate the href ":
I also added a back-reference (\1) to your first capturing group, inside the <a>...</a>, so that you don't have to do the exact same matching all over again.
Note that this regex will be broken if, say, the a has a class name, an id, or anything else in its body. I left it like this because I wanted to give you what you asked for with as few changes as possible, and as to-the-point as possible.
(http://[\w.-]+/.+?\.html) - may be will work for your case only.
or may be faster one
Since you claim to be a regexp newbie, I will offer a more general advice on creating and debugging regular expressions. When they get pretty complicated, I find using Regexp Coach a must.
It's a freeware and really saves a lot of headache. Not to mention you don't have to build / run your application every minute just to see if the regexp works the way you wanted.
In Python, a simple way to match URLs ending in .html or .htm is to use
url_re = re.compile(
r'https?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|' #domain...
r'localhost|' #localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
r'(?::\d+)?' # optional port
r'(?:\S+.html?)+' # ending in .html
which is a modified version of Django's UrlField regex.
This will match any site ending with .html or .htm. (either localhost, ip, domain).