How do i map this url - regex

I am using google appengine and cant map this URL "user/test#example.com"
application = webapp.WSGIApplication( [('/user/(\w+)',UsersSubPath)],debug=True)
I dont know why this expression doesnt work. any ideas?

You'll have to widen the scope of your regex. \w only matches [A-Za-z0-9] which excludes the special characters # and .. For this example you could use:
'/user/([A-Za-z0-9#.]*)'
or
'/user/(\S*)'

Related

A pattern to match [characters]:[characters] inside an URL

I have an url like below and wanted to use RegEx to extract segments like: Id:Reference, Title:dfgdfg, Status.Title:Current Status, CreationDate:Logged...
This is the closest pattern I got [=,][^,]*:[^,]*[,&] but obviously the result is not as expected, any better ideas?
P.S. I'm using [^,] to matach any characters except , because , will not exist the segment.
This is the site using for regex pattern matching.
http://regexpal.com/
The URL:
http://localhost/site/=powerManagement.power&query=_Allpowers&attributes=Id:Reference,Title:dfgdfg,Status.Title:Current Status,CreationDate:Logged,RaiseUser.Title:标题,_MinutesToBreach&sort_by=CreationDate"
Thanks,
You haven't specified what programming language you use. But almost all with support this:
([\p{L}\.]+):([\p{L}\.]+)
\p{L} matches a Unicode character in any language, provided that your regex engine support Unicode. RegEx 101.
You can extract the matches via capturing groups if you want.
In python:
import re
matchobj = re.match("^.*Id:(.*?),Title:(.*?),.*$", url, )
Id = matchobj.group(1)
Title = matchobj.group(2)

Matching URLs with other characters around

I need a regex pattern to match URLs in a complicated environment.
An URL would be in this position:
[url=http://www.php.net/manual/en/function.preg-replace.php:32p0eixu]TEST[/url:32p0eixu]
(That's just a sample URL)
I need to match the URL until the colon, the colon and the code after that should be ignored. There are so many URLs out there and I'm not that experienced to create a pattern to match everything from http:// to :
As I said, everything else should be ignored, left away, except the URL which I need to store in a variable.
Could someone help me create such a pattern? My tries were matching the URL above, but when I put in more complicated URLs, they wouldn't match.
This is the pattern I've created. It works with simple URLs, but not with the complicated ones:
http(s)?://[A-Za-z0-9.,/_-]+
I'm not very good in regex, I'm still learning.
Thank you.
This regex should do it for you.
\[url=(.*?):[a-zA-Z0-9]*\]
Run against your test data:
[url=http://www.php.net/manual/en/function.preg-replace.php:32p0eixu]TEST[/url:32p0eixu]
This will return the URL in capture group 1.
Assuming PHP (since your test URL is for the PHP manual), you'd use this with preg_match like this:
$value = "[url=http://www.php.net/manual/en/function.preg-replace.php:32p0eixu]TEST[/url:32p0eixu]";
$pattern = "/\[url=(.*?):[a-zA-Z0-9]*\]/";
preg_match($pattern, $value, $matches);
echo $matches[1];
Output:
http://www.php.net/manual/en/function.preg-replace.php
This will also work against URLs which contain colons in them, such as:
http://www.php.net:8080/manual/en/function.preg-replace.php
http://www.php.net/manual/us:en/function.preg-replace.php
How about this:
^(http(s)?:\/\/)?[^]^(^)^ ]+
Below regex will give you the url part before colon:
\[url=((http|https)?://)?[^\:]+

Exclude part of the string with regex

I'm quite bad with regex, and I'm looking to match a criteria.
This is a regex expression that should go emmbed into the url for a firewall, so It will block any url that is not like the list at the end.
This is what Im currently using but its not working:
http://www.youtube.com/(*.*)list=UUFwtOm4N5djdcuTAlNIWJaQ
This is the example url (to be blocked):
http://www.youtube.com/watch?NR=1&feature=fvwp&v=P1b5VY_Bp_o&list=UUFwtOm4N5djdcuTAlNIWJaQ
I'm trying to make a regex that will Success fully match when NR=1 or feature=fvwp
are NOT present, I asume I can do it like this: (?!^feature=fvwp$) but the v= and list=UUFwtOm4N5djdcuTAlNIWJaQ are allowed.
Also the v= should be limited to any character (uppercase and lowercase) and 11 length, I assume its: /^[a-z0-9]{11}$/
How can I build all that together and make it work so it would allow and match only on this urls excluding from allowing the previous criterias that I explained:
http://www.youtube.com/watch?v=4eK_RWpTgcc&feature=BFa&list=UUFwtOm4N5djdcuTAlNIWJaQ
http://www.youtube.com/watch?v=TLRl85TJwZM&feature=BFa&list=UUFwtOm4N5djdcuTAlNIWJaQ
http://www.youtube.com/watch?v=QEV9yqrpxkc&feature=BFa&list=UUFwtOm4N5djdcuTAlNIWJaQ
Can you block based on matching by regex? If so, just use
(.*)www\.youtube\.com/watch\?NR=1&feature=fvwp and block whatever matches that.

How do I decipher a dynamic URL magic in Django

url(r'^([a-zA-Z0-9/_-]+):p:(?P<sku>[a-zA-Z0-9_-]+)/$', 'product_display', name='product_display'),
url(r'^(?P<path>[a-zA-Z0-9/_-]+)$', 'collection_display', name='collection_display'),
That's my current regex:
My problem is this: I want to be able to match the product_display's regex without using :p: in the regex. I can do this by putting .html at the end to set it apart from the collection_display's regex, but that doesn't fix the problem that is; without the ":p:" in the regex as is above the URI "some-collection/other/other/sku.html" would match the regex all the way up to the ".html" disregarding the sku. How can I do this without using the ":p:" to end the collection regex. Anything will help.
Thanks
It looks like your sku can't contain slashes, so I would recommend using "/" as your delimiter. Then the ".html" trick can be used; it turns out that your collection_display regex doesn't match the dot, but to make absolutely sure, you can use a negative look-behind:
url(r'^([a-zA-Z0-9/_-]+)/(?P<sku>[a-zA-Z0-9_-]+)\.html$', 'product_display', name='product_display'),
url(r'^(?P<path>[a-zA-Z0-9/_-]+)(?<!\.html)$', 'collection_display', name='collection_display'),
Alternatively, always end your collection_display urls with a slash and product_display with ".html" (or vice versa).

match url that doesnt contain asp, apsx, css, htm.html,jpg

Q-1. match url that doesn't contain asp, apsx, css, htm.html,jpg,
Q-2. match url that doesn't end with asp, apsx, css, htm.html,jpg,
You want to use the 'matches count' function, and make it match 0.
eg.
(matches all characters, then a dot, then anything that isnt aspx or css
^.*\.((aspx) | (css)){0}.*$
Edit,
added ^ (start) and $ (end line chars)
Q-1. This is better done using a normal string search, but if you insist on regex: (.(?!asp|apsx|css|htm|html|jpg))*.
Q-2. This is better done using a normal string search, but if you insist on regex: .*(?<!asp|css|htm|jpg)(?<!aspx|html)$.
If your regular expression implementation does allow lookaround assertions, try these:
(?:(?!aspx?|css|html?|jpg).)*
.*$(?<!aspx?|css|html?|jpg)