Help with capturing URL fragment in Django - django

I am working on a django project, and trying to match URLs in the following form:
/cards/series/SERIES NAME/
where I am trying to capture SERIES NAME
My base urls.py includes:
(r'^cards/', include('cards.urls')),
and then my cards/urls.py includes
(r'^series/(\w+)/$',
However, the regex is not matching (404). If I hard code the path like so:
(r'^series/foo/$',
Then I can get it to match on /cards/series/foo/
So, does anyone have an idea of what I am doing wrong, and why my regex isnt catching /cards/series/SERIES NAME/ ?
Update : The regex will match single words, but not multiple. So:
/cards/series/FOO/
matches, but:
/cards/series/FOO BAR/
or anything with a space does not match.
Update : Found the solution here:
how to unescape special characters in django urlpatterns
which is:
(r'^series/([\w ]+)/$',

You are not allowed to have spaces in URLs. They must be escaped.
http://en.wikipedia.org/wiki/Query_string#URL_encoding
So essentially, don't do this. If spaces are important they should wind up in your URL as things like /cards/series/SERIES%20NAME/. It is considered better practice to use hyphens these days, although underscores are okay too. Use of capital letters is legal, but will strike some people (like me) as poor form.
As for why the pattern doesn't work, Django query patterns are regular expressions:
http://en.wikipedia.org/wiki/Regular_expression
You are capturing the pattern \w+. And \w is a stand-in for Alphanumeric characters plus "_", while the ensuing + says to match one or more of the preceding element. Whitespace is not an alphanumeric character, so if you wanted to match spaces as well you'd have to add it to a set as you have found. Brackets are used to indicate a set of acceptable characters, so [\w ]+ would mean Alphanumeric characters plus "_" plus " "
Again, don't do it.

Related

regular expression which can treat a string containing '#' as illegal input

I wrote a regular expression (https?:\/\/)+([a-x]*)?.[a-z]*.(com|io|cn|net) that can achieve:
Must start with http or https
Must end with com,cn,io or net
Domain names can only consist of numbers, letters, and underscores
Subdomain can be empty
the right answer can be 'http://123.cn' or 'https://www.123.cn'
but it also considered 'http://ww#.123.com' as the correct answer,
I wonder what's wrong with my expression, how to limit input '#'.
If you use a RegEx tester online (like regex101.com) it will tell you that it's matching because the . is not escaped as \. so it will match the # character.
Try: ^(https?:\/\/)([a-z0-9_]*\.)?[a-z0-9_]*\.(com|io|cn|net)$ and you may get what you're looking for.
Note your original RegEx did not include digits or the underscore in the domain names.

What's the regex to match a string except it has a $ dollar sign in it?

So basically, I have a text like this:
secret(mapOf("path" to "config/info"))
secret(mapOf("path" to "config/data/${rootProject.name}"))
prefix(mapOf("path" to "config/${rootProject.name}", "format" to "${rootProject.name.replace('-', '_')}"))
I want to match a path like config/info and not to match paths which contain variables (have dollar signs in them). I came up with this ((?:'|\")config/.+(?:'|\")) but it also matches the others.
How can I exclude strings with dollar signs?
What you are looking for (if I fully understand your question) is this one:
(['\"]config/[^$]+['\"])
Explanation:
I used ['"] instead of (?:'|\") I believe it will make your regex
readable and simple to understand in the future.
Also using [^$]+ instead of .+ is the key to solve your issue, .+ will match anything however [^$]+ will match anything as long as it's not $.

REGEX: Match all instances of text, digits, + , _ and -, Between colons, which are NOT part of an URL

I'd like to find and replace (with nothing) all instances of text between colons, like such:
:smile:
:thumbs_up:
:+1:
:-1:
but NOT if the colons are part of the url, like this URL for example:
http://pdf.reuters.com/htmlnews/htmlnews.asp?i=43059c3bf0e37541&u=urn:newsml:reuters.com:20190417:nPn5XHnXBa
As you can see, this URL has several colons and any such matches should be ignored.
The complete text can have some text before and after as well. In addition, these can also show up in succession, without any spaces in between. For example:
I was browsing and found this url :smile: http://pdf.reuters.com/htmlnews/htmlnews.asp?i=43059c3bf0e37541&u=urn:newsml:reuters.com:20190417:nPn5XHnXBa it's fantastic :smile::+1: Remember: don't forget to upvote!
I would expect the result to be:
I was browsing and found this url http://pdf.reuters.com/htmlnews/htmlnews.asp?i=43059c3bf0e37541&u=urn:newsml:reuters.com:20190417:nPn5XHnXBa it's fantastic Remember: don't forget to upvote!
I am using python regex module for my replacements.
My thinking is:
"Ok, I should find any URL and tell the regex to IGNORE any matches that are part of the URL"
So I have the regex to successfully match any URL as such:
(http[^\s]+)
This will find http and anything else until a non-whitespace character or newline, which would indicate the end of the URL.
I also have regex to match the text between (including) colons:
(:[\w+-]+:)
SO... I was hoping to use negative lookahead and combine these 2 like this:
(?!http[^\s]+)(:[\w+-]+:)
This is ALMOST perfect but it ends up matching these 2 parts of the URL:
:newsml:
and
:20190417:
How can I build this regex so that it matches everywhere in the text, EXCEPT if the colons are part of the URL?
Thanks a million!
PS. I've been using this awesome site to test my patterns...
https://regexr.com/
One option is to have your regex match a URL pattern (captured in a group), or match something enclosed in :s, and then you can replace with the first captured group:
(https?://\S+)|:[\w+-]+:
replace with
\1
This ensures that URLS will stay where they are in the text (being matched and replaced with themselves), but the colon sections that you want to remove will be matched and replaced with nothing.
https://regex101.com/r/d7mM1s/2

Regex for URL matching

I want to match below two URLs.
1. /,a=e[o],e[o]=function(){s=arguments},i.always(function(){e[o]=a,n[o]&&(n.jsonpcallback=r.jsonpcallback,fn.push(o)),s&&x.isfunction(a)&&a(s[0]),s=a=t}),
2. /,a[f]=function(){h=arguments},e.always(function(){a[f]=g,c[f]&&(c.jsonpcallback=d.jsonpcallback,ce.push(f)),h&&p.isfunction(g)&&g(h[0]),h=g=b}),
For that regex is :
^[a-zA-Z0-9:\/\.,\[\]\=\(\) \{\}\=\&]{0,500}$
But above mention Reg-ex match with :
https://www.test.com/test/test/test.php
I Want to write reg-ex where all special character like []{}()&,. In above two URL are compulsory but if this all mention special character is not available then reg-ex should not match.
Short Answer
^(?=.*\[)(?=.*])(?=.*\{)(?=.*})(?=.*\()(?=.*\))(?=.*&)(?=.*,)(?=.*\.)[a-zA-Z0-9:\/\.,\[\]\=\(\) \{\}\=\&]{0,500}$
Longer Answer
You can use a positive lookahead to ensure the result contains a set of characters.
For example, add this to the start:
(?=.*\[)
And it will only match results that contain an opening square bracket [
You can do this for each of the special characters that you need to ensure are present.
For example, if you want to ensure it contains all of the characters []{}()&,. then you would add this at the start:
(?=.*\[)(?=.*\])(?=.*\{)(?=.*\})(?=.*\()(?=.*\))(?=.*\&)(?=.*\,)(?=.*\.)
Just be sure to escape the relevant characters depending on your programming language and type of regex

Regular expression to prevent adjacent repeating dashes

In my asp.net application I am restricting allowed URL formats with regular expressions.I need to create regular expression which will not allow adjacent dashes in URLs
01) allow URLs like
text1-text2.htm
text1-text2-textn.htm
02) prevent URLS like
text1--text2.htm
text1--text2-textn.htm
Try this regex:
/--/
If you found a match then it means the URL had two dashes.
url.Contains("--") will work for you, where the url variable is the url entered. Nice and concise, and you don't have to fuss with a RegEx.
The negative answer posted by Aziz is best, but just for completeness sake here is a regex that matches the kinds of strings you wish to accept (as opposed to reject):
You want a string made up of zero or more of the following:
a non-dash character, or
a dash followed by a non-dash
A regex for this is
/^(?:[^-]|-(?!-))*$/
Now you can adjust the [^-] part to accept not just any character at all, but only those characters permitted in a URL (that is, if you wish to match all possible urls except those with two consecutive dashes). To do this you will have to find the RFC that gives the URI syntax. Will be somewhat tedious, which is why the negative solution with /--/ combined with other checks is your best bet.
This will match a filename with 0 or more occurences of a single dash followed by a some word characters.
^\w+(-\w+)*\.\w+
Should be enough to search for the problem -{2,} and then do the negation. Ie as long as this regex (two or more dashes in a row) does not match, it's valid.
Or positive regex matching only urls you do want: ^([A-Za-z0-9]+-?)+\.htm$