I am using django-localeurl to change the language of a project based on a suffix after the domain (example.com/en , example.com/hu etc). However I also have subdomains for the countries which are exactly the same as the suffixes.
How can I modify the locale-url or add another filter to the links so that I could change the suffix and subdomain at the same time?
f.e.
example.com -> hu.example.com/hu -> es.example.com/es etc.
Here there is the localeurl chlocale function:
def chlocale(url, locale):
"""
Changes the URL's locale prefix if the path is not locale-independent.
Otherwise removes locale prefix.
"""
_, path = utils.strip_script_prefix(url)
_, path = utils.strip_path(path)
return utils.locale_url(path, locale)
chlocale = stringfilter(chlocale)
register.filter('chlocale', chlocale)
That's my call as URL href:
Hungary
This one actually returns only the relative path not the http full address of the webpage, so it's OK to attach prefix http://sitename.domain at the beginning before the {{ request.path... }} call.
domain = Site.objects.get_current().domain
Hungary
A little hacky but perhaps what you are looking for.
Related
I have a website building tool created in Django and I'd like to add easy user defined 301 redirects to it.
Webflow has a very easy to understand tool for 301 redirects. You add a path (not just a slug) and then define where that path should lead the user.
I'd like to do the same for the Django project I'm working on. I currently allow users to set a slug that redirects /<slug:redirect_slug>/ and they can set to go to any URL. But I'd like them to be able to add, for example, the path for an old blog post '/2018/04/12/my-favorite-thing/'
What's the best URL conf to use in Django to safely accept any path the user wants?
You can use the Path Converters that convert the path parameters into appropriate types, which also includes a converter for urls.
An example would be like the following:
path('api/<path:encoded_url>/', YourView.as_view()),
As per the docs:
Matches any non-empty string, including the path separator, '/'. This allows you to match against a complete URL path rather than just a segment of a URL path as with str.
In your view, you can get your URL like this:
encoded_url = self.kwargs.get('encoded_url')
Add a RerouteMiddleware which first checks if the request can be served by the existing URLs from the urls.py. If it cannot be served, check if the requested path is from the old -> new URLs mapping, if a match found redirect it to the new URL.
Sample piece of code to try it out.
try:
resolve(request.path_info)
except Resolver404:
# Check if the URL exists in your database/constants
# where you might have stored the old -> new URL mapping.
if request.path is valid:
new_url = # Retrieve the new URL
return redirect(new_url)
response = self.get_response(request)
return response
In my Django apps, I have many urls including /(?P<project_name>[_\w]+)/.
The project_name is defined by users and it is an attribute of the Project model.
I've added a validator on the project_name to check if it's lowercase.
So new names are all lowercase but some old names include uppercase characters.
I would like to change all the names stored to make them lowercase but at the same time, I don't want users to get an error when trying to access to one of the url with the old project name including uppercase characters.
As I have many urls and many views, I don't want to update each to manually .lower() the project_name.
Is there a way to redirect all urls including /<project_NAME>/ to /<project_name>/?
Hacky way with decorators
You could create a decorator for all your views that use a project_name:
def project_lowercase(_func=None):
def checkLowercase(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
request = args[0]
project_name = args[1]
if not project_name.islower():
return HttpResponseRedirect(reverse(your_url_name, kwargs{'project_name' = project_name.lower()))
return func(*args, **kwargs)
return wrapper
return checkLowercase(_func)
Replace your_url_name with whatever you named your url route, then import the function and add the decorator above each view function:
#project_lowercase
def view_project(request, project_name):
# ...
return "Hello World!"
Slugs would be better
Alternitively as suggested in the comments and a better solution you should use a slug to store urls for your projects, have a look here to see how to add them to your models and generate slugs.
To move your existing data to use slugs or just to update project names you can create a data migration that will alter existing data in your database.
The first option I would suggest is to avoid regular expressions, switch to path for your urlpatterns and use custom url converter for your project names.
In case you want to keep using regular expressions, you could restrict your pattern to only accept lower case project names /(?P<project_name>[_a-z0-9]+)/. After that add url pattern, which is (effectively) the same as your current one /(?P<project_name>\w+)/ (note missing _ - \w already includes that) to match all project names, including the legacy ones, the view for that pattern would redirect to your first view with the lower case project_name.
Suppose my site's domain is mysite.com , now whenever a request comes in this form : mysite.com/https://stackoverflow.com :I want to fetch out this url "https://stackoverflow.com" and send it to the corresponding view.
I have tried this pattern :
url(r'^(?P<preurl>http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)$',prepend_view)
regex of which matches the incoming appended url and assigns variable preurl the value "https://stackoverflow.com", which I access in corresponding view function .
This works fine for above example but my url pattern is failing in case of some exceptional urls..
Please suggest a robust url pattern by taking into consideration all exceptional urls too, like the following:
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
mailto:John.Doe#example.com
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
That is, if a request comes like :
mysite.com/ldap://[2001:db8::7]/c=GB?objectClass?one
I should be able to get the value "ldap://[2001:db8::7]/c=GB?objectClass?one" in variable preurl
You don't have to make this type of complex url pattern, First, make a URL pattern that matches everything.
url(r'^.*/$', views.fast_track_service, name='fast_track'),
and append it to the end in urlpatterns in your urls.py then in your view, Use request object, So You can get the full path of get request with this method,
fast_track_url = request.get_full_path()[1:]
and then once you got the url try validating that with URLValidator like this.
if not 'http://' in fast_track_url and not 'https://' in fast_track_url:
fast_track_url = 'http://' + fast_track_url
url_validate = URLValidator()
try:
url_validate(fast_track_url)
except:
raise Http404
If you want to validate other complicated URL like mailto etc, then you can write your own validator.
I'm new to Python, and extremely impressed by the amount of libraries at my disposal. I have a function already that uses Beautiful Soup to extract URLs from a site, but not all of them are relevant. I only want webpages (no media) on the same website (domain or subdomain, but no other domains). I'm trying to manually program around examples I run into, but I feel like I'm reinventing the wheel - surely this is a common problem in internet applications.
Here's an example list of URLs that I might retrieve from a website, say http://example.com, with markings for whether or not I want them and why. Hopefully this illustrates the issue.
Good:
example.com/page - it links to another page on the same domain
example.com/page.html - has an filetype ending but it's an HTML page
subdomain.example.com/page.html - it's on the same site, though on a subdomain
/about/us - it's a relative link, so it doesn't have the domain it it, but it's implied
Bad:
otherexample.com/page - bad, the domain doesn't match
example.com/image.jpg - bad, not an image and not a page
/ - bad - sometimes there's just a slash in the "a" tag, but that's a reference to the page I'm already on
#anchor - this is also a relative link, but it's on the same page, so there's no need for it
I've been writing cases in if statements for each of these...but there has to be a better way!
Edit: Here's my current code, which returns nothing:
ignore_values = {"", "/"}
def desired_links(href):
# ignore if href is not set
if not href:
return False
# ignore if it is just a link to the same page
if href.startswith("#"):
return False
# skip ignored values
if href in ignore_values:
return False
def explorePage(pageURL):
#Get web page
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
response = opener.open(pageURL)
html = response.read()
#Parse web page for links
soup = BeautifulSoup(html, 'html.parser')
links = [a["href"] for a in soup.find_all("a", href=desired_links)]
for link in links:
print(link)
return
def main():
explorePage("http://xkcd.com")
BeautifulSoup is quite flexible in helping you to create and apply the rules to attribute values. You can create a filtering function and use it as a value for the href argument to find_all().
For example, something for you to start with:
ignore_values = {"", "/"}
def desired_links(href):
# ignore if href is not set
if not href:
return False
# ignore if it is just a link to the same page
if href.startswith("#"):
return False
# skip ignored values
if href in ignore_values:
return False
# TODO: more rules
# you would probably need "urlparse" package for a proper url analysis
return True
Usage:
links = [a["href"] for a in soup.find_all("a", href=desired_links)]
You should take a look at Scrapy and its Link Extractors.
I am looking for a list of (doesnt matter if its not all, just needs to be big as its for generating dummy data)
Im looking for a list like
.net.nz
.co.nz
.edu.nz
.govt.nz
.com.au
.govt.au
.com
.net
any ideas where I can locate a list?
There are answers here. Most of them are relating to the use of http://publicsuffix.org/, and even some implementations to use it were given in some languages, like Ruby.
To get all the ICANN domains, this python code should work for you:
import requests
url = 'https://publicsuffix.org/list/public_suffix_list.dat'
page = requests.get(url)
icann_domains = []
for line in page.text.splitlines():
if 'END ICANN DOMAINS' in line:
break
elif line.startswith('//'):
continue
else:
domain = line.strip()
if domain:
icann_domains.append(domain)
print(len(icann_domains)) # 7334 as of Nov 2018
Remove the break statement to get private domains as well.
Be careful as you will get some domains like this: *.kh (e.g. http://www.mptc.gov.kh/dns_registration.htm). The * is a wildcard.