Changing robot.txt in django - django

I've created a website using Django and added robots.txt using the code :
path('robots.txt', lambda r: HttpResponse("User-agent: *\nDisallow: /", content_type="text/plain")),
in my main urls.py , it works great but now i need to add some rules to it .. how to do it

robots.txt is not just an HttpResponse. It is an actual file.
You can either continue to fabricate the whole response manually using the lambda function. In this case you need to keep building up a string response.
Or you could write a file to server's disk, write rules to it, etc. and serve that file upon request to robots.txt
Further reading on robots.txt (not related to django)
Related SO question: django serving robots.txt efficiently

You can write the robots.txt under your template and then serve it as follows if you want to serve it through Django:
from django.conf.urls import url
from django.views.generic import TemplateView
urlpatterns = [
url(r'^robots.txt$', TemplateView.as_view(template_name="robots.txt", content_type="text/plain"), name="robots_file")
]
However recommended way is to serve through your web server directives.
Nginx:
location /robots.txt {
alias /path/to/static/robots.txt;
}
Apache:
<Location "/robots.txt">
SetHandler None
Require all granted
</Location>
Alias /robots.txt /var/www/html/project/robots.txt

in your main app urls.py
from django.urls import path, include
from django.views.generic.base import TemplateView
urlpatterns = [
# If you are using admin
path('admin/', admin.site.urls),
path(
"robots.txt",
TemplateView.as_view(template_name="robots.txt", content_type="text/plain"),
),
path(
"sitemap.xml",
TemplateView.as_view(template_name="sitemap.xml", content_type="text/xml"),
),
]
Then go to your template root folder and create a robots.txt file and you can add something like this
User-Agent: *
Disallow: /private/
Disallow: /junk/
Got to your tempalte root folder again and create another file sitemap.xml and you can add somemthing like this or get it done properly with sitemaps generator here is an example:
<url>
<loc>https://examplemysite.com</loc>
<lastmod>2020-02-01T15:19:02+00:00</lastmod>
<priority>1.00</priority>
</url>
Now if you run python manage.py runserver you can test it 127.0.0.1:8000/sitemap.xml and /robots.txt and it will work. But this won't work in your production server because you will need to let nginx know about this and give the paths.
So you will need to ssh into your server and for example in nginx you should have a configuration file that you named when you built it. You should cd into /etc/nginx/sites-available in that folder you should have the default file (which you should leave alone) and there should be another file there that you named, usually should be named same as your project name or website name. Open that file with nano but take a back up first. Next you can add your paths for both files like this:
Be aware of the paths, but obviously you can look at the file and you should get an idea you should see the path to static file or media. So you could do something like this.
location /robots.txt {
root /home/myap-admin/projects/mywebsitename/templates;
}
location /sitemap.xml {
root /home/myap-admin/projects/mywebsitename/templates;
}
/home/myap-admin/projects/mywebsitename/templates you should know the path to your mywebsitename. This is just an example path that leads to templates folder.
Make sure you then run service nginx restart

Related

Wagtail {{document.url}} is returning a 404 for user-uploaded files, in production

I've inherited a Wagtail CMS project but have been unable to solve an issue relating to document uploads.
Having uploaded a file through the CMS, it arrives in the documents directory /var/www/example.com/wagtail/media/documents/test_pdf.pdf which maps to the /usr/src/app/media/documents/test_pdf.pdf directory inside the docker container.
In the front end (and within the Wagtail dashboard) the document.url resolves to https://example.com/documents/9/test_pdf.pdf/ which returns a 404. Obviously the model number segment is missing from the file path above, but I read on a forum that
In Wagtail, documents are always served through a Django view (wagtail.wagtaildocs.views.serve.serve) so that we can perform additional processing on document downloads
so perhaps this, in itself, is not an issue.
There are a couple of lines in urls.py file which look correct:
urlpatterns = [
url(r'^django-admin/', admin.site.urls),
url(r'^admin/', include(wagtailadmin_urls)),
url(r'^documents/', include(wagtaildocs_urls)),
url(r'^search/$', search_views.search, name='search'),
url(r'^sitemap\.xml$', sitemap),
url(r'', include(wagtail_urls)),
# url(r'^pages/', include(wagtail_urls)),
]
if settings.DEBUG:
...
urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
and in base.py
MEDIA_ROOT = os.path.join(BASE_DIR, 'media')
MEDIA_URL = '/media/
So, my hunch is one of either:
Uploads being stored incorrectly, in a single folder rather than in subdirectories by model
The routing to this “virtual” directory is broken, so it’s breaking at the "check permissions" stage (but I couldn't figure out how routing works in Django) and returning the 404
The web server is incorrectly configured, so whilst the “virtual” URL is fine it’s actually the file URL which is broken and THIS causes the 404 (my nginx contains a /media/ location but not a /documents/ location, as I would have expected)
Something else entirely (my next step is to pull a copy down to my own machine and see if the issue still occurs)
I appreciate there isn't much to go on here but I'm hoping that someone might be able to give me some pointers as to what else I should check, as I've been banging my head against this for most of the day.
My background is with Ruby on Rails so, as with that framework, I've a feeling that there is a lot of "magic" happening behind-the-scenes that is making it very tricky to figure out what's going on.
Thanks!
You are able to see documents while developing because of
if settings.DEBUG:
...
urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
Like static files media files should be handled on server level. For example, if you are using GCP you need to update your app.yaml and add media/ the same way as static/
...
handlers:
- url: /static
static_dir: static/
- url: /media
static_dir: media/
...

Redirect for a single file with Django and Nginx

I have a Django project serving static correctly files at /static. I'd like to serve a single .txt file in the root though, without the static in the URL.
This is my Nginx section:
location /ms11973759.txt {
try_files /home/myhome/nms/static/ms11973759.txt;
}
I get a 404, although I can access the file via mysite/static/ms11973759.txt. What am I missing?
The following also does not work:
location /ms11973759.txt {
root /home/myhome/nms/static;
}
For some reason this is the only approach that finally worked for me.
I added a new view:
from django.http import HttpResponse
from django.views.decorators.http import require_GET
#require_GET
def ms11973759_txt(request):
lines = [
"my",
"many",
"lines",
]
return HttpResponse("\n".join(lines), content_type="text/plain")
...and then added a
[...]path("ms11973759.txt", ms11973759_txt),[...]
to my urls.py.

Django is redirecting from HTTPS to HTTP

I have a Django ecommerce site running, and have purchases and installed an SSL cert for it.
I have added a VirtualHost entry:
<VirtualHost *:443>
#Basic setup
ServerAdmin blah#test.com
ServerName test.com
ServerAlias www.test.com
Alias /media/admin/ /home/test/public_html/test/release/env/lib/python2.6/dist-packages/django/contrib/admin/media/
Alias /static/ /home/test/public_html/test/release/static/
Alias /media/ /home/test/public_html/test/release/media/
<Directory /home/test/public_html/test/release/>
Order deny,allow
Allow from all
</Directory>
RewriteEngine On
LogLevel warn
ErrorLog /home/test/public_html/test/logs/error.log
CustomLog /home/test/public_html/test/logs/access.log combined
WSGIDaemonProcess test user=www-data group=www-data threads=20 processes=2
WSGIProcessGroup test_ssl
WSGIScriptAlias / /home/test/public_html/test/release/apache/test.wsgi
SSLEngine On
SSLCertificateFile /etc/apache2/ssl/test.com.crt
SSLCertificateChainFile /etc/apache2/ssl/gs_root.pem
SSLCertificateKeyFile /etc/apache2/ssl/www.test.com.key
</VirtualHost>
Here is the urls.py file:
from django.conf.urls.defaults import patterns, include, url
from django.contrib import admin
from django.conf import settings
from gallery.models import LOCATIONS, Photo
admin.autodiscover()
from satchmo_store.urls import urlpatterns as satchmo_urls
from satchmo_store.shop.views.sitemaps import sitemaps
from cms.sitemaps import CMSSitemap
sitemaps['pages'] = CMSSitemap
urlpatterns = patterns('',
url(r'^admin/', include(admin.site.urls)),
url(r'^search/', include('haystack.urls')),
# Include satchmo urls. Unfortunately, this also includes it's own
# /admin/ and everything else.
url(r'^shop/', include(satchmo_urls)),
url(r'^sitemap\.xml/?$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}),
url(r'events/gallery/(.*)/(.*)/$', 'gallery.views.events_image'),
url(r'locations/view-all/(.*)/$', 'gallery.views.locations_image'),
url(r'locations/view-all/$', 'gallery.views.locations_view_all',{
'queryset':Photo.objects.filter(gallery__category=LOCATIONS).distinct()}),
url(r'^contact-us/', include('contact_form.urls')),
url(r'^', include('cms.urls')),
)
if settings.DEBUG:
urlpatterns = patterns('',
(r'^media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),
(r'^static/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.STATIC_ROOT}),
(r'^404/$', 'django.views.defaults.page_not_found'),
(r'^500/$', 'django.views.defaults.server_error'),
) + urlpatterns
There is also a conf for non ssl which is working fine.
Whenever I request the HTTPS version of the site, I get a 302 header response which redirects to the HTTP version.
There are no redirects in the apache conf that explicitly state go to port 80.
Ive been banging my head against this for a while, any help would be great!
Thanks
You probably already fixed it and it could be an entirely different problem, but I just came across something that sounds somewhat similar and as I did not find an answer that addressed your issue, I thought it might be worth to post a reply (despite I was having a 301 and you a 302).
I am running a Django site (Django 1.6.1) with gunicorn behind nginx. So nginx does the SSL. The environment variable HTTPS is set to on.
When I set up a test server without an http-to-https redirect, I noticed that some requests end up being redirected to an http address - similar to what you describe, but in my case it was just for one particular link. After looking into the request and response headers, I found out:
The initial request https://example.org/test got redirected by Django/gunicorn with 301 MOVED PERMANENTLY to http://exmaple.org/test/. nginx then responded with 400 Bad Request - The plain HTTP request was sent to HTTPS port.
Quickly I came across a setting I had not paid much attention to before: APPEND_SLASH (https://docs.djangoproject.com/en/1.6/ref/settings/#std:setting-APPEND_SLASH) with the default value True.
After adding APPEND_SLASH = False to my settings.py file, a request to https://example.org/test resulted in a 404 NOT FOUND response, without a redirect to http. So it seems that APPEND_SLASH does not respect the HTTP environment variable setting - I guess configuring SECURE_PROXY_SSL_HEADER (https://docs.djangoproject.com/en/1.6/ref/settings/#std:setting-SECURE_PROXY_SSL_HEADER) would solve this, I have not tested it yet.
By the way, the reason for that "faulty" link in my case was a hard-coded link in a template. The easy way to avoid links like that is using the built-in {% url ... %} template tag (https://docs.djangoproject.com/en/1.6/ref/templates/builtins/#url [sorry, I could not make this link clickable because I don't have "at least 10 reputation"...]).
Perhaps this helps you or anyone else who wonders why Django sometimes redirects from https to http.
I know this is an old question but I have just spent hours searching for a solution to an identical problem so I thought I would post what I eventually worked out here.
I was using Satchmo as the original poster was, It has a middleware class satchmo_store.shop.SSLMiddleware.SSLRedirect which by default sends a redirect exactly as described in the original question from https to http with a 302 header response. Commenting the line in MIDDLEWARE_CLASSES fixes the problem and may be OK if anyone wants to run completely over https but the documentation http://satchmo.readthedocs.org/en/latest/configuration.html#ssl explains how to use it properly which is what I am going to try to do.
Only thing that I can think of is your site setting in the database. If you put an explicit port number in your Site object... Could you take a look in your admin?

Where to put static files that should be served directly under the server root?

I just migrated an old Django project to make use of the staticfiles app. Before that i had all needed files in a directory called static that got served directly under the server root. This directory is now served under STATIC_URL which is fine, except for the files that should be served directly under the server root.
I know how to serve files directly from root (like /favicon.ico or /robots.txt) but where should i put those? If i put them anywhere beneath STATIC_ROOT they will be served by two URLs (e.g. /file.txt and /static/foobar/file.txt) which is not good practice.
Any ideas?
I've resolved both problems (favicon.ico, robots.txt) within url.py but with some differences.
I don't like the solution one would firstly think to make a view an perform a render_to_response.
EDIT: From Django 1.5 direct_to_template and redirect_to were deprecated, so now you can use class-bassed views.
For Django 1.5:
For robots.txt, add the following line to your urlpatterns:
from django.views.generic.base import RedirectView, TemplateView
(r'^robots\.txt$', TemplateView.as_view(template_name="robots.txt",
content_type='text/plain')),
I use the generic class view TemplateView, and specify the template to use (robots.txt that should be in your template directory), and the mimetype.
For favicon.ico, add the following line to your urlpatterns:
(r'^favicon\.ico$', RedirectView.as_view(
url=settings.STATIC_URL + 'img/favicon.ico')),
This redirects /favicon.ico to STATIC_URL + img/favicon.ico (eg: /static/img/favicon.ico)
favicon.ico shall be in your static directory.
These approaches could be used for any media or html content.
For previous versions of Django you could use:
(r'^robots\.txt$', direct_to_template, {'template': 'robots.txt',
'mimetype': 'text/plain'}),
(r'^favicon\.ico$', redirect_to,
{'url': settings.STATIC_URL + 'img/favicon.ico'}),
Keep them in static and have your webserver redirect /static/favicon.ico to /favicon.ico.
To answer more completely:
If you have the file favicon.ico, this is a static file and as such should exist inside of STATIC_ROOT. However this file is an exception to the normal rule and you do not want it to exist at /static/favicon.ico, you want it to exist at /favicon.ico. Since this is an exception to the rule, you add in a special rule just for this file to your webserver configuration so that it is also served at /favicon.ico.
Now you have the same resource served by 2 different urls which is a bad thing. Since you went out of your way to add the rule to make your file served at /, We'll assume that this is the canonical url and tell the webserver to redirect /static/favicon.ico to /favicon.ico. Now you have the same resource, served from one location.
Other files in the root of /static/ will not be affected by this, because in the rules you setup in the webserver for favicon.ico, you specified favicon.ico because of the exceptional nature of this file (and any other file you want to serve from /).

Django set up with WSGI in directory below the domain root: Url problems in templates

Django has been up and running on my mod_wsgi implementation of Apache (on Windows 7 x64, btw) for a bit. This was a huge pain, complete with having to actually hand-modify registry values to get things to install correctly and to get Django to use the same MySQL install as all my other apps. I also had to modify the PATH variable by double-escaping parentheses (Program Files (x86)) because they were screwing with batch files. But this is mostly Microsoft's fault and I'm rambling.
Here is my issue:
All of the URLs used in the URLCONF and in views work correctly. The only thing which doesn't is in templates when I try to work off the site root URL. I am running a development server and I have two Django sites, this particular one running off of www.example.com/testing.
In the templates, if I just put "/" in an < a >, then it will point to www.example.com/, and NOT www.example.com/testing/. I have read a topic on this but the issue wasn't resolved because the poster was being stubborn. I am posting my WSGI configuration in the httpd.conf below:
Listen 127.0.0.1:8001
VirtualHost *:8001 >
ServerName ****.net/testing
ServerAdmin *******#gmail.com
ErrorLog ********.log
WSGIScriptAlias /testing ****/htdocs/testing/apache/django.wsgi
Directory ****/htdocs/testing/apache >
Order deny,allow
Allow from all
/Directory>
Location "/media">
SetHandler None
/Location>
/VirtualHost>
Note: all "<" omitted so the tags would show
Here is the file django.wsgi in the above directory:
import sys
import os
sys.path.insert(0, '*****/Django/')
sys.path.insert(0, '*****/htdocs/')
sys.path.insert(0, '*****/htdocs/testing')
sys.path.insert(0, '*****/htdocs/testing/apache')
os.environ['DJANGO_SETTINGS_MODULE'] = 'testing.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
import testing.monitor
testing.monitor.start(interval=1.0)
I have an Nginx frontend which passes any non-static file from testing/ to :8001 for Apache to catch with this virtualhost.
If I omit the root "htdocs/" line from the django.wsgi file, I just get 500 errors on the site. Also, if I use the URL form relative to the current URL (ie "example/" instead of "/example/"), that works alright. But if I add the initial "/" to put the URL off the root, it will make it off of "www.example.com" instead of "www.example.com/testing" like I wanted.
Sorry for the very long post, but I wanted to be as clear as possible. Thank you for your time.
This is why you should not hard-code URLs in your templates. Of course / will take you to the root of the site, not the root of your app - that's what it's supposed to do.
Instead, give your root view a name in your urlconf:
urlpatterns = patterns('',
url('^$', 'myapp.views.index', name='home')
)
now in your template you can do:
Home
and this will correctly resolve to /testing/.