HTTP headers list - django

I am studying Django and have created a page that shows all HTTP headers in a request using request.META dictionary. I'm running it locally and it the page shows me a weird amount of headers like 'TEMP' containing the path to my Windows temp folder, or 'PATH' with my full path parameters and much more information that I don't really find necessary to share in my browser requests (like installed applications).
Is it normal? What do I do about it?

So, let's jump quickly into Django's source code:
django/core/handlers/wsgi.py
class WSGIRequest(http.HttpRequest):
def __init__(self, environ):
...
self.META = environ
self.META['PATH_INFO'] = path_info
self.META['SCRIPT_NAME'] = script_name
...
This handler is used by default in runserver command and every other wsgi server. The environ dictionary comes from the underlying web server. And it is filled with lots of data. You can read more about environ dictionary here in the official wsgi docs:
https://www.python.org/dev/peps/pep-0333/#environ-variables
Also note that any web server is free to add its own variables to environ. I assume that's why you see things like TEMP. They are probably used internally by the web server.
If you wish to get headers only then wsgi mandates that headers have to start with HTTP_ prefix with the exception of CONTENT_TYPE and CONTENT_LENGTH headers.
So Django's docs are misleading. The META field contains more then headers only. It is neither correct nor incorrect, it's just how it is. Special care has to be taken when dealing with META. Leaking some of the data might be a serious security issue.

Related

Flask default http headers - Caching and more

I'm inquiring on Flask and was wondering what are the default values for things such as cache-control. I can't find information on any HTTP headers. Maybe I'm mistaken and it's the server software who takes care of this part.
Thanks
For your first question about Caching:
As Flask docs state
Flask itself does not provide caching for you, but Flask-Caching, an extension for Flask does.
So you can use Flask-Caching if you want to do caching for your website.
For your second question about http headers:
You can use request object to get different header values How to get http headers in flask? and use make_response to set custom headers.

Get HTTP Request like transfered over the wire (Django)

Is it possible to get the http request as bytestring like it gets transferred over the wire if you have a django request object?
Of course the plain text (not encrypted if https gets used).
I would like to store the bytestring to analyze it later.
At best I would like to access the real bytestring. Creating a bytestring from request.META, request.GET and friends will likely not be the same like the original.
Update: it seems that it is impossible to get to the original bytes. Then the question is: how to construct a bytestring which roughly looks like the original?
As others pointed out it is not possible because Django doesn't interact with raw requests.
You could just try reconstructing the request like this.
def reconstruct_request(request):
headers = ''
for header, value in request.META.items():
if not header.startswith('HTTP'):
continue
header = '-'.join([h.capitalize() for h in header[5:].lower().split('_')])
headers += '{}: {}\n'.format(header, value)
return (
'{method} HTTP/1.1\n'
'Content-Length: {content_length}\n'
'Content-Type: {content_type}\n'
'{headers}\n\n'
'{body}'
).format(
method=request.method,
content_length=request.META['CONTENT_LENGTH'],
content_type=request.META['CONTENT_TYPE'],
headers=headers,
body=request.body,
)
NOTE this is not a complete example only proof of concept
The basic answer is no, Django doesn't have access to the raw request, in fact it doesn't even have code to parse raw HTTP request.
This is because Django's (like many other Python web frameworks) HTTP request/response handling is, in it's core, a WSGI application (WSGI specification).
It's the job of the frontend/proxy server (like Apache or nginx) and application server (like uWSGI or gunicorn) to "massage" the request (like transforming and stripping headers) and convert it into an object that can be handled by Django.
As an experiment you can actually wrap Django's WSGI application yourself and see what Django gets to work with when a request comes in.
Edit your project's wsgi.py and add some extremely basic WSGI "middleware":
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project.settings')
class MyMiddleware:
def __init__(self, app):
self._app = app
def __call__(self, environ, start_response):
import pdb; pdb.set_trace()
return self._app(environ, start_response)
# Wrap Django's WSGI application
application = MyMiddleware(get_wsgi_application())
Now if you start your devserver (./manage.py runserver) and send a request to your Django application. You'll drop into a debugger.
The only thing of interest here is the environ dict. Poke around it and you'll see that it's pretty much the same as what you'll find in Django's request.META. (The contents of the environ dict is detailed in this section of the WSGI spec.)
Knowing this, the best thing you can get is piecing together items form the environ dict to something that remotely resembles an HTTP request.
But why? If you have an environ dict, you have all the information you need to replicate a Django request. There's no actual need to translate this back to a HTTP request.
In fact, as you now known, you don't need a HTTP request at all to call Django's WSGI application. All you need is a environ dict with the required keys and a callable so that Django can relay the response.
So, to analyze requests (and even be able to replay them) you only need to be able to recreate a valid environ dict.
To do so in Django the easiest option would be to serialize request.META and request.body to a JSON dict.
If you really need something that resembles an HTTP request (and you are unable to go a level up to e.g. the webserver to log this information) you'll just have to piece this together from the information available in request.META and request.body, with the caveats that this is not a realistic representation of the original HTTP request.

Django REMOTE_USER does not exist but HTTP_REMOTE_USER does

All,
I have what should be a very simple problem. I am trying to use Django authentication using the REMOTE_USER variable following these instructions: https://docs.djangoproject.com/en/1.8/howto/auth-remote-user/.
Then, to test that this is working, I am using the postman chrome extension. There I am setting a header variable with the name "REMOTE-USER" and then text for a superuser, and then I'm hitting the django admin page. I don't automatically login.
I set a break point in the process_request function in the RemoteUserMiddleware class. When I make the request, I see that request.META["HTTP_REMOTE_USER"] exists but request.META["REMOTE_USER"] does not exist. The default RemoteUserMiddleware variable uses header="REMOTE_USER". It seems that HTTP Header variables gets a HTTP_ prefix, so I don't understand how this would ever work.
I feel like I must be missing something obvious. Thanks!
The REMOTE_USER is meant to be an environment variable set by your web server (e.g. Apache), not an HTTP header. If it was an HTTP header, then users would be able to spoof the header, and log in as any user they wanted.
All http headers are prefixed HTTP_ so that you can distinguish between them and environment variables.
You can set the environment variable with the development server as follows.
REMOTE_USER=admin ./manage.py runserver

Why am I getting different mime types on my flask app on localhost and openshift?

I have a dynamic css file which loads fonts using font-face which is generated by a request and I am setting the content headers explicitly. It's all working well as far as mime types are concerned at localhost(text/css in network tab) except that the fonts are not loaded in chrome but works in firefox. But that's a different issue, so now I put the code on openshift and by magic response has a text/html header. What am I missing here ?
resp = make_response(render_template('webfonts.css', fonts=fonts))
resp.headers.add('content-type', 'text/css')
return resp
heres the flask code.
and heres the url
http://flaskexample-diadara.rhcloud.com/api/webfonts?font=LohitGujarati
I had the same issue (with the Flask built-in server) and also came across your question, I found the following:
While adding the header can be found elsewhere as the recommended solution it actually doesn't set one of the properties in the Response object (which actually makes sense if you think about it) making the server still send out a default text/html header.
The way that I found it to work is this:
response = Response(render_template('css/' + filename), mimetype='text/css')
return response
You should also do
from flask import Response

Microsoft Azure appending extra query string to urls with query strings

In deploying a version of the Django website I'm working on to Microsoft's Azure service, I added a page which takes a query string like
http://<my_site_name>.azurewebsites.net/security/user/?username=<some_username>&password=<some_password>
However, I was getting 404 responses to this URL. So I turned on Django's Debug flag and the page I get returned said:
Page not found (404)
Request Method: GET
Request URL: http://<my_site_name>.azurewebsites.net/security/user/?username=<some_username>&password=<some_password>?username=<some_username>&password=<some_password>
Using the `URLconf` defined in `<my_project_name>.urls`, Django tried these URL patterns, in this order:
^$
^security/ ^user/$
^account/
^admin/
^api/
The current URL, `security/user/?username=<some_username>&password=<some_password>`, didn't match any of these.
So it seems to be appending the query string onto the end of the url that already has the same query string. I have the site running on my local machine and on an iis server on my internal network which I'm using for staging before pushing to Azure. Neither of these site deployments do this, so this seems to be something specific to Azure.
Is there something I need to set in the Azure website management interface to prevent it from modifying URLs with query strings? Is there something I'm doing wrong with regards to using query strings with Azure?
In speaking to the providers of wfastcgi.py they told me it may be an issue with wfastcgi.py that is causing this problem. While they look into it they gave me a work around that fixes the issue.
Download the latest copy of wfastcgi.py from http://pytools.codeplex.com/releases
In that file find this part of the code:
if 'HTTP_X_ORIGINAL_URL' in record.params:
# We've been re-written for shared FastCGI hosting, send the original URL as the PATH_INFO.
record.params['PATH_INFO'] = record.params['HTTP_X_ORIGINAL_URL']
And add right below it (still part of the if block):
# PATH_INFO is not supposed to include the query parameters, so remove them
record.params['PATH_INFO'] = record.params['PATH_INFO'].split('?')[0]
Then, upload/deploy this modified file to the Azure site (either use the ftp to put it somewhere or add it to your site deployment. I'm deploying it so that if I need to modify it further its versioned and backed up.
In the Azure management page for the site, go to the site's configure page and change the handler mapping to point to the modified wfastcgi.py file and save the configuration.
i.e. my handler used to be the default D:\python27\scripts\wfastcgi.py. Since I deployed my modified file, the handler path is now: D:\home\site\wwwroot\wfastcgi.py
I also restarted the site, but you may not have to.
This modified script should now strip the query string from PATH_INFO, and urls with query strings should work. I'll be using this until I hear from the wfastcgi.py devs that the default wfastcgi.py file in the Python27 install has been fixed/replaced.