Get HTTP Request like transfered over the wire (Django) - django

Is it possible to get the http request as bytestring like it gets transferred over the wire if you have a django request object?
Of course the plain text (not encrypted if https gets used).
I would like to store the bytestring to analyze it later.
At best I would like to access the real bytestring. Creating a bytestring from request.META, request.GET and friends will likely not be the same like the original.
Update: it seems that it is impossible to get to the original bytes. Then the question is: how to construct a bytestring which roughly looks like the original?

As others pointed out it is not possible because Django doesn't interact with raw requests.
You could just try reconstructing the request like this.
def reconstruct_request(request):
headers = ''
for header, value in request.META.items():
if not header.startswith('HTTP'):
continue
header = '-'.join([h.capitalize() for h in header[5:].lower().split('_')])
headers += '{}: {}\n'.format(header, value)
return (
'{method} HTTP/1.1\n'
'Content-Length: {content_length}\n'
'Content-Type: {content_type}\n'
'{headers}\n\n'
'{body}'
).format(
method=request.method,
content_length=request.META['CONTENT_LENGTH'],
content_type=request.META['CONTENT_TYPE'],
headers=headers,
body=request.body,
)
NOTE this is not a complete example only proof of concept

The basic answer is no, Django doesn't have access to the raw request, in fact it doesn't even have code to parse raw HTTP request.
This is because Django's (like many other Python web frameworks) HTTP request/response handling is, in it's core, a WSGI application (WSGI specification).
It's the job of the frontend/proxy server (like Apache or nginx) and application server (like uWSGI or gunicorn) to "massage" the request (like transforming and stripping headers) and convert it into an object that can be handled by Django.
As an experiment you can actually wrap Django's WSGI application yourself and see what Django gets to work with when a request comes in.
Edit your project's wsgi.py and add some extremely basic WSGI "middleware":
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project.settings')
class MyMiddleware:
def __init__(self, app):
self._app = app
def __call__(self, environ, start_response):
import pdb; pdb.set_trace()
return self._app(environ, start_response)
# Wrap Django's WSGI application
application = MyMiddleware(get_wsgi_application())
Now if you start your devserver (./manage.py runserver) and send a request to your Django application. You'll drop into a debugger.
The only thing of interest here is the environ dict. Poke around it and you'll see that it's pretty much the same as what you'll find in Django's request.META. (The contents of the environ dict is detailed in this section of the WSGI spec.)
Knowing this, the best thing you can get is piecing together items form the environ dict to something that remotely resembles an HTTP request.
But why? If you have an environ dict, you have all the information you need to replicate a Django request. There's no actual need to translate this back to a HTTP request.
In fact, as you now known, you don't need a HTTP request at all to call Django's WSGI application. All you need is a environ dict with the required keys and a callable so that Django can relay the response.
So, to analyze requests (and even be able to replay them) you only need to be able to recreate a valid environ dict.
To do so in Django the easiest option would be to serialize request.META and request.body to a JSON dict.
If you really need something that resembles an HTTP request (and you are unable to go a level up to e.g. the webserver to log this information) you'll just have to piece this together from the information available in request.META and request.body, with the caveats that this is not a realistic representation of the original HTTP request.

Related

How to send context request in DRF Management Commands without request Object

For cacheing reasons I store data in memcache - I'm collecting data with management-commands from Django - if I collect data all the urls are relative - for absolute path I need to pass context to serializer - but I don't have this. How can I solve this problem to get the full path?
For the public view it works well - but if I develop with localhost then I have troubles with the urls.
I tried fake a request object with requestfactory or testclient but not worked :D

How to insert Django data on Nginx logs?

Im thinking on how to retrieve Django user data on the user authetication class and pass it to Nginx session variables, then on the nginx logging settings use that data to create a Nginx access log entry that contains the Django user that create such a request.
I have found these ideas:
Get current request by Django's or Python threading
https://gist.github.com/vparitskiy/71bb97b4fd2c3fbd6d6db81546622346
https://nedbatchelder.com/blog/201008/global_django_requests.html
Set a session variable:
How can I set and get session variable in django?
And then log the cookie variable via a Nginx configuration like:
https://serverfault.com/questions/223584/how-to-add-recently-set-cookies-to-nginxs-access-log
https://serverfault.com/questions/872375/what-is-the-difference-between-http-cookie-and-cookie-name-in-nginx
Any better idea?. I'm reinventing the wheel?
Finally I have done this. Place a middleware en Django that insert in the cookies the logging data that I want nginx to log.
Then I used the $upstream_cookies_NAME to rescue the COOKIES['NAME'] if any.
have you read django's documentation on logging?
I haven't worked with nginx, yet, but with apache djangos default logger also outputs to the apache log, meaning that you can do this:
from logging import getLogger
logger = getLogger('django')
def my_view(request):
logger.info(f'my view: {request.user}')
which will output the user to the server log.

usage of self.client.get() vs self.browser.get()

I'm working through this book about TDD with Django.
I get different behaviour from using self.client.get('/') and different one from using self.browser.get('/localhost:8000') seemingly they look
the same but getting different behaviour.
class FirstTest(unittest.TestCase):
def setUp(self):
self.browser = webdriver.Chrome(os.path.join(os.getcwd(), 'chromedriver'))
def test_home_page_returns_correct_html(self):
response = self.client.get('/')
self.assertTemplateUsed(response, 'home.html')
Can anybody explain what's happening here ?
These are two different things.
self.client, is the built-in Django test client. This isn't a real browser, and doesn't even make real requests. It just constructs a Django HttpRequest object and passes it through the request/response process - middleware, URL resolver, view, template - and returns whatever Django produces. It won't parse that response at all, or render it, and won't make other requests driven by the HTML for assets etc.
But webdriver.Chrome is an actual real browser, ie Chrome. Webdriver fires up a headless version of Chrome and drives it to request your web pages. They go through actual HTTP requests and then render in the browser the response; just like a real browser, if the HTML includes links to JS or CSS it will request them and then render them as well.

HTTP headers list

I am studying Django and have created a page that shows all HTTP headers in a request using request.META dictionary. I'm running it locally and it the page shows me a weird amount of headers like 'TEMP' containing the path to my Windows temp folder, or 'PATH' with my full path parameters and much more information that I don't really find necessary to share in my browser requests (like installed applications).
Is it normal? What do I do about it?
So, let's jump quickly into Django's source code:
django/core/handlers/wsgi.py
class WSGIRequest(http.HttpRequest):
def __init__(self, environ):
...
self.META = environ
self.META['PATH_INFO'] = path_info
self.META['SCRIPT_NAME'] = script_name
...
This handler is used by default in runserver command and every other wsgi server. The environ dictionary comes from the underlying web server. And it is filled with lots of data. You can read more about environ dictionary here in the official wsgi docs:
https://www.python.org/dev/peps/pep-0333/#environ-variables
Also note that any web server is free to add its own variables to environ. I assume that's why you see things like TEMP. They are probably used internally by the web server.
If you wish to get headers only then wsgi mandates that headers have to start with HTTP_ prefix with the exception of CONTENT_TYPE and CONTENT_LENGTH headers.
So Django's docs are misleading. The META field contains more then headers only. It is neither correct nor incorrect, it's just how it is. Special care has to be taken when dealing with META. Leaking some of the data might be a serious security issue.

Django as reverse proxy

My client-server application is mainly based on special purpose http server that communicates with client in an Ajax like fashion, ie. the client GUI is refreshed upon asynchronous http request/response cycles.
Evolvability of the special purpose http server is limited and as the application grows, more and more standard features are needed which are provided by Django for instance.
Hence, I would like to add a Django application as a facade/reverse-proxy in order to hide the non-standard special purpose server and be able to gain from Django. I would like to have the Django app as a gateway and not use http-redirect for security reasons and to hide complexity.
However, my concern is that tunneling the traffic through Django on the serer might spoil performance. Is this a valid concern?
Would there be an alternative solution to the problem?
Usually in production, you are hosting Django behind a web container like Apache httpd or nginx. These have modules designed for proxying requests (e.g. proxy_pass for a location in nginx). They give you some extras out of the box like caching if you need it. Compared with proxying through a Django application's request pipeline this may save you development time while delivering better performance. However, you sacrifice the power to completely manipulate the request or proxied response when you use a solution like this.
For local testing with ./manage.py runserver, I add a url pattern via urls.py in an if settings.DEBUG: ... section. Here's the view function code I use, which supports GET, PUT, and POST using the requests Python library: https://gist.github.com/JustinTArthur/5710254
I went ahead and built a simple prototype. It was relatively simple, I just had to set up a view that maps all URLs I want to redirect. The view function looks something like this:
def redirect(request):
url = "http://%s%s" % (server, request.path)
# add get parameters
if request.GET:
url += '?' + urlencode(request.GET)
# add headers of the incoming request
# see https://docs.djangoproject.com/en/1.7/ref/request-response/#django.http.HttpRequest.META for details about the request.META dict
def convert(s):
s = s.replace('HTTP_','',1)
s = s.replace('_','-')
return s
request_headers = dict((convert(k),v) for k,v in request.META.iteritems() if k.startswith('HTTP_'))
# add content-type and and content-length
request_headers['CONTENT-TYPE'] = request.META.get('CONTENT_TYPE', '')
request_headers['CONTENT-LENGTH'] = request.META.get('CONTENT_LENGTH', '')
# get original request payload
if request.method == "GET":
data = None
else:
data = request.raw_post_data
downstream_request = urllib2.Request(url, data, headers=request_headers)
page = urllib2.urlopen(downstream_request)
response = django.http.HttpResponse(page)
return response
So it is actually quite simple and the performance is good enough, in particular if the redirect goes to the loopback interface on the same host.