A year ago, I used Django's StreamingHttpResponse to stream a text file and Chrome immediately displayed every chunk of text that it received. Now, with the same code, Chrome only displays the text when it completely loads the text file, thus risks server timeout. This does not happen with Firefox.
I created a simple test:
# views.py
import time
from django.views import generic
class TestEditView(generic.TemplateView):
def generator(self):
for _ in range(15):
time.sleep(1)
yield 'THIS IS {}\n'.format(_)
print('LOG: THIS IS {}\n'.format(_))
def get(self, request, *args, **kwargs):
return StreamingHttpResponse(self.generator(),
content_type="text/plain; charset=utf-8")
If I access that view in Firefox, that browser will print out 'THIS IS ....' each second for 15 seconds. But in Chrome, the browser will wait 15 seconds, then print out all of the 'THIS IS...', even though the development server log 'LOG: THIS IS...' once a second.
I wonder if there is any subtlety in this problem that I missed. Thank you.
Python: 3.6.2.
Django: 1.10.5
Changing the content_type from "text/plain" to "text/html" or removing the content_type altogether solves the problem - it makes Chrome render each chunk of text immediately after it receives.
Related
I'm looking for a way to execute code in Django after the response has been sent to the client. I know the usual way is to implement a task queue (e.g., Celery). However, the PaaS service I'm using (PythonAnywhere) doesn't support task queues as of May 2019. It also seems overly complex for a few simple use cases. I found the following solution on SO: Execute code in Django after response has been sent to the client. The accepted answer works great when run locally. However, in production on PythonAnywhere, it still blocks the response from being sent to the client. What is causing that?
Here's my implementation:
from time import sleep
from datetime import datetime
from django.http import HttpResponse
class HttpResponseThen(HttpResponse):
"""
WARNING: THIS IS STILL BLOCKING THE PAGE LOAD ON PA
Implements HttpResponse with a callback i.e.,
The callback function runs after the http response.
"""
def __init__(self, data, then_callback=lambda: 'hello world', **kwargs):
super().__init__(data, **kwargs)
self.then_callback = then_callback
def close(self):
super().close()
return_value = self.then_callback()
print(f"Callback return value: {return_value}")
def my_callback_function():
sleep(20)
print('This should print 20 seconds AFTER the page loads.')
print('On PA, the page actually takes 20 seconds to load')
def test_view(request):
return HttpResponseThen("Timestamp: "+str(datetime.now()),
then_callback=my_callback_function) # This is still blocking on PA
I'm expecting the response to be sent to the client immediately, but it actually takes a full 20 seconds for the page to load. (On my laptop, the code works great. The response is sent immediately and the print statements execute 20 seconds later.)
I m trying to scrape a website that uses Ajax to load the different pages.
Although my selenium browser is navigating through all the pages, but scrapy response is still the same and it ends up scraping same response(no of pages times).
Proposed Solution :
I read in some answers that by using
hxs = HtmlXPathSelector(self.driver.page_source)
You can change the page source and then scrape. But it is not working ,also after adding this the browser also stopped navigating.
code
def parse(self, response):
self.driver.get(response.url)
pages = (int)(response.xpath('//p[#class="pageingP"]/a/text()')[-2].extract())
for i in range(pages):
next = self.driver.find_element_by_xpath('//a[text()="Next"]')
print response.xpath('//div[#id="searchResultDiv"]/h3/text()').extract()[0]
try:
next.click()
time.sleep(3)
#hxs = HtmlXPathSelector(self.driver.page_source)
for sel in response.xpath("//tr/td/a"):
item = WarnerbrosItem()
item['url'] = response.urljoin(sel.xpath('#href').extract()[0])
request = scrapy.Request(item['url'],callback=self.parse_job_contents,meta={'item': item}, dont_filter=True)
yield request
except:
break
self.driver.close()
Please Help.
When using selenium and scrapy together, after having selenium perform the click I've read the page back for scrapy using
resp = TextResponse(url=self.driver.current_url, body=self.driver.page_source, encoding='utf-8')
That would go where your HtmlXPathSelector selector line went. All the scrapy code from that point to the end of the routine would then need to refer to resp (page rendered after the click) rather than response (page rendered before the click).
The time.sleep(3) may give you issues as it doesn't guarantee the page has actually loaded, it's just an unconditional wait. It might be better to use something like
WebDriverWait(self.driver, 30).until(test page has changed)
which waits until the page you are waiting for passes a specific test, such as finding the expected page number or manufacturer's part number.
I'm not sure what the impact of closing the driver at the end of every pass through parse() is. I've used the following snippet in my spider to close the driver when the spider is closed.
def __init__(self, filename=None):
# wire us up to selenium
self.driver = webdriver.Firefox()
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
self.driver.close()
Selenium isn't in any way connected with scrapy, nor their response object, and in your code I don't see you changing the response object.
You'll have to work with them independently.
I'd like to add a 'Last seen' url list to a project, so that last 5 articles requested by users can be displayed in the list to all users.
I've read the middleware docs but could not figure out how to use it in my case.
What I need is a simple working example of a middleware that captures the requests so that they can be saved and reused.
Hmm, don't know if I would do it with middleware, or right a decorator. But as your question is about Middleware, here my example:
class ViewLoggerMiddleware(object):
def process_response(self, request, response):
# We only want to save successful responses
if response.status_code not in [200, 302]:
return response
ViewLogger.objects.create(user_id=request.user.id,
view_url=request.get_full_path(), timestamp=timezone.now())
Showing Top 5 would be something like;
ViewLogger.objects.filter(user_id=request.user.id).order_by("-timestamp")[:5]
Note: Code is not tested, I'm not sure if status_code is a real attribute of response. Also, you could change your list of valid status codes.
I have been writing tests for one of my django applications and have been looking to get around this problem for quite some time now. I have a view that sends messages using django.contrib.messages for different cases. The view looks something like the following.
from django.contrib import messages
from django.shortcuts import redirect
import custom_messages
def some_view(request):
""" This is a sample view for testing purposes.
"""
some_condition = models.SomeModel.objects.get_or_none(
condition=some_condition)
if some_condition:
messages.success(request, custom_message.SUCCESS)
else:
messages.error(request, custom_message.ERROR)
redirect(some_other_view)
Now, while testing this view client.get's response does not contain the context dictionary that contains the messages as this view uses a redirect. For views that render templates we can get access to the messages list using messages = response.context.get('messages'). How can we get access messages for a view that redirects?
Use the follow=True option in the client.get() call, and the client will follow the redirect. You can then test that the message is in the context of the view you redirected to.
def test_some_view(self):
# use follow=True to follow redirect
response = self.client.get('/some-url/', follow=True)
# don't really need to check status code because assertRedirects will check it
self.assertEqual(response.status_code, 200)
self.assertRedirects(response, '/some-other-url/')
# get message from context and check that expected text is there
message = list(response.context.get('messages'))[0]
self.assertEqual(message.tags, "success")
self.assertTrue("success text" in message.message)
You can use get_messages() with response.wsgi_request like this (tested in Django 1.10):
from django.contrib.messages import get_messages
...
def test_view(self):
response = self.client.get('/some-url/') # you don't need follow=True
self.assertRedirects(response, '/some-other-url/')
# each element is an instance of django.contrib.messages.storage.base.Message
all_messages = [msg for msg in get_messages(response.wsgi_request)]
# here's how you test the first message
self.assertEqual(all_messages[0].tags, "success")
self.assertEqual(all_messages[0].message, "you have done well")
If your views are redirecting and you use follow=true in your request to the test client the above doesn't work. I ended up writing a helper function to get the first (and in my case, only) message sent with the response.
#classmethod
def getmessage(cls, response):
"""Helper method to return message from response """
for c in response.context:
message = [m for m in c.get('messages')][0]
if message:
return message
You include this within your test class and use it like this:
message = self.getmessage(response)
Where response is what you get back from a get or post to a Client.
This is a little fragile but hopefully it saves someone else some time.
I had the same problem when using a 3rd party app.
If you want to get the messages from a view that returns an HttpResponseRedict (from which you can't access the context) from within another view, you can use get_messages(request)
from django.contrib.messages import get_messages
storage = get_messages(request)
for message in storage:
do_something_with_the_message(message)
This clears the message storage though, so if you want to access the messages from a template later on, add:
storage.used = False
Alternative method mocking messages (doesn't need to follow redirect):
from mock import ANY, patch
from django.contrib import messages
#patch('myapp.views.messages.add_message')
def test_some_view(self, mock_add_message):
r = self.client.get('/some-url/')
mock_add_message.assert_called_once_with(ANY, messages.ERROR, 'Expected message.') # or assert_called_with, assert_has_calls...
In my Django application I want to keep track of whether a response has been sent to the client successfully. I am well aware that there is no "watertight" way in a connectionless protocol like HTTP to ensure the client has received (and displayed) a response, so this will not be mission-critical functionality, but still I want to do this at the latest possible time. The response will not be HTML so any callbacks from the client (using Javascript or IMG tags etc.) are not possible.
The "latest" hook I can find would be adding a custom middleware implementing process_response at the first position of the middleware list, but to my understanding this is executed before the actual response is constructed and sent to the client. Are there any hooks/events in Django to execute code after the response has been sent successfully?
The method I am going for at the moment uses a subclass of HttpResponse:
from django.template import loader
from django.http import HttpResponse
# use custom response class to override HttpResponse.close()
class LogSuccessResponse(HttpResponse):
def close(self):
super(LogSuccessResponse, self).close()
# do whatever you want, this is the last codepoint in request handling
if self.status_code == 200:
print('HttpResponse successful: %s' % self.status_code)
# this would be the view definition
def logging_view(request):
response = LogSuccessResponse('Hello World', mimetype='text/plain')
return response
By reading the Django code I am very much convinced that HttpResponse.close() is the latest point to inject code into the request handling. I am not sure if there really are error cases that are handled better by this method compared to the ones mentioned above, so I am leaving the question open for now.
The reasons I prefer this approach to the others mentioned in lazerscience's answer are that it can be set up in the view alone and does not require middleware to be installed. Using the request_finished signal, on the other hand, wouldn't allow me to access the response object.
If you need to do this a lot, a useful trick is to have a special response class like:
class ResponseThen(Response):
def __init__(self, data, then_callback, **kwargs):
super().__init__(data, **kwargs)
self.then_callback = then_callback
def close(self):
super().close()
self.then_callback()
def some_view(request):
# ...code to run before response is returned to client
def do_after():
# ...code to run *after* response is returned to client
return ResponseThen(some_data, do_after, status=status.HTTP_200_OK)
...helps if you want a quick/hacky "fire and forget" solution without bothering to integrate a proper task queue or split off a separate microservice from your app.
I suppose when talking about middleware you are thinking about the middleware's process_request method, but there's also a process_response method that is called when the HttpResponse object is returned. I guess that will be the latest moment where you can find a hook that you can use.
Furthermore there's also a request_finished signal being fired.
I modified Florian Ledermann's idea a little bit... So someone can just use the httpresponse function normally, but allows for them to define a function and bind it to that specific httpresponse.
old_response_close = HttpResponse.close
HttpResponse.func = None
def new_response_close(self):
old_response_close(self)
if self.func is not None:
self.func()
HttpResponse.close = new_response_close
It can be used via:
def myview():
def myfunc():
print("stuff to do")
resp = HttpResponse(status=200)
resp.func = myfunc
return resp
I was looking for a way to send a response, then execute some time consuming code after... but if I can get a background (most likely a celery) task to run, then it will have rendered this useless to me. I will just kick off the background task before the return statement. It should be asynchronous, so the response will be returned before the code is finished executing.
---EDIT---
I finally got celery to work with aws sqs. I basically posted a "how to". Check out my answer on this post:
Cannot start Celery Worker (Kombu.asynchronous.timer)
I found a filthy trick to do this by accessing a protected member in HttpResponse.
def some_view(request):
# ...code to run before response is returned to client
def do_after():
# ...code to run *after* response is returned to client
response = HttpResponse()
response._resource_closers.append(do_after)
return response
It works in Django 3.0.6 , check the "close" function in the prototype of HttpResponse.
def close(self):
for closer in self._resource_closers:
try:
closer()
except Exception:
pass
# Free resources that were still referenced.
self._resource_closers.clear()
self.closed = True
signals.request_finished.send(sender=self._handler_class)