Does a local python 'long time task' function stops running when flask app url refreshes or changes without using a task queue - flask

Let me use the following to explain my question:
In my flask app, the url '/long_time_task' produces a result based on the output of foo(). Foo() will take long time to run.
I'd like to know:
when foo() is running (ie. url was clicked), but not finished.
1> If user refreshes the link, it will start another foo(). Will both results eventually (be saved in db and) visible at url /show_db_result (if no error)?
2> If user goes to other links, or log out. Will result eventually (be saved in db and) visible at url /show_db_result (if no error)?
I've tested the application myself. And all seem to work fine (ie yes to both questions) without celery or any other MQ support.
#app.route('/long_time_task')
def long_time_task():
if session.get('logged_in'):
foo()
return 'some results'
return 'please log in'
def foo():
# also require log in
# time consuming task, print('calculating')
# when finished, save result to a database
return 'msg'
#app.route('/page1')
def page1():
return 'page1'
#app.route('/page2')
def page2():
return 'page2'
#app.route('/show_db_result')
def show_db_result():
# require log in to see user's result
return 'all foo()\'s result row by row'
I understand that there are many articles about flask long_time_task and celery. I'd like to know more about the basic machenisms.

Related

How do I check if a user has entered the URL from another website in Django?

I want an effect to be applied when a user is entering my website. So therefore I want to check for when a user is coming from outside my website so the effect isnt getting applied when the user is surfing through different urls inside the website, but only when the user is coming from outside my website
You can't really check for where a user has come from specifically. You can check if the user has just arrived on your site by setting a session variable when they load one of your pages. You can check for it before you set it, and if they don't have it, then they have just arrived and you can apply your effect. There's some good examples of how sessions work here: https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/Sessions
There's a couple of ways to handle this. If you are using function based views, you can just create a separate util function and include it at the top of every page, eg,
utils.py
def first_visit(request):
"""returns the answer to the question 'first visit for session?'
make sure SESSION_EXPIRE_AT_BROWSER_CLOSE set to False in settings for persistance"""
if request.session['first_visit']:
#this is not the first session because the session variable is used.
return False
else:
#This is the first visit
...#do something
#set the session variable so you only do the above once
request.session[first_visit'] = True
return True
views.py
from utils.py import first_visit
def show_page(request):
first_visit = first_visit(request)
This approach gives you some control. For example, you may not want to run it on pages that require login, because you will already have run it on the login page.
Otherwise, the best approach depends on what will happen on the first visit. If you want just to update a template (eg, perhaps to show a message or run a script on th epage) you can use a context processor which gives you extra context for your templates. If you want to interrupt the request, perhaps to redirect it to a separate page, you can create a simple piece of middleware.
docs for middleware
docs for context processors
You may also be able to handle this entirely by javascript. This uses localStorage to store whether or not this is the user's first visit to the site and displays the loading area for 5 seconds if there is nothing in localStorage. You can include this in your base template so it runs on every page.
function showMain() {
document.getElementByID("loading").style.display = "none";
document.getElementByID("main").style.display = "block";
}
const secondVisit = localStorage.getItem("secondVisit");
if (!secondVisit) {
//show loading screen
document.getElementByID("loading").style.display = "block";
document.getElementByID("main").style.display = "none";
setTimeout(5000, showMain)
localStorage.setItem("secondVisit", "true" );
} else {
showMain()
}

Django: Script that executes many queries runs massively slower when executed from Admin view than when executed from shell

I have a script that loops through the rows of an external csv file (about 12,000 rows) and executes a single Model.objects.get() query to retrieve each item from the database (final product will be much more complicated but right now it's stripped down to the barest functionality possible to try to figure this out).
For right now the path to the local csv file is hardcoded into the script. When I run the script through the shell using py manage.py runscript update_products_from_csv it runs in about 6 seconds.
The ultimate goal is to be able to upload the csv through the admin and then have the script run from there. I've already been able to accomplish that, but the runtime when I do it that way takes more like 160 seconds. The view for that in the admin looks like...
from .scripts import update_products_from_csv
class CsvUploadForm(forms.Form):
csv_file = forms.FileField(label='Upload CSV')
#admin.register(Product)
class ProductAdmin(admin.ModelAdmin):
# list_display, list_filter, fieldsets, etc
def changelist_view(self, request, extra_context=None):
extra_context = extra_context or {}
extra_context['csv_upload_form'] = CsvUploadForm()
return super().changelist_view(request, extra_context=extra_context)
def get_urls(self):
urls = super().get_urls()
new_urls = [path('upload-csv/', self.upload_csv),]
return new_urls + urls
def upload_csv(self, request):
if request.method == 'POST':
# csv_file = request.FILES['csv_file'].file
# result_string = update_products_from_csv.run(csv_file)
# I commented out the above two lines and added the below line to rule out
# the possibility that the csv upload itself was the problem. Whether I execute
# the script using the uploaded file or let it use the hardcoded local path,
# the results are the same. It works, but takes more than 20 times longer
# than executing the same script from the shell.
result_string = update_products_from_csv.run()
print(result_string)
messages.success(request, result_string)
return HttpResponseRedirect(reverse('admin:products_product_changelist'))
Right now the actual running parts of the script are about as simple as this...
import csv
from time import time
from apps.products.models import Product
CSV_PATH = 'path/to/local/csv_file.csv'
def run():
csv_data = get_csv_data()
update_data = build_update_data(csv_data)
update_handler(update_data)
return 'Finished'
def get_csv_data():
with open(CSV_PATH, 'r') as f:
return [d for d in csv.DictReader(f)]
def build_update_data(csv_data):
update_data = []
# Code that loops through csv data, applies some custom logic, and builds a list of
# dicts with the data cleaned and formatted as needed
return update_data
def update_handler(update_data):
query_times = []
for upd in update_data:
iter_start = time()
product_obj = Product.objects.get(external_id=upd['external_id'])
# external_id is not the primary key but is an indexed field in the Product model
query_times.append(time() - iter_start)
# Code to export query_times to an external file for analysis
update_handler() has a bunch of other code checking field values to see if anything needs to be changed, and building the objects when a match does not exist, but that's all commented out right now. As you can see, I'm also timing each query and logging those values. (I've been dropping time() calls in various places all day and have determined that the query is the only part that's noticeably different.)
When I run it from the shell, the average query time is 0.0005 seconds and the total of all query times comes out to about 6.8 seconds every single time.
When I run it through the admin view and then check the queries in Django Debug Toolbar it shows the 12,000+ queries as expected, and shows a total query time of only about 3900ms. But when I look at the log of query times gathered by the time() calls, the average query time is 0.013 seconds (26 times longer than when I run it through the shell), and the total of all query times always comes out at 156-157 seconds.
The queries in Django Debug Toolbar when I run it through the admin all look like SELECT ••• FROM "products_product" WHERE "products_product"."external_id" = 10 LIMIT 21, and according to the toolbar they are mostly all 0-1ms. I'm not sure how I would check what the queries look like when running it from the shell, but I can't imagine they'd be different? I couldn't find anything in django-extensions runscript docs about it doing query optimizations or anything like that.
One additional interesting facet is that when running it from the admin, from the time I see result_string print in the terminal, it's another solid 1-3 minutes before the success message appears in the browser window.
I don't know what else to check. I'm obviously missing something fundamental, but I don't know what.
Somebody on Reddit suggested that running the script from the shell might be automatically spinning up a new thread where the logic can run unencumbered by the other Django server processes, and this seems to be the answer. If I run the script in a new thread from the admin view, it runs just as fast as it does when I run it from the shell.

Additional arguments in Flask grequests hook

I am having issue in passing additional parameter to grequests using a hook, Its working in a standalone app (non flask) but its not with flask (flask integrated server) Here is my code snippet.
self.async_list = []
for url in self.urls:
self.action_item = grequests.get(url, hooks = {'response' : [self.hook_factory(test='new_folder')]}, proxies={ 'http': 'proxy url'},timeout=20)
self.async_list.append(self.action_item)
grequests.map(self.async_list)
def hook_factory(self, test, *factory_args, **factory_kwargs):
print (test + "In start of hook factory") #this worked and I see test value is printing as new_folder
def do_something(response, *args, **kwargs):
print (test + "In do something") #This is not working hence I was not able to save this response to a newly created folder.
self.file_name = "str(test)+"/"
print ("file name is " + self.file_name)
with open(REL_PATH + self.file_name, 'wb') as f:
f.write(response.content)
return None
return do_something
Am I missing anything here?.
Trying to answer my own question, After further analysis there was nothing wrong with the above code, for some reason I was not getting my session data which is in the request_ctx_stack.top. But the same session data was available in my h_request_ctx_stack._local, Don't know the reason. But I was able to get my data from h_request_ctx_stack._local instead _request_ctx_stack.top for this hook alone. After I made that change was able execute the same hook without any issues.

#vary_on_cookie fails due to non-Django cookies

I am stumped on a caching issue in my Django 1.5.6 application:
#vary_on_cookie
#cache_page(24 * 60 * 60, key_prefix=':1:community')
#rendered_with("general/community.html")
#allow_http("GET")
def community(request):
...
return { ... }
Locally the caching is working correctly, but when I test this in staging, #vary_on_cookie isn't working -- I can see by the queries being executed that community() is being executed on subsequent calls to this page.
I updated my settings in my local environment to use the same Redis cache as staging to eliminate that difference, but the local environment continued to behave correctly.
Looking at the keys Redis has in its cache, I can see what the problem is -- in staging every time this page gets called, new keys are added to the cache. Compare the output from cache.keys('*community*'):
LOCAL:
First call to community page:
[u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.3b7d4c38ec8d92512a4a0847f4738298.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_header.:1:community.b528759dd79cf1c6b405290c0bc05e39.en-us.America/New_York']
Second call (same user):
[u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.3b7d4c38ec8d92512a4a0847f4738298.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_header.:1:community.b528759dd79cf1c6b405290c0bc05e39.en-us.America/New_York']
Notice there are the same number of keys in both cases.
STAGING:
First call to community page:
[u'community:1:views.decorators.cache.cache_header.:1:community.b528759dd79cf1c6b405290c0bc05e39.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.559380b85dc0cdcf0ff25051df78987d.en-us.America/New_York']
Second call (same user):
[u'community:1:views.decorators.cache.cache_header.:1:community.b528759dd79cf1c6b405290c0bc05e39.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.559380b85dc0cdcf0ff25051df78987d.en-us.America/New_York',
u'community:1:views.decorators.cache.cache_page.:1:community.GET.b528759dd79cf1c6b405290c0bc05e39.6ec85abcc8a14d66800228bdccc537f0.en-us.America/New_York']
Notice that an additional entry has been added to the cache though it's the same user!
I'm stumped where to go from here. Both environments are using SESSION_ENGINE = 'django.contrib.sessions.backends.cached_db'. The staging environment clearly recognizes that this is the same user in every other way. What is happening in #vary_on_cookie that is creating a difference in staging, but not locally?
I've inspected all of my staging vs. local differences, scrutinized my custom middleware, but I don't have any ideas of what to look at. Any ideas even of what to look at next would be greatly appreciated. Thanks!
UPDATE
I inspected django.utils.cache._generate_cache_key() to see how it generates that last hex section of the cache key. I naively assumed it just looked at Django's own cookies (like sessionid), but I see that it uses all of the cookies passed into HTTP_COOKIE -- that means, Django and non-Django. For me, that means cookies from Google Analytics and New Relic, neither of which I have running locally.
for header in headerlist: # headerlist = [u'HTTP_COOKIE']
value = request.META.get(header, None) # the string of all cookies, for ex: __atuvc=39%7C17%2C8%7C18; csrftoken=dPqaXS6XVGp2UUvfhEW9kS6R6WPHQlE4; sessionid=j6a83wbsq1sez9bz75n0tzl4n884umg2'
if value is not None:
ctx.update(force_bytes(value))
Can this really be true?! All of the world's Django sites using #vary_on_cookie are being thwarted by their third-party cookies?!
I created a custom decorator which hacks the HTTP headers to isolate the user's ID. (Although it sets Vary: DJANGO_USERID, Cookie in the response sent back to the browser, it doesn't include the actual ID.)
I would appreciate any feedback on this solution, since it's a bit beyond my Django comfort zone. Thanks!
def vary_on_user(view):
"""
Adapted from django.views.decorators.vary_on_cookie
"""
#wraps(view, assigned=available_attrs(view))
def inner_func(request, *args, **kwargs):
request.META['HTTP_DJANGO_USERID'] = request.user.id
response = view(request, *args, **kwargs)
patch_vary_headers(response, ('DJANGO_USERID',))
return response
return inner_func

Django - show loading message during long processing

How can I show a please wait loading message from a django view?
I have a Django view that takes significant time to perform calculations on a large dataset.
While the process loads, I would like to present the user with a feedback message e.g.: spinning loading animated gif or similar.
After trying the two different approaches suggested by Brandon and Murat, Brandon's suggestion proved the most successful.
Create a wrapper template that includes the javascript from http://djangosnippets.org/snippets/679/. The javascript has been modified: (i) to work without a form (ii) to hide the progress bar / display results when a 'done' flag is returned (iii) with the JSON update url pointing to the view described below
Move the slow loading function to a thread. This thread will be passed a cache key and will be responsible for updating the cache with progress status and then its results. The thread renders the original template as a string and saves it to the cache.
Create a view based on upload_progress from http://djangosnippets.org/snippets/678/ modified to (i) instead render the original wrapper template if progress_id='' (ii) generate the cache_key, check if a cache already exists and if not start a new thread (iii) monitor the progress of the thread and when done, pass the results to the wrapper template
The wrapper template displays the results via document.getElementById('main').innerHTML=data.result
(* looking at whether step 4 might be better implemented via a redirect as the rendered template contains javascript that is not currently run by document.getElementById('main').innerHTML=data.result)
Another thing you could do is add a javascript function that displays a loading image before it actually calls the Django View.
function showLoaderOnClick(url) {
showLoader();
window.location=url;
}
function showLoader(){
$('body').append('<div style="" id="loadingDiv"><div class="loader">Loading...</div></div>');
}
And then in your template you can do:
This will take some time...
Here's a quick default loadingDiv : https://stackoverflow.com/a/41730965/13476073
Note that this requires jQuery.
a more straightforward approach is to generate a wait page with your gif etc. and then use the javascript
window.location.href = 'insert results view here';
to switch to the results view which starts your lengthy calculation. The page wont change until the calculation is finished. When it finishes, then the results page will be rendered.
Here's an oldie, but might get you going in the right direction: http://djangosnippets.org/snippets/679/
A workaround that I chose was to use beforunload and unload events to show the loading image. This can be used with or without window.load. In my case, it's the view that is taking a great amount of time and not the page loading, hence I am not using window.load (because it's already a lot of time by the time window.load comes into picture, and at that point of time, I do not need the loading icon to be shown anymore).
The downside is that there is a false message that goes out to the user that the page is loading even when when the request has not even reached the server or it's taking much time. Also, it doesn't work for requests coming from outside my website. But I'm living with this for now.
Update: Sorry for not adding code snippet earlier, thanks #blockhead. The following is a quick and dirty mix of normal JS and JQuery that I have in the master template.
Update 2: I later moved to making my view(s) lightweight which send the crucial part of the page quickly, and then using ajax to get the remaining content while showing the loading icon. It needed quite some work, but the end result is worth it.
window.onload=function(){
$("#load-icon").hide(); // I needed the loading icon to hide once the page loads
}
var onBeforeUnLoadEvent = false;
window.onunload = window.onbeforeunload= function(){
if(!onBeforeUnLoadEvent){ // for avoiding dual calls in browsers that support both events
onBeforeUnLoadEvent = true;
$("#load-icon").show();
setTimeout(function(){
$("#load-icon").hide();},5000); // hiding the loading icon in any case after
// 5 seconds (remove if you do not want it)
}
};
P.S. I cannot comment yet hence posted this as an answer.
Iterating HttpResponse
https://stackoverflow.com/a/1371061/198062
Edit:
I found an example to sending big files with django: http://djangosnippets.org/snippets/365/ Then I look at FileWrapper class(django.core.servers.basehttp):
class FileWrapper(object):
"""Wrapper to convert file-like objects to iterables"""
def __init__(self, filelike, blksize=8192):
self.filelike = filelike
self.blksize = blksize
if hasattr(filelike,'close'):
self.close = filelike.close
def __getitem__(self,key):
data = self.filelike.read(self.blksize)
if data:
return data
raise IndexError
def __iter__(self):
return self
def next(self):
data = self.filelike.read(self.blksize)
if data:
return data
raise StopIteration
I think we can make a iterable class like this
class FlushContent(object):
def __init__(self):
# some initialization code
def __getitem__(self,key):
# send a part of html
def __iter__(self):
return self
def next(self):
# do some work
# return some html code
if finished:
raise StopIteration
then in views.py
def long_work(request):
flushcontent = FlushContent()
return HttpResponse(flushcontent)
Edit:
Example code, still not working:
class FlushContent(object):
def __init__(self):
self.stop_index=2
self.index=0
def __getitem__(self,key):
pass
def __iter__(self):
return self
def next(self):
if self.index==0:
html="loading"
elif self.index==1:
import time
time.sleep(5)
html="finished loading"
self.index+=1
if self.index>self.stop_index:
raise StopIteration
return html
Here is another explanation on how to get a loading message for long loading Django views
Views that do a lot of processing (e.g. complex queries with many objects, accessing 3rd party APIs) can take quite some time before the page is loaded and shown to the user in the browser. What happens is that all that processing is done on the server and Django is not able to serve the page before it is completed.
The only way to show a show a loading message (e.g. a spinner gif) during the processing is to break up the current view into two views:
First view renders the page with no processing and with the loading message
The page includes a AJAX call to the 2nd view that does the actual processing. The result of the processing is displayed on the page once its done with AJAX / JavaScript