Disable caching for a view or url in django - django

In django, I wrote a view that simply returns a file, and now I am having problems because memcache is trying to cache that view, and in it's words, "TypeError: can't pickle file objects".
Since I actually do need to return files with this view (I've essentially made a file-based cache for this view), what I need to do is somehow make it so memcache can't or won't try to cache the view.
I figure this can be done in two ways. First, block the view from being cached (a decorator would make sense here), and second, block the URL from being cached.
Neither seems to be possible, and nobody else seems to have run into this problem, at least not on the public interwebs. Help?
Update: I've tried the #never_cache decorator, and even thought it was working, but while that sets the headers so other people won't cache things, my local machine still does.

from django.views.decorators.cache import never_cache
#never_cache
def myview(request):
# ...
Documentation is here...

Returning a real, actual file object from a view sounds like something is wrong. I can see returning the contents of a file, feeding those contents into an HttpResponse object. If I understand you correctly, you're caching the results of this view into a file. Something like this:
def myview(request):
file = open('somefile.txt','r')
return file # This isn't gonna work. You need to return an HttpRequest object.
I'm guessing that if you turned caching off entirely in settings.py, your "can't pickle a file object" would turn into a "view must return an http response object."
If I'm on the right track with what's going on, then here are a couple of ideas.
You mentioned you're making a file-based cache for this one view. You sure you want to do that instead of just using memcached?
If you really do want a file, then do something like:
def myview(request):
file = open('somefile.txt','r')
contents = file.read()
resp = HttpRespnse()
resp.write(contents)
file.close()
return resp
That will solve your "cannot pickle a file" problem.

You probably did a per site cache, but what you want to do now is a per view cache. The first one is easier to implement, but is only meant for the case of 'just caching everything'. Because you want to choose for every view now, just switch to the fine grained approach. It is also very easy to use, but remember that sometimes you need to create a second view with the same contents, if you want to have the result sometimes cached and sometimes not, depending on the url.
So far to the answer to your question. But is that an answer to your problem? Why do you return files in a view? Normally static files like videos, pictures, css, flash games or whatever should be handled by the server itself (or even by a different server). And I guess, that is what you want to do in that view. Is that correct? The reason for not letting django do this is, because starting django and letting django do its thing also eats a lot of resoruces and time. You don't feel that, when you are the only user in your test environment. But when you want to scale to some thousand users or more, then this kind of stuff becomes very nasty. Also from a logical point of view it does not seem smart, to let a program handle files without changing them, when the normal job of the program is to generate or change HTML according to a state of your data and a user-request. It's like letting your accountant do the programming work. While he might be able to do it, you probably want somebody else do it and let the accountant take care of your books.

Related

How to create separate python script for uploading data into ndb

Can anyone guide me towards the right direction as to where I should place a script solely for loading data into ndb. As I wish to upload all data into the gae ndb so that the application could perform query on it.
Right now, the loading of data is in my application. I wish to placed it separately from the main application.
Should it be edited in the yaml file?
EDITED
This is a snippet of the entity and the handler to upload the data into GAE ndb.
I wish to placed this chunk of code separately from my main application .py. Reason being the uploading of this data won't be done frequently and to keep the codes in the main application "cleaner".
class TagTrend_refine(ndb.Model):
tag = ndb.StringProperty()
trendData = ndb.BlobProperty(compressed=True)
class MigrateData(webapp2.RequestHandler):
def get(self):
listOfEntities = []
f = open("tagTrend_refine.txt")
lines = f.readlines()
f.close()
for line in lines:
temp = line.strip().split("\t")
data = TagTrend_refine(
tag = temp[0],
trendData = temp[1]
)
listOfEntities.append(data)
ndb.put_multi(listOfEntities)
For example if I placed the above code in a file called dataLoader.py, where should I call this script to invoke?
In app.yaml alongside my main application(knowledgeGraph.application)?
- url: /.*
script: knowledgeGraph.application
You don't show us the application object (no doubt a WSGI app) in your knowledge.py module, so I can't know what URL you want to serve with the MigrateData handler -- I'll just guess it's something like /migratedata.
So the class TagTrend_refine should be in a separate file (usually called models.py) so that both your dataloader.py, and your knowledge.py, can import models to access it (and models.py will need its own import of ndb of course). (Then of course access to the entity class will be as models.TagTrend_refine -- very basic Python).
Next, you'll complete dataloader.py by defining a WSGI app, e.g, at end of file,
app = webapp2.WSGIApplication(routes=[('/migratedata', MigrateData)])
(of course this means this module will need to import webapp2 as well -- can I take for granted a knowledge of super-elementary Python?).
In app.yaml, as the first URL, before that /.*, you'll have:
url: /migratedata
script: dataloader.app
Given all this, when you visit '/migratedata', your handler will read the "tagTrend_refine.txt" file that you uploaded together with your .py, .yaml, and so on, files in your overall GAE application, and unconditionally create one entity per line of that file (assuming you fix the multiple indentation problems in your code as displayed above, but, again, this is just super-elementary Python -- presumably you've used both tabs and spaces and they show up OK in your editor, but not here on SO... I recommend you use strictly, only spaces, never tabs, in Python code).
However this does seem to be a peculiar task. If /migratedata gets visited twice, it will create duplicates of all entities. If you change the tagTrend_refine.txt and deploy a changed variation, then visit /migratedata... all old entities will stick around and all the new entities will join them. And so forth.
Moreover -- /migratedata is NOT idempotent (if visited more than once it does not produce the same state as running it just once) so it shouldn't be a GET (and now we're on to super-elementary HTTP for a change!-) -- it should be a POST.
In fact I suspect (but I'm really flying blind here, since you see fit to give such tiny amounts of information) that you in fact want to upload a .txt file to a POST handler and do the updates that way (perhaps avoiding duplicates...?). However, I'm no mind reader, so this is about as far as I can go.
I believe I have fully answered the question you posted (though perhaps not the one you meant but didn't express:-) and by SO's etiquette it would be nice to upvote and accept this answer, then, if needed, post another question, expressing MUCH more clearly and completely what you're trying to achieve, your current .py and .yaml (ideally with correct indentation), what they actually do and why you'd like to do something different. For POST vs GET in particular, just study When should I use GET or POST method? What's the difference between them? ...
Alex's solution will work, as long as all you data can be loaded in under 1 minute, as that's the timeout for an app engine request.
For larger data, consider calling the datastore API directly from your own computer where you have the source. It's a bit of a hassle because it's a different API; it's not ndb. But it's still a pretty simple API. Here's some code that calls the API:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/2-structured-data/bookshelf/model_datastore.py
Again, this code can run anywhere. It doesn't need to be uploaded to app engine to run.

Flask/Python calling def in approute

This might be a stupid question, but I am new to coding with Python.
Using flask and calling #app.route(), i need to create several HTML files.
Rather then coding everything inside of #app.route(), can i call different def inside the app.route before returning the render_template?
Edit:
So i am creating HTML files. Rather then opening 3-4 different documents inside the app.route and printing lines to them, can I create 3-4 functions in the main code to handle each document.
so rather then:
#app.route('/')
Print all html files
have:
def html1():
write html files
#app.route('/')
html1()
render_template
Yes, you can do whatever processing you need to inside a route before finally returning the response to the client. Be aware of load times, however - people don't like waiting long for a page to load.
That being said, you should really take a look at the template tutorial of the official docs. Your approach sounds like a perfect place to make use of templates as Jinja is doing what you describe, but in real-time: taking the HTML file, replacing certain places with your data, and presenting that to the user.

Removing query string from url in django while keeping GET information

I am working on a Django setup where I can receive a url containining a query string as part of a GET. I would like to be able to process the data provided in the query string and return a page that is adjusted for that data but does not contain the query string in the URL.
Ordinarily I would just use reverse(), but I am not sure how to apply it in this case. Here are the details of the situation:
Example URL: .../test/123/?list_options=1&list_options=2&list_options=3
urls.py
urlpatterns = patterns('',
url(r'test/(P<testrun_id>\d+)/'), views.testrun, name='testrun')
)
views.py
def testrun(request, testrun_id):
if 'list_options' in request.GET.keys():
lopt = request.GET.getlist('list_options')
:
:
[process lopt list]
:
:
:
:
[other processing]
:
:
context = { ...stuff... }
return render(request, 'test_tracker/testview.html', context)
When the example URL is processed, Django will return the page I want but with the URL still containing the query string on the end. The standard way of stripping off the unwanted query string would be to return the testrun function with return HttpResponseRedirect(reverse('testrun', args=(testrun_id,))). However, if I do that here then I'm going to get an infinite loop through the testrun function. Furthermore, I am unsure if the list_options data that was on the original request will still be available after the redirect given that it has been removed from the URL.
How should I work around this? I can see that it might make sense to move the parsing of the list_options variable out into a separate function to avoid the infinite recursion, but I'm afraid that it will lose me the list_options data from the request if I do it that way. Is there a neat way of simultaneously lopping the query string off the end of the URL and returning the page I want in one place so I can avoid having separate things out into multiple functions?
EDIT: A little bit of extra background, since there have been a couple of "Why would you want to do this?" queries.
The website I'm designing is to report on the results of various tests of the software I'm working on. This particular page is for reporting on the results of a single test, and often I will link to it from a bigger list of tests.
The list_options array is a way of specifying the other tests in the list I have just come from. This allows me to populate a drop-down menu with other relevant tests to allow me to easily switch between them.
As such, I could easily end up passing in 15-20 different values and creating huge URLs, which I'd like to avoid. The page is designed to have a default set of other tests to fill in the menu in question if I don't suggest any others in the URL, so it's not a big deal if I remove the list_options. If the user wishes to come back to the page directly he won't care about the other tests in the list, so it's not a problem if that information is not available.
First a word of caution. This is probably not a good idea to do for various reasons:
Bookmarking. Imagine that .../link?q=bar&order=foo will filter some search results and also sort the results in particular order. If you will automatically strip out the querystring, then you will effectively disallow users to bookmark specific search queries.
Tests. Any time you add any automation, things can and will probably go wrong in ways you never imagined. It is always better to stick with simple yet effective approaches since they are widely used thus are less error-prone. Ill give an example for this below.
Maintenance. This is not a standard behavior model therefore this will make maintenance harder for future developers since first they will have to understand first what is going on.
If you still want to achieve this, one of the simplest methods is to use sessions. The idea is that when there is a querystring, you save its contents into a session and then you retrieve it later on when there is no querystring. For example:
def testrun(request, testrun_id):
# save the get data
if request.META['QUERY_STRING']:
request.session['testrun_get'] = request.GET
# the following will not have querystring hence no infinite loop
return HttpResponseRedirect(reverse('testrun', args=(testrun_id,)))
# there is no querystring so retreive it from session
# however someone could visit the url without the querystring
# without visiting the querystring version first hence
# you have to test for it
get_data = request.session.get('testrun_get', None)
if get_data:
if 'list_options' in get_data.keys():
...
else:
# do some default option
...
context = { ...stuff... }
return render(request, 'test_tracker/testview.html', context)
That should work however it can break rather easily and there is no way to easily fix it. This should illustrate the second bullet from above. For example, imagine a user wants to compare two search queries side-by-side. So he will try to visit .../link?q=bar&order=foo and `.../link?q=cat&order=dog in different tabs of the same browser. So far so good because each page will open correct results however as soon as the user will try to refresh the first opened tab, he will get results from the second tab since that is what is currently stored in the session and because browser will have a single session token for both tabs.
Even if you will find some other method to achieve what you want without using sessions, I imagine that you will encounter similar issues because HTTP is stateless hence you will have to store state on the server.
There is actually a way to do this without breaking much of the functionality - store state on client instead of server-side. So you will have a url without a querystring and then let javascript query some API for whatever you will need to display on that page. That however will force you to make some sort of API and use some javascript which does not exactly fall into the scope of your question. So it is possible to do cleanly however that will involve more than just using Django.

django cache conditional view processing latest_entry

I have a model called aps
class aps(model.Model):
u=models.ForeignKey(User, related_name='u')
when=models.DateTimeField() #datetime when row was inserted
a=models.ForeignKey(User, related_name='a')
a_read=models.BooleanField()
a_last_read=models.DateTimeField()
The records from aps are retrieved like this:
def displayAps(request, name)
apz=aps.objects.order_by('-when').filter(u=User.objects.get(username=name))
return render_to_response(template.html, {'apz':apz})
And further they are shown in template.html ...
What I am trying to achieve is actually what django docz mention here about conditional view processing
I do something like:
def latest_entry(request, name):
return aps.objects.filter().latest('when').when
#cache_page(60*15)
#last_modified(latest_entry)
def displayAps(request, name)
apz=aps.objects.order_by('-when').filter(u=User.objects.get(username=name))
return render_to_response(template.html, {'apz':apz})
If new rows are added the content is not modified, but retrieved from cache. I need to delete the cache files and 'shift refresh' the browser to see the new rows.
Anyone sees what I am doing wrong here?
If you take out the cache_page decorator, does it work?
The cache_page decorator is the first piece of code that actually gets called in your view function. It is checking the timestamp, and returning the cached data, as it is supposed to. If the cache hasn't expired, the last_modified decorator won't ever be called.
You probably want to be more careful about mixing conditional response processing and static caching, anyway. They accomplish similar things, but using very different mechanisms.
cache_page tells django to only use the view to render an actual response every n seconds. If another request comes in before that, the same rendered content will be returned to the client -- whether it is actually stale or not. This reduces your server load, but doesn't do anything to reduce your bandwidth.
last_modified handles the case where the client says "I have a version of this page that is this old; is it still good?" In that case, your server can check the database, and return a very short "It's still good" response if the database hasn't changed. This cuts down your bandwidth needs significantly for those cases, but you still need to go to the database to determine whether the client's cache is stale or not, so your server load may be almost the same.
Like I mentioned above, you can't just apply cache_page before last_modfied -- if the database has changed, cache_page won't know about it. Worse, if the cache timeout has expired, but the database hasn't changed, then you might end up caching the "304 Not modified" message, and sending that for all subsequent visitors for the next fifteen minutes.
You could apply the decorators in the other order, but you have to make a request from the database for each request, and you could still get into a situation where the database has changed, but the cache hasn't expired -- in that case, the client could still be getting the old version of the page, even though the server has already hit the database to determine that it has been updated.
The filter depends on some logged user or something, you should id this user in cookies and use django's vary_on_cookie.

DJANGO persistant site wide memory

I am new to Django, and probably using it in a way thats not normal.
That said, I would like to find a way to have site wide memory.
To Explain.
I have a very simple setup where one compter will make posts to the site every few seconds.
I want this data to be saved off somewhere.
I want everyone who is viewing the webpage to see updates based on this data in near real time via some javascript.
So using the sample code below.
Computer A would do a post to set_data and set data to "data set"
Computer B,C,D,etc.... would then do a get to get_data and see "data set"
Unfortunatly B,C,D just see ""
I have a feeling what i need is memcached, but I am on a hostgator shared server and cannot install that. In the meantime I am just writing them to files. This works but is really inneficient, and I am hopeing to serve a large user base.
Thanks for any help.
#view.py
data=""
def set_data(request):
data = request.POST['data']
return HttpResponse("");
def get_data(request):
return HttpResponse(data);
memcached is lossy, hence doesn't fulfil "persistent".
Files are fine, but switch to accessing them via mmap.
Persistent storage is also called database (although for some cases Django's cache backend might work as well). Don't ever try to use global variables in web development.
Whether you should use a Django model or the cache backend really depends on your use case, but you just described a contrived example (or does your web app consist of a getter and a setter?).