How to load data outside Django view? - django

If i have some Django views like 'first', 'second' and so on, and i want to load some data outside this views but use it inside views.
Here is an example to understand my idea.
#function execution time is long, that is why i want to laod it only once when run programm.
fetched_data_from_postgres = some function which load data from postgres and return list with postgres data
def first(request):
#and the i want to use that previous value in all my views
fetched_data_from_postgres = do something
return HttpResponse("first VIEW")
def second(request):
#and the i want to use that previous value in all my views
fetched_data_from_postgres = do something
return HttpResponse("secondVIEW")
def third(request):
#and the i want to use that previous value in all my views
fetched_data_from_postgres = do something
return HttpResponse("third VIEW")
this aproach working well when i run my django project like this python manage.py runserver but when i run with gunicorn or wsgi when i can specify workers count then when worker changes then this variable is lost and need to refresh page to get this previous worker to get that data. It's ridiculous.
Or maybe there is some other aproach to do this job?

This approach doesn't work when you start project locally with python manage.py runserver, it only works once, when project starts, and then you need to reload the project every time. This is because everything in any_views.py loads only once, when project starts, except your functions. So you have to restart your project, in order to refresh your fetched_data_from_postgres variable.
The better approach is to create a script fetch_script.py, move your function some function which load data from postgres and return list with postgres data inside it, and call it inside of views.py, not outside.

The best solution to avoid loading your data several times and to prevent this function fetched_data_from_postgres to run twice is to use a cache framework alongside Django. Check the documentation for more details:
https://docs.djangoproject.com/en/stable/topics/cache/
It's relatively easy to set up and it should perfectly address your problem. If you feel it like overkill, then the question is do you really need speed? Are you sure you're not trying to prematurely optimize?

Related

How to create separate python script for uploading data into ndb

Can anyone guide me towards the right direction as to where I should place a script solely for loading data into ndb. As I wish to upload all data into the gae ndb so that the application could perform query on it.
Right now, the loading of data is in my application. I wish to placed it separately from the main application.
Should it be edited in the yaml file?
EDITED
This is a snippet of the entity and the handler to upload the data into GAE ndb.
I wish to placed this chunk of code separately from my main application .py. Reason being the uploading of this data won't be done frequently and to keep the codes in the main application "cleaner".
class TagTrend_refine(ndb.Model):
tag = ndb.StringProperty()
trendData = ndb.BlobProperty(compressed=True)
class MigrateData(webapp2.RequestHandler):
def get(self):
listOfEntities = []
f = open("tagTrend_refine.txt")
lines = f.readlines()
f.close()
for line in lines:
temp = line.strip().split("\t")
data = TagTrend_refine(
tag = temp[0],
trendData = temp[1]
)
listOfEntities.append(data)
ndb.put_multi(listOfEntities)
For example if I placed the above code in a file called dataLoader.py, where should I call this script to invoke?
In app.yaml alongside my main application(knowledgeGraph.application)?
- url: /.*
script: knowledgeGraph.application
You don't show us the application object (no doubt a WSGI app) in your knowledge.py module, so I can't know what URL you want to serve with the MigrateData handler -- I'll just guess it's something like /migratedata.
So the class TagTrend_refine should be in a separate file (usually called models.py) so that both your dataloader.py, and your knowledge.py, can import models to access it (and models.py will need its own import of ndb of course). (Then of course access to the entity class will be as models.TagTrend_refine -- very basic Python).
Next, you'll complete dataloader.py by defining a WSGI app, e.g, at end of file,
app = webapp2.WSGIApplication(routes=[('/migratedata', MigrateData)])
(of course this means this module will need to import webapp2 as well -- can I take for granted a knowledge of super-elementary Python?).
In app.yaml, as the first URL, before that /.*, you'll have:
url: /migratedata
script: dataloader.app
Given all this, when you visit '/migratedata', your handler will read the "tagTrend_refine.txt" file that you uploaded together with your .py, .yaml, and so on, files in your overall GAE application, and unconditionally create one entity per line of that file (assuming you fix the multiple indentation problems in your code as displayed above, but, again, this is just super-elementary Python -- presumably you've used both tabs and spaces and they show up OK in your editor, but not here on SO... I recommend you use strictly, only spaces, never tabs, in Python code).
However this does seem to be a peculiar task. If /migratedata gets visited twice, it will create duplicates of all entities. If you change the tagTrend_refine.txt and deploy a changed variation, then visit /migratedata... all old entities will stick around and all the new entities will join them. And so forth.
Moreover -- /migratedata is NOT idempotent (if visited more than once it does not produce the same state as running it just once) so it shouldn't be a GET (and now we're on to super-elementary HTTP for a change!-) -- it should be a POST.
In fact I suspect (but I'm really flying blind here, since you see fit to give such tiny amounts of information) that you in fact want to upload a .txt file to a POST handler and do the updates that way (perhaps avoiding duplicates...?). However, I'm no mind reader, so this is about as far as I can go.
I believe I have fully answered the question you posted (though perhaps not the one you meant but didn't express:-) and by SO's etiquette it would be nice to upvote and accept this answer, then, if needed, post another question, expressing MUCH more clearly and completely what you're trying to achieve, your current .py and .yaml (ideally with correct indentation), what they actually do and why you'd like to do something different. For POST vs GET in particular, just study When should I use GET or POST method? What's the difference between them? ...
Alex's solution will work, as long as all you data can be loaded in under 1 minute, as that's the timeout for an app engine request.
For larger data, consider calling the datastore API directly from your own computer where you have the source. It's a bit of a hassle because it's a different API; it's not ndb. But it's still a pretty simple API. Here's some code that calls the API:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/2-structured-data/bookshelf/model_datastore.py
Again, this code can run anywhere. It doesn't need to be uploaded to app engine to run.

Delete row from database when date passes

In a database, i have a field called date. Is there a way to delete a row when the date passes, so that it doesnt show up anymore? Ive tried comparing it to todays date in the view, but this wouldnt happen everyday, and people would still see it on the first page load. Any ideas?
Removing something from your database is not safe for many reasons. Starting from permissions going to on_delete logic. If you are not sure about that it's totally required to delete something, just mark this row as active=false.
I would not recomend to use cron, since it hard to maintain: you have to set different tasks on different environments manually, copy these files somewhere on your VCS, work with bash instead of python.
Also, when talking about events, I would not recommend to store something like this in your database, since it is not controlled by VCS and hard to maintain.
If your app is pretty simple schedule is an option.
But if you are looking for some extra info like:
What rows were deleted?
Were there any exceptions?
You can move to more complex Celery with Beat turned on. Extra dependencies (like Redis, RabbitMQ) are the main disadvantage.
Docs:
celery beat
Related:
How do I get a Cron like scheduler in Python?
I believe the best way would be to use a Cron Job or to use a additional conditional in the view to show only rows after the said date.
I would recommend you use a mysql event, since this will run constantly, unlike triggers that are only fired on database operations. You want this to occur outside of anything happening in the application, just based on time, so mysql event will work for this scenario. See full tutorial here: http://www.sitepoint.com/working-with-mysql-events/
I had a easier approach, i guess you could call it "hard-coded". I made a function called deleteevent, which had the following code
def deleteevent():
yesterday = date.today() - timedelta(1)
if Events.objects.filter(event_date = yesterday).count():
Events.objects.filter(event_date = yesterday).delete()
Then, in every other function i had, i called this at the beginning, so the event would be deleted before the page loaded

Placement of standalone script and using django_rq

I need to create a standalone script which accesses the database, fetches data from a table and processes it and stores it into another table. I am also using django-rq in order to run this script.
Where should place this script in django project structure?
Without using views.py, how should I run this script using django-rq?
If I understand your case correctly, I would use custom management commands https://docs.djangoproject.com/en/1.7/howto/custom-management-commands/
In one of your views import the scripts' functions and django-rq and continue with your processing in the view.
I just worked this same issue, One added variable was that I wanted to run a job every hour, so in addition to Django-RQ I am also using RQ-Scheduler.
In this approach, you schedule a function call in the job you create
scheduler.schedule(
scheduled_time=datetime.utcnow(), # Time for first execution, in UTC timezone
func=func, # Function to be queued
args=[arg1, arg2], # Arguments passed into function when executed
kwargs={'foo': 'bar'}, # Keyword arguments passed into function when executed
interval=60, # Time before the function is called again, in seconds
repeat=10 # Repeat this number of times (None means repeat forever)
)
I created a module my_helpers.py in the root of my Django Project which has functions that do the work I want and which scheduled the task as I needed. Then in a separate shell python manage.py shell I import the helpers and run my function to scheduled the task.
I hope that helps, its working for me.

Django doesn't read from database – no error

I just set up the environment for an existing Django project, on a new Mac. I know for certain there is nothing wrong with the code itself (just cloned the repo), but for some reason, Django can't seem to retrieve data from the database.
I know the correct tables and data is in the db.
I know the codebase is as it should be.
I can make queries using the Django shell.
Django doesn't throw any errors despite the data missing on the web page.
I realize that it's hard to debug this without further information, but I would really appreciate a finger pointing me to the right direction. I can't seem to find any useful logs.
EDIT:
I just realized the problem lies elsewhere. Unfortunately I can't delete this post with the bounty still open.
Without seeing any code, I can only suggest some general advice that might help you debug your problem. Please add a link to your repository if you can or some snippets of your database settings, the view which includes the database queries etc...
Debugging the view
The first thing I would recommend is using the python debugger inside the view which queries the database. If you've not used pdb before, it's a life saver which allows you to set breakpoints in your Python script and then interactively execute code inside the interpreter
>>> import pdb
>>> pdb.set_trace()
>>> # look at the results of your queries
If you are using the Django ORM, the QuerySet returned from the query should have all the data you expect.
If it doesn't then you need to look into your database configuration in settings.py.
If it does, then you must might not be returning that object to the template? Unlikely as you said the code was the same, but double check the objects you pass with your HttpResponse object.
Debugging the database settings
If you can query the database using the project settings inside settings.py from the django shell it sounds unlikley that there is a problem with this - but like everything double check.
You said that you've set up a new project on a mac. What is on a different operating system before? Maybe there is a problem with the paths now - to make your project platform independent remember to use the os.path.join() method when working with file paths.
And what about the username and password details....
Debugging the template
Maybe your template is referencing the wrong object variable name or object attribute.You mentioned that
Django doesn't throw any errors despite the data missing on the web
page.
This doesn't really tell us much - to quote the Django docs -
If you use a variable that doesn’t exist, the template system will
insert the value of the TEMPLATE_STRING_IF_INVALID setting, which is
set to '' (the empty string) by default.
So to check all the variables available to your template, you could use the debug template tag
{{ debug }}
Probably even better though is to use the django-debugging-toolbar - this will also let you examine the SQL queries your view is making.
Missing Modules
I would expect this to raise an exception if this were the problem, but have you checked that you have the psycopg module on your new machine?

Storing important singular values in Django

So I'm working on a website where there are a couple important values that get used in various places throughout the site. For example, certain important dates, like the start and end dates for registration.
One way I can do this is making a model that stores these values, but that sounds like overkill (since I'd only have one instance). Another way is to store these values in the settings.py file, but if I wanted to change them, it seems like I would need to restart the webserver for them to take effect. I was wondering what would be the best practice in Django to handle this kind of stuff.
You can store them in settings.py. While there is nothing wrong with this (you can even organize your settings into multiple different files, if you have to many custom settings), you're right that you cannot change these at runtime.
We were solving the same problem where I work and came up with a simple app called django-constance (you can get it from github at https://github.com/comoga/django-constance). What this lets is store your settings in a settings.py, but once you need to turn them into settings configurable at runtime, you can switch to a Redis data store with django admin frontend. You can even use the value from settings as your default. I suggest you try this app out.
The changes to your code are pretty minimal, as pasted from docs you initialize your dynamic settings like this:
CONSTANCE_CONFIG = {
'MY_SETTINGS_KEY': (42, 'the answer to everything'),
}
And then instead of importing settings from django conf, you do this:
from constance import config
if config.MY_SETTINGS_KEY == 42:
answer_the_question()
If you want a specific set of variables available to all of your template, what you are looking for is Context Processors.
http://docs.djangoproject.com/en/dev/ref/templates/api/#writing-your-own-context-processors
More links
http://www.b-list.org/weblog/2006/jun/14/django-tips-template-context-processors/
http://blog.madpython.com/2010/04/07/django-context-processors-best-practice/
The code for your context processors, can live anywhere in your project. You just have to add it to your settings.py under:
TEMPLATE_CONTEXT_PROCESSORS =
You could keep the define your constants in your settings.py or even under a constants.py and just
from constants import *
However as you mentioned, you would need to reload your server each time the settings are updated. I think you first need to figure out how often will you be changing these settings? Is it worth the extra effort to be able to reload the settings automatically?
If you wanted to automatically enable the settings, each time they are updated you could do the following:
Store settings in the DB
Upon save/change, write output to a file
settings.py / constants.py reads files
reload server
In addition, you have a look at the mezzanine project which allows you to update settings from the django admin interface and will reload as well.
See: http://mezzanine.jupo.org/docs/configuration.html
If the variables you need will be updated infrequently, i suggest just store them in settings.py and add a custom context processor.
If you are using source control such as GIT, updating will be quite easy, you can just update the file and push to your server. For really simple reloading of the server you could also create a post-recieve hook for git that will automatically reload the server when new code is pushed.
I would only suggest the other option if you are updating settings fairly regularly.