Placement of standalone script and using django_rq - django

I need to create a standalone script which accesses the database, fetches data from a table and processes it and stores it into another table. I am also using django-rq in order to run this script.
Where should place this script in django project structure?
Without using views.py, how should I run this script using django-rq?

If I understand your case correctly, I would use custom management commands https://docs.djangoproject.com/en/1.7/howto/custom-management-commands/

In one of your views import the scripts' functions and django-rq and continue with your processing in the view.

I just worked this same issue, One added variable was that I wanted to run a job every hour, so in addition to Django-RQ I am also using RQ-Scheduler.
In this approach, you schedule a function call in the job you create
scheduler.schedule(
scheduled_time=datetime.utcnow(), # Time for first execution, in UTC timezone
func=func, # Function to be queued
args=[arg1, arg2], # Arguments passed into function when executed
kwargs={'foo': 'bar'}, # Keyword arguments passed into function when executed
interval=60, # Time before the function is called again, in seconds
repeat=10 # Repeat this number of times (None means repeat forever)
)
I created a module my_helpers.py in the root of my Django Project which has functions that do the work I want and which scheduled the task as I needed. Then in a separate shell python manage.py shell I import the helpers and run my function to scheduled the task.
I hope that helps, its working for me.

Related

How to load data outside Django view?

If i have some Django views like 'first', 'second' and so on, and i want to load some data outside this views but use it inside views.
Here is an example to understand my idea.
#function execution time is long, that is why i want to laod it only once when run programm.
fetched_data_from_postgres = some function which load data from postgres and return list with postgres data
def first(request):
#and the i want to use that previous value in all my views
fetched_data_from_postgres = do something
return HttpResponse("first VIEW")
def second(request):
#and the i want to use that previous value in all my views
fetched_data_from_postgres = do something
return HttpResponse("secondVIEW")
def third(request):
#and the i want to use that previous value in all my views
fetched_data_from_postgres = do something
return HttpResponse("third VIEW")
this aproach working well when i run my django project like this python manage.py runserver but when i run with gunicorn or wsgi when i can specify workers count then when worker changes then this variable is lost and need to refresh page to get this previous worker to get that data. It's ridiculous.
Or maybe there is some other aproach to do this job?
This approach doesn't work when you start project locally with python manage.py runserver, it only works once, when project starts, and then you need to reload the project every time. This is because everything in any_views.py loads only once, when project starts, except your functions. So you have to restart your project, in order to refresh your fetched_data_from_postgres variable.
The better approach is to create a script fetch_script.py, move your function some function which load data from postgres and return list with postgres data inside it, and call it inside of views.py, not outside.
The best solution to avoid loading your data several times and to prevent this function fetched_data_from_postgres to run twice is to use a cache framework alongside Django. Check the documentation for more details:
https://docs.djangoproject.com/en/stable/topics/cache/
It's relatively easy to set up and it should perfectly address your problem. If you feel it like overkill, then the question is do you really need speed? Are you sure you're not trying to prematurely optimize?

Delete row from database when date passes

In a database, i have a field called date. Is there a way to delete a row when the date passes, so that it doesnt show up anymore? Ive tried comparing it to todays date in the view, but this wouldnt happen everyday, and people would still see it on the first page load. Any ideas?
Removing something from your database is not safe for many reasons. Starting from permissions going to on_delete logic. If you are not sure about that it's totally required to delete something, just mark this row as active=false.
I would not recomend to use cron, since it hard to maintain: you have to set different tasks on different environments manually, copy these files somewhere on your VCS, work with bash instead of python.
Also, when talking about events, I would not recommend to store something like this in your database, since it is not controlled by VCS and hard to maintain.
If your app is pretty simple schedule is an option.
But if you are looking for some extra info like:
What rows were deleted?
Were there any exceptions?
You can move to more complex Celery with Beat turned on. Extra dependencies (like Redis, RabbitMQ) are the main disadvantage.
Docs:
celery beat
Related:
How do I get a Cron like scheduler in Python?
I believe the best way would be to use a Cron Job or to use a additional conditional in the view to show only rows after the said date.
I would recommend you use a mysql event, since this will run constantly, unlike triggers that are only fired on database operations. You want this to occur outside of anything happening in the application, just based on time, so mysql event will work for this scenario. See full tutorial here: http://www.sitepoint.com/working-with-mysql-events/
I had a easier approach, i guess you could call it "hard-coded". I made a function called deleteevent, which had the following code
def deleteevent():
yesterday = date.today() - timedelta(1)
if Events.objects.filter(event_date = yesterday).count():
Events.objects.filter(event_date = yesterday).delete()
Then, in every other function i had, i called this at the beginning, so the event would be deleted before the page loaded

How to detect when my Django object's DataTimeField reach current time

I'm using Django 1.5.5.
Say I have an object as such:
class Encounter(model.Models):
date = models.DateTimeField(blank=True, null=True)
How can I detect when a given Encounter has reached current time ? I don't see how signals can help me.
You can't detect it using just Django.
You need some scheduler, that will check every Encounters date (for example, by using corresponding filter query), and do needed actions.
It can be a simple cron script. You can write it as django custom management command. And cron must call it every 5 minute, for example.
Or, you can use Celery. With it, you can see worker status from admin and do some other things.
What you could do is use Celery. When you save an object of Encounter this would then get into the task queue and execute only once it has reached current time.
There is one caveat though, it might execute a bit later depending on how busy the celery workers are.

How to time Django queries

I've always used Python's timeit library to time my little Python programs.
Now I'm developing a Django app and I was wondering how to time my Django functions, especially queries.
For example, I have a def index(request) in my views.py which does a bunch of stuff when I load the index page.
How can I use timeit to time this particular function without altering too much my existing functions?
if your django project is in debug, you can see your database queries (and times) using:
>>> from django.db import connection
>>> connection.queries
I know this won't satisfy your need to profile functions, but hope it helps for the queries part!
The debug toolbar is what you want, it helps you time each of your queries.
Alternatively this snippet works too.
http://djangosnippets.org/snippets/93/
The best way you can get is by using Debug Toolbar, you will also get some additional functionalities for Query optimization, which will help you to optimize your db query.
Here is another solution, You can use connection.queries. This will return the SQL command has been made for the command which was executed just before the connect.queries command. You can the reset_queries after getting the time of the previous query by using reset_queries(). Using reset_queries() is not mandatory.
Suppose you have a Model named Device. You can measure the query time like this:
>>> from django.db import connection, reset_queries
>>> from appname.models import Device
>>> devices = Device.objects.all()
>>> connection.queries
>>> reset_queries()
Anyone stumbling on to this checkout Sentry's approach.
https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/integrations/django/__init__.py#L476
You can replace execute and executemany with your owns functions that track the time it takes for execute to return.
A simple approach is to create custom context manager that initiates a timer and on exit writes final value of the timer to an array you pass to it.
Then you can just check the array.

How to unittest Session timeout in Django

I have a requirement something like this:
As soon as the user signsup(and will be in the waiting state untill he confirms his email address), a session variable is set something like "FIRST_TIME_FREE_SESSION_EXPIRY_AGE_KEY"(sorry if the name sounds confusing!) which will be set to a datetime object adding 8hrs to the current time.
How this should impact the user is, the user gets 8hrs time to actually use all the features of our site, without confirming his signedup email address. After 8hrs, every view/page will show a big banner telling the user to confirm. (All this functionality is achieved using a single "ensure_confirmed_user" decorator for every view).
I want to test the same functionality using django's unittest addon(TestCase class). How do I do it?
Update: Do I need to manually update the mentioned session variable value(modified 8hrs to a few seconds) so as to get it done? Or any better way is there?
Update: This may sound insane, but I want to simulate a request from the future.
Generally, if unit testing is difficult because the product code depends on external resources that won't cooperate, you can abstract away those resources and replace them with dummies that do what you want.
In this case, the external resource is the time. Instead of using datetime.now(), refactor the code to accept an external time function. It can default to datetime.now. Then in your unit tests, you can change the time as the test progresses.
This is better than changing the session timeout to a few seconds, because even then, you have to sleep for a few seconds in the test to get the effect you want. Unit tests should run as fast as you can get them to, so that they will be run more often.
I can think of a couple of possibilities. During your test run, override the FIRST_TIME_FREE_SESSION_EXPIRY_AGE_KEY variable and set it to a smaller time limit. You can then wait until that time limit is over and verify if the feature is working as expected.
Alternately replace your own datetime functions (assuming your feature relies on datetime)
You can accomplish these by overriding setup_ and teardown_test_environment methods.
My settings.py differs slightly depending on whether django is run in a production environment or in a development environment. I have 2 settings modules: settings.py and settings_dev.py. The development version looks as follows:
from settings import *
DEBUG = True
INSTALLED_APPS = tuple(list(INSTALLED_APPS) + [
'dev_app',
])
Now you can solve your problem in different ways:
add the variable with different values to both settings modules;
Where you set the variable, choose between two values according to the value of the DEBUG setting. You can also use DEBUG to omit the unit test when on the production server, because the test will probably take too long there anyway.
You can use the active settings module like this:
from django.conf.project_template import settings
if settings.DEBUG:
...