Django: Getting Django-cron Running - django

I am trying to get Django-cron running, but it seems to only run after hitting the site once. I am using Virtualenv.
Any ideas why it only runs once?
On my PATH, I added the location of django_cron: '/Users/emilepetrone/Workspace/zapgeo/zapgeo/django_cron'
My cron.py file within my Django app:
from django_cron import cronScheduler, Job
from products.views import index
class GetProducts(Job):
run_every = 60
def job(self):
index()
cronScheduler.register(GetProducts)
class GetLocation(Job):
run_every = 60
def job(self):
index()
cronScheduler.register(GetLocation)

The first possible reason
There is a variable in django_cron/base.py:
# how often to check if jobs are ready to be run (in seconds)
# in reality if you have a multithreaded server, it may get checked
# more often that this number suggests, so keep an eye on it...
# default value: 300 seconds == 5 min
polling_frequency = getattr(settings, "CRON_POLLING_FREQUENCY", 300)
So, the minimal interval of checking for time to start your task is polling_frequency. You can change it by setting in settings.py of your project:
CRON_POLLING_FREQUENCY = 100 # use your custom value in seconds here
To start a job hit your server at least one time after starting Django web server.
The second possible reason
Your job has an error and it is not queued (queued flag is set to 'f' if your job raises an exception). In this case it stores in field 'queued' of table 'django_cron_job' string value 'f'. You can test it making the query:
select queued from django_cron_job;
If you change the code of your job the field may stay as 'f'. So, if you correct the error of your job you should manually set in queued field: 't'. Or the flag executing in the table django_cron_cron is 't'. It means that your app. server was stopped when your task was in progress. In this case you should manually set it into 'f'.

Related

Django: Script that executes many queries runs massively slower when executed from Admin view than when executed from shell

I have a script that loops through the rows of an external csv file (about 12,000 rows) and executes a single Model.objects.get() query to retrieve each item from the database (final product will be much more complicated but right now it's stripped down to the barest functionality possible to try to figure this out).
For right now the path to the local csv file is hardcoded into the script. When I run the script through the shell using py manage.py runscript update_products_from_csv it runs in about 6 seconds.
The ultimate goal is to be able to upload the csv through the admin and then have the script run from there. I've already been able to accomplish that, but the runtime when I do it that way takes more like 160 seconds. The view for that in the admin looks like...
from .scripts import update_products_from_csv
class CsvUploadForm(forms.Form):
csv_file = forms.FileField(label='Upload CSV')
#admin.register(Product)
class ProductAdmin(admin.ModelAdmin):
# list_display, list_filter, fieldsets, etc
def changelist_view(self, request, extra_context=None):
extra_context = extra_context or {}
extra_context['csv_upload_form'] = CsvUploadForm()
return super().changelist_view(request, extra_context=extra_context)
def get_urls(self):
urls = super().get_urls()
new_urls = [path('upload-csv/', self.upload_csv),]
return new_urls + urls
def upload_csv(self, request):
if request.method == 'POST':
# csv_file = request.FILES['csv_file'].file
# result_string = update_products_from_csv.run(csv_file)
# I commented out the above two lines and added the below line to rule out
# the possibility that the csv upload itself was the problem. Whether I execute
# the script using the uploaded file or let it use the hardcoded local path,
# the results are the same. It works, but takes more than 20 times longer
# than executing the same script from the shell.
result_string = update_products_from_csv.run()
print(result_string)
messages.success(request, result_string)
return HttpResponseRedirect(reverse('admin:products_product_changelist'))
Right now the actual running parts of the script are about as simple as this...
import csv
from time import time
from apps.products.models import Product
CSV_PATH = 'path/to/local/csv_file.csv'
def run():
csv_data = get_csv_data()
update_data = build_update_data(csv_data)
update_handler(update_data)
return 'Finished'
def get_csv_data():
with open(CSV_PATH, 'r') as f:
return [d for d in csv.DictReader(f)]
def build_update_data(csv_data):
update_data = []
# Code that loops through csv data, applies some custom logic, and builds a list of
# dicts with the data cleaned and formatted as needed
return update_data
def update_handler(update_data):
query_times = []
for upd in update_data:
iter_start = time()
product_obj = Product.objects.get(external_id=upd['external_id'])
# external_id is not the primary key but is an indexed field in the Product model
query_times.append(time() - iter_start)
# Code to export query_times to an external file for analysis
update_handler() has a bunch of other code checking field values to see if anything needs to be changed, and building the objects when a match does not exist, but that's all commented out right now. As you can see, I'm also timing each query and logging those values. (I've been dropping time() calls in various places all day and have determined that the query is the only part that's noticeably different.)
When I run it from the shell, the average query time is 0.0005 seconds and the total of all query times comes out to about 6.8 seconds every single time.
When I run it through the admin view and then check the queries in Django Debug Toolbar it shows the 12,000+ queries as expected, and shows a total query time of only about 3900ms. But when I look at the log of query times gathered by the time() calls, the average query time is 0.013 seconds (26 times longer than when I run it through the shell), and the total of all query times always comes out at 156-157 seconds.
The queries in Django Debug Toolbar when I run it through the admin all look like SELECT ••• FROM "products_product" WHERE "products_product"."external_id" = 10 LIMIT 21, and according to the toolbar they are mostly all 0-1ms. I'm not sure how I would check what the queries look like when running it from the shell, but I can't imagine they'd be different? I couldn't find anything in django-extensions runscript docs about it doing query optimizations or anything like that.
One additional interesting facet is that when running it from the admin, from the time I see result_string print in the terminal, it's another solid 1-3 minutes before the success message appears in the browser window.
I don't know what else to check. I'm obviously missing something fundamental, but I don't know what.
Somebody on Reddit suggested that running the script from the shell might be automatically spinning up a new thread where the logic can run unencumbered by the other Django server processes, and this seems to be the answer. If I run the script in a new thread from the admin view, it runs just as fast as it does when I run it from the shell.

graphene django with redis does not work?

I have a resolver and I gave it a key to save into django-redis, I can see the key and value inside redis but somehow the loading time is still the same.
If I am doing a regular rest it will work fine but somehow graphql doesn't seem to be working properly?
def resolve_search(self, info, **kwargs):
redis_key = 'graphl_search'
if cache.keys(redis_key): # check if key exists, if do render the value
print('using redis') # this will run the second time since key exists for 3600 seconds
return cache.get(redis_key)
# set redis
print('set redis') # this will run the first time since they key does not exist yet
my_model = Model.objects.filter(slug=kwargs.get('slug')).first()
cache.set(redis_key, my_model, timeout=3600)
return my_model
works properly and doesn't go into the rest of the block if key exists, but when I checked the time it would be the same.
1st time (5xx ms)
2nd time (5xx ms)
Am I doing something wrong or that is now how to should use redis with graphql?
Thanks in advance for any help or suggestions

flask how to keep database queries references up to date

I am creating a flask app with two panels one for the admin and the other is for users. In the app scheme I have a utilities file where I keep most of the redundant variables besides other functions, (by redundant i mean i use it in many different parts of the application)
utilities.py
# ...
opening_hour = db_session.query(Table.column).one()[0] # 10:00 AM
# ...
The Table.column or let's say the opening_hour variable's value above is entered to the database by the admin though his/her web panel. This value limits the users from accessing certain functionalities of the application before the specified hour.
The problem is:
If the admin changes that value through his/her web panel, let's say to 11:00 AM. the changes is not being shown directly in the users panel."even though it was entered to the database!".
If I want the new opening_hour's value to take control. I have to manually shutdown the app and restart it "sometimes even this doesn't work"
I have tried adding gc.collect()...did nothing. There must be a way around this other than shutting and restarting the app manually. first, I doubt the admin will be able to do that. second, even if he/she can, that would be really frustrating.
If someone can relate to this please explain why is this occurring and how to get around it. Thanks in advance :)
You are trying to add advanced logic to a simple variable: You want to query the DB only once, and periodically force the variable to update by re-loading the module. That's not how modules and the import mechanism is supposed to be used.
If you want to access a possibly changing value from the database, you have to read it over and over again.
The solution is to, instead of a variable, define a function opening_hours that executes the DB query every time you check the value
def opening_hours():
return (
db_session.query(Table.column).one()[0], # 10:00 AM
db_session.query(Table.column).one()[1] # 5:00 PM
)
Now you may not want to have to query the Database every time you check the value, but maybe cache it for a few minutes. The easiest would be to use cachetools for that:
import cachetools
cache = cachetools.TTLCache(maxsize=10, ttl=60) # Cache for 60 seconds
#cachetools.cached(cache)
def opening_hours():
return (
db_session.query(Table.column).one()[0], # 10:00 AM
db_session.query(Table.column).one()[1] # 5:00 PM
)
Also, since you are using Flask, you can create a route decorator that controls access to your views depending on the view of the day
from datetime import datetime, time
from functools import wraps
from flask import g, request, render_template
def only_within_office_hours(f):
#wraps(f)
def decorated_function(*args, **kwargs):
start_time, stop_time = opening_hour()
if start_time <= datetime.now().time() <= stop_time:
return render_template('office_hours_error.html')
return f(*args, **kwargs)
return decorated_function
that you can use like
#app.route('/secret_page')
#login_required
#only_within_office_hours
def secret_page():
pass

Multi-DB Transactions

Django Version 1.10.5 with Postgres 9.6.1
For the last year I've been working in a multi-schema default database environment. However things are beginning to grow to the point I've decided to split the single database into 3 databases.
I've got things working with a master/slave router for all 3 databases.
I am not using the 'default' database key. Instead I have 'db1', 'db2', and 'db3'
The part I am confused about is with transactions in this multi-database environment.
In this example it fails as expected. Caused of course by not using #transaction.atomic(using='db1') which is clear to me.
#transaction.atomic()
def edit(self, context):
"""Edit
:param dict context: Context
:return: None
"""
# Check if employee exists
try:
result = Passport.objects.get(pk=self.user.employee_id)
except Passport.DoesNotExist:
return False
result.name = context.get('name')
result.save()
However I have this strange example, simply because I'm trying to understand... I would have expected this to fail but it does not:
#transaction.atomic(using='db1')
def edit(self, context):
"""Edit
:param dict context: Context
:return: None
"""
# Check if employee exists
try:
result = Passport.objects.get(pk=self.user.employee_id)
except Passport.DoesNotExist:
return False
result.name = context.get('name')
with transaction.atomic(using='db2'):
result.save()
The model Passport does not exist in DB2 models at all.
My router is setup so that all writes go to each respected DB.
So what is the purpose of setting the using='db1' in the atomic transaction? I've looked at the source and I see it defaults to default when not "using".
In the above example I even made another transaction inside of the initial transaction but this time using='db2' where the model doesn't even exist. I figured that would have failed, but it didn't and the data was written to the proper database.
I bring this up because there will be situations where I need to interact with all 3 databases and if a single problem occurs when writing to all 3 databases, all 3 need to be rolled back or if on success of everything, then committed of course.
Perhaps someone can help break this down for me so I can understand?
You're interpreting transaction.atomic(using='X') to mean: run the following database commands on X, inside a transaction.
In fact, it just means: open a transaction on database X, and then either commit it or roll it back at the end of the block.
Or, as the documentation puts it:
Under the hood, Django’s transaction management code:
opens a transaction when entering the outermost atomic block;
commits or rolls back the transaction when exiting the outermost block.
The question of which database to use for a given command is determined by your router, not the using clause. So your transaction.atomic(using='db2') block is pointless (it will simply open a transaction on db2 and then close it), but not an error.

Django Celery reduce time, 5 hours to complete 1000 tasks

I'm running on a development environment so this maybe different in production, but when I run a task from Django Celery, it seems to only fetch tasks from the broker every 10-20 seconds. I'm only testing at this point but lets say I'm sending around 1000 tasks this means it will take over 5 hours+ to complete.
Is this is normal? Should it be quicker? Or I'm I doing something wrong?
This is my task
class SendMessage(Task):
name = "Sending SMS"
max_retries = 10
default_retry_delay = 3
def run(self, message_id, gateway_id=None, **kwargs):
logging.debug("About to send a message.")
# Because we don't always have control over transactions
# in our calling code, we will retry up to 10 times, every 3
# seconds, in order to try to allow for the commit to the database
# to finish. That gives the server 30 seconds to write all of
# the data to the database, and finish the view.
try:
message = Message.objects.get(pk=message_id)
except Exception as exc:
raise SendMessage.retry(exc=exc)
if not gateway_id:
if hasattr(message.billee, 'sms_gateway'):
gateway = message.billee.sms_gateway
else:
gateway = Gateway.objects.all()[0]
else:
gateway = Gateway.objects.get(pk=gateway_id)
#response = gateway._send(message)
print(message_id)
logging.debug("Done sending message.")
which gets run from my view
for e in Contact.objects.filter(contact_owner=request.user etc etc):
SendMessage.delay(e.id, message)
Yes this is normal. This is the default workers to be used. They set this default so that it will not affect the performance of the app.
There is another way to change it. The task decorator can take a number of options that change the way the task behaves. Any keyword argument passed to the task decorator will actually be set as an attribute of the resulting task class.
You can set the rate limit which limits the number of tasks that can be run in a given time frame.
//means hundred tasks a minute, another /s (second) and /h (hour)
CELERY_DEFAULT_RATE_LIMIT = "100/m" --> set in settings