django cron job cannot get model query - django

I am using django-crontab and the following cron job is working well:
the following cron job is added by
python manage.py crontab add
settings.py
CRONTAB_COMMAND_SUFFIX = '2>&1'
CRONJOBS = [
('*/1 * * * *', 'my_app.cron.test','>> ~/cron_job.log'),
]
my_app/cron.py
from datetime import datetime
def test():
print('HELLO : {}'.format(datetime.now()))
and once the server runs, it prints out to the log file:
~/cron_job.log
>...
>HELLO : 2018-01-04 23:52:02.983604
>...
same thing if I want to add a query for all my models :
my_app/cron.py
from datetime import datetime
from django.apps import apps
def test():
print('HELLO : {}'.format(datetime.now()))
print(apps.get_models())
~/cron_job.log
>...
>HELLO : 2018-01-05 10:00:02.283938
[<class 'django.contrib.admin.models.LogEntry'>, <class 'django.contrib.auth.models.Permission'>, <class 'django.contrib.auth.models.Group'>, <class 'django.contrib.auth.models.User'>, <class 'django.contrib.contenttypes.models.ContentType'>, <class 'django.contrib.sessions.models.Session'>, <class 'my_app.models.UserProfile'>, <class 'my_app.models.Post'>, <class 'my_app.models.Comment'>, ...]
>...
But when I start to query my model entries:
my_app/cron.py
from datetime import datetime
import blog_app.models
def test():
print('HELLO : {}'.format(datetime.now()))
for post in my_app.models.Post.objects.all():
print(post.title)
nothing is printed out. The model entries exist though.Any idea?
>...
>django.db.utils.OperationalError: no such table: blog_app_post

I struggled with this all day today and for me the problem was that Django wasn't connecting to the database through cron so it was defaulting to the sqlite database which was not migrated. The reason for this is that cron sets up a minimalistic environment and doesn't read the environment variables that you may have already had set. Most people using Django will setup their databases conditionally based on environment variables.
I solved the issue by giving cron access to the 'DATABASE_URL' environment variable that I needed. I found two ways of doing so. One way is to export the env variables that you need to the global /etc/environment file:
env | grep DATABASE_URL >> /etc/environment
You can do this from your entrypoint.sh script. I tried this out myself and it worked for me.
The other solution that I found seems a little bit more complex. It's described here: https://roboslang.blog/post/2017-12-06-cron-docker/. I haven't tried it, but it looks like it should work.

Related

How to pass custom parameters(such as -o) to scrapy crawler

I'm currently working on python2.7/Scrapy 1.8 project.
I work within a Docker container and using a
launchable.py:
import scrapy
from scrapy.crawler import CrawlerProcess
from spiders import addonsimilartechSpider, similartechSpider
process = CrawlerProcess()
process.crawl(similartechSpider.SimilarTechSpider)
process.crawl(addonsimilartechSpider.AddonSimilarSpider)
process.start()
I used to start my scrapy like this :
scrapy crawl <nameofmyspider> -o output.xlsx
I installed scrapy-xlsx and used it until now, now that I have my launchable.py I dont know how to pass 'custom' arguments through scrappy crawler (not spider).
I understand the difference between scrapy settings and spider settings, so :
process.crawl(similartechSpider.SimilarTechSpider, input='-o', first='test1.xlsx')
will likely not work right?
thanks for any of your time taken to answer this.
Use the corresponding Scrapy settings instead (FEED_*).
You can pass them to CrawlerProcess as a dict.
CrawlerProcess(settings={
'FEED_URI': 'output_file_name.xlsx',
'FEED_EXPORTERS' : {'xlsx': 'scrapy_xlsx.XlsxItemExporter'},
})

Add method imports to shell_plus

In shell_plus, is there a way to automatically import selected helper methods, like the models are?
I often open the shell to type:
proj = Project.objects.get(project_id="asdf")
I want to replace that with:
proj = getproj("asdf")
Found it in the docs. Quoted from there:
Additional Imports
In addition to importing the models you can specify other items to
import by default. These are specified in SHELL_PLUS_PRE_IMPORTS and
SHELL_PLUS_POST_IMPORTS. The former is imported before any other
imports (such as the default models import) and the latter is imported
after any other imports. Both have similar syntax. So in your
settings.py file:
SHELL_PLUS_PRE_IMPORTS = (
('module.submodule1', ('class1', 'function2')),
('module.submodule2', 'function3'),
('module.submodule3', '*'),
'module.submodule4'
)
The above example would directly translate to the following python
code which would be executed before the automatic imports:
from module.submodule1 import class1, function2
from module.submodule2 import function3
from module.submodule3 import *
import module.submodule4
These symbols will be available as soon as the shell starts.
ok, two ways:
1) using PYTHONSTARTUP variable (see this Docs)
#in some file. (here, I'll call it "~/path/to/foo.py"
def getproj(p_od):
#I'm importing here because this script run in any python shell session
from some_app.models import Project
return Project.objects.get(project_id="asdf")
#in your .bashrc
export PYTHONSTARTUP="~/path/to/foo.py"
2) using ipython startup (my favourite) (See this Docs,this issue and this Docs ):
$ pip install ipython
$ ipython profile create
# put the foo.py script in your profile_default/startup directory.
# django run ipython if it's installed.
$ django-admin.py shell_plus

Importing CSV to Django and settings not recognised

So i'm getting to grips with Django, or trying to. I have some code that isn't dependent on being called by the webpage - it's designed to populate the database with information. Eventually it will be set up as a cron job to run overnight. This is the first crack at it, which is to do an initial population (once I have that working, I'll move to an add structure, where only new records are pushed.) I'm using Python 2.7, Django 1.5 and Sqlite3. When I run this code, I get
Requested setting DATABASES, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
That seems fairly obvious, but I've spent a couple of hours now trying to work out how to adjust that setting. How do I call / open a connection / whatever the right terminology is here? I have a number of functions like this that will be scheduled jobs, and this has been frustrating me all afternoon.
import urllib2
import csv
import requests
from django.db import models
from gmbl.models import Match
master_data_file = urllib2.urlopen("http://www.football-data.co.uk/mmz4281/1213/E0.csv", "GET")
data = list(tuple(rec) for rec in csv.reader(master_data_file, delimiter=','))
for row in data:
current_match = Match(matchdate=row[1],
hometeam=row[2],
awayteam = row [3],
homegoals = row [4],
awaygoals = row[5],
homeshots = row[10],
awayshots = row[11],
homeshotsontarget = row[12],
awayshotsontarget = row[13],
homecorners = row[16],
awaycorners = row[17])
current_match.save()
I had originally started out with http://django-csv-importer.readthedocs.org/en/latest/ but I had the same error, and the documentation doesn't make much sense trying to debug it. When I tried calling settings.configure in the function, it said it didn't exist; presumably I had to import it, but couldn't make that work.
Make sure Django, and your project are in PYTHONPATH then you can do:
import urllib2
import csv
import requests
from django.core.management import setup_environ
from django.db import models
from yoursite import settings
setup_environ(settings)
from gmbl.models import Match
master_data_file = urllib2.urlopen("http://www.football-data.co.uk/mmz4281/1213/E0.csv", "GET")
data = list(tuple(rec) for rec in csv.reader(master_data_file, delimiter=','))
# ... your code ...
Reference: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
Hope it helps!

How can I run django shell commands from a bash script

Instead of repeatedly deleting my tables, recreating them and populating with data in my dev env, I decided to create a bash script called reset_db that does this for me. I got it to whack the tables, recreate them. But it's not able to populated the tables with data from the django orm.
I try to do this by calling the django shell from the script and then running ORM commands to populate my tables. But it seems like the django shell commands are not running.
I tried running the django orm commands manually/directly in the shell and they run fine but not from within the bash script.
The errors I get are:
NameError: name 'User' is not defined
NameError: name 'u1' is not defined
NameError: name 'm' is not defined
Here is my script:
#!/bin/bash
set +e
RUN_ON_MYDB="psql -X -U user --set ON_ERROR_STOP=on --set AUTOCOMMIT=off rcamp1"
$RUN_ON_MYDB <<SQL # Whack tables
DROP TABLE rcamp_merchant CASCADE;
DROP TABLE rcamp_customer CASCADE;
DROP TABLE rcamp_point CASCADE;
DROP TABLE rcamp_order CASCADE;
DROP TABLE rcamp_custmetric CASCADE;
DROP TABLE rcamp_ordermetric CASCADE;
commit;
SQL
python manage.py syncdb # Recreate tables
python manage.py shell <<ORM # Start django shell. Problem starts here.
from rcamp.models import Customer, Merchant, Order, Point, CustMetric, OrderMetric
u1 = User.objects.filter(pk=5)
m = Merchant(u1, full_name="Bill Gates")
m
ORM
I'm new to both django and shell scripting. Thanks for your help.
You should look at creating a fixture to populate your db https://docs.djangoproject.com/en/dev/howto/initial-data/
You need to import User explicitly. The django package and a few other things are automatically imported, but not everything you might want.
Also, to avoid not know what to import, there are management commands. This will leverage your Django and Python. You can learn shell scripting later.
clearly seen in your mistakes is not recognized as a model class User django-admin maybe you lack some import or something like this
from django.db import models
User import from django.contrib.auth.models
, by the way In line
User.objects.filter u1 = (pk = 5)
I think I put
u1 = User.objects.filter (pk = 5). First ()
at the end.
Anyway, here I leave some threads that may be of help,
https://docs.djangoproject.com/en/dev/ref/django-admin/
http://www.stackoverflow.com/questions/6197256/django-user-model-fields-at-adminmodel
https://groups.google.com/forum/?fromgroups = #! topic/django-users/WrVp1DDFrX8
Hope this helps.

Django timer thread

I would like to compute some information in my Django application on regular basis.
I need to select and insert data each second and want to use Django ORM.
How can I do this?
In a shell script, set the DJANGO_SETTINGS_MODULE variable and call a python script
export DJANGO_SETTINGS_MODULE=yourapp.settings
python compute_some_info.py
In compute_some_info.py, set up django and import your modules (look at how the manage.py script sets up to run Django)
#!/usr/bin/env python
import sys
try:
import settings # Assumed to be in the same directory.
except ImportError:
sys.stderr.write("Error: Can't find the file 'settings.py'")
sys.exit(1)
sys.path = sys.path + ['/yourapphome']
from yourapp.models import YourModel
YourModel.compute_some_info()
Then call your shell script in a cron job.
Alternatively -- you can just keep running and sleeping (better if it's every second) -- you would still want to be outside of the webserver and in your own process that is set up this way.
One way to do it would be to create a custom command, and invoke python manage.py your_custom_command from cron or windows scheduler.
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
For example, create myapp/management/commands/myapp_task.py which reads:
from django.core.management.base import NoArgsCommand
class Command(NoArgsCommand):
def handle_noargs(self, **options):
print 'Doing task...'
# invoke the functions you need to run on your project here
print 'Done'
Then you can run it from cron like this:
export DJANGO_SETTINGS_MODULE=myproject.settings; export PYTHONPATH=/path/to/project_parent; python manage.py myapp_task