I put Celery in my Django app so that the two other python programs can process the input from my Django app via doing subprocess method.
My question is how do I access the output from the subprocess? Back then when I made just a python program, I access the log files (output from the two apps) via stdout and stderr. Is this the same when I use Celery in Django? Is the value of CELERY_RESULT_BACKEND (if I should assign my Django app's db here) affected by the log files?
So far what I've done is:
Access the two apps via subprocess in my tasks.py
I assigned my broker's db, Redis, as my db for now for CELERY_RESULT_BACKEND. My plan is to get the log files and then save them to my Django app's db so that I can just access that db.
Can you offer some help?
Typically, you only care about the task result, which is the return value of the celery task, and that is stored in your result_backend for at least result_expires time (usually 1 day). So, to the extent that you want to access any particular task's result, you can just do so using the task ID.
Related
I want to run a django query from a string and put the output into a variable
in my DRF project, the client sends a django query:
{'query': 'model.objects.all()'}
and I need to return the result of this query.
I tried using exec('model.objects.all()') but i can't assign the output to a variable, i also tried using subprocess.run([sys.executable, "-c", 'model.objects.all()'], capture_output=True, text=True) but subrocess doesn't find the model
There's a huge amount of setting up needed before a process using Django models can work correctly. That's why manage.py shell exists.
If you want to perform Django operations outside the context of a Django server, write a management command. You can then invoke it from the command line, from cron, from other Python scripts ... wherever.
Normally, we can deploy celery for async. Can celery be used for asynchronous file uploading, which allows client to continue working on the website while big size of file being uploaded? I passed the forms to the task of the celery and I had an error like 'Object of type module is not JSON serializable'. Is there any way for async file uploading?
i'm pretty sure it's not possible, what you need to do is more like open a popup page and do the job inside.
'Object of type module is not JSON serializable'
One of the best practice of Celery is to stock data in database (for a lot of data), and create a celery task with ID's only. The object you pass to celery need to be json formatted.
We are a small team of developers working on an application using the PostgreSQL database backend. Each of us has a separate working directory and virtualenv, but we share the same PostgreSQL database server, even Jenkins is on the same machine.
So I am trying to figure out a way to allow us to run tests on the same project in parallel without running into a database name collision. Furthermore, sometimes a Jenkins build would fail mid-way and the test database doesn't get dropped in the end, such that subsequent Jenkins build could get confused by the existing database and fail automatically.
What I've decided to try is this:
import os
from datetime import datetime
DATABASES = {
'default': {
# the usual lines ...
TEST_NAME: '{user}_{epoch_ts}_awesome_app'.format(
user=os.environ.get('USER', os.environ['LOGNAME']),
# This gives the number of seconds since the UNIX epoch
epoch_ts=int((datetime.utcnow() - datetime.utcfromtimestamp(0)).total_seconds())
),
# etc
}
}
So the test database name at each test run most probably will be unique, using the username and the timestamp. This way Jenkins can even run builds in parallel, I think.
It seems to work so far. But could it be dangerous? I'm guessing we're safe as long as we don't try to import the project settings module directly and only use django.conf.settings because that should be singleton-like and evaluated only once, right?
I'm doing something similar and have not run into any issue. The usual precautions should be followed:
Don't access settings directly.
Don't cause the values in settings to be evaluated in a module's top level. See the doc for details. For instance, don't do this:
from django.conf import settings
# This is bad because settings might still be in the process of being
# configured at this stage.
blah = settings.BLAH
def some_view():
# This is okay because by the time views are called by Django,
# the settings are supposed to be all configured.
blah = settings.BLAH
Don't do anything that accesses the database in a module's top level. The doc warns:
If your code attempts to access the database when its modules are compiled, this will occur before the test database is set up, with potentially unexpected results. For example, if you have a database query in module-level code and a real database exists, production data could pollute your tests. It is a bad idea to have such import-time database queries in your code anyway - rewrite your code so that it doesn’t do this.
Instead of the time, you could use the Jenkins executor number (available in the environment); that would be unique enough and you wouldn't have to worry about it changing.
As a bonus, you could then use --keepdb to avoid rebuilding the database from scratch each time... On the downside, failed and corrupted databases would have to be dropped separately (perhaps the settings.py can print out the database name that it's using, to facilitate manual dropping).
I have a Django 1.5.1 webapp using Celery 3.0.23 with RabbitMQ 3.1.5. and sqlite3.
I can submit jobs using a simple result = status.tasks.mymethod.delay(parameter), all tasks executes correctly:
[2013-09-30 17:04:11,369: INFO/MainProcess] Got task from broker: status.tasks.prova[a22bf0b9-0d5b-4ce5-967a-750f679f40be]
[2013-09-30 17:04:11,566: INFO/MainProcess] Task status.tasks.mymethod[a22bf0b9-0d5b-4ce5-967a-750f679f40be] succeeded in 0.194540023804s: u'Done'
I want to display in a page the latest 10 jobs submitted and their status. Is there a way in Django to get such objects? I see a couple of tables in the database (celery_taskmeta and celery_taskmeta_2ff6b945) and tried some accesses to the objects but Django always displays a AttributeError page.
What is the correct way to access Celery results from Django?
Doing
cel = celery.status.tasks.get(None)
cel = status.tasks.all()
does not work, resulting in the aforementioned AttributeError. (status is the name of my app)
EDIT: I am sure tasks are saved, as this small tutorial says:
By default django-celery stores this state in the Django database. You may consider choosing an alternate result backend or disabling states alltogether (see Result Backends).
Following the links there are only references on how to setup the DB connection and not how to retrieve the results.
Try this:
from djcelery.models import TaskMeta
TaskMeta.objects.all()
I want to develop an application which uses Django as Fronted and Celery to do background stuff.
Now, sometimes Celery workers on different machines need database access to my django frontend machine (two different servers).
They need to know some realtime stuff and to run the django-app with
python manage.py celeryd
they need access to a database with all models available.
Do I have to access my MySQL database through direct connection? Thus I have to allow user "my-django-app" access not only from localhost on my frontend machine but from my other worker server ips?
Is this the "right" way, or I'm missing something? Just thought it isn't really safe (without ssl), but maybe that's just the way it has to be.
Thanks for your responses!
They will need access to the database. That access will be through a database backend, which can be one that ships with Django or one from a third party.
One thing I've done in my Django site's settings.py is load database access info from a file in /etc. This way the access setup (database host, port, username, password) can be different for each machine, and sensitive info like the password isn't in my project's repository. You might want to restrict access to the workers in a similar manner, by making them connect with a different username.
You could also pass in the database connection information, or even just a key or path to a configuration file, via environment variables, and handle it in settings.py.
For example, here's how I pull in my database configuration file:
g = {}
dbSetup = {}
execfile(os.environ['DB_CONFIG'], g, dbSetup)
if 'databases' in dbSetup:
DATABASES = dbSetup['databases']
else:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
# ...
}
}
Needless to say, you need to make sure that the file in DB_CONFIG is not accessible to any user besides the db admins and Django itself. The default case should refer Django to a developer's own test database. There may also be a better solution using the ast module instead of execfile, but I haven't researched it yet.
Another thing I do is use separate users for DB admin tasks vs. everything else. In my manage.py, I added the following preamble:
# Find a database configuration, if there is one, and set it in the environment.
adminDBConfFile = '/etc/django/db_admin.py'
dbConfFile = '/etc/django/db_regular.py'
import sys
import os
def goodFile(path):
return os.path.isfile(path) and os.access(path, os.R_OK)
if len(sys.argv) >= 2 and sys.argv[1] in ["syncdb", "dbshell", "migrate"] \
and goodFile(adminDBConfFile):
os.environ['DB_CONFIG'] = adminDBConfFile
elif goodFile(dbConfFile):
os.environ['DB_CONFIG'] = dbConfFile
Where the config in /etc/django/db_regular.py is for a user with access to only the Django database with SELECT, INSERT, UPDATE, and DELETE, and /etc/django/db_admin.py is for a user with these permissions plus CREATE, DROP, INDEX, ALTER, and LOCK TABLES. (The migrate command is from South.) This gives me some protection from Django code messing with my schema at runtime, and it limits the damage an SQL injection attack can cause (though you should still check and filter all user input).
This isn't a solution to your exact problem, but it might give you some ideas for ways to smarten up Django's database access setup for your purposes.