Sharing updated gensim model file between gunicorn workers

Sharing updated gensim model file between gunicorn workers - flask

I am trying to create an API using Gunicorn and Flask for calling a gensim model.
One of the endpoints would allow users to input new products into the model without any downtime. We achieve this using model.add and model.save.
As I understand it Gunicorn would load up the current model and distribute copies of it to the workers.
# Load both the item2vec and word2vec model
modelname = load_model_item2vec(modelpath)
word2vec_model = load_model_item2vec(word2vec_modelpath)
app = Flask(__name__)
#app.route("/update_model", methods=["POST"])
...
model.add(entities=..., weights=vectors)
model.save(model_path)
...
After saving the updated model, it only affects the copy of the model in the worker itself, unless I restart the Gunicorn master process itself.
Is there anyway to propagate the updated model file to all the workers without any downtime/ needing to restart Gunicorn ? Thanks

Related

How to manage a stop or restart of a task from Django-based site?

I'll be running a script in a server which will automatically create model instances in a database. The idea is to use a infinite loop (e.g while True:) which will be endlessly creating instances until I somehow stop it.
I want to use Django to nicely check from my website how big my database is, and from there I want to stop or restart it.
What could be a good approach here?
I was thinking about Celery, but I don't know how would I don't have clear how to stop it and it kind of looks like an overkill. Any suggestion?

A simple solution is to have a class that saves to the db the name of the script and whether it should keep running:
class ScriptTracker():
name = models.Charfield()
keep_running = models.BooleanField()
Then your script would just check the db every loop to see if it should stop:
def my_script():
while True:
if not ScriptTracker.objects.get(name="my_script").keep_running:
# stop running
return
# creating an instance in the db
MyObject.objects.create(name="helloworld")
Create the ScriptTracker object
ScriptTracker.objects.create(name="my_script", keep_running=True)
Start your script running, could be done simple if script is built as a management command:
python manage.py my_script

Flask-Migrate creates the same duplicate migration when used with postgres schemas

I have a very simple and silly problem but I don't know what I'm missing. Basically, the way I've currently written my manage app, it seems flask migrate always creates an absolute migration and not just a change-set to migrate from the previous schema to the current one.
For example, if I delete my migrations and spin a brand new DB and I then do manage db migrate followed by manage db upgrade all works. If I then make a change to a db.Model table and then do manage db migrate I don't get an error.
However, the new migration script points to the previous one but isn't just the diff needed to get the database from the previous schema state to the new one but a full (absolute) migration starting from an empty schema - as in, it would try to create the tables from scratch again (with the change) and not just for example apply the change to the already created schema. That is, even though the migration is linked to the previous one, it hasn't taken into account what the previous migration has been applied. This means they cannot be chained because for example the second migration will attempt to create tables again and so manage db upgrade fails when called the second time.
My manage app looks like this:
from flask_migrate import Migrate, MigrateCommand
from src.common.db import db
from src.common.flaskery import global_flask_app, global_flask_manager
app = global_flask_app(__name__)
migrate = Migrate(app, db)
manager = global_flask_manager(__name__)
manager.add_command('db', MigrateCommand)
from src.db.models import *
def main():
manager.run()
if __name__ == '__main__':
main()
Similar: Flask Migrate using different postgres schemas ( __table_args__ = {'schema': 'test_schema']})

So in your migrations/env.py, you need to add include_schemas=True to the config as below:
context.configure(connection=connection,
target_metadata=target_metadata,
process_revision_directives=process_revision_directives,
include_schemas=True,
**current_app.extensions['migrate'].configure_args)

Access Celery's subprocess stdout and stderr in my Django app

I put Celery in my Django app so that the two other python programs can process the input from my Django app via doing subprocess method.
My question is how do I access the output from the subprocess? Back then when I made just a python program, I access the log files (output from the two apps) via stdout and stderr. Is this the same when I use Celery in Django? Is the value of CELERY_RESULT_BACKEND (if I should assign my Django app's db here) affected by the log files?
So far what I've done is:
Access the two apps via subprocess in my tasks.py
I assigned my broker's db, Redis, as my db for now for CELERY_RESULT_BACKEND. My plan is to get the log files and then save them to my Django app's db so that I can just access that db.
Can you offer some help?

Typically, you only care about the task result, which is the return value of the celery task, and that is stored in your result_backend for at least result_expires time (usually 1 day). So, to the extent that you want to access any particular task's result, you can just do so using the task ID.

Getting latest tasks from Celery and display them using Django

I have a Django 1.5.1 webapp using Celery 3.0.23 with RabbitMQ 3.1.5. and sqlite3.
I can submit jobs using a simple result = status.tasks.mymethod.delay(parameter), all tasks executes correctly:
[2013-09-30 17:04:11,369: INFO/MainProcess] Got task from broker: status.tasks.prova[a22bf0b9-0d5b-4ce5-967a-750f679f40be]
[2013-09-30 17:04:11,566: INFO/MainProcess] Task status.tasks.mymethod[a22bf0b9-0d5b-4ce5-967a-750f679f40be] succeeded in 0.194540023804s: u'Done'
I want to display in a page the latest 10 jobs submitted and their status. Is there a way in Django to get such objects? I see a couple of tables in the database (celery_taskmeta and celery_taskmeta_2ff6b945) and tried some accesses to the objects but Django always displays a AttributeError page.
What is the correct way to access Celery results from Django?
Doing
cel = celery.status.tasks.get(None)
cel = status.tasks.all()
does not work, resulting in the aforementioned AttributeError. (status is the name of my app)
EDIT: I am sure tasks are saved, as this small tutorial says:
By default django-celery stores this state in the Django database. You may consider choosing an alternate result backend or disabling states alltogether (see Result Backends).
Following the links there are only references on how to setup the DB connection and not how to retrieve the results.

Try this:
from djcelery.models import TaskMeta
TaskMeta.objects.all()

Django Celery Database for Models on Producer and Worker

I want to develop an application which uses Django as Fronted and Celery to do background stuff.
Now, sometimes Celery workers on different machines need database access to my django frontend machine (two different servers).
They need to know some realtime stuff and to run the django-app with
python manage.py celeryd
they need access to a database with all models available.
Do I have to access my MySQL database through direct connection? Thus I have to allow user "my-django-app" access not only from localhost on my frontend machine but from my other worker server ips?
Is this the "right" way, or I'm missing something? Just thought it isn't really safe (without ssl), but maybe that's just the way it has to be.
Thanks for your responses!

They will need access to the database. That access will be through a database backend, which can be one that ships with Django or one from a third party.
One thing I've done in my Django site's settings.py is load database access info from a file in /etc. This way the access setup (database host, port, username, password) can be different for each machine, and sensitive info like the password isn't in my project's repository. You might want to restrict access to the workers in a similar manner, by making them connect with a different username.
You could also pass in the database connection information, or even just a key or path to a configuration file, via environment variables, and handle it in settings.py.
For example, here's how I pull in my database configuration file:
g = {}
dbSetup = {}
execfile(os.environ['DB_CONFIG'], g, dbSetup)
if 'databases' in dbSetup:
DATABASES = dbSetup['databases']
else:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
# ...
}
}
Needless to say, you need to make sure that the file in DB_CONFIG is not accessible to any user besides the db admins and Django itself. The default case should refer Django to a developer's own test database. There may also be a better solution using the ast module instead of execfile, but I haven't researched it yet.
Another thing I do is use separate users for DB admin tasks vs. everything else. In my manage.py, I added the following preamble:
# Find a database configuration, if there is one, and set it in the environment.
adminDBConfFile = '/etc/django/db_admin.py'
dbConfFile = '/etc/django/db_regular.py'
import sys
import os
def goodFile(path):
return os.path.isfile(path) and os.access(path, os.R_OK)
if len(sys.argv) >= 2 and sys.argv[1] in ["syncdb", "dbshell", "migrate"] \
and goodFile(adminDBConfFile):
os.environ['DB_CONFIG'] = adminDBConfFile
elif goodFile(dbConfFile):
os.environ['DB_CONFIG'] = dbConfFile
Where the config in /etc/django/db_regular.py is for a user with access to only the Django database with SELECT, INSERT, UPDATE, and DELETE, and /etc/django/db_admin.py is for a user with these permissions plus CREATE, DROP, INDEX, ALTER, and LOCK TABLES. (The migrate command is from South.) This gives me some protection from Django code messing with my schema at runtime, and it limits the damage an SQL injection attack can cause (though you should still check and filter all user input).
This isn't a solution to your exact problem, but it might give you some ideas for ways to smarten up Django's database access setup for your purposes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sharing updated gensim model file between gunicorn workers - flask

Related

How to manage a stop or restart of a task from Django-based site?

Flask-Migrate creates the same duplicate migration when used with postgres schemas

Access Celery's subprocess stdout and stderr in my Django app

Getting latest tasks from Celery and display them using Django

Django Celery Database for Models on Producer and Worker

Categories

Resources