Django Celery - Creating duplicate users due to mutiple workers - django

I am facing a weird issue with celery in production. Currently, the production server has 4 celery workers which handle all the tasks registered by my django app. No custom queues are defined. The workers are basically 4 separate supervisor conf files.
Now, in my app I am handling facebook webhook data, and I want a user with a specific FacebookID to be only created once on my backend. But, recently I checked and found out that there are users who have the same FacebookID, which should not have happened.
What happened I think was e.g. user with FacebookID 666 sent me a webhook data. Now, a task is created which will create a new user instance in my database with FacebookID 666. Now, before the user is created in my database, the user hits me with another data, which also created a task but under a different worker, and thus I got myself two users with the same FacebookID.
Is there any way I can configure celery to handle a user with a specific FacebookID to create tasks only in ONE worker? Or have I completely misjudged the situation over here?

Essentially, you need a user-level distributed lock to prevent multiple workers from working on the same user. There are several ways to accomplish this, the most straightforward being a database such as mysql or redis. In mysql, the first process would transactionally (1) check for an existing row in a database table with the user ID (e.g., email or other unique identifier) and (2) if no row exists, creating that row; (3) and if a row exists, return early without doing anything. You can also do this in redis using a redlock or for smaller systems just using SETNX

Related

Redshift WLM: If super user is added to automatic WLM query group then how many slots would be given to that query group?

I have created a WLM query group named ETL and have added 2 super users in that group. I am using automatic WLM, So not changing any concurrency related settings.
I am not clear about one thing that how many number of slots would be assignedand to this "ETL" group since it contains 2 super users?
As per the AWS documentation super user will have separate queue but what if the super users are part of other WLM group(ETL in this case).
When I run this query for ETL WLM select * from stl_wlm_query. In the result its shows slot_count as 1, ideally it should be 5 but same WLM group is also running 5 queries in parallel.
Two different concepts are being confused here. Users that have superuser ability and the superuser queue. The only way a query runs in the superuser queue is if the user is a superuser AND they have set the property "query_group" to 'superuser'. See - https://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-queue-assignment-rules.html
So your superuser queries won't run in the superuser queue unless they set query_group like - "set query_group to 'superuser';" in their session. If this is done then their queries will run in the superuser queue.

How to consolidate Django (different models) database and store at centralized place

I have created a Django-based webpage where different vendor company employees can logins and can change their Shift Timing. (Now we are controlling this job with Linux script but due to large user size ~8k doing it for all requests is a difficult task).
To resolve this I have created a Django webpage( 6 separate models/DB) and used default SQLite DB.
Requirement:
The user is working on some application which needs to be controlled by updated shift timing on the portal.
Question:
How to consolidate OR store DB data in a centralized place? so that if tomorrow I have to reset the Timing for all the users in the portal to default consider General Shift.
I have the below Idea to do this but not sure if this is the best way to complete this work.
by using the REST API I will get the JSON data.OR
manage.py dumpdata apple.CompanyName(Model) --indent 5
any help/Suggestion on this would be appreciated.
For the database u could use an Hosted db like heroku postgres database, If ur new
to database else u can run ur own postgres database in the server.AS u mention there 8k its not good to use SQLite DB as it is file system based.To update the shift timing u can use the default Django admin. I am not sure about ur model structure but as long as you have necessary validation in logic it can be updated from admin anytime

Flask-Login user status monitoring

I'm developing a small website with Flask & Flask-Login. I need an admin view to monitor all user's online status. I added an is-online column in user db collection and try to update it. But I didn't find any callbacks to handle session expires. How to solve it or any better idea?
Thanks!
FYI, Counting Online Users with Redis | Flask (A Python Microframework) - http://flask.pocoo.org/snippets/71/.
You could get away with checking if users last activity time is bigger(older) than session life time.
if that's the case, you will go on an update their is_online status.
the way i have handled the problem in my application was, since i had a last_activity field in db for each user, to log when they have done what they have done, i could check that value vs session life time.
One very crude method is to create a separate stack that pushes and pops when a user logs in. Assuming that session id and userid on your db is not tied together (i.e., you have separate session id and user id), you can maintain a ledger of sorts and push and pop as sessions are created and destroyed.
You will have to put special emphasis on users running multiple sessions on multiple devices...which is why i put a caveat saying this is a rather crude method.

Celery won't traverse Django foreign key relationships

I'm running two instances on Heroku:
1. Gunicorn
2. Celeryd
In my email templates I have something to the effect of:
orders.orderitem_set.all
When I render these emails via the web instance (ie. without Celery) I get a list of order items (as expected).
However, when I render these templates with Celery, the list is empty.
Why won't Celery traverse foreign key relationships in templates and how do I know what is in and out of scope?
Most likely the problem is that the django database objects are stale at the moment of execution of the celery tasks.
This problem was noted by Deni Bertovic in https://denibertovic.com/posts/celery-best-practices/
You shouldn't pass Database objects (for instance your User model) to a background task because the serialized object might contain stale data. What you want to do is feed the task the User id and have the task ask the database for a fresh User object.

How can I limit database query time during web requests?

We've got a pretty typical django app running on postgresql 9.0. We've recently discovered some db queries that have run for over 4 hours, due to inefficient searches in the admin interface. While we plan to fix these queries, as a safeguard we'd like to artificially constrain database query time to 15 seconds--but only in the context of a web request; batch jobs and celery tasks should not be bounded by this constraint.
How can we do that? Or is it a terrible idea?
The best way to do this would be to set up a role/user that is only used to run the web requests, then set the statement_timeout on that role.
ALTER ROLE role_name SET statement_timeout = 15000
All other roles will use the global setting of statement_timeout (which is disabled in a stock install).
You will need to handle this manually. That is checking for the 15 second rule and killing the queries that violate it.
Query pg_stat_activity and find the violators and issue calls to pg_terminate_backend(procpid) to kill the offenders.
Something like this in a loop:
SELECT pg_terminate_backend(pg_stat_activity.procpid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'TARGET_DB'
AND usename = 'WEBUSERNAME'
AND (now()-query_start) > '00:00:15';
As far as the timing goes, you could pass all of your queries through a class which, on instantiation, spawns two threads: one for the query, and one for a timer. If the timer reaches 15 seconds, then kill the thread with the query.
As far as figuring out if the query is instantiated from a web request, I don't know enough about Django to be able to help you. Simplistically, I would say, in your class that handles your database calls, an optional parameter to the constructor could be something like context, which could be http in the event of a web request and "" for anything else.