Improve performance of writing Django queryset to manytomany

Improve performance of writing Django queryset to manytomany - django

I have a Django 1.11 app. There is a model Campaign where I can specify parameters to select Users. When I run Campaign, I create a CampaignRun instance with FK compaign_id and M2M users. Each time a Campaign is run, different users can be in a resulting queryset so I'd like to keep a record about it. I do it as shown below:
run = CampaignRun.objects.create(campaign=self, ...)
(...)
filtered_users = User.objects.filter(email__in=used_emails)
run.users.add(*filtered_users) # I also tried run.users.set(filtered_users)
run.save()
However, it turns out that if the campaign is run from django-admin and the resulting number of users exceeds approximately 150, the process takes more than 30 seconds, which results in Error 502: Bad Gateway.
It seems to me that 150 is ridiculously low number to get a timeout so I believe there must be a plenty of room for optimizing the process. What can I do to improve this process? What are my options in Django? Would you suggest using completely different approach (e.g. nosql)?

Related

Django multiple admin modifying the same databases

i'm a total noob in django and just wondering if it's possible for an admin doing a same thing at the same time ? the only thing i get after looking at the django documentation is that it is possible to have two admins, but is it possible for the admins to do a task in the same databases at the same time ?
thanks for any help

You didn't made it clear that what do you actually want but:
If by admin you mean a superuser then yes you can have as many admins as you want.
Admins can change anything in database at the same time, but if you mean changing a specific row of a specific table at the same time, its not possible because of these reasons:
Its kinda impossible to save something at the same time. when both admins tries to save anything, the last request will be saved (the first one will be saved too but it changes to the last request)
and if there is any important data in database, you should block any other accesses to that row till the first user has done his job and saved the changes. (imagine a ticket reservation website which has to block any other users to be allowed to order the same ticket number till user finishes the order or cancel it.)
Also if you mean 2 different django projects using a single database, then its another yes. Basically they are like 2 different admins and all above conditions works for them too.

Django, each user having their own table of a model

A little background. I've been developing the core code of an application in python, and now I want to implement it as a website for the user, so I've been learning Django and have come across a problem and not sure where to go with it. I also have little experience dealing with databases
Each user would be able to populate their own list, each with the same attributes. What seems to be the solution is to create a single model defining the attributes etc..., and then the user save records to this, and at the same time very frequently changing the values of the attributes of the records they have added (maybe every 5~10 seconds or so), using filters to filter down to their user ID. Each user would add on average 4000 records to this model, so say just for 1000 users, this table would have 4 million rows, 10,000 users we get 40million rows. To me this seems it would impact the speed of content delivery a lot?
To me a faster solution would be to define the model, and then for each user to have their own instance of this table of 4000ish records. From what I'm learning this would use more memory and disk-space, but I'd rather get a faster user experience as my primary end point.
Is it just my thinking because I don't have experience with databases? Or are my concerns warranted and I should find a solution as to how to be able to do the latter?
This post asked the same question I believe, but no solution on how to achieve it. How to create one Model (table) for each user on django?

Django Rest Framework metrics per user requests

I'm currently working on an Api build with Django Rest Framework, uwsgi nginx and memcached.
I would like to know what is the best way to get users statistics like number of requests per user? Taking in consideration that the infrastructure is probably going to scale to multiple servers.
And is there is a way to determine if the response was retrieve from cache or from the application?
What I'm thinking is processing the Nginx logs to separate the request by user and apply all the calculations.

First: You might find drf-tracking to be a useful project, but it stores the response to every request in the database, which we found kind of crazy.
The solution for this that we developed instead is a a mixin that borrows heavily from drf-tracking, but that only logs statistics. This solution uses our cache server (Redis), so it's very fast.
It's very simple to drop in if you're already using Redis:
class LoggingMixin(object):
"""Log requests to Redis
This draws inspiration from the code that can be found at: https://github.com/aschn/drf-tracking/blob/master/rest_framework_tracking/mixins.py
The big distinctions, however, are that this code uses Redis for greater
speed, and that it logs significantly less information.
We want to know:
- How many queries in last X days, total?
- How many queries ever, total?
- How many queries total made by user X?
- How many queries per day made by user X?
"""
def initial(self, request, *args, **kwargs):
super(LoggingMixin, self).initial(request, *args, **kwargs)
d = date.today().isoformat()
user = request.user
endpoint = request.resolver_match.url_name
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['STATS'],
)
pipe = r.pipeline()
# Global and daily tallies for all URLs.
pipe.incr('api:v3.count')
pipe.incr('api:v3.d:%s.count' % d)
# Use a sorted set to store the user stats, with the score representing
# the number of queries the user made total or on a given day.
pipe.zincrby('api:v3.user.counts', user.pk)
pipe.zincrby('api:v3.user.d:%s.counts' % d, user.pk)
# Use a sorted set to store all the endpoints with score representing
# the number of queries the endpoint received total or on a given day.
pipe.zincrby('api:v3.endpoint.counts', endpoint)
pipe.zincrby('api:v3.endpoint.d:%s.counts' % d, endpoint)
pipe.execute()
Put that somewhere in your project, and then add the mixin to your various views, like so:
class ThingViewSet(LoggingMixin, viewsets.ModelViewSet):
# More stuff here.
Some notes about the class:
It uses Redis pipelines to make all of the Redis queries hit the server with one request instead of six.
It uses Sorted Sets to keep track of which endpoints in your API are being used the most and which users are using the API the most.
It creates a few new keys in your cache per day -- there might be better ways to do this, but I can't find any.
This should be a fairly flexible starting point for logging the API.

What I do usually is that I have a centralized cache server (Redis), I log all requests to there with all the custom calculations or fields I need. Then you can build your own dashboard.
OR
Go with Logstash from the Elasticsearch company. Very well done saves you time and scales very good. I'd say give it a shot http://michael.bouvy.net/blog/en/2013/11/19/collect-visualize-your-logs-logstash-elasticsearch-redis-kibana/

Track values of model over time in Django

I'm building a Django application, and I would like to track certain model statistics over time (such as the number of registered users or the number of times a page has been edited). Is there a predetermined app that would do this for me, or would it be easier to roll one from scratch?
At the end of the day, I'm looking for something that can track unique values across different models over time.

The number of registered users is already available:
active_users = User.objects.filter(is_active=True).count()
for all active and inactive users it's just
active_users = User.objects.count()
To get the number of times a page has been edited, there are a couple of approaches: you could track and record each individual change (and count them) or just override the save method for the model to provide a counter of sorts.
def save(self, *args, **kwargs):
self.counter += 1
If you want to record each individual change, use a versioning tool like django-reversion (it is under active development and a ton of deployments).

You could use django-reversion for audit trail.

Django: Skipping model validation

I'm using the development server (runserver) in Django and it's started to annoy me that Django is validating the models every time I save a file and the server restarts. I have ~8000 entries in my SQLite3 database and it takes up to five seconds to validate the models. I'm not familiar with how Django validates the models, but I'm guessing it's proportional to the size of the database somehow.
So is there any way to tell Django not to validate the models? The ideal thing would be being able to tell Django to validate the models only on the first start and not on any automatic restarts due to changes in the Python files.

Django doesn't do any model-level validation at all, and it certainly doesn't scan your database on startup.
The only validation it does on startup is to check the syntax of your models code, and that's not at all proportional to your database size.

I have over 10 million rows in my database, and runserver takes less than a second.
Try hitting Control + C while it is in the 5 seconds, and see where the code is when the KeyboardException is thrown.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js