Getting random object of a model with django-rest-framework - django

In my Django project I need to provide a view to get random object from a model using django-rest-framework. I had this ListAPIView:
class RandomObject(generics.ListAPIView):
queryset = MyModel.objects.all().order_by('?')[:1]
serializer_class = MyModelSerializer
...
It worked fine but order_by('?') takes a lot of time when launched on big database. So I decided to use usual Python random.
import random
def pick_random_object():
return random.randrange(1, MyModel.objects.all().count() + 1)
class RandomObject(generics.ListAPIView):
queryset = MyModel.objects.all().filter(id = pick_random_object())
...
I found out a strange thing when tried to use this. I launched Django development server and sent some GET requests, but I got absolutely the same object for all of the requests. When dev server restarted and another set of requests sent I'm getting another object, but still absolutely the same one for all of requests - even if random.seed() was used first of all. Meanwhile, when I tried to get a random object not via REST but via python manage.py shell I got random objects for every time I called pick_random_object().
So everything looks good when using shell and the behavior is strange when using REST, and I have no clue of what's wrong.
Everything was executed on Django development server (python manage.py runserver).

As #CarltonGibson noticed, queryset is an attribute of RandomObject class. Hence it cached and cannot be changed any later. So if you want to make some changeable queryset (like getting random objects at every request) in some APIView, you must override a get_queryset() method. So instead of
class RandomObject(generics.ListAPIView):
queryset = MyModel.objects.all().filter(id = pick_random_object())
...
you should write something like this:
class RandomObject(generics.ListAPIView):
#queryset = MyModel.objects.all().filter(id = pick_random_object())
def get_queryset(self):
return MyModel.objects.all().filter(id = pick_random_object())
Here pick_random_object() is a method to get random id from the model.

Since it's an attribute of the class, your queryset is getting evaluated and cached when the class is loaded, i.e. when you start the dev server.
I'd try pulling a list of primary keys, using values_list() — the flat=True example does exactly what you need. Ideally cache that. Pick a primary key at random and then use that to get() the actual object when you need it.
So, how would that go?
I'd define a method on the view. If you forget the caching, the implementation might go like this:
# Lets use this...
from random import choice
def random_MyModel(self):
"""Method of RandomObject to pick random MyModel"""
pks = MyModel.objects.values_list('pk', flat=True).order_by('id')
random_pk = choice(pks)
return MyModel.objects.get(pk=random_pk)
You might then want to cache the first look up here. The caching docs linked above explain how to do that. If you do cache the result look into the db.models signals to know when to invalidate — I guess you'd post_save, checking the created flag, and post_delete.
I hope that helps.

Related

How to disable Django REST Framework caching?

I'm just start working with django and DRF, and occure a problem, that is looks like DRF cache responses. I mean - I can change object, create new, or delete it - and DRF keep response, thats nothing is changed. For example, I create an object, but modelViewSet still return data where this object does not presented. But if I directly request it object - it show that it's created. And so with any another actions. I can't find topic about caching in DRF, and look like I have not any django chaching middlewares, so I have no idea what is going on.
Only one thing that helps - restart server ( I'm using default dev-server).
One more thing - all data is ok when it's rendered by django views, not DRF views.
Here is one of the serializers/modelViewSets that I'm using. It simple as it possible. And also - I'm not using django cache backends. At least - I have not any in my settings.
class WorkOperationSerializer(serializers.ModelSerializer):
class Meta:
model = WorkOperation
class WorkOperationAPIView(viewsets.ModelViewSet):
serializer_class = WorkOperationSerializer
queryset = WorkOperation.objects.all()
def get_queryset(self):
return self.queryset
You can read here about django queryset caching. The best advice seems to be: re-run the .all() method to get fresh results. Using object.property may give you cached results.

best practice to override saving method in django model to be async using celery

i am building a cloud system, i have two apps,
server apps which includes full functionality, and client app which including only input method,
so i am installing the client app in the customer branch as local app,
i want override any models in the apps after saving the models locally, i will call a celery task to add this model in the queue to make sure it will arrive, even if the internet is down, i will do retry until internet is getting up,
now i want the best practice to have a generic way to do it to any model
i have two option
1- overriding save method like this
def save(self, *args, **kwargs):
super(Model, self).save(*args, **kwargs)
save_task.delay(self)
or using signals like this
post_save.connect(save-task.delay, sender=Model)
which one is the best practice and i can make it generics for all the models of this project?
.save() is just a bunch of signals executed one after the other. here's a shortened version of the process from the documentation:
Emit a pre-save signal. [...]
Pre-process the data. [...] Most fields do no pre-processing [...] only used on fields that have special behavior [...]
documentation doesn’t yet include a list of all the fields with this
“special behavior.”
Prepare the data for the database. Each field is asked to provide its current value in a data type that can be written to the
database. Most fields require no data preparation [...] integers and strings are
‘ready to write’ as a Python object [...] complex data types often
require some modification. [...]
Insert the data into the database. [...]
Emit a post-save signal. [...]
In your case, you're not doing anything in the middle of that process. You only need to do it after the model has already been saved. So there's no need to use signals.
Now what you're actually asking, is how to make sure a task will be executed eventually. Well:
I'm pretty sure you can solve this using celery
You should hook up the applications to a single db (if you can), don't save things locally and then update a server, that could turn ugly.
but, if you truly think there's a fair chance of the internet going down or anything like that, and you're sure there's no better way to link your apps, I would suggest you add a new model that keeps track of what's been updated. Something like this:
class Track(models.Model):
modelname = models.CharField(max_length=20)
f_pk = models.IntegerField()
sent = models.BooleanField()
def get_obj(self):
try:
# we want to do modelname.objects.get(pk=self.f_pk), so:
return getattr( getattr(self.modelname, 'objects'), 'get')(pk=self.f_pk)
except:
return False
Notice how I'm not linking it to a certain model but rather giving it tools to fetch on any model you damn well please. Then, for each model you want to keep track of, you add this:
class myModel(models.Model):
...
def save(self, *args, **kwargs):
super(Model, self).save(*args, **kwargs)
t = Track(modelname=self.__class__.__name__, f_pk=self.pk, sent=False)
t.save()
Then scheduale a task that will Track objects with sent=False and try to save them:
unsent = Track.objects.filter(sent=False)
for t in unsent:
obj = t.get_obj()
# check if this object exists on the server too
# if so:
t.sent = True
t.save()
p.s.
remember how I mentioned things could get ugly? it's been moments since I posted this, and I already see how. Notice how I use a pk and modelname to figure out if a model is saved in both places, right? but, pk's are (by default in django) an auto-incremented field. If the application runs in two places, or even if you run it locally and something wrong happens once, than the pks ca quickly be out of sync.
Say I saved once object, it gets a pk of 1 on both local and server.
local server
name pk ++ name pk
obj1 1 ++ obj1 1
Then I save another one but the internet went down.
local server
name pk ++ name pk
obj1 1 ++ obj1 1
obj2 2 ++
Next time it's up, I add a new object, but this happens before the schedualed task runs. So now my local db has 3 objects, and my server has 2, and those have different pk's, get it?
local server
name pk ++ name pk
obj1 1 ++ obj1 1
obj2 2 ++ obj3 2
obj3 3 ++
and after the schedualed task will run we'll have this:
local server
name pk ++ name pk
obj1 1 ++ obj1 1
obj2 2 ++ obj3 2
obj3 3 ++ obj2 3
See how easily this can get out of hand? To fix this, each tracked model will have to have some sort of unique identifier, and you'll need to somehow tell the Track model how to follow it. It's a headache. Better not to save things locally but to link everything together

Setting the authenticated user on a Django model

I have a number of models that need to refer back to the user that created/updated them. Generally this just involves passing request.user to the relevant attribute, however I'd like to make this automatic if possible.
There's an extension for Doctrine (a PHP ORM) called Blameable that will set a reference to the currently authenticated user when persisting a model instance, e.g.:
class Post
{
/**
* Will set this to the authenticated User on the first persist($model)
* #ORM\ManyToOne(targetEntity="User", inversedBy="posts")
* #Gedmo\Blameable(on="create")
*/
private $createdBy;
/**
* Sets this to the authenticated User on the first and subsequent persists
* #ORM\ManyToOne(targetEntity="User")
* #Gedmo\Blameable(on="update")
*/
private $updatedBy;
}
To get the same functionality in Django, my first thought was to try and use pre_save signal hooks to emulate this - however I'd need to access the request outside of a view function (looks possible with some middleware but a bit hacky).
Is there something similar already available for Django? Am I better off explicitly passing the authenticated user?
The level of decoupling Django has makes it impossible to automatically set the user in a model instance.
The middleware solution is the way to go. When I need to do this, I just add to the save() method, like so:
class MyObject(models.Model):
def save(self, *args, **kwargs):
if not self.created_by:
self.created_by = get_requests().user
super(MyObject, self).save(*args, **kwargs)
as for the "hackyness" of storing the requests in a global dictionary, I think you'll get over it. Someone once said of this pattern, "It's the worst one, except for all the others".
P.S. You'll also find it really useful if you want to use django.contrib.messages from deep within your code.

Django: Editable BINARY field which displays as HEX in Admin

update I have now figured that there is a reason to define get_prep_value() and that doing so improves Django's use of the field. I have also been able to get rid of the wrapper class. All this has, finally, enabled me to also eliminate the __getattribute__ implementation with the data model, which was annoying. So, apart from Django callingto_python()` super often, I'm now fine as far as I can see. /update
One morning, you wake up and find yourself using Django 1.4.2 along with DjangoRESTFramework 2.1.2 on Python 2.6.8. And hey, things could definitely be worse. This Django admin magic provides you with forms for your easily specified relational data model, making it a pleasure to maintain the editorial part of your database. Your business logic behind the RESTful URLs accesses both the editorial data and specific database tables for their needs, and even those are displayed in the Django admin, partially because it's easily done and nice to have, partially because some automatically generated records require a mini workflow.
But wait. You still haven't implemented those binary fields as BINARY. They're VARCHARS. You had put that on your ToDo list for later. And later is now.
Okay, there are those write-once-read-many-times cases with small table sizes where an optimization would not necessarily pay. But in another case, you're wasting both storage and performance due to freuquent INSERTs and DELETEs in a table which will get large.
So what would you want to have? A clear mapping between the DB and Django, where the DB stores BINARY and Django deals with hex strings of twice the length. Can't be that hard to achieve, can it?
You search the Web and find folks who want CHAR instead for VARCHAR, others who want BLOBs, and everybody seems to do it a bit differently. Finally, you end up at Writing custom model fields where the VARCHAR -> CHAR case is officially dealt with. So you decide to go with this information.
Starting with __init__(), db_type() and to_python(), you notice that to_python() gets rarely called and add __metaclass__ = models.SubfieldBase only to figure that Django now calls to_python() even if it has done so before. The other suggestions on the page suddenly start to make more sense to you, so you're going to wrap your data in a class, such that you can protect it from repeated calls to to_python(). You also follow the suggestion to Put a __str__() or __unicode__() method on the class you're wrapping up as a field and implement get_prep_value().
While the resulting code does not do what you expect, one thing you notice is that get_prep_value() never gets called so far, so you're removing it for now. What you do figure is that Django consistently appears to get a str from the DB and a unicode from the admin, which is cool, and end up with something like this (boiled down to essentials, really).
class MyHexWrappeer(object):
def __init__(self, hexstr):
self.hexstr = hexstr
def __len__(self):
return len(self.hexstr)
def __str__(self):
return self.hexstr
class MyHexField(models.CharField):
__metaclass__ = models.SubfieldBase
def __init__(self, max_length, *args, **kwargs):
assert(max_length % 2 == 0)
self.max_length = max_length
super(MyHexField, self).__init__(max_length=max_length, *args, **kwargs)
def db_type(self, connection):
return 'binary(%s)' % (self.max_length // 2)
def to_python(self, data):
if isinstance(data, MyHexWrapper): # protect object
return data
if isinstance(data, str): # binary string from DB side
return MyHexWrapper(binascii.b2a_hex(data))
if isinstance(data, unicode): # unicode hex string from admin
return MyHexWrapper(data)
And... it won't work. The reason, of course, being that while you have found a reliable way to create MyHexWrapper objects from all sources including Django itself, the path backwards is clearly missing. From the remark above, you were thinking that Django calls str() or unicode() for admin and get_prep_value() in the direction of the DB. But if you add get_prep_value() above, it will never be called, and there you are, stuck.
That can't be, right? So you're not willing to give up easily. And suddenly you get this one nasty thought, and you're making a test, and it works. And you don't know whether you should laugh or cry.
So now you try this modification, and, believe it or not, it just works.
class MyHexWrapper(object):
def __init__(self, hexstr):
self.hexstr = hexstr
def __len__(self):
return len(self.hexstr)
def __str__(self): # called on its way to the DB
return binascii.a2b_hex(self.hexstr)
def __unicode__(self): # called on its way to the admin
return self.hexstr
It just works? Well, if you use such a field in code, like for a RESTful URL, then you'll have to make sure you have the right kind of string; that's a matter of discipline.
But then, it still only works most of the time. Because when you make such a field your primary key, then Django will call quote(getattr()) and while I found a source claiming that getattr() "nowdays" will use unicode() I can't confirm. But that's not a serious obstacle once you got this far, eh?
class MyModel((models.Model):
myhex = MyHexField(max_length=32,primary_key=True,editable=False)
# other fields
def __getattribute__(self, name):
if (name == 'myhex'):
return unicode(super(MyModel, self).__getattribute__(name))
return super(MyModel, self).__getattribute__(name)
Works like a charm. However, now you lean back and look at your solution as a whole. And you can't help to figure that it's a diversion from the documentation you referred to, that it uses undocumented or internal behavioural characteristics which you did not intend to, and that it is error-prone and shows poor usability for the developer due to the somewhat distributed nature of what you have to implement and obey.
So how can the objective be achieved in a cleaner way? Is there another level with hooks and magic in Django where this mapping should be located?
Thank you for your time.

Multiple Databases in Django 1.0.2 with custom manager

I asked this in the users group with no response so i thought I would try here.
I am trying to setup a custom manager to connect to another database
on the same server as my default mysql connection. I have tried
following the examples here and here but have had no luck. I get an empty tuple when returning
MyCustomModel.objects.all().
Here is what I have in manager.py
from django.db import models
from django.db.backends.mysql.base import DatabaseWrapper
from django.conf import settings
class CustomManager(models.Manager):
"""
This Manager lets you set the DATABASE_NAME on a per-model basis.
"""
def __init__(self, database_name, *args, **kwargs):
models.Manager.__init__(self, *args, **kwargs)
self.database_name = database_name
def get_query_set(self):
qs = models.Manager.get_query_set(self)
qs.query.connection = self.get_db_wrapper()
return qs
def get_db_wrapper(self):
# Monkeypatch the settings file. This is not thread-safe!
old_db_name = settings.DATABASE_NAME
settings.DATABASE_NAME = self.database_name
wrapper = DatabaseWrapper()
wrapper._cursor(settings)
settings.DATABASE_NAME = old_db_name
return wrapper
and here is what I have in models.py:
from django.db import models
from myproject.myapp.manager import CustomManager
class MyCustomModel(models.Model):
field1 = models.CharField(max_length=765)
attribute = models.CharField(max_length=765)
objects = CustomManager('custom_database_name')
class Meta:
abstract = True
But if I run MyCustomModel.objects.all() I get an empty list.
I am pretty new at this stuff so I am not sure if this works with
1.0.2, I am going to look into the Manager code to see if I can figure
it out but I am just wondering if I am doing something wrong here.
UPDATE:
This now in Django trunk and will be part of the 1.2 release
http://docs.djangoproject.com/en/dev/topics/db/multi-db/
You may want to speak to Alex Gaynor as he is adding MultiDB support and its pegged for possible release in Django 1.2. I'm sure he would appreciate feedback and input from those that are going to be using MultiDB. There is discussions about it in the django-developers mainling list. His MultiDB branch may even be useable, I'm not sure.
Since I guess you probably can't wait and if the MultiDB branch isn't usable, here are your options.
Follow Eric Flows method, bearing in mind that its not supported and new released of Django may break it. Also, some comments suggest its already been broken. This is going to be hacky.
Your other option would be to use a totally different database access method for one of your databases. Perhaps SQLAlchemy for one and then Django ORM. I'm going by the guess that one is likely to be more Django centric and the other is a legacy database.
To summarise. I think hacking MultiDB into Django is probably the wrong way to go unless your prepared to keep up with maintaining your hacks later on. Therefore I think another ORM or database access would give you the cleanest route as then you are not going out with supported features and at the end of the day, its all just Python.
My company has had success using multiple databases by closely following this blog post: http://www.eflorenzano.com/blog/post/easy-multi-database-support-django/
This probably isnt the answer your looking for, but its probably best if you move everything you need into the one database.