For some reason there needs to be a database table with fields being updated. These fields are attempt, success and failure. I thought it'd be better to do using the Django ORM, but this needs to be the way ..
Problem hower is, that I get an array of array with data in it, that I will need to parse. How would I go about when there is not yet an entry in the database for this field?
models.py
class SomeData(models.Model):
attempt = models.PositiveIntegerField(null=True)
success = models.PositiveIntegerField(null=True)
failure = models.PositiveIntegerField(null=True)
views.py
class PutSomeData(CreateOrUpdateAPIView):
model = OtherModel
def post(self, *args, **kwargs):
data = self.request.data
for k in data:
entry = OtherModel(
field1=k[0],
field2=k[1],
field3=k[2]
)
entry.save()
count = SomeData.objects.all()
if not count:
attempt, success, failure = 0, 0, 0
data = SomeData(
attempt=attempt,
success=success,
failure=failure
)
data.save()
else:
data = SomeData.objects.last()
if 'attempt' in k[2]:
data.attempt + 1
elif 'success' in k[2]:
data.success + 1
else:
data.failure + 1
data.save()
I was thinking something like this for now, but this is ofcourse a stupid thing to do, and besides that it will always skip the first one in the array and is thus inaccurate. This is just something I have for now, but I do not know how to make this better and more elegant. Any thoughts?
Edit: to be a bit more clear: problem is that there are no initial fields for now, otherwise i could just increment the fields like i also do, but now I have to look if there is an entry already (it will otherwise complain about NoneType not having 'attempt' ofcourse)
You could create a migration with initial data, so that you'll be sure the entry exists.
Also, for a table with global values that is going to have only one row I would suggest using table with rows key and value (or similar), so that when you'll need to add new global values you won't need to do a schema migration.
Related
I have a simple Task model:
class Task(models.Model):
name = models.CharField(max_length=255)
order = models.IntegerField(db_index=True)
And a simple task_create view:
def task_create(request):
name = request.POST.get('name')
order = request.POST.get('order')
Task.objects.filter(order__gte=order).update(order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
return HttpResponse(new_task.id)
View shifts existing tasks that goes after newly created by + 1, then creates a new one.
And there are lots of users of this method, and I suppose something will go wrong one day with ordering because update and create definitely should be performed together.
So, I just want to be shure, will it be enough to avoid any data corruptions:
from django.db import transaction
def task_create(request):
name = request.POST.get('name')
order = request.POST.get('order')
with transaction.atomic():
Task.objects.select_for_update().filter(order__gte=order).update(order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
return HttpResponse(new_task.id)
1) Probably, something more should be done in task creation line like select_for_update before filter of existing Task.objects?
2) Does it matter where return HttpResponse() is located? Inside transaction block or outside?
Big thx
1) Probably, something more should be done in task creation line like select_for_update before filter of existing Task.objects?
No - what you have currently looks fine and should work the way you want it to.
2) Does it matter where return HttpResponse() is located? Inside transaction block or outside?
Yes, it does matter. You need to return a response to the client regardless of whether your transaction was successful or not - so it definitely needs to be outside of the transaction block. If you did it inside the transaction, the client would get a 500 Server Error if the transaction failed.
However if the transaction fails, then you will not have a new task ID and cannot return that in your response. So you probably need to return different responses depending on whether the transaction succeeds, e.g,:
from django.db import IntegrityError, transaction
try:
with transaction.atomic():
Task.objects.select_for_update().filter(order__gte=order).update(
order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
except IntegrityError:
# Transaction failed - return a response notifying the client
return HttpResponse('Failed to create task, please try again!')
# If it succeeded, then return a normal response
return HttpResponse(new_task.id)
You could also try to change your model so you don't need to update so many other rows when inserting a new one.
For example, you could try something resembling a double-linked list.
(I used long explicit names for fields and variables here).
# models.py
class Task(models.Model):
name = models.CharField(max_length=255)
task_before_this_one = models.ForeignKey(
Task,
null=True,
blank=True,
related_name='task_before_this_one_set')
task_after_this_one = models.ForeignKey(
Task,
null=True,
blank=True,
related_name='tasks_after_this_one_set')
Your task at the top of the queue would be the one that has the field task_before_this_one set to null. So to get the first task of the queue:
# these will throw exceptions if there are many instances
first_task = Task.objects.get(task_before_this_one=None)
last_task = Task.objects.get(task_after_this_one=None)
When inserting a new instance, you just need to know after which task it should be placed (or, alternatively, before which task). This code should do that:
def task_create(request):
new_task = Task.objects.create(
name=request.POST.get('name'))
task_before = get_object_or_404(
pk=request.POST.get('task_before_the_new_one'))
task_after = task_before.task_after_this_one
# modify the 2 other tasks
task_before.task_after_this_one = new_task
task_before.save()
if task_after is not None:
# 'task_after' will be None if 'task_before' is the last one in the queue
task_after.task_before_this_one = new_task
task_after.save()
# update newly create task
new_task.task_before_this_one = task_before
new_task.task_after_this_one = task_after # this could be None
new_task.save()
return HttpResponse(new_task.pk)
This method only updates 2 other rows when inserting a new row. You might still want to wrap the whole method in a transaction if there is really high concurrency in your app, but this transaction will only lock up to 3 rows, not all the others as well.
This approach might be of use to you if you have a very long list of tasks.
EDIT: how to get an ordered list of tasks
This can not be done at the database level in a single query (as far as I know), but you could try this function:
def get_ordered_task_list():
# get the first task
aux_task = Task.objects.get(task_before_this_one=None)
task_list = []
while aux_task is not None:
task_list.append(aux_task)
aux_task = aux_task.task_after_this_one
return task_list
As long as you only have a few hundered tasks, this operation should not take that much time so that it impacts the response time. But you will have to try that out for yourself, in your environment, your database, your hardware.
While creating a front end for a Django module I faced the following problem inside Django core:
In order to display a link to the next/previous object from a model query, we can use the extra-instance-methods of a model instance: get_next_by_FIELD() or get_previous_by_FIELD(). Where FIELD is a model field of type DateField or DateTimeField.
Lets explain it with an example
from django.db import models
class Shoe(models.Model):
created = models.DateTimeField(auto_now_add=True, null=False)
size = models.IntegerField()
A view to display a list of shoes, excluding those where size equals 4:
def list_shoes(request):
shoes = Shoe.objects.exclude(size=4)
return render_to_response(request, {
'shoes': shoes
})
And let the following be a view to display one shoe and the corresponding
link to the previous and next shoe.
def show_shoe(request, shoe_id):
shoe = Shoe.objects.get(pk=shoe_id)
prev_shoe = shoe.get_previous_by_created()
next_shoe = shoe.get_next_by_created()
return render_to_response('show_shoe.html', {
'shoe': shoe,
'prev_shoe': prev_shoe,
'next_shoe': next_shoe
})
Now I have the situation that the show_shoe view displays the link to the previous/next regardless of the shoes size. But I actually wanted just shoes whose size is not 4.
Therefore I tried to use the **kwargs argument of the get_(previous|next)_by_created() methods to filter out the unwanted shoes, as stated by the documentation:
Both of these methods will perform their queries using the default manager for the model. If you need to emulate filtering used by a custom manager, or want to perform one-off custom filtering, both methods also accept
optional keyword arguments, which should be in the format described in Field lookups.
Edit: Keep an eye on the word "should", because then also (size_ne=4) should work, but it doesn't.
The actual problem
Filtering using the lookup size__ne ...
def show_shoe(request, shoe_id):
...
prev_shoe = shoe.get_previous_by_created(size__ne=4)
next_shoe = shoe.get_next_by_created(size__ne=4)
...
... didn't work, it throws FieldError: Cannot resolve keyword 'size_ne' into field.
Then I tried to use a negated complex lookup using Q objects:
from django.db.models import Q
def show_shoe(request, shoe_id):
...
prev_shoe = shoe.get_previous_by_created(~Q(size=4))
next_shoe = shoe.get_next_by_created(~Q(size=4))
...
... didn't work either, throws TypeError: _get_next_or_previous_by_FIELD() got multiple values for argument 'field'
Because the get_(previous|next)_by_created methods only accept **kwargs.
The actual solution
Since these instance methods use the _get_next_or_previous_by_FIELD(self, field, is_next, **kwargs) I changed it to accept positional arguments using *args and passed them to the filter, like the **kwargs.
def my_get_next_or_previous_by_FIELD(self, field, is_next, *args, **kwargs):
"""
Workaround to call get_next_or_previous_by_FIELD by using complext lookup queries using
Djangos Q Class. The only difference between this version and original version is that
positional arguments are also passed to the filter function.
"""
if not self.pk:
raise ValueError("get_next/get_previous cannot be used on unsaved objects.")
op = 'gt' if is_next else 'lt'
order = '' if is_next else '-'
param = force_text(getattr(self, field.attname))
q = Q(**{'%s__%s' % (field.name, op): param})
q = q | Q(**{field.name: param, 'pk__%s' % op: self.pk})
qs = self.__class__._default_manager.using(self._state.db).filter(*args, **kwargs).filter(q).order_by('%s%s' % (order, field.name), '%spk' % order)
try:
return qs[0]
except IndexError:
raise self.DoesNotExist("%s matching query does not exist." % self.__class__._meta.object_name)
And calling it like:
...
prev_shoe = shoe.my_get_next_or_previous_by_FIELD(Shoe._meta.get_field('created'), False, ~Q(state=4))
next_shoe = shoe.my_get_next_or_previous_by_FIELD(Shoe._meta.get_field('created'), True, ~Q(state=4))
...
finally did it.
Now the question to you
Is there an easier way to handle this? Should shoe.get_previous_by_created(size__ne=4) work as expected or should I report this issue to the Django guys, in the hope they'll accept my _get_next_or_previous_by_FIELD() fix?
Environment: Django 1.7, haven't tested it on 1.9 yet, but the code for _get_next_or_previous_by_FIELD() stayed the same.
Edit: It is true that complex lookups using Q object is not part of "field lookups", it's more part of the filter() and exclude() functions instead. And I am probably wrong when I suppose that get_next_by_FIELD should accept Q objects too. But since the changes involved are minimal and the advantage to use Q object is high, I think these changes should get upstream.
tags: django, complex-lookup, query, get_next_by_FIELD, get_previous_by_FIELD
(listing tags here, because I don't have enough reputations.)
You can create custom lookup ne and use it:
.get_next_by_created(size__ne=4)
I suspect the method you've tried first only takes lookup arg for the field you're basing the get_next on. Meaning you won't be able to access the size field from the get_next_by_created() method, for example.
Edit : your method is by far more efficient, but to answer your question on the Django issue, I think everything is working the way it is supposed to. You could offer an additional method such as yours but the existing get_next_by_FIELD is working as described in the docs.
You've managed to work around this with a working method, which is OK I guess, but if you wanted to reduce the overhead, you could try a simple loop :
def get_next_by_field_filtered(obj, field=None, **kwargs):
next_obj = getattr(obj, 'get_next_by_{}'.format(field))()
for key in kwargs:
if not getattr(next_obj, str(key)) == kwargs[str(key)]:
return get_next_by_field_filtered(next_obj, field=field, **kwargs)
return next_obj
This isn't very efficient but it's one way to do what you want.
Hope this helps !
Regards,
I have a very large database (6 GB) that I would like to use Django-REST-Framework with. In particular, I have a model that has a ForeignKey relationship to the django.contrib.auth.models.User table (not so big) and a Foreign Key to a BIG table (lets call it Products). The model can be seen below:
class ShoppingBag(models.Model):
user = models.ForeignKey('auth.User', related_name='+')
product = models.ForeignKey('myapp.Product', related_name='+')
quantity = models.SmallIntegerField(default=1)
Again, there are 6GB of Products.
The serializer is as follows:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = serializers.RelatedField(many=False)
user = serializers.RelatedField(many=False)
class Meta:
model = ShoppingBag
fields = ('product', 'user', 'quantity')
So far this is great- I can do a GET on the list and individual shopping bags, and everything is fine. For reference the queries (using a query logger) look something like this:
SELECT * FROM myapp_product WHERE product_id=1254
SELECT * FROM auth_user WHERE user_id=12
SELECT * FROM myapp_product WHERE product_id=1404
SELECT * FROM auth_user WHERE user_id=12
...
For as many shopping bags are getting returned.
But I would like to be able to POST to create new shopping bags, but serializers.RelatedField is read-only. Let's make it read-write:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = serializers.PrimaryKeyRelatedField(many=False)
user = serializers.PrimaryKeyRelatedField(many=False)
...
Now things get bad... GET requests to the list action take > 5 minutes and I noticed that my server's memory jumps up to ~6GB; why?! Well, back to the SQL queries and now I see:
SELECT * FROM myapp_products;
SELECT * FROM auth_user;
Ok, so that's not good. Clearly we're doing "prefetch related" or "select_related" or something like that in order to get access to all the products; but this table is HUGE.
Further inspection reveals where this happens on Line 68 of relations.py in DRF:
def initialize(self, parent, field_name):
super(RelatedField, self).initialize(parent, field_name)
if self.queryset is None and not self.read_only:
manager = getattr(self.parent.opts.model, self.source or field_name)
if hasattr(manager, 'related'): # Forward
self.queryset = manager.related.model._default_manager.all()
else: # Reverse
self.queryset = manager.field.rel.to._default_manager.all()
If not readonly, self.queryset = ALL!!
So, I'm pretty sure that this is where my problem is; and I need to say, don't select_related here, but I'm not 100% if this is the issue or where to deal with this. It seems like all should be memory safe with pagination, but this is simply not the case. I'd appreciate any advice.
In the end, we had to simply create our own PrimaryKeyRelatedField class to override the default behavior in Django-Rest-Framework. Basically we ensured that the queryset was None until we wanted to lookup the object, then we performed the lookup. This was extremely annoying, and I hope the Django-Rest-Framework guys take note of this!
Our final solution:
class ProductField(serializers.PrimaryKeyRelatedField):
many = False
def __init__(self, *args, **kwargs):
kwarsgs['queryset'] = Product.objects.none() # Hack to ensure ALL products are not loaded
super(ProductField, self).__init__(*args, **kwargs)
def field_to_native(self, obj, field_name):
return unicode(obj)
def from_native(self, data):
"""
Perform query lookup here.
"""
try:
return Product.objects.get(pk=data)
except Product.ObjectDoesNotExist:
msg = self.error_messages['does_not_exist'] % smart_text(data)
raise ValidationError(msg)
except (TypeError, ValueError):
msg = self.error_messages['incorrect_type'] % type(data)
raise ValidationError(msg)
And then our serializer is as follows:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = ProductField()
...
This hack ensures the entire database isn't loaded into memory, but rather performs one-off selects based on the data. It's not as efficient computationally, but it also doesn't blast our server with 5 second database queries loaded into memory!
I want a primary key for my model to be unsigned. Therefore I do something like this:
class MyModel(models.Model):
id = models.PositiveIntegerField(primary_key=True)
This gets me an UNSIGNED column in the resulting MySQL table, which I want. However, I believe I will not get the automatic assigning to id each time I create a new object, will I? This seems to require the use of AutoField instead. Problem is, AutoField is signed. Is there a way to create an unsigned AutoField?
The actual type of the field is specified in the backend. In the case of MySQL, the backend is django.db.backends.mysql. This extract from django/db/backends/mysql/creation.py shows this translation:
class DatabaseCreation(BaseDatabaseCreation):
# This dictionary maps Field objects to their associated MySQL column
# types, as strings. Column-type strings can contain format strings; they'll
# be interpolated against the values of Field.__dict__ before being output.
# If a column type is set to None, it won't be included in the output.
data_types = {
'AutoField': 'integer AUTO_INCREMENT',
'BooleanField': 'bool',
'CharField': 'varchar(%(max_length)s)',
...
To change that, you should either monkey-patch this dict doing:
from django.db.backends.mysql.creation import DatabaseCreation
DatabaseCreation.data_types['AutoField'] = 'integer UNSIGNED AUTO_INCREMENT'
Or you create your own class, so you won't mess up with the other AutoFields:
from django.db.models.fields import AutoField
class UnsignedAutoField(AutoField):
def get_internal_type(self):
return 'UnsignedAutoField'
from django.db.backends.mysql.creation import DatabaseCreation
DatabaseCreation.data_types['UnsignedAutoField'] = 'integer UNSIGNED AUTO_INCREMENT'
And then create your own PKs:
id = UnsignedAutoField()
As it descends from AutoField, it will inherit all of its behavior.
Edit: Just to be clear, neither of the solutions written by myself or Simanas should be used in real world projects. I wrote this as an example in which direction should one go if they'd decided to avoid DBMS built-in way, and not as a completed model ready to be used.
I am sorry for writing an answer instead of a comment on the post made by Simanas, but I do not have high reputation to post one, and I feel it's needed as this question is pretty high ranked on 'django autofield unsigned integer' keywords.
Using his method is not reliable as it will produce an existing integer for new row if one of the previous objects gets deleted. Here's a modified one:
from django.db import IntegrityError
import re
class MyModel(models.Model):
def next_id():
try:
# Find the ID of the last object
last_row = MyModel.objects.order_by('-id')[0]
return last_row.id + 1
except IndexError:
# No objects exist in database so far
return 1
id = models.PositiveIntegerField(primary_key=True, default=next_id)
def save(self, *args, **kwargs):
while True:
try:
super(MyModel, self).save(*args, **kwargs)
break
except IntegrityError, e:
if e.args[0] == 1062:
if re.match("^Duplicate entry \'.*\' for key \'%s\'$"
% re.escape(self._meta.pk.name), e.args[1]):
self.id = next_id()
else:
raise
While this would work, it wouldn't know whether newly assigned ID was previously used for another objects (in case of deletion of newest objects?) and may lead to collisions in such cases; but it will work cross-database compared to Augusto's answer, which is MySQL specific.
Another caveat to this method is that if you have another application hooking to the same database, it'll have to provide the ID on INSERTs, as auto incremental is not done at database level.
You almost certainly don't want to do it this way.
I'm trying to write an internal API in my application without necessarily coupling it with the database.
class Product(models.Model):
name=models.CharField(max_length=4000)
price=models.IntegerField(default=-1)
currency=models.CharField(max_length=3, default='INR')
class Image(models.Model):
# NOTE -- Have changed the table name to products_images
width=models.IntegerField(default=-1)
height=models.IntegerField(default=-1)
url=models.URLField(max_length=1000, verify_exists=False)
product=models.ForeignKey(Product)
def create_product:
p=Product()
i=Image(height=100, widght=100, url='http://something/something')
p.image_set.add(i)
return p
Now, when I call create_product() Django throws up an error:
IntegrityError: products_images.product_id may not be NULL
However, if I call p.save() & i.save() before calling p.image_set.add(i) it works. Is there any way that I can add objects to a related object set without saving both to the DB first?
def create_product():
product_obj = Product.objects.create(name='Foobar')
image_obj = Image.objects.create(height=100, widght=100, url='http://something/something', product=product_obj)
return product_obj
Explanation:
Product object has to be created first and then assign it to the Image object because id and name here is required field.
I am wondering why wouldn't you not require to make a product entry in DB in first case? If there is any specific reason then i may suggest you some work around?
EDIT: Okay! i think i got you, you don't want to assign a product to an image object initially. How about creating a product field as null is equal to true.
product = models.ForeignKey(Product, null=True)
Now, your function becomes something like this:
def create_product():
image_obj = Image.objects.create(height=100, widght=100, url='http://something/something')
return image_obj
Hope it helps you?
I got same issue with #Saurabh Nanda
I am using Django 1.4.2. When I read in django, i see that
# file django/db/models/fields/related.py
def get_query_set(self):
try:
return self.instance._prefetched_objects_cache[rel_field.related_query_name()]
except (AttributeError, KeyError):
db = self._db or router.db_for_read(self.model, instance=self.instance)
return super(RelatedManager,self).get_query_set().using(db).filter(**self.core_filters)
# file django/db/models/query.py
qs = getattr(obj, attname).all()
qs._result_cache = vals
# We don't want the individual qs doing prefetch_related now, since we
# have merged this into the current work.
qs._prefetch_done = True
obj._prefetched_objects_cache[cache_name] = qs
That 's make sese, we only need to set property _prefetched_objects_cache for the object.
p = Product()
image_cached = []
for i in xrange(100):
image=Image(height=100, widght=100, url='http://something/something')
image_cached.append(image)
qs = p.images.all()
qs._result_cache = image_cached
qs._prefetch_done = True
p._prefetched_objects_cache = {'images': qs}
Your problem is that the id isn't set by django, but by the database (it's represented in the database by an auto-incremented field), so until it's saved there's no id. More about this in the documentation.
I can think of three possible solutions:
Set a different field of your Image model as the primary key (documented here).
Set a different field of your Production model as the foreign key (documented here).
Use django's database transactions API (documented here).