Django custom model Manager database error - django

I'm trying to build a custom model manager, but have run into an error. The code looks like this:
class LookupManager(models.Manager):
def get_options(self, *args, **kwargs):
return [(t.key, t.value) \
for t in Lookup.objects.filter(group=args[0].upper())]
class Lookup(models.Model):
group = models.CharField(max_length=1)
key = models.CharField(max_length=1)
value = models.CharField(max_length=128)
objects = LookupManager()
(I have played around with get_options quite a lot using super() and other ways to filter the results)
When I run syncdb, I get the following error (ops_lookup being the corresponding table):
django.db.utils.DatabaseError: no such table: ops_lookup
I noticed that if I change the manager to return [] instead of a filter, then syncdb works. Also, if I've run syncdb and all the tables exist, then change the code to the above, it works as well.
How can I get Django to not expect this table to exist when running syncdb for the first time?
Update
After looking through the traceback I realised what was happening. The lookup table is meant to contain values which populate the choices of some columns in other tables. I think what happens is that the manager gets called when the other tables are created which, it seems, happens before the lookup table is created.
Is there any way to force django to create the lookup table first (short of renaming it?)

What's happening is that you're trying to access the database during module load time. For example:
class MyModel(models.Model):
name = models.CharField(max_length=255)
class OtherModel(models.Model):
some_field = models.CharField(
max_length=255,
# Next line fails on syncdb because the database table hasn't been created yet
# but the model is being queried during module load time (during class definition)
choices=[(o.pk, o.name) for o in MyModel.objects.all()]
)
This is equivalent to what you're doing because, as you've stated, you're using the manager method (transitively) to generate choices for other models.
Replacing the list comprehension with a generator expression will return an iterable, but will not evaluate the filtered queryset until the first iteration. So, this would fix the above example:
choices=((o.pk, o.name) for o in MyModel.objects.all())
Using your example, it would be:
class LookupManager(models.Manager):
def get_options(self, *args, **kwargs):
return ((t.key, t.value) for t in Lookup.objects.filter(group=args[0].upper()))
(note the use of ( and ) instead of [ and ]) (the outer ones) - that is the syntax for creating a generator expression.

Related

The ModelResource Meta field import_id_fields seems that it doesn't work as expected

I'm building a web app using the Django framework and the django-import-export package.
I would like to import data from files and want to prevent importing it twice to the DB.
For this, I used the import_id_fields when declaring the resource class, but it seems that it doesn't work as expected.
1- The first time I import the file everything is working fine and rows created in the DB.
2- The second time I import the same file, also new rows created in the DB (here is the problem, this is not supposed to happen)
3- The third time I import the same file, here I get errors and no rows added to the DB.
So I would like to know if this is normal behavior or not, and if is normal I would like to know how can I stop the import in point 2 and show the errors.
You can find below portions from the code and the error messages.
# resources.py
class OfferingResource(ModelResource):
ACCESSIONNUMBER = Field(attribute='company',
widget=ForeignKeyWidget(Company, 'accession_number'))
quarter = Field(attribute='quarter')
# other fields ...
class Meta:
model = Offering
import_id_fields = ('ACCESSIONNUMBER', 'quarter')
def before_import_row(self, row, row_number=None, **kwargs):
Company.objects.get_or_create(accession_number=row.get('ACCESSIONNUMBER'))
# models.py
class Company(models.Model):
accession_number = models.CharField(max_length=25, primary_key=True)
def __str__(self):
return self.accession_number
class Offering(models.Model):
company = models.ForeignKey(Company, on_delete=models.CASCADE, to_field='accession_number')
quarter = models.CharField(max_length=7)
# other fields ...
def __str__(self):
return f'{self.company}
<- error messages ->
1 get() returned more than one Offering -- it returned 2!
2 get() returned more than one Offering -- it returned 2!
...
22 get() returned more than one Offering -- it returned 2!
You are on the right track but your implementation is not quite right.
You are right to use import_id_fields. The fields you declare here must uniquely identify a single instance of a row object.
The object instance is identified using:
return self.get_queryset().get(**params)
where each param has a key of your attribute field name, and the value is the cleaned value taken from the import file.
Therefore, this line won't work:
import_id_fields = ('ACCESSIONNUMBER', 'quarter')
because you don't have an attribute on Offering called ACCESSIONNUMBER. To fix this, declare your field using this syntax:
industry_group_type = Field(attribute='company__accession_number', column_name='ACCESSIONNUMBER')
(you need to follow the model relation to refer to accession_number. I assume your csv import column name is ACCESSIONNUMBER).
If this combination of fields does not uniquely identify a single instance, then you will see the error:
get() returned more than one Offering -- it returned 2!

How to Update models field based on another field of same models

class PurchaseOrder(models.Model):
purchase_order_id = models.AutoField(primary_key=True)
purchase_order_number = models.CharField(unique=True)
vendor = models.ForeignKey(Vendor)
i am creating Purchase Order(po) table. when po created i have to update purchase_order_number as "PO0"+purchase_order_id ex PO0123 (123 is Primary key). so i am using def save in models to accomplish this
def save(self):
if self.purchase_order_id is not None:
self.purchase_order_number = "PO"+str(self.purchase_order_id)
return super(PurchaseOrder, self).save()
It is working fine with single creation but when i try to create bulk of data using locust(Testing tool) its giving an error duplicate entry for PurchseOrdernumber Can we modify field value in models itself some thing like this
purchase_order_number = models.CharField(unique=True,default=("PO"+self.purchase_order_id )
To be honest, I don't think it should work when you create multiple instances. Because as I can see from the code:
if self.purchase_order_id is not None:
self.purchase_order_number = "PO"+str(self.purchase_order_id)
Here purchase_order_id will be None when you are creating new instance. Also, until you call super(PurchaseOrder, self).save(), it will not generate purchase_order_id, meaning purchase_order_number will be empty.
So, what I would recommend is to not store this information in DB. Its basically the same as purchase_order_id with PO in front of it. Instead you can use a property method to get the same value. Like this:
class PurchaseOrder(models.Model):
purchase_order_id = models.AutoField(primary_key=True)
# need to remove `purchase_order_number = models.CharField(unique=True)`
...
#property
def purchase_order_number(self):
return "PO{}".format(self.purchase_order_id)
So, you can also see the purchase_order_number like this:
p = PurchaseOrder.objects.first()
p.purchase_order_number
Downside of this solution is that, you can't make any query on the property field. But I don't think it would be necessary anyway, because you can do the same query for the purchase_order_id, ie PurchaseOrder.objects.filter(purchase_order_id=1).

Getting bigint id fields when using Django and Postgres?

Djangos ORM uses a integer datatype for the automatically created ID column, but I need them to be bigint (using postgres backend). Is there a way of doing this?
Django 1.10
Use the newly added BigAutoField
A 64-bit integer, much like an AutoField except that it is guaranteed
to fit numbers from 1 to 9223372036854775807.
Older versions of django
You need to create your model like this
class MyModel(models.Model):
id = models.BigIntegerField(primary_key=True)
name = models.CharField(max_length=100)
Then after ./manage.py makemigrations has been run, open the generated migration and add the following into he operations list:
migrations.RunSQL("CREATE SEQUENCE myapp_seq"),
migrations.RunSQL("ALTER TABLE myapp_mymodel ALTER COLUMN id SET DEFAULT NEXTVAL('myapp_seq')");
Update
A valid point was raised by Daniel Roseman in the comments. In postgreql the following query works
INSERT INTO myapp_mymodel(name) values('some name');
but the following doesn't because primary keys are not null
INSERT INTO myapp_mymodel(id, name) values(null,'some name');
unfortunately it's the second form of the query that's passed through by django. This can still be solved with a bit of work.
def save(self, *args, **kwargs):
if not self.id :
cursor = connection.cursor();
cursor.execute("SELECT NEXTVAL('myapp_seq')")
id = cursor.fetchone()
self.id = id[0]
Model(MyModel,self).save(*args, **kwargs)

Is it possible to write a QuerySet method that modifies the dataset but delays evaluation (similar to prefetch_related)?

I'm working on a QuerySet class that does something similar to prefetch_related but allows the query to link data that's in an unconnected database (basically, linking records from django apps's database to records in a legacy system, using a shared unique key, something along the links of:
class UserFoo(models.Model):
''' Uses the django database & can link to User model '''
user = models.OneToOneField(User, related_name='userfoo')
foo_record = models.CharField(
max_length=32,
db_column="foo",
unique=True
) # uuid pointing to legacy db table
#property
def foo(self):
if not hasattr(self, '_foo'):
self._foo = Foo.objects.get(uuid=self.foo_record)
return self._foo
#foo.setter
def foo(self, foo_obj):
self._foo = foo_obj
and then
class Foo(models.Model):
'''Uses legacy database'''
id = models.AutoField(primary_key=True)
uuid = models.CharField(max_length=32) # uuid for Foo legacy db table
…
#property
def user(self):
if not hasattr(self, '_user'):
self._user = User.objects.get(userfoo__foo_record=self.uuid)
return self._user
#user.setter
def user(self, user_obj):
self._user = user_obj
Run normally, a query that matches 100 foos (each with, say, 1 user record) will end up requiring 101 queries: one to get the foos, and a hundred for each user record (by doing a look up for the user record by calling the user property on each food).
To get around this, I am making something similar to prefetch_related which pulls all of the matching records for a query by the key, which means I just need one additional query to get the remaining records.
My code looks something like this:
class FooWithUserQuerySet(models.query.QuerySet):
def with_foo(self):
qs = self._clone()
foo_idx = {}
for record in self.all():
foo_idx.setdefault(record.uuid, []).append(record)
users = User.objects.filter(
userfoo__foo_record__in=foo_idx.keys()
).select_related('django','relations','here')
user_idx = {}
for user in users:
user_idx[user.userfoo.foo_record] = user
for fid, frecords in foo_idx.items():
user = user_idx.get(fid)
for frecord in frecords:
if user:
setattr(frecord, 'user', user)
return qs
This works, but any extra data saved to a foo is lost if the query is later modified — that is, if the queryset is re-ordered or filtered in any way.
I would like a way to create a method that does exactly what I am doing now, but waits until the moment that adjusts whenever the query is evaluated, so that foo records always have a User record.
Some notes:
the example has been highly simplified. There are actually a lot of tables that link up to the legacy data, and so for example although there is a one-to-on relationship between Foo and User, there will be some cases where a queryset will have multiple Foo records with the same key.
the legacy database is on a different server and server platform, so I can't link the two tables using a database server itself
ideally I'd like the User data to be cached, so that even if the records are sorted or sliced I don't have to re-run the foo query a second time.
Basically, I don't know enough about the internals of how the lazy evaluation of querysets works in order to do the necessary coding. I have jumped back and forth on the source code for django.db.models.query but it really is a fairly dense read and I'm hoping someone out there who's worked with this already can offer some pointers.

Django aggregate multiple columns after arithmetic operation

I have a really strange problem with Django 1.4.4.
I have this model :
class LogQuarter(models.Model):
timestamp = models.DateTimeField()
domain = models.CharField(max_length=253)
attempts = models.IntegerField()
success = models.IntegerField()
queue = models.IntegerField()
...
I need to gather the first 20 domains with the higher sent property. The sent property is attempts - queue.
This is my request:
obj = LogQuarter.objects\
.aggregate(Sum(F('attempts')-F('queue')))\
.values('domain')\
.filter(**kwargs)\
.order_by('-sent')[:20]
I tried with extra too and it isn't working.
It's really basic SQL, I am surprised that Django can't do this.
Did someone has a solution ?
You can actually do this via subclassing some of the aggregation functionality. This requires digging in to the code to really understand, but here's what I coded up to do something similar for MAX and MIN. (Note: this code is based of Django 1.4 / MySQL).
Start by subclassing the underlying aggregation class and overriding the as_sql method. This method writes the actual SQL to the database query. We have to make sure to quote the field that gets passed in correctly and associate it with the proper table name.
from django.db.models.sql import aggregates
class SqlCalculatedSum(aggregates.Aggregate):
sql_function = 'SUM'
sql_template = '%(function)s(%(field)s - %(other_field)s)'
def as_sql(self, qn, connection):
# self.col is currently a tuple, where the first item is the table name and
# the second item is the primary column name. Assuming our calculation is
# on two fields in the same table, we can use that to our advantage. qn is
# underlying DB quoting object and quotes things appropriately. The column
# entry in the self.extra var is the actual database column name for the
# secondary column.
self.extra['other_field'] = '.'.join(
[qn(c) for c in (self.col[0], self.extra['column'])])
return super(SqlCalculatedSum, self).as_sql(qn, connection)
Next, subclass the general model aggregation class and override the add_to_query method. This method is what determines how the aggregate gets added to the underlying query object. We want to be able to pass in the field name (e.g. queue) but get the corresponding DB column name (in case it is something different).
from django.db import models
class CalculatedSum(models.Aggregate):
name = SqlCalculatedSum
def add_to_query(self, query, alias, col, source, is_summary):
# Utilize the fact that self.extra is set to all of the extra kwargs passed
# in on initialization. We want to get the corresponding database column
# name for whatever field we pass in to the "variable" kwarg.
self.extra['column'] = query.model._meta.get_field(
self.extra['variable']).db_column
query.aggregates[alias] = self.name(
col, source=source, is_summary=is_summary, **self.extra)
You can then use your new class in an annotation like this:
queryset.annotate(calc_attempts=CalculatedSum('attempts', variable='queue'))
Assuming your attempts and queue fields have those same db column names, this should generate SQL similar to the following:
SELECT SUM(`LogQuarter`.`attempts` - `LogQuarter`.`queue`) AS calc_attempts
And there you go.
I am not sure if you can do this Sum(F('attempts')-F('queue')). It should throw an error in the first place. I guess, easier approach would be to use extra.
result = LogQuarter.objects.extra(select={'sent':'(attempts-queue)'}, order_by=['-sent'])[:20]