Maintain uniqueness on django-localized-fields - django

I'm trying to avoid having duplicate localized items stored in a Django-rest-framework app, django-localalized-fields package with a PostgreSQL database I can't find any way to make this work.
(https://pypi.org/project/django-localized-fields/)
I've tried writing custom duplicate detection logic in the Serializer, which works for create, but for update the localized fields become null (they are required fields, so I receive a not null constraint error). It seems to be django-localized-fields utility which is causing this problem.
The serializer runs correctly (create/update) when I'm not overriding create/update in the serializer by defining them separately.
I've also tried adding unique options to the database in the model, which does not work - duplicates are still created. Using the standard unique methods, or the method in the django-localized-fields documentation (uniqueness=['en', 'ro']).
I've also tried the UniqueTogetherValidator in Django, which also doesn't seem to support HStore/localizedfields.
I'd appreciate some help in tracking down either how to fix the update in the serializer or place a unique constraint in the database. Since django-localized-fields uses hstore in PostgreSQL it must be a common enough problem for applications using hstore to maintain uniqueness.
For those who aren't familiar, Hstore stores items as key/value pairs within a database. Here's an example of how django-localized-fields stores language data within the database:
"en"=>"english word!", "es"=>"", "fr"=>"", "frqc"=>"", "fr-ca"=>""

django-localized-fields constraint unique values only per the same language. If you want to achieve that values in a row don't collide with values in another row, you have to validate them on Django and database level.
Validation in Django
In Django you can create custom function validate_hstore_uniqueness, which is called everytime is model validated.
def validate_hstore_uniqueness(obj, field_name):
value_dict = getattr(obj, field_name)
cls = obj.__class__
values = list(value_dict.values())
# find all duplicite existing objects
duplicite_objs = cls.objects.filter(**{field_name+'__values__overlap':values})
if obj.pk:
duplicite_objs = duplicite_objs.exclude(pk=obj.pk)
if len(duplicite_objs):
# extract duplicite values
existing_values = []
for obj2 in duplicite_objs:
existing_values.extend(getattr(obj2, field_name).values())
duplicate_values = list(set(values) & set(existing_values))
# raise error for field
raise ValidationError({
field_name: ValidationError(
_('Values %(values)s already exist.'),
code='unique',
params={'values': duplicate_values}
),
})
class Test(models.Model):
slug = LocalizedField(blank=True, null=True, required=False)
def validate_unique(self, exclude=None):
super().validate_unique(exclude)
validate_hstore_uniqueness(self, 'slug')
Constraint in DB
For DB constraint you have to use constraint trigger.
def slug_uniqueness_constraint(apps, schema_editor):
print('Recreating trigger quotes.slug_uniqueness_constraint')
# define trigger
trigger_sql = """
-- slug_hstore_unique
CREATE OR REPLACE FUNCTION slug_uniqueness_constraint() RETURNS TRIGGER
AS $$
DECLARE
duplicite_count INT;
BEGIN
EXECUTE format('SELECT count(*) FROM %I.%I ' ||
'WHERE id != $1 and avals("slug") && avals($2)', TG_TABLE_SCHEMA, TG_TABLE_NAME)
INTO duplicite_count
USING NEW.id, NEW.slug;
IF duplicite_count > 0 THEN
RAISE EXCEPTION 'Duplicate slug value %', avals(NEW.slug);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS slug_uniqueness_constraint on quotes_author;
CREATE CONSTRAINT TRIGGER slug_uniqueness_constraint
AFTER INSERT OR UPDATE OF slug ON quotes_author
FOR EACH ROW EXECUTE PROCEDURE slug_uniqueness_constraint();
"""
cursor = connection.cursor()
cursor.execute(trigger_sql)
And enable it in migrations:
class Migration(migrations.Migration):
dependencies = [
('quotes', '0031_auto_20200109_1432'),
]
operations = [
migrations.RunPython(slug_uniqueness_constraint)
]
Probably is a good idea to also create GIN db index for speeding up lookups:
CREATE INDEX ON test_table using GIN (avals("slug"));

Related

django DateField model -- unable to find the date differences

I'm unable to find the difference between two dates in my form.
models.py:
class Testing(models.Model):
Planned_Start_Date = models.DateField()
Planned_End_Date = models.DateField()
Planned_Duration = models.IntegerField(default=Planned_Start_Date - Planned_End_Date)
difference between the date has to calculated and it should stored in the database but It doesn't works
default is a callable function that is just used on the class level, so you can't use it to do what you want. You should override the model's save() method (or better, implement a pre_save signal handler to populate the field just before the object is saved:
def save(self, **kwargs):
self.Planned_Duration = self.Planned_End_Date - self.Planned_Start_Date
super().save(**kwargs)
But why do you save a computed property to the database? This column is unnecessary. Both for querying (you can easily use computed queries on the start and end date) as for retrieving, you're wasting db space.
# if you need the duration just define a property
#property
def planned_duration(self):
return self.Planned_End_Date - self.Planned_Start_Date
# if you need to query tasks which last more than 2 days
Testing.objects.filter(Planned_End_Date__gt=F('Planned_Start_Date') + datetime.timedelta(days=2))
Note: Python conventions would recommend you name your fields using snake_case (planned_duration, planned_end_date, planned_start_date). Use CamelCase for classes (TestingTask). Don't mix the two.

Django with Postgres and advanced constraints

I have the following model:
class AppHistory(models.Model):
campaign = models.ForeignKey('campaigns.Campaign')
context_tenant = models.ForeignKey('campaigns.FacebookContextTenant')
start_date = models.DateField()
end_date = models.DateField(blank=True, null=True)
The backend is a Postgres database. For the above model I need the following constraints to be checked before a dataset is inserted:
A row can only be inserted if it does not overlap in the date (start_date - end_date) with an existing one with the same campaign and context_tenant
A row can only be inserted if there's none with the same campaign and context_tenant where end_date is NULL
I know there's the option to do this in Django by performing a validation.
But I'd like to make sure that even manual insertion into the database are verified.
So currently I came up with two options, database constraints and triggers. I'm not too familiar with postgres, so I'm uncertain how extensive the constraints are. Is it possible to do the above restrictions with constraints only or should I use triggers (or even something else)?
I solved the problem by using constraints, such as
EXCLUDE USING gist (campaign WITH =, daterange(start_date, end_date) WITH &&)
you can use trigger in postgresql and RAISE EXCEPTION when data is invalid to roll back transaction
when you create your trigger you can use a custom migration to create trigger on database.
with migration.RunPython
You should override save() method in your model
def save(self, *args, **kwargs):
<add your conditions here>
return super(AppHistory, self).save(*args,**kwargs)
For your first condition where you want to check uniqueness you should use,
class Meta:
unique_together = ("campaign", "context_tenant", "start_date")

Getting bigint id fields when using Django and Postgres?

Djangos ORM uses a integer datatype for the automatically created ID column, but I need them to be bigint (using postgres backend). Is there a way of doing this?
Django 1.10
Use the newly added BigAutoField
A 64-bit integer, much like an AutoField except that it is guaranteed
to fit numbers from 1 to 9223372036854775807.
Older versions of django
You need to create your model like this
class MyModel(models.Model):
id = models.BigIntegerField(primary_key=True)
name = models.CharField(max_length=100)
Then after ./manage.py makemigrations has been run, open the generated migration and add the following into he operations list:
migrations.RunSQL("CREATE SEQUENCE myapp_seq"),
migrations.RunSQL("ALTER TABLE myapp_mymodel ALTER COLUMN id SET DEFAULT NEXTVAL('myapp_seq')");
Update
A valid point was raised by Daniel Roseman in the comments. In postgreql the following query works
INSERT INTO myapp_mymodel(name) values('some name');
but the following doesn't because primary keys are not null
INSERT INTO myapp_mymodel(id, name) values(null,'some name');
unfortunately it's the second form of the query that's passed through by django. This can still be solved with a bit of work.
def save(self, *args, **kwargs):
if not self.id :
cursor = connection.cursor();
cursor.execute("SELECT NEXTVAL('myapp_seq')")
id = cursor.fetchone()
self.id = id[0]
Model(MyModel,self).save(*args, **kwargs)

Django aggregate multiple columns after arithmetic operation

I have a really strange problem with Django 1.4.4.
I have this model :
class LogQuarter(models.Model):
timestamp = models.DateTimeField()
domain = models.CharField(max_length=253)
attempts = models.IntegerField()
success = models.IntegerField()
queue = models.IntegerField()
...
I need to gather the first 20 domains with the higher sent property. The sent property is attempts - queue.
This is my request:
obj = LogQuarter.objects\
.aggregate(Sum(F('attempts')-F('queue')))\
.values('domain')\
.filter(**kwargs)\
.order_by('-sent')[:20]
I tried with extra too and it isn't working.
It's really basic SQL, I am surprised that Django can't do this.
Did someone has a solution ?
You can actually do this via subclassing some of the aggregation functionality. This requires digging in to the code to really understand, but here's what I coded up to do something similar for MAX and MIN. (Note: this code is based of Django 1.4 / MySQL).
Start by subclassing the underlying aggregation class and overriding the as_sql method. This method writes the actual SQL to the database query. We have to make sure to quote the field that gets passed in correctly and associate it with the proper table name.
from django.db.models.sql import aggregates
class SqlCalculatedSum(aggregates.Aggregate):
sql_function = 'SUM'
sql_template = '%(function)s(%(field)s - %(other_field)s)'
def as_sql(self, qn, connection):
# self.col is currently a tuple, where the first item is the table name and
# the second item is the primary column name. Assuming our calculation is
# on two fields in the same table, we can use that to our advantage. qn is
# underlying DB quoting object and quotes things appropriately. The column
# entry in the self.extra var is the actual database column name for the
# secondary column.
self.extra['other_field'] = '.'.join(
[qn(c) for c in (self.col[0], self.extra['column'])])
return super(SqlCalculatedSum, self).as_sql(qn, connection)
Next, subclass the general model aggregation class and override the add_to_query method. This method is what determines how the aggregate gets added to the underlying query object. We want to be able to pass in the field name (e.g. queue) but get the corresponding DB column name (in case it is something different).
from django.db import models
class CalculatedSum(models.Aggregate):
name = SqlCalculatedSum
def add_to_query(self, query, alias, col, source, is_summary):
# Utilize the fact that self.extra is set to all of the extra kwargs passed
# in on initialization. We want to get the corresponding database column
# name for whatever field we pass in to the "variable" kwarg.
self.extra['column'] = query.model._meta.get_field(
self.extra['variable']).db_column
query.aggregates[alias] = self.name(
col, source=source, is_summary=is_summary, **self.extra)
You can then use your new class in an annotation like this:
queryset.annotate(calc_attempts=CalculatedSum('attempts', variable='queue'))
Assuming your attempts and queue fields have those same db column names, this should generate SQL similar to the following:
SELECT SUM(`LogQuarter`.`attempts` - `LogQuarter`.`queue`) AS calc_attempts
And there you go.
I am not sure if you can do this Sum(F('attempts')-F('queue')). It should throw an error in the first place. I guess, easier approach would be to use extra.
result = LogQuarter.objects.extra(select={'sent':'(attempts-queue)'}, order_by=['-sent'])[:20]

Django custom model Manager database error

I'm trying to build a custom model manager, but have run into an error. The code looks like this:
class LookupManager(models.Manager):
def get_options(self, *args, **kwargs):
return [(t.key, t.value) \
for t in Lookup.objects.filter(group=args[0].upper())]
class Lookup(models.Model):
group = models.CharField(max_length=1)
key = models.CharField(max_length=1)
value = models.CharField(max_length=128)
objects = LookupManager()
(I have played around with get_options quite a lot using super() and other ways to filter the results)
When I run syncdb, I get the following error (ops_lookup being the corresponding table):
django.db.utils.DatabaseError: no such table: ops_lookup
I noticed that if I change the manager to return [] instead of a filter, then syncdb works. Also, if I've run syncdb and all the tables exist, then change the code to the above, it works as well.
How can I get Django to not expect this table to exist when running syncdb for the first time?
Update
After looking through the traceback I realised what was happening. The lookup table is meant to contain values which populate the choices of some columns in other tables. I think what happens is that the manager gets called when the other tables are created which, it seems, happens before the lookup table is created.
Is there any way to force django to create the lookup table first (short of renaming it?)
What's happening is that you're trying to access the database during module load time. For example:
class MyModel(models.Model):
name = models.CharField(max_length=255)
class OtherModel(models.Model):
some_field = models.CharField(
max_length=255,
# Next line fails on syncdb because the database table hasn't been created yet
# but the model is being queried during module load time (during class definition)
choices=[(o.pk, o.name) for o in MyModel.objects.all()]
)
This is equivalent to what you're doing because, as you've stated, you're using the manager method (transitively) to generate choices for other models.
Replacing the list comprehension with a generator expression will return an iterable, but will not evaluate the filtered queryset until the first iteration. So, this would fix the above example:
choices=((o.pk, o.name) for o in MyModel.objects.all())
Using your example, it would be:
class LookupManager(models.Manager):
def get_options(self, *args, **kwargs):
return ((t.key, t.value) for t in Lookup.objects.filter(group=args[0].upper()))
(note the use of ( and ) instead of [ and ]) (the outer ones) - that is the syntax for creating a generator expression.