Django with Postgres and advanced constraints

Django with Postgres and advanced constraints - django

I have the following model:
class AppHistory(models.Model):
campaign = models.ForeignKey('campaigns.Campaign')
context_tenant = models.ForeignKey('campaigns.FacebookContextTenant')
start_date = models.DateField()
end_date = models.DateField(blank=True, null=True)
The backend is a Postgres database. For the above model I need the following constraints to be checked before a dataset is inserted:
A row can only be inserted if it does not overlap in the date (start_date - end_date) with an existing one with the same campaign and context_tenant
A row can only be inserted if there's none with the same campaign and context_tenant where end_date is NULL
I know there's the option to do this in Django by performing a validation.
But I'd like to make sure that even manual insertion into the database are verified.
So currently I came up with two options, database constraints and triggers. I'm not too familiar with postgres, so I'm uncertain how extensive the constraints are. Is it possible to do the above restrictions with constraints only or should I use triggers (or even something else)?

I solved the problem by using constraints, such as
EXCLUDE USING gist (campaign WITH =, daterange(start_date, end_date) WITH &&)

you can use trigger in postgresql and RAISE EXCEPTION when data is invalid to roll back transaction
when you create your trigger you can use a custom migration to create trigger on database.
with migration.RunPython

You should override save() method in your model
def save(self, *args, **kwargs):
<add your conditions here>
return super(AppHistory, self).save(*args,**kwargs)
For your first condition where you want to check uniqueness you should use,
class Meta:
unique_together = ("campaign", "context_tenant", "start_date")

Related

Maintain uniqueness on django-localized-fields

I'm trying to avoid having duplicate localized items stored in a Django-rest-framework app, django-localalized-fields package with a PostgreSQL database I can't find any way to make this work.
(https://pypi.org/project/django-localized-fields/)
I've tried writing custom duplicate detection logic in the Serializer, which works for create, but for update the localized fields become null (they are required fields, so I receive a not null constraint error). It seems to be django-localized-fields utility which is causing this problem.
The serializer runs correctly (create/update) when I'm not overriding create/update in the serializer by defining them separately.
I've also tried adding unique options to the database in the model, which does not work - duplicates are still created. Using the standard unique methods, or the method in the django-localized-fields documentation (uniqueness=['en', 'ro']).
I've also tried the UniqueTogetherValidator in Django, which also doesn't seem to support HStore/localizedfields.
I'd appreciate some help in tracking down either how to fix the update in the serializer or place a unique constraint in the database. Since django-localized-fields uses hstore in PostgreSQL it must be a common enough problem for applications using hstore to maintain uniqueness.
For those who aren't familiar, Hstore stores items as key/value pairs within a database. Here's an example of how django-localized-fields stores language data within the database:
"en"=>"english word!", "es"=>"", "fr"=>"", "frqc"=>"", "fr-ca"=>""

django-localized-fields constraint unique values only per the same language. If you want to achieve that values in a row don't collide with values in another row, you have to validate them on Django and database level.
Validation in Django
In Django you can create custom function validate_hstore_uniqueness, which is called everytime is model validated.
def validate_hstore_uniqueness(obj, field_name):
value_dict = getattr(obj, field_name)
cls = obj.__class__
values = list(value_dict.values())
# find all duplicite existing objects
duplicite_objs = cls.objects.filter(**{field_name+'__values__overlap':values})
if obj.pk:
duplicite_objs = duplicite_objs.exclude(pk=obj.pk)
if len(duplicite_objs):
# extract duplicite values
existing_values = []
for obj2 in duplicite_objs:
existing_values.extend(getattr(obj2, field_name).values())
duplicate_values = list(set(values) & set(existing_values))
# raise error for field
raise ValidationError({
field_name: ValidationError(
_('Values %(values)s already exist.'),
code='unique',
params={'values': duplicate_values}
),
})
class Test(models.Model):
slug = LocalizedField(blank=True, null=True, required=False)
def validate_unique(self, exclude=None):
super().validate_unique(exclude)
validate_hstore_uniqueness(self, 'slug')
Constraint in DB
For DB constraint you have to use constraint trigger.
def slug_uniqueness_constraint(apps, schema_editor):
print('Recreating trigger quotes.slug_uniqueness_constraint')
# define trigger
trigger_sql = """
-- slug_hstore_unique
CREATE OR REPLACE FUNCTION slug_uniqueness_constraint() RETURNS TRIGGER
AS $$
DECLARE
duplicite_count INT;
BEGIN
EXECUTE format('SELECT count(*) FROM %I.%I ' ||
'WHERE id != $1 and avals("slug") && avals($2)', TG_TABLE_SCHEMA, TG_TABLE_NAME)
INTO duplicite_count
USING NEW.id, NEW.slug;
IF duplicite_count > 0 THEN
RAISE EXCEPTION 'Duplicate slug value %', avals(NEW.slug);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS slug_uniqueness_constraint on quotes_author;
CREATE CONSTRAINT TRIGGER slug_uniqueness_constraint
AFTER INSERT OR UPDATE OF slug ON quotes_author
FOR EACH ROW EXECUTE PROCEDURE slug_uniqueness_constraint();
"""
cursor = connection.cursor()
cursor.execute(trigger_sql)
And enable it in migrations:
class Migration(migrations.Migration):
dependencies = [
('quotes', '0031_auto_20200109_1432'),
]
operations = [
migrations.RunPython(slug_uniqueness_constraint)
]
Probably is a good idea to also create GIN db index for speeding up lookups:
CREATE INDEX ON test_table using GIN (avals("slug"));

Django JOIN query without foreign key

Is there a way in Django to write a query using the ORM, not raw SQL that allows you to JOIN on another table without there being a foreign key? Looking through the documentation it appears in order for the One to One relationship to work there must be a foreign key present?
In the models below I want to run a query with a JOIN on UserActivity.request_url to UserActivityLink.url.
class UserActivity(models.Model):
id = models.IntegerField(primary_key=True)
last_activity_ip = models.CharField(max_length=45L, blank=True)
last_activity_browser = models.CharField(max_length=255L, blank=True)
last_activity_date = models.DateTimeField(auto_now_add=True)
request_url = models.CharField(max_length=255L, blank=True)
session_id = models.CharField(max_length=255L)
users_id = models.IntegerField()
class Meta:
db_table = 'user_activity'
class UserActivityLink(models.Model):
id = models.IntegerField(primary_key=True)
url = models.CharField(max_length=255L, blank=True)
url_description = models.CharField(max_length=255L, blank=True)
type = models.CharField(max_length=45L, blank=True)
class Meta:
db_table = 'user_activity_link'
The link table has a more descriptive translation of given URLs in the system, this is needed for some reporting the system will generate.
I've tried creating the foreign key from UserActivity.request_url to UserActivityLink.url but it fails with the following error: ERROR 1452: Cannot add or update a child row: a foreign key constraint fails

No, there isn't an effective way unfortunately.
The .raw() is there for this exact thing. Even if it could it probably would be a lot slower than raw SQL.
There is a blogpost here detailing how to do it with query.join() but as they themselves point out. It's not best practice.

Just reposting some related answer, so everyone could see it.
Taken from here: Most efficient way to use the django ORM when comparing elements from two lists
First problem: joining unrelated models
I'm assuming that your Model1 and Model2 are not related,
otherwise you'd be able to use Django's related objects
interface. Here are two approaches you could take:
Use extra and a SQL subquery:
Model1.objects.extra(where = ['field in (SELECT field from myapp_model2 WHERE ...)'])
Subqueries are not handled very efficiently in some databases
(notably MySQL) so this is probably not as good as #2 below.
Use a raw SQL query:
Model1.objects.raw('''SELECT * from myapp_model1
INNER JOIN myapp_model2
ON myapp_model1.field = myapp_model2.field
AND ...''')
Second problem: enumerating the result
Two approaches:
You can enumerate a query set in Python using the built-in enumerate function:
enumerate(Model1.objects.all())
You can use the technique described in this answer to do the enumeration in MySQL. Something like this:
Model1.objects.raw('''SELECT *, #row := #row + 1 AS row
FROM myapp_model1
JOIN (SELECT #row := 0) rowtable
INNER JOIN myapp_model2
ON myapp_model1.field = myapp_model2.field
AND ...''')

The Django ForeignKey is different from SQL ForeignKey. Django ForeignKey just represent a relation, it can specify whether to use database constraints.
Try this:
request_url = models.ForeignKey(UserActivityLink, to_field='url_description', null=True, on_delete=models.SET_NULL, db_constraint=False)
Note that the db_constraint=False is required, without it Django will build a SQL like:
ALTER TABLE `user_activity` ADD CONSTRAINT `xxx` FOREIGN KEY (`request_url`) REFERENCES `user_activity_link` (`url_description`);"
I met the same problem, after a lot of research, I found the above method.
Hope it helps.

django prefetch_related with filter

models.py:
class Ingredient(models.Model):
_est_param = None
param = models.ManyToManyField(Establishment, blank=True, null=True, related_name='+', through='IngredientParam')
def est_param(self, establishment):
if not self._est_param:
self._est_param, created = self.ingredientparam_set\
.get_or_create(establishment=establishment)
return self._est_param
class IngredientParam(models.Model):
#ingredient params
active = models.BooleanField(default=False)
ingredient = models.ForeignKey(Ingredient)
establishment = models.ForeignKey(Establishment)
I need to fetch all Ingredient with parametrs for Establishment. First I fetch Ingredients.objects.all() and use all params like Ingredients.objects.all()[0].est_param(establishment).active. How I can use django 1.4 prefetch_related to make less sql queries? May be I can use other way to store individual Establishment properties for Ingredient?

Django 1.7 adds the Prefetch object you can put into prefetch_related. It allows you to specify a queryset which should provide the filtering. I'm having some problems with it at the moment for getting a singular (latest) entry from a list, but it seems to work very well when trying to get all the related entries.
You could also checkout django-prefetch which is part of this question which does not seem a duplicate of this question because of the vastly different wording.

The following code would fetch all the ingredients and their parameters in 2 queries:
ingredients = Ingredients.objects.all().prefetch_related('ingredientparam_set')
You could then access the parameters you're interested in without further database queries.

UPDATE in raw sql does not hit all records although they meet the criteria

I am trying to update several records when I hit the save button in the admin with a raw sql which is located in models.py (def save(self, *args, **kwargs)
The raw sql is like this as a prototype
cursor=connection.cursor()
cursor.execute("UPDATE sales_ordered_item SET oi_delivery = %s WHERE oi_order_id = %s", ['2011-05-29', '1105212105'])
Unfortunately it does not update all records which meet the criteria. Only one and sometimes more but never all.
With the SQLite Manager and the following SQL everything works great and all the records get updated:
UPDATE sales_ordered_item
SET oi_delivery = '2011-05-29'
WHERE oi_order_id = '1105212105'
I was thinking of using a manager to update the table but I have no idea how this would work when not using static data like '2011-05-29'. Anyways, it would be great to understand in the first place how to hit all records with the raw sql.
Any recommendations how to solve the problems in a different way are also highly appreciated
Here ist the code which I stripped a little to keep it short
# Orders of the customers
class Order(models.Model):
"""
Defines the order data incl. payment, shipping and delivery
"""
# Main Data
o_customer = models.ForeignKey(Customer, related_name='customer',
verbose_name=_u'Customer'), help_text=_(u'Please select the related Customer'))
o_id = models.CharField(_(u'Order ID'), max_length=10, primary_key=True,
help_text=_(u'ID has the format YYMMDDHHMM'))
o_date = models.DateField(_(u'created'))
and more...
# Order Item
class Ordered_item(models.Model):
"""
Defines the ordered item to which order it belongs, pricing is decoupled from the
catalogue to be free of any changes in the pricing. Pricing and description is copied
from the item catalogue as a proposal and can be altered
"""
oi_order = models.ForeignKey(Order, related_name='Order', verbose_name=_(u'Order ID'))
oi_pos = models.CharField(_('Position'), max_length=2, default='01')
oi_quantity = models.PositiveIntegerField(_('Quantity'), default=1)
# Date of the delivery to determine the status of the item: ordered or already delivered
oi_delivery = models.DateField(_(u'Delivery'), null=True, blank=True)
and more ...
def save(self, *args, **kwargs):
# does not hit all records, use static values for test purposes
cursor=connection.cursor()
cursor.execute("UPDATE sales_ordered_item SET oi_delivery = %s WHERE oi_order_id = %s", ['2011-05-29', '1105212105'])
super(Ordered_item, self).save(*args, **kwargs)

This is probably happening because you are not commiting the transaction (See https://docs.djangoproject.com/en/dev/topics/db/sql/#executing-custom-sql-directly)
Add these lines after your cursor.execute:
from django.db import transaction
transaction.commit_unless_managed()

You asked for a manager method.
SalesOrderedItem.objects.filter(oi_order='1105212105').update(oi_delivery='2011-05-29')
should do the job for you!
Edit:
I assume that you have two models (I am guessing this code from your raw SQL):
class OiOrder(models.Model):
pass
class SalesOrderedItem(models.Model):
oi_order = models.ForeignKey(OiOrder)
oi_delivery = models.DateField()
So:
SalesOrderedItem.objects.filter(oi_order='1105212105')
gives you all SalesOrderedItem which have a oi_order of 1105212105.
... update(oi_delivery='2011-05-29')
The update method updates all oi_delivery attributes.

Django model manager live-object issues

I am using Django. I am having a few issues with caching of QuerySets for news/category models:
class Category(models.Model):
title = models.CharField(max_length=60)
slug = models.SlugField(unique=True)
class PublishedArticlesManager(models.Manager):
def get_query_set(self):
return super(PublishedArticlesManager, self).get_query_set() \
.filter(published__lte=datetime.datetime.now())
class Article(models.Model):
category = models.ForeignKey(Category)
title = models.CharField(max_length=60)
slug = models.SlugField(unique = True)
story = models.TextField()
author = models.CharField(max_length=60, blank=True)
published = models.DateTimeField(
help_text=_('Set to a date in the future to publish later.'))
created = models.DateTimeField(auto_now_add=True, editable=False)
updated = models.DateTimeField(auto_now=True, editable=False)
live = PublishedArticlesManager()
objects = models.Manager()
Note - I have removed some fields to save on complexity...
There are a few (related) issues with the above.
Firstly, when I query for LIVE objects in my view via Article.live.all() if I refresh the page repeatedly I can see (in MYSQL logs) the same database query being made with exactly the same date in the where clause - ie - the datetime.datetime.now() is being evaluated at compile time rather than runtime. I need the date to be evaluated at runtime.
Secondly, when I use the articles_set method on the Category object this appears to work correctly - the datetime used in the query changes each time the query is run - again I can see this in the logs. However, I am not quite sure why this works, since I don't have anything in my code to say that the articles_set query should return LIVE entries only!?
Finally, why is none of this being cached?
Any ideas how to make the correct time be used consistently? Can someone please explain why the latter setup appears to work?
Thanks
Jay
P.S - database queries below, note the date variations.
SELECT LIVE ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE `news_article`.`published` <= '2011-05-17 21:55:41' ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
SELECT LIVE ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE `news_article`.`published` <= '2011-05-17 21:55:41' ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
CATEGORY SELECT ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE (`news_article`.`published` <= '2011-05-18 21:21:33' AND `news_article`.`category_id` = 1 ) ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;
CATEGORY SELECT ARTICLES, query #1:
SELECT `news_article`.`id`, `news_article`.`category_id`, `news_article`.`title`, `news_article`.`slug`, `news_article`.`teaser`, `news_article`.`summary`, `news_article`.`story`, `news_article`.`author`, `news_article`.`published`, `news_article`.`created`, `news_article`.`updated` FROM `news_article` WHERE (`news_article`.`published` <= '2011-05-18 21:26:06' AND `news_article`.`category_id` = 1 ) ORDER BY `news_article`.`published` DESC, `news_article`.`slug` ASC;

You should check out conditional view processing.
def latest_entry(request, article_id):
return Article.objects.latest("updated").updated
#conditional(last_modified_func=latest_entry)
def view_article(request, article_id)
your view code here
This should cache the page rather than reloading a new version every time.
I suspect that if you want the now() to be processed at runtime, you should do use raw sql. I think this will solve the compile/runtime issue.
class PublishedArticlesManager(models.Manager):
def get_query_set(self):
return super(PublishedArticlesManager, self).get_query_set() \
.raw("SELECT * FROM news_article WHERE published <= CURRENT_TIMESTAMP")
Note that this returns a RawQuerySet which may differ a bit from a normal QuerySet

I have now fixed this issue. It appears the problem was that the queryset returned by Article.live.all() was being cached in my urls.py! I was using function-based generic-views:
url(r'^all/$', object_list, {
'queryset' : Article.live.all(),
}, 'news_all'),
I have now changed this to use the class-based approach, as advised in the latest Django documentation:
url(r'^all/$', ListView.as_view(
model=Article,
), name="news_all"),
This now works as expected - by specifying the model attribute rather than the queryset attribute the query is QuerySet is created at compile-time instead of runtime.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js