Doctrine - QueryBuilder, select query with where parameters: entities or ids? - doctrine-orm

I have a Product entity and a Shop entity.
A shop can have 0 to n products and a product can be in only one shop.
The product entity table is thus refering to the Shop entity through a shop_id table field.
When querying for the products of a given shop using doctrine query builder, we can do this:
$products = $this->getDoctrine()->getRepository('MyBundle:Product')
->createQueryBuilder('p')
->where('p.shop = :shop')
->setParameter('shop', $shop) // here we pass a shop object
->getQuery()->getResult();
or this:
$products = $this->getDoctrine()->getRepository('MyBundle:Product')
->createQueryBuilder('p')
->where('p.shop = :shopId')
->setParameter('shopId', $shopId) // we pass directly the shop id
->getQuery()->getResult();
And the both seem to work... I'm thus wondering: can we always pass directly entity ids instead of entity instances in such cases (ie: on a doctrine entity field that refers to another entity)?
I initially thought that only the first example would work...

According to the Doctrine documentation, you can pass the object to setParameter() if the object is managed.
extract from the documentation:
Calling setParameter() automatically infers which type you are setting as value. This works for integers, arrays of strings/integers, DateTime instances and for managed entities.
for more information, please see:
http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/query-builder.html#binding-parameters-to-your-query

I might be wrong, but I think that if you pass object instance instead of id number, Doctrine will automatically call $instance->getId() making your two queries the same when translated into SQL (even DQL).

Related

How to populate table based on the condition of another table's column value in Django

I am kinda new in django and I am stuck.
I am building and app to store equipment registry. I have done a model for equipment list and i have a status values as "available", "booked", "maintenance"
I also have a model for all equipment that are not available. not in my html in the "not available registry" i want to show only details of equipment in the list that are marked as "booked" and "maintenance"
There are three ways of doing this.
To filter out objects of the model that are marked as "booked" or "maintenance" you can use complex lookups with Q objects. It allows you to filter objects with OR statement. Here you need to find objects that have status set to "booked" OR to "maintenance". The query should look like this:
from django.db.models import Q
Equipment.objects.filter(Q(status='booked') | Q(status='maintetnance'))
Second way of doing this is by using __in statement to filter objects that you need:
not_available_status = ['booked', 'maintenance']
Equipment.objects.filter(status__in=not_available_status)
And final way is to exclude objects that you don't need:
Equipment.objects.exclude(status='available')

Django Effecient way to Perform Query on M2M

class A(models.Model)
results = models.TextField()
class B(models.Model)
name = models.CharField(max_length=20)
res = models.ManyToManyField(A)
Let's suppose we have above 2 models. A model has millions of objects.
I would like to know what would be the best efficient/fastest way to get all the results objects of a particular B object.
Let's suppose we have to retrieve all results for object number 5 of B
Option 1 : A.objects.filter(b__id=5)
(OR)
Option 2 : B.objects.get(id=5).res.all()
Option 1: My Question is filtering by id on A model objects would take lot of time? since there are millions of A model objects.
Option 2: Question: does res field on B model stores the id value of A model objects?
The reason why I'm assuming the option 2 would be a faster way since it stores the reference of A model objects & directly getting those object values first and making the second query to fetch the results. whereas in the first option filtering by id or any other field would take up a lot of time
The first expression will result in one database query. Indeed, it will query with:
SELECT a.*
FROM a
INNER JOIN a_b ON a_b.a_id = a.id
WHERE a_b.b_id = 5
The second expression will result in two queries. Indeed, first Django will query to fetch that specific B object with a query like:
SELECT b.*
FROM b
WHERE b.id = 5
then it will make exactly the same query to retrieve the related A objects.
But retrieving the A object is here not necessary (unless you of course need it somewhere else). You thus make a useless database query.
My Question is filtering by id on A model objects would take lot of time? since there are millions of A model objects.
A database normally stores an index on foreign key fields. This thus means that it will filter effectively. The total number of A objects is usually not (that) relevant (since it uses a datastructure to accelerate search like a B-tree [wiki]). The wiki page has a section named An index speeds the search that explains how this works.

ndb verify entity uniqueness in transaction

I've been trying to create entities with a property which should be unique or None something similar to:
class Thing(ndb.Model):
something = ndb.StringProperty()
unique_value = ndb.StringProperty()
Since ndb has no way to specify that a property should be unique it is only natural that I do this manually like this:
def validate_unique(the_thing):
if the_thing.unique_value and Thing.query(Thing.unique_value == the_thing.unique_value).get():
raise NotUniqueException
This works like a charm until I want to do this in an ndb transaction which I use for creating/updating entities. Like:
#ndb.transactional
def create(the_thing):
validate_unique(the_thing)
the_thing.put()
However ndb seems to only allow ancestor queries, the problem is my model does not have an ancestor/parent. I could do the following to prevent this error from popping up:
#ndb.non_transactional
def validate_unique(the_thing):
...
This feels a bit out of place, declaring something to be a transaction and then having one (important) part being done outside of the transaction. I'd like to know if this is the way to go or if there is a (better) alternative.
Also some explanation as to why ndb only allows ancestor queries would be nice.
Since your uniqueness check involves a (global) query it means it's subject to the datastore's eventual consistency, meaning it won't work as the query might not detect freshly created entities.
One option would be to switch to an ancestor query, if your expected usage allows you to use such data architecture, (or some other strongly consistent method) - more details in the same article.
Another option is to use an additional piece of data as a temporary cache, in which you'd store a list of all newly created entities for "a while" (giving them ample time to become visible in the global query) which you'd check in validate_unique() in addition to those from the query result. This would allow you to make the query outside the transaction and only enter the transaction if uniqueness is still possible, but the ultimate result is the manual check of the cache, inside the transaction (i.e. no query inside the transaction).
A 3rd option exists (with some extra storage consumption as the price), based on the datastore's enforcement of unique entity IDs for a certain entity model with the same parent (or no parent at all). You could have a model like this:
class Unique(ndb.Model): # will use the unique values as specified entity IDs!
something = ndb.BooleanProperty(default=False)
which you'd use like this (the example uses a Unique parent key, which allows re-using the model for multiple properties with unique values, you can drop the parent altogether if you don't need it):
#ndb.transactional
def create(the_thing):
if the_thing.unique_value:
parent_key = get_unique_parent_key()
exists = Unique.get_by_id(the_thing.unique_value, parent=parent_key)
if exists:
raise NotUniqueException
Unique(id=the_thing.unique_value, parent=parent_key).put()
the_thing.put()
def get_unique_parent_key():
parent_id = 'the_thing_unique_value'
parent_key = memcache.get(parent_id)
if not parent_key:
parent = Unique.get_by_id(parent_id)
if not parent:
parent = Unique(id=parent_id)
parent.put()
parent_key = parent.key
memcache.set(parent_id, parent_key)
return parent_key

How do I use django's Q with django taggit?

I have a Result object that is tagged with "one" and "two". When I try to query for objects tagged "one" and "two", I get nothing back:
q = Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
print len(q)
# prints zero, was expecting 1
Why does it not work with Q? How can I make it work?
The way django-taggit implements tagging is essentially through a ManytoMany relationship. In such cases there is a separate table in the database that holds these relations. It is usually called a "through" or intermediate model as it connects the two models. In the case of django-taggit this is called TaggedItem. So you have the Result model which is your model and you have two models Tag and TaggedItem provided by django-taggit.
When you make a query such as Result.objects.filter(Q(tags__name="one")) it translates to looking up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has the name="one".
Trying to match for two tag names would translate to looking up up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has both name="one" AND name="two". You obviously never have that as you only have one value in a row, it's either "one" or "two".
These details are hidden away from you in the django-taggit implementation, but this is what happens whenever you have a ManytoMany relationship between objects.
To resolve this you can:
Option 1
Query tag after tag evaluating the results each time, as it is suggested in the answers from others. This might be okay for two tags, but will not be good when you need to look for objects that have 10 tags set on them. Here would be one way to do this that would result in two queries and get you the result:
# get the IDs of the Result objects tagged with "one"
query_1 = Result.objects.filter(tags__name="one").values('id')
# use this in a second query to filter the ID and look for the second tag.
results = Result.objects.filter(pk__in=query_1, tags__name="two")
You could achieve this with a single query so you only have one trip from the app to the database, which would look like this:
# create django subquery - this is not evaluated, but used to construct the final query
subquery = Result.objects.filter(pk=OuterRef('pk'), tags__name="one").values('id')
# perform a combined query using a subquery against the database
results = Result.objects.filter(Exists(subquery), tags__name="two")
This would only make one trip to the database. (Note: filtering on sub-queries requires django 3.0).
But you are still limited to two tags. If you need to check for 10 tags or more, the above is not really workable...
Option 2
Query the relationship table instead directly and aggregate the results in a way that give you the object IDs.
# django-taggit uses Content Types so we need to pick up the content type from cache
result_content_type = ContentType.objects.get_for_model(Result)
tag_names = ["one", "two"]
tagged_results = (
TaggedItem.objects.filter(tag__name__in=tag_names, content_type=result_content_type)
.values('object_id')
.annotate(occurence=Count('object_id'))
.filter(occurence=len(tag_names))
.values_list('object_id', flat=True)
)
TaggedItem is the hidden table in the django-taggit implementation that contains the relationships. The above will query that table and aggregate all the rows that refer either to the "one" or "two" tags, group the results by the ID of the objects and then pick those where the object ID had the number of tags you are looking for.
This is a single query and at the end gets you the IDs of all the objects that have been tagged with both tags. It is also the exact same query regardless if you need 2 tags or 200.
Please review this and let me know if anything needs clarification.
first of all, this three are same:
Result.objects.filter(tags__name="one", tags__name="two")
Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
Result.objects.filter(tags__name_in=["one"]).filter(tags__name_in=["two"])
i think the name field is CharField and no record could be equal to "one" and "two" at same time.
in python code the query looks like this(always false, and why you are geting no result):
from random import choice
name = choice(["abtin", "shino"])
if name == "abtin" and name == "shino":
we use Q object for implement OR or complex queries
Into the example that works you do an end on two python objects (query sets). That gets applied to any record not necessarily to the same record that has one AND two as tag.
ps: Why do you use the in filter ?
q = Result.objects.filter(tags_name_in=["one"]).filter(tags_name_in=["two"])
add .distinct() to remove duplicates if expecting more than one unique object

JPQL 2.0 - query entities based on superclass entity field

I have Entity (not MappedSuperclass) Person (with id, name, surname).
I also have Entity Employee extends Person (with other attributes, unimportant).
Inheritance Strategy is single table.
Now I want to create a namedQuery like this:
SELECT emp FROM Employee emp WHERE emp.name = ?1
In the IDE I get:
the state field path emp.name cannot be resolved to a valid type
I think the problem is that the attribute belongs to the superclass entity.
So far, I haven't found any solution other than using the TYPE operator to perform a selective query on Employee instances.
I'd like to perform the query above. Is that possible?
I'm on EclipseLink/JPA 2.0
Your JPQL seems valid. Did you try it at runtime? It could just be an issue with your IDE.
(include your code)
Person has to be #MappedSuperclass.
http://www.objectdb.com/api/java/jpa/MappedSuperclass
Furthermore, you should use named parameters, e.g. :name instead of ?...