Efficient way of joining two query sets without foreign key - django

I know django doesn't allow joining without a foreign key relation and I can't specify a foreign key because there are entries in one table that are not in the other (populated using pyspark). I need an efficient way to query the following:
Let's say I have the following tables:
Company | Product | Total # Users | Total # Unique Users
and
Company | Product | # Licenses | # Estimated Users
I would like to join such that I can display a table like this on the frontend
Company View
Product|Total # Users|Total # Unique Users|#Licenses|# Estimated Users|
P1 | Num | Num | Num | Num |
P2 | Num | Num | Num | Num |
Currently loop through each product and perform a query (way too slow and inefficient) to populate a dictionary of lists
Way too inefficient

I'm not quite getting why you can't do a Foreign key in this situation, but if you can implement your query in a sql statement I would look at Q objects. See "Complex Lookups with Q Objects" in the documentation.
https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects

Related

Power BI data model design relationships - direct active relationship would introduce ambiguity between the tables

Can anyone offer some modeling & relationship advise in Power BI?
I have two Customer tables at different grains that I am trying to relate to a Customer rollup group ('dimCustomers').
The two customers tables ('dimBillTierCustomer' and 'dimCustomerMeter') are individually related to my fact table ('factSummaryTicket'). These two relationships work individually, but I want them to be aware of the relationship they each have to 'dimCustomers', so I can use Customers to filter both tables in the report.
When I relate each of them, I get an error message on the second relationship.
You can’t create a direct active relationship between 'dimCustomerMeter' and 'dimCustomers' because that would introduce ambiguity between the tables 'dimCustomers' and 'factSummaryTicket'. To make this relationship active, deactivate or delete one of the relationships between 'dimCustomers' and 'factSummaryTicket' first.
Screenshots below show sample data, table relations, and the error message.
Bill Tier is for Customer pricing rules. Customer Meter is customer locations hierarchy. Customer should filter both of these tables.
Table Relations
+------------------------------------------------------------------------------------------+----------------+-----------------+---------------+-----------+
| Relation (From : To) | CrossFiltering | FromCardinality | ToCardinality | IsActive |
+------------------------------------------------------------------------------------------+----------------+-----------------+---------------+-----------+
| [gopherMeterId].[factSummaryTicket] ==> : <== [bisonMeterId].[dimCustomerMeter] | OneDirection | Many | One | TRUE |
+------------------------------------------------------------------------------------------+----------------+-----------------+---------------+-----------+
| [CustBillTierKey].[factSummaryTicket] ==> : <== [CustTierKey].[dimBillTierCustomer] | OneDirection | Many | One | TRUE |
+------------------------------------------------------------------------------------------+----------------+-----------------+---------------+-----------+
| [eticketOperatorId].[dimCustomerMeter] ==> : <== [eticketOperatorId].[dimCustomers] | OneDirection | Many | One | **FALSE** |
+------------------------------------------------------------------------------------------+----------------+-----------------+---------------+-----------+
| [CustKey].[dimBillTierCustomer] ==> : <== [eticketOperatorId].[dimCustomers] | OneDirection | Many | One | TRUE |
+------------------------------------------------------------------------------------------+----------------+-----------------+---------------+-----------+
Sample Data
Table Diagram
One simple change gave me the functionality I needed. Without having to relate ‘dimCustomerMeter’ and ‘dimBillTierCustomer’. The solution was to enable ‘Bi-Directional’ instead of ‘Single’.
Bi Directional Relationship

How to design a DynamoDB table schema

I am doing my best to understand DynamoDB data modeling but I am struggling. I am looking for some help to build off what I have now. I feel like I have fairly simple data but it's not coming to me on what I should do to fit into DynamoDB.
I have two different types of data. I have a game object and a team stats object. A Game represents all of the data about the game that week and team stats represents all of the stats about a given team per week.
A timeId is in the format of year-week (ex. 2020-9)
My Access patterns are
1) Retrieve all games per timeId
2) Retrieve all games per timeId and by TeamName
3) Retrieve all games per timeId and if value = true
4) Retrieve all teamStats per timeId
5) Retrieve all teamStats by timeId and TeamName
My attempt at modeling so far is:
PK: TeamName
SK: TimeId
This is leading me to have 2 copies of games since there is a copy for each team. It is also only allowing me to scan for all teamStats by TimeId. Would something like a GSI help here? Ive thought maybe changing the PK to something like
PK: GA-${gameId} / TS-${teamId}
SK: TimeId
Im just very confused and the docs aren't helping me much.
Looking at your access patterns, this is a possible table design. I'm not sure if it's going to really work with your TimeId, especially for the Local Secondary Index (see note below), but I hope it's a good starting point for you.
# Table
-----------------------------------------------------------
pk | sk | value | other attributes
-----------------------------------------------------------
TimeId | GAME#TEAM{teamname} | true | ...
TimeId | STATS#TEAM{teamname} | | ...
GameId | GAME | | general game data (*)
TeamName | TEAM | | general team data (*)
# Local Secondary Index
-------------------------------------------------------------------------------
pk from Table as pk | value from Table as sk | sk from Table + other attributes
-------------------------------------------------------------------------------
TimeId | true | GAME#Team{teamname} | ...
With this Table and Local Secondary Index you can satisfy all access patterns with the following queries:
Retrieve all games by timeId:
Query Table with pk: {timeId}
Retrieve all games per timeId and by TeamName
Query table with pk: {timeId}, sk: GAME#TEAM{teamname}
Retrieve all games per timeId and if value = true
Query LSI with pk: {timeId}, sk: true
Retrieve all teamStats per timeId
Query table with pk: {timeId}, sk: begins with 'STATS'
Retrieve all teamStats by timeId and TeamName
Query table with pk: {timeId}, sk: STATS#TEAM{teamname}
*: I've also added the following two items, as I assume that there are cases where you want to retrieve general information about a specific game or team as well. This is just an assumption based on my experience and might be unnecessary in your case:
Retrieve general game information
Query table with pk: {GameId}
Retrieve general team information
Query table with pk: {TeamName}
Note: I don't know what value = true stands for, but for the secondary index to work in my model, you need to make sure that each combination of pk = TimeId and value = true is unique.
To learn more about single-table design on DynamoDB, please read Alex DeBrie's excellent article The What, Why, and When of Single-Table Design with DynamoDB.

Django: stop foreign key column on ManyToMany table from auto-ordering

I have a ManyToMany relationship between a Group model and a Source model:
class Group(models.Model):
source = models.ManyToManyField('Source', null=True)
class Source(models.Model):
content = models.CharField(max_length=8)
This creates an intermediate table with the columns : id (PK), group_id(FK) and source_id (FK)
Source could look like this:
+----+----------+
| id | content |
+----+----------+
| 1 | A |
| 2 | B |
| 3 | C |
+----+----------+
Each group can have different source member in different orders. For example, group 1 could have sources with 'content' C, A and B with keys of 3,1,2 respectively, and in that specific order.
Group 2 could have sources with 'content' B, C, A with keys of 2,3,1 respectively, and also in that specific order
the table should look like
+----+----------+---------------+
| id | group_id | source_id |
+----+----------+---------------+
| 1 | 1 | 3 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 2 | 3 |
| 6 | 2 | 1 |
+----+----------+---------------+
The trouble is when I associate these sources in the order I want in a code for loop
sequences = [['C', 'A', 'B'], ['B', 'C', 'A']]
for seq in sequences:
group = models.Group()
group.save()
for letter in seq:
source = models.Source.objects.get(content=letter)
source.group_set.add(group)
It ends up in the table as i.e. re-ordered sequentially in order which is definitely what I do not want as in this case the order of the Sources is essential.
+----+----------+---------------+
| id | group_id | source_id |
+----+----------+---------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 2 | 3 |
+----+----------+---------------+
How can I avoid this column re-ordering in Django?
It's important to understand that in SQL there isn't an inherent ordering to the table; the way the information is stored is opaque to you. Rather, the results of each query are ordered according to some specification that you provide at query time.
It sounds like you want the primary key of the M2M table to do double-duty as the field that defines the ordering. In most use cases that is a bad idea. What if you decide later to switch the order of A and B in group 1? What if you need to insert a new Source in between them? You can't do it, because primary keys are not that flexible.
The usual way to do this is to provide a specific column just for ordering. Unlike the primary key field you can change this at will, allowing you to adjust the order, insert new items, etc. In Django you would do this by explicitly declaring the M2M table (using the through field) and adding an ordering column to it. Something like:
class Group(models.Model):
source = models.ManyToManyField('Source', through='GroupSource')
class Source(models.Model):
content = models.CharField(max_length=8)
class GroupSource(models.Model):
# Also look into using unique_together for this model
group = models.ForeignKey(Group)
source = models.ForeignKey(Source)
position = models.IntegerField()
And your code would change to:
sequences = [['C', 'A', 'B'], ['B', 'C', 'A']]
for seq in sequences:
group = models.Group()
group.save()
for position, letter in enumerate(seq):
source = models.Source.objects.get(content=letter)
GroupSource.objects.create(group=group, source=source, position=position)
Thanks for taking the time and effort, and I probably would have gone down the route of doing much the same by adding another field to represent the ordering. But if you can safely get the same thing for free, why bother? These were individual inserts whose order of insertion is important. What puzzled me most later was some tests I have just concluded.
I managed to get the foreign keys still ordered the way I put them in by using sql-connector on a test db with the same schema relationships between the tables. There the keys in the intermediary table holding keys to each of the ManyToMany partners do not re-organise from lowest to highest. However, the exact same code unfortunately still did on the problematic database. Hence it was not a Django thing as such.
The only real difference between the functioning and non-functioning tables was the UNIQUE attribute pointing to the ManyToMany parters i.e foreign keys to Group and Source. After removing them, the problem went away.
However, to be honest, I am not sure why. Or why Django put those UNIQUE attributes there in the first place. Not sure either whether removing them will badly affect the application going forward.

Django: duplicates when filtering on many to many field

I've got the following models in my Django app:
class Book(models.Model):
name = models.CharField(max_length=100)
keywords = models.ManyToManyField('Keyword')
class Keyword(models.Model)
name = models.CharField(max_length=100)
I've got the following keywords saved:
science-fiction
fiction
history
science
astronomy
On my site a user can filter books by keyword, by visiting /keyword-slug/. The keyword_slug variable is passed to a function in my views, which filters Books by keyword as follows:
def get_books_by_keyword(keyword_slug):
books = Book.objects.all()
keywords = keyword_slug.split('-')
for k in keywords:
books = books.filter(keywords__name__icontains=k)
This works for the most part, however whenever I filter with a keyword that contains a string that appears more than once in the keywords table (e.g. science-fiction and fiction), then I get the same book appear more than once in the resulting QuerySet.
I know I can add distinct to only return unique books, but I'm wondering why I'm getting duplicates to begin with, and really want to understand why this works the way it does. Since I'm only calling filter() on successfully filtered QuerySets, how does the duplicate book get added to the results?
The 2 models in your example are represented with 3 tables: book, keyword and book_keyword relation table to manage M2M field.
When you use keywords__name in filter call Django is using SQL JOIN to merge all 3 tables. This allows you to filter objects in 1st table by values from another table.
The SQL will be like this:
SELECT `book`.`id`,
`book`.`name`
FROM `book`
INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`)
INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`)
WHERE (`keyword`.`name` LIKE %fiction%)
After JOIN your data looks like
| Book Table | Relation table | Keyword table |
|---------------------|------------------------------------|------------------------------|
| Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name |
|---------|-----------|------------------|-----------------|------------|-----------------|
| 1 | Book 1 | 1 | 1 | 1 | Science-fiction |
| 1 | Book 1 | 1 | 2 | 2 | Fiction |
| 2 | Book 2 | 2 | 2 | 2 | Fiction |
Then when data is loaded from DB into Python you only receive data from book table. As you can see the Book 1 is duplicated there
This is how Many-to-many relation and JOIN works
Direct quote from the Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Successive filter() calls further restrict the
set of objects, but for multi-valued relations, they apply to any
object linked to the primary model, not necessarily those objects that
were selected by an earlier filter() call.
In your case, because keywords is a multi-valued relation, your chain of .filter() calls filters based only on the original model and not on the previous queryset.

Ordering entries via comment count with django

I need to get entries from database with counts of comments. Can i do it with django's comment framework? I am also using a voting application which is not using GenericForeignKeys i get entries with scores like this:
class EntryManager(models.ModelManager):
def get_queryset(self):
return super(EntryManager,self).get_queryset(self).all().annotate(\
score=Sum("linkvote__value"))
But when there is foreignkeys i am being stuck. Do you have any ideas about that?
extra explaination: i need to fetch entries like this:
id | body | vote_score | comment_score |
1 | foo | 13 | 4 |
2 | bar | 4 | 1 |
after doing that, i can order them via comment_score. :)
Thans for all replies.
Apparently, annotating with reverse generic relations (or extra filters, in general) is still an open ticket (see also the corresponding documentation). Until this is resolved, I would suggest using raw SQL in an extra query, like this:
return super(EntryManager,self).get_queryset(self).all().annotate(\
vote_score=Sum("linkvote__value")).extra(select={
'comment_score': """SELECT COUNT(*) FROM comments_comment
WHERE comments_comment.object_pk = yourapp_entry.id
AND comments_comment.content_type = %s"""
}, select_params=(entry_type,))
Of course, you have to fill in the correct table names. Furthermore, entry_type is a "constant" that can be set outside your lookup function (see ContentTypeManager):
from django.contrib.contenttypes.models import ContentType
entry_type = ContentType.objects.get_for_model(Entry)
This is assuming you have a single model Entry that you want to calculate your scores on. Otherwise, things would get slightly more complicated: you would need a sub-query to fetch the content type id for the type of each annotated object.