I must implement this following hierarchy data:
Category (id, name, url)
SubCategory (id, name, url)
SubSubCategory (id, name, url)
NOTE that this is many-to-many relationship. EG: Each node can have multiple parents or children. There will be no circulation relationship (thank GOD). Only some SubSubCategory may belong to multiple SubCategory.
My implementation: I use single table for this
Cat (id, type(category, subcategory, subsubcategory), name, url)
CatRelation (id, parent_id, child_id, pre_calculated_index for tree retrieval)
pre_calculated_index can be left right implementation of modified preorder tree traversal [1, 2] or a path in my implementation. This pre_calculated_index is calculated when adding child to one node so that when you retrieve a tree you only need to sort by this field and avoid recursive query.
Anyway my boss argued that this implementation is not ideal. He suggests having each table for each type of category, and then have a pivot tables to link them:
Category (id, name, url)
SubCategory (id, name, url)
SubSubCategory (id, name, url)
Category_SubCategory(category_id, sub_category_id)
SubCategory_SubSubCategory(sub_category_id, sub_sub_category_id)
When you retrieve a tree, you only need to join all tables. His arguments is that later when you add some attribute to any category type you don't need and null field in single table implementation. And the pre_calculated_index may get wrong since it is calculated in code.
Which one should I follow? Which has better performance?
I use django and postgreSQL.
PS: More detail on my pre_calculated_index implementation:
Instead of left and right for each node I add a path (string, unique, indexed) value to the CatRelation: root node will have `path = '.'
child node when added to CatRelation will have path = parent_path + '.' So when you sort by this path, you get everything in tree order. Examples:
Cat
| id | name | url |
|----|------------|-----|
| 1 | Cat1 | |
| 2 | Subcat1 | |
| 3 | Subcat2 | |
| 4 | Subcat3 | |
| 5 | Subsubcat1 | |
| 6 | Subsubcat2 | |
| 7 | Subsubcat3 | |
CatRelationship Left right equivalent
| id | parent_id | child_id | path | |lft |rght|
|---- |----------- |---------- |-------- | |----|----|
| 1 | null | 1 | 1. | | 1 | 14 |
| 2 | 1 | 2 | 1.2. | | 2 | 3 |
| 3 | 1 | 3 | 1.3. | | 4 | 11 |
| 4 | 1 | 4 | 1.4. | | 12 | 13 |
| 5 | 3 | 5 | 1.3.5. | | 5 | 6 |
| 6 | 3 | 6 | 1.3.6. | | 7 | 8 |
| 7 | 3 | 7 | 1.3.7. | | 9 | 10 |
So when you sort by path (or order by left in modified preorder tree), you will got this nice tree structure without recursion:
| id | parent_id | child_id | path |
|---- |----------- |---------- |-------- |
| 1 | null | 1 | 1. |
| 2 | 1 | 2 | 1.2. |
| 3 | 1 | 3 | 1.3. |
| 5 | 3 | 5 | 1.3.5. |
| 6 | 3 | 6 | 1.3.6. |
| 7 | 3 | 7 | 1.3.7. |
| 4 | 1 | 4 | 1.4. |
And I can always build path dynamically using recursion:
WITH RECURSIVE CTE AS (
SELECT R1.*, CONCAT(R1.id, ".") AS dynamic_path
FROM CatRelation AS R1
WHERE R1.child_id = request_id
UNION ALL
SELECT R2.*, CONCAT(dynamic_path, R2.child_id, ".") AS dynamic_path
FROM CTE
INNER JOIN CatRelation AS R2 ON (CTE.child_id = R2.parent_id)
)
SELECT * FROM CTE;
This is not inheritance as someone suggested
Your question is somewhat opinionated because you ask for a comparison of two different approaches. I'll try to provide an answer although I'm afraid there is no unique true answer to it. In the rest of the answer I'll refer to your approach as solution A and to the approach suggested by your boss as solution B.
I would strongly suggest to follow the approach proposed by your boss:
because he's your boss! If something goes wrong later, nobody can blame you. You have followed the instructions.
because it follows the "The Zen of Python".
In particular the following rules of The Zen of Python apply:
Explicit is better than implicit.
The solution B is very explicit. The solution A is implicit.
Simple is better than complex.
The solution B is very simple and straightforward. The solution A is complex.
Sparse is better than dense.
The solution B is sparse. The solution A is dense and hides the obvious from the user.
Readability counts.
The solution B is very verbose, yet easy to read. The solution A requires more time and effort to understand.
You might measure performance in ms, your boss eventually thinks about performance in $. Getting a junior developer on board would require far less time with solution B. Time is expensive for enterprises.
Future changes in the models can be easier implemented. What if you'd like to add another field to Category which shouldn't (or doesn't need) to be present in SubCategory and SubSubCategory?
Testing (unit and functional) is much easier with solution B. It would require eventually more lines of code and be more verbose, but would be easier to read and understand.
The performance will vary and depend on the use case. How many records you'll have in the database? What's more critical: retrieving or inserting/updating? What makes the earlier more performant, might deteriorate the latter and vice versa.
I hope you have heard the sentence:
Premature optimization is the root of all evil.
given by Donald Knuth.
You'll take care about performance when there are concrete issues regarding it. It doesn't mean you shouldn't not invest any forethought concerning performance when designing your application.
You can cache queries, an option would be to use redis. Since you use PostgreSQL you could also use materialized views. But as I said I'd cross that bridge when I come to it.
EDIT:
You didn't mention anything else about any further models. I'd assume that when you have categories you'll have some entities, let's say products classified in those categories i.e. categorized. Here I'd give an example:
Category: Men
SubCategory: Sportswear
SubSubCategory: Running Shoes
Product: ACME speeedVX13 (fictive brand and model)
If you strictly follow this hieararchy and put a product only and only in SubSubCategory then the solution B is better.
But if you have a fictive product Sportskit ACME (running shoes, shorts and sleeveless shirt) that you can't put in SubSubCategory and need to put in SubCategory, skipping one level, then you might end up using something like generic relations.
In that case solution A is better.
Related
I have 2 tables in powerbi, one contains all transactions to and from people (each client identified with an id, where "I" can be either the receiver or sender of $) and the other is the detail for each client.
Table 1 would look something like
| $ | sender id | receiver id |
|---|-----------| ------------|
| 10| 1 | 2 |
| 15| 1 | 3 |
| 20| 1 | 2 |
| 15| 3 | 1 |
| 10| 3 | 1 |
| 25| 2 | 1 |
| 10| 1 | 2 |
The second table contains sender id and name:
| id | name |
|----|-------|
| 1 | "me" |
| 2 | John |
| 3 | Susan |
The expected result is something like (not necesarily in a table, just to show)
| $ sent | $ received | Balance|
|--------|------------|--------|
| 55 | 45 | +10 |
And in a filter have "John" and "Susan" so when i Select one of them i could see $ sent, $received and balance for each of them.
The problem of course is that i end up with one active and one inactive relationship so if i apply such a filter i end up with 0 in sender/receiver and the whole value in the other (depending which is made active and inactive) and if i make another table that's "id sender"+"name sender" then i cant filter all at once.
Is it possible to do this?
I hope this is kinda understandable
You will need to add 2 columns to your user table
received = CALCULATE(SUM(T1[$]), Filter(T1, UserTable[id] = T1[reveicer id]))
The same you can do for send. Now in your visual, use the new columns.
Enjoy!
after going around a bit I found a way to solve this, probably not the most orthodox way to do it, but it works.
What I did is to add 2 columns to my sales table, one was labeled "movement" and in sql it is just a 'case' where when the receiver is 'me' its "Charged" and when the receiver is 'not-me' its "Payment", then i added a column with a case so it would always bring me the 'not-me' id, and i used that for may relationship with my users table.
Then I just added filters in my cards making one a "Payment" card and the other a "Charged" card.
This is all following the previous example, it was actually just a bit more tricky as I could actually have a payment from me to myself, but thats just another "case" for when it was 'me-me'
Hope this is understandable, english is not my first language and the information i actually used is partially confidential so i had to make the above example.
thanks all and have a nice day.
I have two tables (subject & category) that are both related to the same parent table (main). Because of the foreign key constraints, it looks like Power BI automatically created the links.
Simple mock-up of table links
I need to count the subjects by type for each possible distance range. I tried a simple calculation shown below for each distance category.
less than 2m =
CALCULATE(
COUNTA('Category'[Descr]),
'Subject'[Distance] IN { "less than 2m" }
)
However, the filter doesn't seem to apply properly.
I want...
+------+--------------+--------------+--+
| Descr| less than 2m | more than 2m | |
+------+--------------+--------------+--+
| Car | 2 | 1 | |
| Sign | 4 | 2 | |
+------+--------------+--------------+--+
but I'm getting...
+------+--------------+--------------+--+
| Descr| less than 2m | more than 2m | |
+------+--------------+--------------+--+
| Car | 3 | 3 | |
| Sign | 6 | 6 | |
+------+--------------+--------------+--+
It's just giving me the total count by type which is correct but isn't applying the filter by distance so I can break it down.
I'm sure this is probably really simple but I'm pretty new with DAX and I can't figure this one out.
I wish I could mark Kosuke's comment as an answer. The issue was indeed with having to enable cross-filtering. This can either be done clicking on the link on your model or using a function to temporarily enable the cross filter.
I want to create a new variable, say cheese2, that takes cheese and divides every by the last observation (2921333).
+----------+
| cheese |
|----------|
1. | 3060000 |
2. | 840333.3 |
3. | 1839667 |
4. | 1.17e+07 |
5. | 1374000 |
|----------|
6. | 2092333 |
7. | 341000 |
8. | 3149000 |
9. | 3557667 |
10. | 590666.7 |
|----------|
11. | 8937000 |
12. | 4142000 |
13. | 2624000 |
14. | 1973667 |
15. | 2921333 |
I would also like to do this for multiple columns at once i.e. divide multiple columns by the last row of my data set.
In Stata terminology,
create a new variable by dividing a column by the observation in the last row
becomes
create a new variable by dividing a variable by the value in the last observation.
Such a question suggests that you are storing totals in your last observation, spreadsheet style. Such a practice is undoubtedly convenient for what you are asking, but it creates obligations to exclude the last observation from almost every other manipulation and to maintain precisely the same sort order, and would generally be considered a bad idea therefore.
All that said,
gen cheese2 = cheese/cheese[_N]
is what you ask and a loop over several variables could be
foreach v of var frog newt toad lizard dragon {
gen `v'2 = `v'/`v'[_N]
}
See also the help for foreach.
INTRODUCTION AND RELEVANT INFORMATION:
I have MS ACCESS 2007 database that I edit using ADO and C++.
PROBLEM:
My problem is that primary key also represents an ordinal number of the record, and after deletion, it should be properly updated. Primary key is of autonumber type.
Here is an example of what I am talking about:
| #PK | Other data ... |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
| 5 | ... |
Now if I delete the 3rd record I get the following problem:
| #PK | Other data ... |
| 1 | ... |
| 2 | ... |
| 4 | ... |
| 5 | ... |
but I should get the following result:
| #PK | Other data ... |
| 1 | ... |
| 2 | ... |
| 3 | ... | // edited to reflect the change ( previous value was 4 )
| 4 | ... | // edited to reflect the change ( previous value was 5 )
If I delete last record and then insert new one I get this result:
| #PK | Other data ... |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
| 6 | ... | // this should be 5
QUESTIONS:
Is there a way for me to programmatically update the autonumber field after I perform the DELETE query ?
EDIT:
Since I am aware this is a bad practice, I would prefer adding new field that should be ordinal number so my table can look like this:
| #PK | Ordinal | Other data ... |
| 1 | 1 | ... |
| 2 | 2 | ... |
| 4 | 3 | ... |
| 5 | 4 | ... |
but I would prefer it to update itself automatically. If this is not possible, I would prefer to update the field with SQL query after I perform the deletion.
Thank you.
Best regards.
It is possible, but not the right way. Primary keys are used for relationships, so if you change the values, you need to update all related tables. Even if you currently don't have any related tables, you still should consider adding a separate field for the order, otherwise you may face the same problem in the future when you want to add related tables.
EDIT To answer your question:
Is there a way to add another field that would represent ordinal number and will automatically increment after inserting new record?
If you set it to autonumber, it will automatically increment, but you will not be able to modify it. You can set it to number and when you insert, you use SELECT MAX(oredinal) + 1 FROM mytable to increment it.
For MS Access use
ALter Table Customer alter column CustomerID Counter(1,1)
For Sql Server
DBCC CHECKIDENT (orders, RESEED, 0)
This will set the value of the next ID to be 1, you can use above command.
Ref URL# http://www.howtogeek.com/howto/database/reset-identity-column-value-in-sql-server/
I have decided to add a new field in my table that will hold the ordinal number of the record.
If we assume the field is named OrdinalNumber then the following solution worked for me:
// when inserting record, I just had to add COUNT( PK ) + 1
INSERT INTO MyTable ( OrdinalNumber , ... ) SELECT COUNT( PK ) + 1 , ...
from MyTable ;
// when deleting, I had to perform following two queries :
DELETE from MyTable where PK = ? ;
// decrement all the successors ordinal number by one
UPDATE MyTable set OrdinalNumber = ( OrdinalNumber - 1 ) where ( PK > ? );
Everything seem to work well. I wish there was an easier way though...
Thanks everyone for helping. I have upvoted all the answers.
I am parsing the USDA's food database and storing it in SQLite for query purposes. Each food has associated with it the quantities of the same 162 nutrients. It appears that the list of nutrients (name and units) has not changed in quite a while, and since this is a hobby project I don't expect to follow any sudden changes anyway. But each food does have a unique quantity associated with each nutrient.
So, how does one go about storing this kind of information sanely. My priorities are multi-programming language friendly (Python and C++ having preference), sanity for me as coder, and ease of retrieving nutrient sets to sum or plot over time.
The two things that I had thought of so far were 162 columns (which I'm not particularly fond of, but it does make the queries simpler), or a food table that has a link to a nutrient_list table that then links to a static table with the nutrient name and units. The second seems more flexible i ncase my expectations are wrong, but I wouldn't even know where to begin on writing the queries for sums and time series.
Thanks
You should read up a bit on database normalization. Most of the normalization stuff is quite intuitive, but really going through the definition of the steps and seeing an example helps understanding the concepts and will help you greatly if you want to design a database in the future.
As for this problem, I would suggest you use 3 tables: one for the foods (let's call it foods), one for the nutrients (nutrients), and one for the specific nutrients of each food (foods_nutrients).
The foods table should have a unique index for referencing and the food's name. If the food has other data associated to it (maybe a link to a picture or a description), this data should also go here. Each separate food will get a row in this table.
The nutrients table should also have a unique index for referencing and the nutrient's name. Each of your 162 nutrients will get a row in this table.
Then you have the crossover table containing the nutrient values for each food. This table has three columns: food_id, nutrient_id and value. Each food gets 162 rows inside this table, oe for each nutrient.
This way, you can add or delete nutrients and foods as you like and query everything independent of programming language (well, using SQL, but you'll have to use that anyway :) ).
Let's try an example. We have 2 foods in the foods table and 3 nutrients in the nutrients table:
+------------------+
| foods |
+---------+--------+
| food_id | name |
+---------+--------+
| 1 | Banana |
| 2 | Apple |
+---------+--------+
+-------------------------+
| nutrients |
+-------------+-----------+
| nutrient_id | name |
+-------------+-----------+
| 1 | Potassium |
| 2 | Vitamin C |
| 3 | Sugar |
+-------------+-----------+
+-------------------------------+
| foods_nutrients |
+---------+-------------+-------+
| food_id | nutrient_id | value |
+---------+-------------+-------+
| 1 | 1 | 1000 |
| 1 | 2 | 12 |
| 1 | 3 | 1 |
| 2 | 1 | 3 |
| 2 | 2 | 7 |
| 2 | 3 | 98 |
+---------+-------------+-------+
Now, to get the potassium content of a banana, your'd query:
SELECT food_nutrients.value
FROM food_nutrients, foods, nutrients
WHERE foods_nutrients.food_id = foods.food_id
AND foods_nutrients.nutrient_id = nutrients.nutrient_id
AND foods.name = 'Banana'
AND nutrients.name = 'Potassium';
Use the second (more normalized) approach.
You could even get away with fewer tables than you mentioned:
tblNutrients
-- NutrientID
-- NutrientName
-- NutrientUOM (unit of measure)
-- Otherstuff
tblFood
-- FoodId
-- FoodName
-- Otherstuff
tblFoodNutrients
-- FoodID (FK)
-- NutrientID (FK)
-- UOMCount
It will be a nightmare to maintain a 160+ field database.
If there is a time element involved too (can measurements change?) then you could add a date field to the nutrient and/or the foodnutrient table depending on what could change.