How to deal with numeric attribute in dimension? - powerbi

I have a model where in my Project dimension, I have an attribute (AgreedReturn), which is a fixed number.
I included it in my model, all good.
My problem now, is that I have a requirement where users want a SUM of that attribute.
This is very confusing to me... For me, it clearly belongs to the dimension, but, it messes it up that they want ‘math’ applied on it.
What is the right modeling approach to this? Am I forced to move this attribute to the fact somehow?

Numbers in dimensions are not problematic. Including them in calculations is not problematic. For example a Product may have a weight, which you could multiply by a quantity from a fact table to produce a shipment weight.
So your problem is just one of understanding what the users are trying to measure, and figuring out how to calculate it.
And it may well be that it's more convenient to push this attribute to the fact table during ETL, or with a calculated column. But that's a secondary issue.

Related

POWERBI: take while table when there is no available relation

I have a slight issue with my tables in POWERBI. In short, I have a missing link in one of my relations. As a result, instead of returning NOTHING which is logical and actually what I would like, it returns EVERYTHING.
A bit more details, I have the multiple tables with relations between them. The problem is that I have a few task_group pointing toward shipments that do not exist. In my visualization, I am trying to access data (a count of the number of Packages linked to a shipment) that is linked to a shipment. The logical thing for me would be that "If there is no shipment fitting the number that is given in the shipment table, then you cannot count the number of packages linked to that shipment".
But PowerBI beg to differ. His idea is "If I cannot find a shipment to link to package, i'm going to take every single package regardless of shipment". As a result, a group of task that do not have any package end up showing as having all the packages instead. How can I tell powerbi to return nothing if he doesn't find anything instead of returning everything?
Image of my relationships
I think Power BI behaves slightly unintuitively where there are nulls on one side of a join.
Have you tried filtering to only include where shipment_id is not blank?
If the problem is you having NULLs in one side of the relationship, the best way to tackle this would be to replace the NULLs with something else. Now, you can do it in two ways:
Edit the Shipment number NULLs to something else in the Power query while importing (Some number which is not likely to be an actual shipment, maybe 0)
Create a calculated field in DAX replacing the blanks/NULLs and use that in the relationship instead
But I think you may have NULLs in both the sides of the relationship. That is the only explanation I can think of, why Power BI is behaving this way. Either way, the above solutions should fix it.

Using column versions for time series

In the official documentation there is a text for which I can't totally understand the reason:
When working with time series, do not leverage the transactional behavior of rows. Changes to data in an existing row should be stored as a new, separate row, not changed in the existing row. This is an easier model to construct, and it enables you to maintain a history of activity without relying upon column versions.
The last sentence is not obvious and concrete, so it doesn't convince me. For now, using versioning for updating the cell's data still looks to me like a good fit for the update task. At least versions are managed by BigTable, so it's simplier solution.
Can anybody please provide more obvious explanation of why the versioning shouldn't be used in that use case?
Earlier in that page under Patterns for row key design, a bit more detail is explained. The high level view being that using row keys instead of column versions will:
Make it easier to run queries across your data, allowing for scanning of less data.
Avoid going over the recommended maximum row size.
The one caveat being:
It is acceptable to use versions of a column where the use case is
actually amending a value, and the value's history is important. For
example, suppose you did a set of calculations based on the closing
price of ZXZZT, and initially the data was mistakenly entered as
559.40 for the closing price instead of 558.40. In this case, it might be important to know the value's history in case the incorrect value
had caused other miscalculations.

Missing values in PowerBI slicer

I have a problem in my cube with a dimension value as a slicer or a filter. It has no and name and I want to slice on name, but some of the values are missing. It is the same problem if I use no. If I use no as a filter and use advanced filter to search for the specific no the data works, but the slicer for name is still empty. The strange thing is that I can see the no and name fine in the data area, just not in the filter or if I use a slicer.
I have the same problem in PowerBI desktop and in the online version.
Edited:
I have several facts using the same dimensions. I have found that a disabling a specific relationship from a fact one of the dimensions makes the problem disappear. The number of values in the slicer are a bit too many, but at least now it doesn't exclude some of the values I can see. The only problem is that I need the relationship. I have checked if there should be a problem with the values for the relationship such as missing or null values, all the values are there.
My colleague found the solution for my dashboard: when you go to the options of the X axis there is a Start and End box. In my End box there was a number, which caused the last value to disappear. When I removed this number (and Auto appeared), my graph was complete again.
Removing and adding the problematic fact table again solved the problem.
It was quite hard to find out what was causing the problem to begin with. I went about it by first creating a minimal cube to make sure that the core data was correct. After verifying that, I opened the problematic cube and starting removing facts and dimensions until the problem disappeared.

Add Indexes (db_index=True)

I'm reading a book about coding style in Django and one thing they discuss is db_index=True. Ever since I started using Django, I've never used this function because I'm not really sure what it does.
So my question is, when to consider adding indexes?
This is not really django specific; more to do with databases. You add indexes on columns when you want to speed up searches on that column.
Typically, only the primary key is indexed by the database. This means look ups using the primary key are optimized.
If you do a lot of lookups on a secondary column, consider adding an index to that column to speed things up.
Keep in mind, like most problems of scale, these only apply if you have a statistically large number of rows (10,000 is not large).
Additionally, every time you do an insert, indexes need to be updated. So be careful on which column you add indexes.
As always, you can only optimize what you can measure - so use the EXPLAIN statement and your database logs (especially any slow query logs) to find out where indexes can be useful.
The above answer is correct but in some cases where the search is being done on columns that have only varchar datatype like email. There you need to add an index.
Following is the way of doing that:
Index(name='covering_index', fields=['headline'], include=['pub_date'])
reference from https://docs.djangoproject.com/en/3.2/ref/models/indexes/

Sorting Records using Foreign Keys

From my question at Get Foreign Key Value, I managed to get the desired output...only one last bit remains. I want to sort my records by the year, make, then model in that order. I thought it'd be as simple as Vehicle.objects.all().order_by('common_vehicle') but this doesn't sort anything.
You have to order by specific fields in the related class. You do this by using the double-underscore format. So, for example:
Vehicle.objects.order_by('common_vehicle__year', 'common_vehicle__series__model__model')
to sort by the year value of the CommonVehicle class, then the model value of the Model class which is related via the Series class.
Note that this is a lot of joins, and could make your query performance quite slow. It may be fine for your needs, but just a heads-up that this is a potential source of slowness down the line.