PowerBI (DAX) : Dealing with Company Hierarchy and Values - powerbi

I have 3 tables: Agents, Leaders and Super-Leaders.
These tables have NAME, ID and RESERVE_VALUE columns, as well as the ID of the superior hierarchy.
I am trying to simply make a Matrix visual showing the total RESERVE_VALUE under each context of hierarchy and I'd love to able to drill down this table to show the Agents' RESERVE_VALUE bellow the Leader's and Super-Leader's value on the same column if possible.
I linked the tables and simply put the 3 reserve columns side by side on the matrix, but that is kind of an ugly solution.
I am currently making a flat table using the Leaders as categories, and the hierarchy works fine, but I can't figure out how to deal with the values.
I researched a lot, watched some videos, tried using ISFILTERED, HASONEVALUE, HASONFILTER, etc.. but the problem is that even when I can get the lower level right, the upper levels' values go blank.
The similar problems I've seen online are normaly related to products and categories, but the main difference is that categories simple agregate the products' values. In my case the Leaders do agregate the values, but they also have theyr own values that I want to display when drilling down.
Has anyone been through something similar in PowerBI? Any suggestions on how to solve it or how to research this problem better?
Here are the sample data:
FLAT TABLE:
Agent LEADER SP_LEADER AGENT_VALUE LEADER_VALUE SP_VALUE
A L1 X 100 300 0
B L1 X 150 300 0
C L2 Z 200 0 700
D L2 Z 370 0 700
E L3 Z 0 340 700
DESIRED RESULT:
The original files are like these:
NAME_AGENT LEADER AGENT_VALUE
A L1 100
B L1 150
C L2 200
D L2 370
E L3 0
LEADER Table:
NAME_LEADER SUPER_LEADER LEADER_VALUE
L1 X 300
L2 Z 0
L3 Z 340
SUPER LEADER Table:
NAME_SP_LEADER VALUE
X 0
Z 700

In PBI, you can control what the total says at each level using ISINSCOPE() but what you cannot do is change what the total says depending on whether the node is expanded or contracted.
What you want to do isn't possible nor is it a good user experience for the value of a row to change depending on whether the level is expanded or collapsed. For example, you are trying to treat L1 as a leaf node with an individual value when expanded but as a parent node when collapsed.
If you turn off stepped layout, you will see why this makes no sense as L1 is only ever a parent.
My suggestion is you rework your data. If L1 is the team name and L1 is also an individual contributor, you should have L1 form part of the hierarchy. e.g. You have A=100, B=150 and L1(individual member)= 300 and then your L1(team) will always show 550 at the total level.

Related

Cant get my if Statement to work correctly

I'm trying to make it so that when X is selected it then calls another set of code. for example
Main Selection is Type of Settlement( 4 choses ) City2, Village, Town, Hommlet. this is in a drop down in a Name Range. however this really isn't linked to anything, and should affect any of the other outcomes.
part 2 :
=SORTN('Data Sheet'!D5:E17,1,0,'Data Sheet'!E5:E17,FALSE)
Here i have it select from a list of Races ie Human, Dwarf, Elf Ect. along with a % of chance that Race is selected. there are 3 types that will pop up and some other formulas for calculated % of each.
Hommlet TRUE
| Hommlet TRUE
What Type of Settlement?
Population 54
Races and Purcentage Within the Setttlement
% Race # of
Majority Race 91 Half-elf 99 49
Secondary Race 6.03 Genasi 3
Tertiary Race 2.97 Firbolg 97 1.6
3rd on and here is where the issue is. If the Race is "Human" THen is needs to pick from a 'list of Human names. A data base list of name i have.
=If(N7="Human",INDEX('Name Lists'!B2:B15,RANDBETWEEN(1,COUNTA('Name Lists'!B2:B15))),"None")
here is the test i did to insure it would work before i embedded more into the the If formula. i also did a logic test to see if when Human Appeared it would come back TRUE, but it always returns False. if I type the word Human in another cell and Direct above code to that cell it works.
check out thew above

Linear Programming - Re-setting a variable based on it's cumulative count

Detailed business problem:
I'm trying to solve a production scheduling business problem as below:
I have two plants producing FG A and B respectively.
Both the products consume the same Raw Material x
I need to create a 30 day production schedule looking at the Raw Material availability.
FG A and B can be produced if there is sufficient raw material available on the day.
After every 6 days of production the plant has to undergo maintenance and the production on that day will be zero.
Objective is to maximize the margin looking at the day level Raw material available and adhere to the production constraint (i.e. shutdown after every 6th day)
I need to build a linear programming to address the below problem:
Variable y: (binary)
variable z: cumulative of y
When z > 6 then y = 0. I also need to reset the cumulation of z after this point.
Desired output:
How can I build the statement to MILP constraint. Are there any techniques for solving this problem. Thank you.
I think you can model your maintenance differently. Just forbid any sequences of 7 ones for y. I.e.
y[t-6]+y[t-5]+y[t-4]+y[t-3]+y[t-2]+y[t-1]+y[t] <= 6 for t=1,..,T
This is easier than using your accumulator. Note that the beginning needs some attention: you can use historic data for this. I.e., at t=1, the values for t=0,-1,-2,.. are known.
Your accumulator approach is not inherently wrong. We often use it to model inventory. An inventory capacity is a restriction on how large the accumulated inventory can be.

Sorting query by distance requires reading entire data set?

To perform geoqueries in DynamoDB, there are libraries in AWS (https://aws.amazon.com/blogs/mobile/geo-library-for-amazon-dynamodb-part-1-table-structure/). But to sort the results of a geoquery by distance, the entire dataset must be read, correct? If a geoquery produces a large number of results, there is no way to paginate that (on the backend, not to the user) if you're sorting by distance, is there?
You are correct. To sort all of the datapoint by distance from some arbitrary location, you must read all the data from your DynamoDB table.
In DynamoDB, you can only sort results using a pre-computed value that has been stored in the DynamoDB table and is being used as the sort key of the table or one of its indexes. If you need to sort by distance from a fixed location, then you can do this with DynamoDB.
Possible Workaround (with limitations)
TLDR; it's not such a bad problem if you can get away with only sorting the items that are within X kms from an arbitrary point.
This still involves sorting the data points in memory, but it makes the problem easier by producing incomplete results (by limiting the maximum range of the results.)
To do this, you need the Geohash of your point P (from which you are measuring the distance of all other points). Suppose it is A234311. Then you need to pick what range of results is appropriate. Let's put some numbers on this to make it concrete. (I'm totally making these numbers up because the actual numbers are irrelevant for understanding the concepts.)
A - represents a 6400km by 6400km area
2 - represents a 3200km by 3200km area within A
3 - represents a 1600km by 1600km area within A2
4 - represents a 800km by 800km area within A23
3 - represents a 400km by 400km area within A234
1 - represents a 200km by 200km area within A2343
1 - represents a 100km by 100km area within A23431
Graphically, it might look like this:
View of A View of A23
|----------|-----------| |----------|-----------|
| | A21 | A22 | | | |
| A1 |-----|-----| | A231 | A232 |
| | A23 | A24 | | | |
|----------|-----------| |----------|-----------|
| | | | |A2341|A2342|
| A3 | A4 | | A233 |-----|-----|
| | | | |A2343|A2344|
|----------|-----------| |----------|-----------| ... and so on.
In this case, our point P is in A224132. Suppose also, that we want to get the sorted points within 400km. A2343 is 400km by 400km, so we need to load the result from A2343 and all of its 8-connected neighbors (A2341, A2342, A2344, A2334, A2332, A4112, A4121, A4122). Then once we've loaded only those in memory, then you calculate the distances, sort them, and discard any results that are more than 400km.
(You could keep the results that are more than 400km away as long as the users/clients know that beyond 400km, the data could be incomplete.)
The hashing method that DynamoDB Geo library uses is very similar to a Z-Order Curve—you may find it helpful to familiarize yourself with that method as well as Part 1 and Part 2 of the AWS Database Blog on Z-Order Indexing for Multifaceted Queries in DynamoDB.
Not exactly. When querying location you can query by a fixed query value (partition key value) and by sort key, so you can limit your query data result and also apply a little filtering.
I have been racking my brain while designing a DynamoDB Geo Hash proximity locator service. For this example customer_A wants to find all service providers_X in their area. All customers and providers have a 'g8' key that stores their precise geoHash location (to 8 levels).
The accepted way to accomplish this search is to generate a secondary index from the main table with a less accurate geoHash 'g4' which gives a broader area for the main query key. I am applying key overloading and composite key structures for a single table design. The goal in this design is to return all the data required in a single query, secondary indexes can duplicate data by design (storage is cheap but cpu and bandwidth is not)
GSI1PK GSI1SK providerId Projected keys and attributes
---------------------------------------------
g4_9q5c provider pr_providerId1 name rating
g4_9q5c provider pr_providerId2 name rating
g4_9q5h provider pr_providerId3 name rating
Scenario1: customer_A.g8_9q5cfmtk So you issue a query where GSI1PK=g4_9q5c and a list of two providers is returned, not three I desire.
But using geoHash.neighbor() will return eight surrounding neighbors like 9q5h (see reference below). That's great because there a provider in 9q5h but this means I have to run nine queries, one on the center and eight on the neighbors, or run 1-N until I have the minimum results I require.
But which direction to query second, NW, SW, E?? This would require another level of hinting toward which neighbor has more results, without knowing first, unless you run a pre-query for weighted results. But then you run the risk of only returning favorable neighbors as there could be new providers in previously unfavored neighbors. You could apply some ML and randomized query into neighbors to check current counts.
Before the above approach I tried this design.
GSI1PK GSI1SK providerId Projected keys and attributes
---------------------------------------------
loc g8_9q5cfmtk pr_provider1
loc g8_9q5cfjgq pr_provider2
loc g8_9q5fe954 pr_provider3
Scenario2: customer_A.g8_9q5cfmtk So you issue a query where GSI1PK=loc and GSI1SK in between g8_9q5ca and g8_9q5fz and a list of three providers is returned, but a ton of data was pulled and discarded.
To achieve the above query the between X and Y sort criteria is composed of. 9q5c.neighbors().sorted() = 9q59, 9q5c, 9q5d, 9q5e, 9q5f, 9q5g, 9qh1, 9qh4, 9qh5. So we can just use X=9q59 and Y=9qh5 but there are over 50 (I really didn't count after 50) matching quadrants in such a UTF between function.
Regarding the hash/size table above I would recommend to use this https://www.movable-type.co.uk/scripts/geohash.html
Geohash length Cell width Cell height
1 ≤ 5,000km × 5,000km
2 ≤ 1,250km × 625km
3 ≤ 156km × 156km
4 ≤ 39.1km × 19.5km
5 ≤ 4.89km × 4.89km
...

Increasing speed of a for loop through a List and dataframe

Currently I am dealing with a massive amount of Data in the original form of a list through combination. I am running conditions on each set of list through a for loop. Problem is this small for loop is taking hours with the data. I'm looking to optimize the speed by changing some functions or vectorizing it.
I know one of the biggest NO NOs is don't do Pandas or Dataframe operations in for loops but I need to sum up the columns and organize it a little to get what I want. It seems unavoidable.
So you have a better understanding, each list looks something like this when its thrown into a dataframe:
0
Name Role Cost Value
0 Johnny Tsunami Driver 1000 39
1 Michael B. Jackson Pistol 2500 46
2 Bobby Zuko Pistol 3000 50
3 Greg Ritcher Lookout 200 25
Name Role Cost Value
4 Johnny Tsunami Driver 1000 39
5 Michael B. Jackson Pistol 2500 46
6 Bobby Zuko Pistol 3000 50
7 Appa Derren Lookout 250 30
This is the current loop, any ideas?
for element in itertools.product(*combine_list):
combo = list(element)
df = pd.DataFrame(np.array(combo).reshape(-1,11))
df[[2,3]] = df[[2,3]].apply(pd.to_numeric)
if (df[2].sum()) <= 5000 and (df[3].sum()) > 190:
df2 = pd.concat([df2, df], ignore_index=True)
Couple things I've done that have sliced off some time but not enough.
*df[2].sum() to df[2].values.sum----its faster
*where the concat is in the if statement I've tried using append and also adding the dataframe together as a list...concat is actually 2 secs faster normally or it will end up being about the same speed.
*by the .apply(pd.to_numeric) changed it to .astype(np.int64) its faster as well.
I'm currently looking at PYPY and Cython as well but I want to start here first before I go through the headache.

Issue with ms access 2000, repeating display of same field in query

I was having an issue with ms access 2000 in which I try to enter the same field in a query multiple times and it only displays the field once. As in if I entered the field with the number being (for example) 8150 multiple times, it would only display it once.
This image shows the query.
I've already checked everything on ms access 2000 to try to resolve this issue but I've come up with nothing suitable.
I know your data set is simplified, but looking at your data, inputs, etc, it appears your query is pulling from a single table and repeating results -- so there is no join consideration.
I think the issue is your DISTINCTROW in the query, which is removing all duplicate values.
If you remove the "DISTINCTROW," I believe it may give you what you are expecting. In other words, change this:
SELECT DISTINCTROW Ring.[Ring Number], Ring.[Mounting Weight]
FROM Ring
To this:
SELECT Ring.[Ring Number], Ring.[Mounting Weight]
FROM Ring
For what it's worth, there may also be some strategies to simplifying how this query is run in the future (less dependence on dialog box prompts), but I know you probably want to address the issue at a hand first, so let me know if this doesn't do it.
-- EDIT --
The removal of distinct still applies, but I suddenly see the problem. The query is depicting the logic as "OR" of multiple values. Therefore, repeating the value does not mean multiple rows, it just means you've repeated a true condition.
For example, if I have:
Fruit Count
------ ------
Apple 1
Pear 1
Kiwi 3
and I say select where Fruit is Apple or Apple or Apple or Apple, the query is still only going to list the first row. Once the "Or" condition matches true, short-circuiting kicks in, and no other conditions matter.
That does not sound like what you want.
Here's what I think you need to do:
Get rid of the prompts within the query
Load your options into a separate table -- the repetition can occur here
Change your query to perform an inner join on the new table
New table (named "Selection" for the sake of example):
Entry Ring Number Mounting Weight
----- ----------- ----------------
1 8105 you get the idea...
2 8110
3 8110
4 8110
5 8115
6 8130
7 8130
8 8130
9 8130
10 8150
New Query:
select
Ring.[Ring Number], Ring.[Mounting Weight]
from
Ring
Inner join Selection on Ring.[Ring Number] = Selection.[Ring Number]
This has the added advantage of allowing more (or less) than 10 records