How to pick which column to unstack a dataframe on

How to pick which column to unstack a dataframe on - python-2.7

I have a data set that looks like:
UniqueID CategoryType Value
A Cat1 apple
A Cat2 banana
B Cat1 orange
C Cat2 news
D Cat1 orange
D Cat2 blue
I'd like it to look like:
UniqueID Cat1 Cat2
A apple banana
B orange
C news
D orange blue
I've tried using unstack, but can't get the right index set or something.
Thanks

The bulk of the work is done with
df.set_index(['UniqueID', 'CategoryType']).Value.unstack(fill_value='')
CategoryType Cat1 Cat2
UniqueID
A apple banana
B orange
C news
D orange blue
We can get the rest of the formatting with
df.set_index(['UniqueID', 'CategoryType']).Value.unstack(fill_value='') \
.rename_axis(None, 1).reset_index()
UniqueID Cat1 Cat2
0 A apple banana
1 B orange
2 C news
3 D orange blue

You can use pivot
Edit: With some more edit and inspiration from #piRsquared's answer,
df.pivot('UniqueID', 'CategoryType', 'Value').replace({None: ''}).rename_axis(None, 1).reset_index()
UniqueID Cat1 Cat2
0 A apple banana
1 B orange
2 C news
3 D orange blue

You can use pivot_table with fill_value
df.pivot_table(index='UniqueID', columns='CategoryType', values='Value',
aggfunc='sum', fill_value='')
CategoryType Cat1 Cat2
UniqueID
A apple banana
B orange
C news
D orange blue

pivot works just fine:
df = df.pivot(index = "UniqueID", columns = "CategoryType", values = "Value")

Take Me so long time to think outside the box :)
index = df.UniqueID.unique()
columns = df.CategoryType.unique()
df1= pd.DataFrame(index=index, columns=columns)
df['match']=df.UniqueID.astype(str)+df.CategoryType
A=dict( zip( df.match, df.Value))
df1.apply(lambda x : (x.index+x.name)).applymap(A.get).replace({None:''})
Out[406]:
Cat1 Cat2
A apple banana
B orange
C news
D orange blue

Related

Reporting Multiple Response questions

I have a csv dataset like this where we asked favorite colors:
id q1 q2 q3
1 red blue green
2 blue green .
3 green . .
4 blue . .
5 . . .
Is PowerBI able to handle this type of reporting, I've seen recommendations to Unpivot the data which I could do BUT i would like to keep the results % based on respondents NOT on mentions, meaning % should be calculated by diving by 4 (people that answered a favorite color) son for example for RED result should be:
Green = 3/4 = 75% (based on 4 respondents)
Instead of
Green = 3/7 = 43% (based on 7 colors mentioned)
Thanks!

After unpivoting your sample data table looks like this:
ID
Attribute
Value
1
q1
red
1
q2
blue
1
q3
green
2
q1
blue
2
q2
green
3
q1
green
4
q1
blue
Now you can use this calculated table
% Colors =
VAR numIDs =
DISTINCTCOUNT('Table'[ID])
RETURN
SUMMARIZE(
'Table',
'Table'[Value],
"Pct", DIVIDE(COUNT('Table'[Value]), numIDs)
)
to get this result:

PBI | Ranking dynamically based in different levels of product granularity

I hope everyone that's reading this message is having an wonderful day
I'm running into a problem, that I would really appreciate your help :slightly_smiling_face:
I have a dataset in PBI, where I have to Rank per 3 different LEVELS of granularity (Cat1, Cat2, Cat3).
WHAT I WANT?
Sales Rank per Category 1 and Brand:
I want to rank by the total Sales (Brand in Category 1) / Total Sales (Cat1)
Sales Rank per Category 2 and Brand:
I want to rank by the total Sales (Brand in Category 2) / Total Sales (Cat2)
Sales Rank per Category 3 and Brand:
I want to rank by the total Sales (Brand in Category 3) / Total Sales (Cat3)
Explaining further:
Example 1:
Having Category 1 filtered, Category 2 filtered and Category 3 filtered I want to:
Rank by the Sales of each Brand in each Category 3 Item / Total Sales(Cat3 item)
Columns Cat1, Cat2 and Cat3 are filtered in the Matrix
Company ABC has 1000$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Body and segment from Cat3 Body Milk
Company XYZ has 800$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Body and segment from Cat3 Body Milk
Company DZA has 200$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Body and segment from Cat3 Body Milk
Find in BOLD the expected OUTPUT
Expected Output:
Brand
Cat1
Cat2
Cat3
SalesQuantity
Sales MS %
Rank by MS %
ABC
Beauty
Body
Body Milk
1000
50%
1
XYZ
Beauty
Body
Body Milk
800
40%
2
DZA
Beauty
Body
Body Milk
200
10%
3
Example 2:
Having Category 1 filtered, Category 2 filtered and Category 3 NOT filtered I want to:
Rank by the Sales of each Brand for each Category 2 Item / Total Sales(Cat2 item)
Example:
Columns Cat1, Cat2 are filtered in the Matrix
Company ABC has 1000$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Body
Company ABC has 300$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Hair
Company XYZ has 800$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Body
Company XYZ has 500$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Hair
Company DZA has 200$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Body
Company DZA has 200$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Hair
Company DZA has 250$ in Sales in the segment from Cat1 Beauty and
segment from Cat2 Hands
Find in BOLD the expected OUTPUT
Expected Output:
Brand
Cat1
Cat2
Sales Quantity
Sales MS %
Rank by MS %
ABC
Beauty
Body
1000
50%
1
ABC
Beauty
Hair
300
30%
2
XYZ
Beauty
Body
800
40%
2
XYZ
Beauty
Hair
500
50%
1
DZA
Beauty
Body
200
10%
3
DZA
Beauty
Hair
200
20%
3
DZA
Beauty
Hands
250
100%
1
Example 3:
Having Category 1 filtered, Category 2 NOT FILTERED and Category 3 NOT FILTERED I want to:
Rank by the Sales of each Brand for each Category 1 Item / Total Sales(Cat1 item)
Example:
Company ABC has 1000$ in Sales in the segment from Cat1 Beauty
Company ABC has 300$ in Sales in the segment from Cat1 Home
Company ABC has 300$ in Sales in the segment from Cat1 Men's
Clothing
Company XYZ has 800$ in Sales in the segment from Cat1 Beauty
Company XYZ has 500$ in Sales in the segment from Cat1 Home
Company XYZ has 500$ in Sales in the segment from Cat1 Men's
Clothing
Company DZA has 200$ in Sales in the segment from Cat1 Beauty
Company DZA has 200$ in Sales in the segment from Cat1 Home
Company DZA has 200$ in Sales in the segment from Cat1 Men's
Clothing
Company DZA has 200$ in Sales in the segment from Cat1 Women's
Clothing
Find in BOLD the expected OUTPUT
Expected Output:
Brand
Cat1
Sales Quantity
Sales MS %
Rank by MS %
ABC
Beauty
1000
50%
1
ABC
Home
300
30%
2
ABC
Men's Clothing
300
30%
2
XYZ
Beauty
800
40%
2
XYZ
Home
500
50%
1
XYZ
Men's Clothing
500
50%
1
DZA
Beauty
200
10%
3
DZA
Home
200
20%
3
DZA
Men's Clothing
200
20%
3
DZA
Women's Clothing
200
100%
1
What I need from you?
The Measure Market Share Volume % is well built? I Want that the measure calculates based in the Cat filtered
MS Volume % =
DIVIDE (
CALCULATE ( SUM(FACT_SALES[SALES_QTY] )),
CALCULATE (
SUM (FACT_SALES[SALES_QTY] ),
ALLEXCEPT (DIM_PRODUCT,DIM_PRODUCT[Cat1],DIM_PRODUCT[Cat2],DIM_PRODUCT[Cat3] )
),
What could be an example formula to give me what I want? I've tried with RANKX several times, but not seeing the expected results (you can find Rank MS % Measure that i'm trying to build in the link sent)
Any other suggestion to produce a similar output but with another structure?
DUMMY DATA:
Please find in the link [PBI + Dataset] PBI + Dataset
Thank you very much
Diego

google sheets - list (or numbering) depending on content in Columns and rows

in the following sheet i try to number rows depending on the content:
https://docs.google.com/spreadsheets/d/1Nu8il1f-5sR32hEPGbdHUSRODRbGhFRIV2sMzPv1GvY/edit?usp=sharing
I got 2 sheets. One with the content and one with a query sorted by column A(Kategory), B(SUB Kategory), C(SUB SUB Kategory).
This is the content sheet:
**Kategory SUB Kategory SUB SUB Kategory**
Fruits Orange Italia
Fruits Apple New Zealand
Fruits Cherry Australia
Vegetables Tomato France
Fish Salmon Canada
Meat Pork Ireland
Fruits Orange Israel
Fruits Apple Germany
Fish Salmon New Zealand
Fish Makrele Germany
Vegetables Tomato Spain
Vegetables Cucumber Germany
The goal is to number in the sheet with the query depending on the content as follows:
**Kategory SUB Kategory SUB SUB Kategory Numbering #**
Fish Makrele Germany 1.1.1
Fish Salmon Canada 1.2.1
Fish Salmon New Zealand 1.2.2
Fruits Apple Germany 2.1.1
Fruits Apple New Zealand 2.1.2
Fruits Cherry Australia 2.2.1
Fruits Orange Israel 2.3.1
Fruits Orange Italia 2.3.2
Meat Pork Ireland 3.1.1
Vegetables Cucumber Germany 4.1.1
Vegetables Tomato France 4.2.1
Vegetables Tomato Spain 4.2.2
The content will grow and the query updates it every time in the numbering sheet. So the numbering should update it aswell. Similar to a table of contents.
Maybe there is a way to achive this.
thanks jenar

You can do it by repeated use of match, first on unique values in column A, then on unique values in A&B combined, then on values in A&B&C combined (because we know that these are already unique):
ArrayFormula(if (A2:A="",,match(A2:A,UNIQUE(A2:A))&
"."&match(A2:A&B2:B,unique(A2:A&B2:B),0)-match(A2:A,index(unique({A2:A,B2:B}),0,1),0)+1&
"."&match(A2:A&B2:B&C2:C,A2:A&B2:B&C2:C,0)-match(A2:A&B2:B,A2:A&B2:B,0)+1))
or slightly shorter:
=ArrayFormula(if (A2:A="",,match(A2:A,UNIQUE(A2:A))&
"."&match(A2:A&B2:B,unique(A2:A&B2:B),0)-match(A2:A&"*",unique(A2:A&B2:B),0)+1&
"."&match(A2:A&B2:B&C2:C,A2:A&B2:B&C2:C,0)-match(A2:A&B2:B,A2:A&B2:B,0)+1))

How to get subtotals on particular category in Power BI

I am trying to understand DAX. I want to get subtotals for category Cat1 in a Matrix visual. This is my table
Cat1 Cat2 Sales
Fruits Apples 30
Fruits Apples 50
Fruits Apples 100
Fruits Cherries 50
Fruits Cherries 80
Vegetables Tomatoes 20
Vegetables Tomatoes 30
Vegetables Tomatoes 10
After creating measure below,
Sales running total in Cat1 =
CALCULATE(
SUM('Table1'[Sales]),
FILTER(
ALLSELECTED('Table1'[Cat1]),
ISONORAFTER('Table1'[Cat1], MAX('Table1'[Cat1]), DESC)
I would expect subtotals to appear for Cat1 only, but they appear also for Cat2. And the result in Matrix does not change if I change Cat1 to Cat2. What is then the purpose of having Cat1 or Cat2 in the DAX formula?
Example of expected result:
Cat1 Cat2 Sales
Fruits Apples 30
Fruits Apples 50
Fruits Apples 100
Fruits Cherries 50
Fruits Cherries 80
Total: 310
Vegetables Tomatoes 20
Vegetables Tomatoes 30
Vegetables Tomatoes 10
Total: 60
And this is what I get:
Cat1 Cat2 Sales
Fruits Apples 30
Fruits Apples 50
Fruits Apples 100
Total: 180
Fruits Cherries 50
Fruits Cherries 80
Total: 110
Total: 310
Vegetables Tomatoes 20
Vegetables Tomatoes 30
Vegetables Tomatoes 10
Total: 60
Total: 60
Total: 370

Create a table with all permutations of some column values in SAS

I am working in SAS Enterprise guide and want to create a table that contains all possible permutations of some columns. Here is an example:
Lets say I have three columns
apple pear plum
0 good blue
1 middle violet
bad
I would want my output table to look as follows:
apple pear plum
0 good blue
0 good violet
0 middle blue
0 middle violet
0 bad blue
0 bad violet
1 good blue
1 good violet
1 middle blue
1 middle violet
1 bad blue
1 bad violet
My actual code has more columns with more distinct values, so hard coding is definitely not an option. How can I create such a table in SAS?
Thanks up front for the help!

You can use PROC SQL to create full cross product.
proc sql ;
create table want as
select *
from (select distinct apple from have where not missing(apple))
, (select distinct pear from have where not missing(pear))
, (select distinct plum from have where not missing(plum))
;
quit;

PROC SUMMARY
data testx;
input apple pear $ plum $;
cards;
0 good blue
1 middle violet
1 bad blue
;;;;
run;
proc summary nway completetypes chartype;
class _all_;
output out=testb(drop=_:);
run;
proc print;
run;
Obs apple pear plum
1 0 bad blue
2 0 bad violet
3 0 good blue
4 0 good violet
5 0 middle blue
6 0 middle violet
7 1 bad blue
8 1 bad violet
9 1 good blue
10 1 good violet
11 1 middle blue
12 1 middle violet

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to pick which column to unstack a dataframe on - python-2.7

You can use pivot Edit: With some more edit and inspiration from #piRsquared's answer, df.pivot('UniqueID', 'CategoryType', 'Value').replace({None: ''}).rename_axis(None, 1).reset_index() UniqueID Cat1 Cat2 0 A apple banana 1 B orange 2 C news 3 D orange blue

You can use pivot_table with fill_value df.pivot_table(index='UniqueID', columns='CategoryType', values='Value', aggfunc='sum', fill_value='') CategoryType Cat1 Cat2 UniqueID A apple banana B orange C news D orange blue

pivot works just fine: df = df.pivot(index = "UniqueID", columns = "CategoryType", values = "Value")

Related

Reporting Multiple Response questions

PBI | Ranking dynamically based in different levels of product granularity

google sheets - list (or numbering) depending on content in Columns and rows

How to get subtotals on particular category in Power BI

Create a table with all permutations of some column values in SAS

Categories

Resources