Which datasets-merging operation would do this in Pandas? - python-2.7

Lets say I have two pandas DataFrames, X and Y:
X =
+---+----------+---------+
| | Value1 | Value2 |
+---+----------+---------+
| A | 1 | NaN |
| B | 0 | 0 |
+---+----------+---------+
Y =
+---+----------+---------+
| | Value1 | Value2 |
+---+----------+---------+
| A | 2 | NaN |
| C | 30 | NaN |
+---+----------+---------+
I want to merge / join them based on the index (row name) resulting in this:
+---+----------+---------+
| | Value1 | Value2 |
+---+----------+---------+
| A | 1 | 2 |
| B | 0 | 0 |
| C | 30 | NaN |
+---+----------+---------+
Using merge and 'outer', the resulting table has columns per table, instead of just concatenating. I need something that appends new rows to the end, but also appends new columns for a matching index.
This is the result of an 'outer' merge:
+---+----------+---------+----------+---------+
| | Value1_X | Value2_X| Value1_Y | Value2_Y|
+---+----------+---------+----------+---------+
| A | 1 | NaN | 2 | NaN |
| B | 0 | 0 | NaN | NaN |
| C | NaN | NaN | 30 | NaN |
+---+----------+---------+----------+---------+
Which is almost what I want, but ignoring the original column labels...

On the result of the 'outer' merge:
X =
+---+----------+---------+----------+---------+
| | Value1_X | Value2_X| Value1_Y | Value2_Y|
+---+----------+---------+----------+---------+
| A | 1 | NaN | 2 | NaN |
| B | 0 | 0 | NaN | NaN |
| C | NaN | NaN | 30 | NaN |
+---+----------+---------+----------+---------+
do, X = X.apply(lambda x: pd.Series(x.dropna().values), axis = 1)
which will give
0 1
A 1.0 2.0
B 0.0 0.0
C 30.0 NaN

Related

How to Sum all working days for each month but restart from 0 for every month in power Bi Dax

I would like to know how could I get the Sum of all working days for specific month but in the table starting each month's Sum over again.
This is my DateTable Now with this query for Work Days Sum:
Work Days Sum =
CALCULATE (
SUM ( 'DateTable'[Is working Day] ),
ALL ( 'DateTable' ),
'DateTable'[Date] <= EARLIER ( 'DateTable'[Date] )
)
Date | Month Order | Is working day | Work Days Sum |
January - 21 331
2022/01/01 | 1 | 0 | |
2022/01/02 | 1 | 0 | |
2022/01/03 | 1 | 1 | 1 |
2022/01/04 | 1 | 1 | 2 |
2022/01/05 | 1 | 1 | 3 |
2022/01/06 | 1 | 1 | 4 |
.....
2022/01/27 | 1 | 1 | 19 |
2022/01/28 | 1 | 1 | 20 |
2022/01/29 | 1 | 0 | 20 |
2022/01/30 | 1 | 0 | 20 |
2022/01/31 | 1 | 1 | 21 |
February 20 890
2022/02/01 | 2 | 1 | 22 |
2022/02/02 | 2 | 1 | 23 |
2022/02/03 | 2 | 1 | 24 |
2022/02/04 | 2 | 1 | 25 |
|
|
V
Date | Month Order | Is working day | Work Days Sum |
January - 21 21
2022/01/01 | 1 | 0 | |
2022/01/02 | 1 | 0 | |
2022/01/03 | 1 | 1 | 1 |
2022/01/04 | 1 | 1 | 2 |
2022/01/05 | 1 | 1 | 3 |
2022/01/06 | 1 | 1 | 4 |
.....
2022/01/27 | 1 | 1 | 19 |
2022/01/28 | 1 | 1 | 20 |
2022/01/29 | 1 | 0 | 20 |
2022/01/30 | 1 | 0 | 20 |
2022/01/31 | 1 | 1 | 21 |
February 20 41
2022/02/01 | 2 | 1 | 1 |
2022/02/02 | 2 | 1 | 2 |
2022/02/03 | 2 | 1 | 3 |
2022/02/04 | 2 | 1 | 4 |
2022/02/05 | 2 | 0 | 4 |
.....
Any idea on how I can change my dax query to achieve output of second table below the down arrow would be much appreciated.

Yearly conditional sum in SAS

I have a below table
+------+------+------+------+------+-----+
| Yr | col1 | col2 | col3 | col4 | PQR |
+------+------+------+------+------+-----+
| 2012 | 1 | 0 | 1 | 1 | 2 |
| 2012 | 0 | 1 | 0 | 0 | 4 |
| 2013 | 1 | 1 | 1 | 1 | 6 |
| 2014 | 0 | 0 | 0 | 0 | 8 |
| 2012 | 1 | 0 | 1 | 1 | 7 |
| 2013 | 0 | 1 | 0 | 0 | 3 |
| 2014 | 1 | 0 | 1 | 1 | 2 |
| 2012 | 0 | 1 | 0 | 0 | 10 |
| 2014 | 0 | 0 | 1 | 0 | 12 |
| 2014 | 0 | 0 | 0 | 0 | 5 |
+------+------+------+------+------+-----+
The output I want is as below
+------+-------+------+------+------+
| | Total | 2012 | 2013 | 2014 |
+------+-------+------+------+------+
| col1 | 17 | 9 | 6 | 2 |
| col2 | 23 | 14 | 9 | 0 |
| col3 | 29 | 9 | 6 | 14 |
| col4 | 17 | 9 | 6 | 2 |
+------+-------+------+------+------+
For row col1 in my output table
The column `Total` is `SUM(PQR)` when `col1` is 1 my input table
The value `17` is `SUM(PQR)` when `col1` is 1 in my input table
The value in col `2012` is `SUM(PQR)` when `col1` is 1 and `Yr=2012` in my input table
The value `9` is `SUM(PQR)` when `col1` is 1 and `Yr=2012` in my input table
Similarly 6 in column 2013 is SUM(PQR) when col1 is 1 and Yr is 2013
Hope the process to get output table is understood
I want to achieve the above result with SAS.
Any help will be really appreciated
Transpose the data into a categorical form and use PQR as a weight in your aggregating sum. Proc TABULATE is very adept at creating such tabulations.
data have;
infile datalines dlm='|'; input
Yr col1 col2 col3 col4 PQR ; datalines;
| 2012 | 1 | 0 | 1 | 1 | 2 |
| 2012 | 0 | 1 | 0 | 0 | 4 |
| 2013 | 1 | 1 | 1 | 1 | 6 |
| 2014 | 0 | 0 | 0 | 0 | 8 |
| 2012 | 1 | 0 | 1 | 1 | 7 |
| 2013 | 0 | 1 | 0 | 0 | 3 |
| 2014 | 1 | 0 | 1 | 1 | 2 |
| 2012 | 0 | 1 | 0 | 0 | 10 |
| 2014 | 0 | 0 | 1 | 0 | 12 |
| 2014 | 0 | 0 | 0 | 0 | 5 |
run;
data have_row_id / view=have_row_id;
set have;
rowid+1;
run;
proc transpose data=have_row_id out=have_categorical;
by rowid yr pqr;
run;
proc tabulate data=have_categorical;
class yr _name_;
var col1;
weight pqr;
table _name_='', col1='' * sum=''*f=8. * (all='Total' yr='') / nocellmerge;
run;
The ='' removes labelling cells and compactifies the output.

Create column to classify rows based on realted tables DAX PowerBI

I have simplified my problem to solve. Lets suppose I have three tables. One containing data and specific codes that identify objects lets say Apples.
+-------------+------------+-----------+
| Data picked | Color code | Size code |
+-------------+------------+-----------+
| 1-8-2018 | 1 | 1 |
| 1-8-2018 | 1 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 2 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 3 | 3 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 5 | 3 |
| 1-8-2018 | 6 | 1 |
| 1-8-2018 | 6 | 2 |
| 1-8-2018 | 6 | 2 |
+-------------+------------+-----------+
And i have two related helping tables to help understand the codes (their relationships are inactive in the model due to ambiguity with other tables in the real case).
+-----------+--------+
| Size code | Size |
+-----------+--------+
| 1 | Small |
| 2 | Medium |
| 3 | Large |
+-----------+--------+
and
+------------+----------------+-------+
| Color code | Color specific | Color |
+------------+----------------+-------+
| 1 | Light green | Green |
| 2 | Green | Green |
| 3 | Semi green | Green |
| 4 | Red | Red |
| 5 | Dark | Red |
| 6 | Pink | Red |
+------------+----------------+-------+
Lets say that I want to create an extra column in the original table to determine which apples are class A and class B given that medium green Apples are class A and large Red apples are class B, the other remain blank as the example below.
+-------------+------------+-----------+-------+
| Data picked | Color code | Size code | Class |
+-------------+------------+-----------+-------+
| 1-8-2018 | 1 | 1 | |
| 1-8-2018 | 1 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 2 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 3 | 3 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 5 | 3 | B |
| 1-8-2018 | 6 | 1 | |
| 1-8-2018 | 6 | 2 | |
| 1-8-2018 | 6 | 2 | |
+-------------+------------+-----------+-------+
What's the proper DAX to use given the relationships are initially inactive. Preferably solvable without creating any further additional columns in any table. I already tried codes like:
CALCULATE (
"A" ;
FILTER ( 'Size Table' ; 'Size Table'[Size] = "Medium");
FILTER ( 'Color Table' ; 'Color Table'[Color] = "Green")
)
And many variations on the same principle
Given that the relationships are inactive, I'd suggest using LOOKUPVALUE to match ID values on the other tables. You should be able to create a calculated column as follows:
Class =
VAR Size = LOOKUPVALUE('Size Table'[Size],
'Size Table'[Size code], 'Data Table'[Size code])
VAR Color = LOOKUPVALUE('Color Table'[Color],
'Color Table'[Color code], 'Data Table'[Color code])
RETURN SWITCH(TRUE(),
(Size = "Medium") && (Color = "Green"), "A",
(Size = "Large") && (Color = "Red"), "B", BLANK())
If your relationships are active, then you don't need the lookups:
Class = SWITCH(TRUE(),
(RELATED('Size Table'[Size]) = "Medium") &&
(RELATED('Color Table'[Color]) = "Green"),
"A",
(RELATED('Size Table'[Size]) = "Large") &&
(RELATED('Color Table'[Color]) = "Red"),
"B",
BLANK())
Or a bit more elegantly written (especially for more classes):
Class =
VAR SizeColor = RELATED('Size Table'[Size]) & " " & RELATED('Color Table'[Color])
RETURN SWITCH(TRUE(),
SizeColor = "Medium Green", "A",
SizeColor = "Large Red", "B",
BLANK())

IF MATCH multiple criteria

Table 1
| | 1 Jan 2018 | 2 Jan 2018 | 3 Jan 2018 | 4 Jan 2018 | 5 Jan 2018 |
|----|------------|------------|------------|------------|------------|
| A1 | | | | | |
| A2 | | | | | |
| A3 | | | | | |
| A4 | | | | | |
| A5 | | | | | |
| A6 | | | | | |
Table 2
|----|----------|-----------|-----------|-----------|-----------|
| A1 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
| A2 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
| A3 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
| A4 | 3-Jan-18 | 6-Jan-18 | 10-Jan-18 | 13-Jan-18 | 17-Jan-18 |
| A5 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
| A6 | 3-Jan-18 | 10-Jan-18 | 17-Jan-18 | 24-Jan-18 | 31-Jan-18 |
IF MATCH
=IF(MATCH(A2,$A$27:$A$54,0) & MATCH(C1,$B$27:$S$54,0),"1","")
Getting #N/A error out of it
Trying to get apply formula onto the cells in Table 1 to lookup values in Table 2
If it matches, output is 1, else 0
Table & Image above to clearly illustrate & experiment out.
Thanks in advance (:
Applied formula & Output
Firstly, MATCH() returns a number that represents the position of a found match so your formula says IF(1 & 1,"1","") for your first potential match, there is no logical here.
The first ammendment would be to force a True / False output: =IF(AND(ISNUMBER(MATCH()),ISNUMBER(MATCH())),"1","")
You still have the issue that the second match is referencing the entire range of resuts though, you really want this to only look through the row that meets the first criteria, for this we will use an array formula to build the array you want to use:
EDIT: You can't buld an array from Match as it returns a single integer:
=LARGE(IF(B$1=IF($A2=$A$27:$A$54,$B$27:$S$54),1,0),1)
This is an array formula, while still in the formula bar hit Ctrl+Shift+Enter
The Inner IF() statemnet is building an array of each row, providing values where column A matches and FALSE where it doesn't. The outter IF() statement is then evaluating 0 or 1 whether it finds the date in that new array...
I have wrapped this in a LARGE() to return the first largest number so if a single match is found it will return that 1. If you want the blank you can wrap the whole thing in another IF() statement; IF([formula]=0,"",1)

SAS - how to 'sum up' based on consecutive occurrences

First time post so hopefully someone can kindly assist on this problem I'm facing within SAS EG (still learning SAS coding so please be kind!)
If you see a snippet of the dataset below what I'm trying to do is tally up the scores (pts) by Ref based on consecutive occurrences that flag has showed for that Ref.
For Example:
If you take Ref 505 for A_Flag there is 2 different sets of consecutive occurrences of that flag then scoring will be as follows:
1st ID > 1st instance = 25 points
2nd ID > 2nd instance but 1st consecutive instance = double to 50 points
3rd ID > 0 instance = 0 points
4th ID > 1st instance = 25 points
5th ID > 2nd instance but 1st consecutive instance = double to 50 points
6th ID > 0 instance = 0 points
Therefore for this Ref A_Pts will be 150 points.
Another example:
If you take Ref 527 for B_Flag there is 4 consecutive occurrences of that flag so coring per ID:
1st ID > 0 instance = 0 points
2nd ID > 1st instance = 10 points
3rd ID > 2nd instance but 1st consecutive instance = double to 20 points
4th ID > 3rd instance but 2nd consecutive instance = double to 40 points
5th ID > 4th instance but 3rd consecutive instance = double to 80 points
Therefore for this Ref B_Pts will be 150 points
I have to say the data is in the necessary order for what I'm trying to achieve.
I'd tried using LAG function but that will only work based on the 1st consecutive instance.
I also tried calculate a count - an enumeration variable based on cats(Ref,A_Flag) - but it then orders the data incorrectly and doesnt count up accordingly
Hopefully this makes sense to someone out there!
The dataset in question:
+-----------+-----+--------+--------+--------+-------+-------+
| date | Ref | FormID | A_Flag | B_Flag | A_Pts | B_Pts |
+-----------+-----+--------+--------+--------+-------+-------+
| 01-Feb-17 | 505 | 74549 | A | | 25 | 0 |
| 01-Feb-17 | 505 | 74550 | A | | 25 | 0 |
| 10-Jan-17 | 505 | 82900 | | B | 0 | 10 |
| 13-Jan-17 | 505 | 82906 | A | | 25 | 0 |
| 09-Jan-17 | 505 | 82907 | A | | 25 | 0 |
| 11-Jan-17 | 505 | 82909 | | B | 0 | 10 |
| 03-Jan-17 | 527 | 62549 | A | | 25 | 0 |
| 04-Jan-17 | 527 | 62550 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76151 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76152 | A | B | 25 | 10 |
| 04-Jan-17 | 527 | 76153 | A | B | 25 | 10 |
+-----------+-----+--------+--------+--------+-------+-------+
Desired output (unless there is a better suggestion):
+-----------+-----+--------+--------+--------+-----------+-----------+
| date | Ref | FormID | A_Flag | B_Flag | A_Pts_Agg | B_Pts_Agg |
+-----------+-----+--------+--------+--------+-----------+-----------+
| 01-Feb-17 | 505 | 74549 | A | | 25 | 0 |
| 01-Feb-17 | 505 | 74550 | A | | 50 | 0 |
| 10-Jan-17 | 505 | 82900 | | B | 0 | 10 |
| 13-Jan-17 | 505 | 82906 | A | | 25 | 0 |
| 09-Jan-17 | 505 | 82907 | A | | 50 | 0 |
| 11-Jan-17 | 505 | 82909 | | B | 0 | 10 |
| 03-Jan-17 | 527 | 62549 | A | | 25 | 0 |
| 04-Jan-17 | 527 | 62550 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76151 | | B | 0 | 20 |
| 04-Jan-17 | 527 | 76152 | A | B | 25 | 40 |
| 04-Jan-17 | 527 | 76153 | A | B | 50 | 80 |
+-----------+-----+--------+--------+--------+-----------+-----------+
So when totalled up it'll be
+-----+-----------+-----------+
| Ref | A_Pts_Agg | B_Pts_Agg |
+-----+-----------+-----------+
| 505 | 150 | 20 |
| 527 | 100 | 150 |
+-----+-----------+-----------+
Try this:
data have;
infile cards dlm='|';
input date :date7. Ref :8. FormID :8. A_Flag :$1. B_Flag :$1. A_Pts :8. B_Pts :8.;
format date date7.;
cards;
| 01-Feb-17 | 505 | 74549 | A | | 25 | 0 |
| 01-Feb-17 | 505 | 74550 | A | | 25 | 0 |
| 10-Jan-17 | 505 | 82900 | | B | 0 | 10 |
| 13-Jan-17 | 505 | 82906 | A | | 25 | 0 |
| 09-Jan-17 | 505 | 82907 | A | | 25 | 0 |
| 11-Jan-17 | 505 | 82909 | | B | 0 | 10 |
| 03-Jan-17 | 527 | 62549 | A | | 25 | 0 |
| 04-Jan-17 | 527 | 62550 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76151 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76152 | A | B | 25 | 10 |
| 04-Jan-17 | 527 | 76153 | A | B | 25 | 10 |
;
run;
data want;
set have;
by Ref;
retain A_pts_agg B_pts_agg;
if first.Ref then do;
A_pts_agg = A_pts;
B_pts_agg = B_pts;
end;
if lag(A_flag) ne (A_flag) then A_pts_agg = A_pts;
else if A_flag = 'A' then A_pts_agg = A_pts_agg * 2;
if lag(B_flag) ne (B_flag) then B_pts_agg = B_pts;
else if B_flag = 'B' then B_pts_agg = B_pts_agg * 2;
run;