Max difference in a group - powerbi

For every group I'd like to find the max observation for VarB and subtract the first value in each group of Var A. Max(VarB for group1) - FirstObs(VarA group1). Hopefully this makes sense. Below is the desired results in table form and my attempt at the code
VarA VarB Group Result Index
10 11 1 (10-11=-1) 1
11 4 1 (10-11=-1) 2
...
12 7 1 (10-11=-1) 5
9 11 2 (9-11=-2) 6
13 4 2 (9-11=-2) 7
...
11 7 2 (9-11=-2) 11
Maxdiff =
VAR CurrGroup = Table1[Group]
VAR MaxVal = CALCULATE(MAX(Table1[VarB]), ALL(Table1), Table1[Group] = CurrGroup)
VAR MinIndex = CALCULATE(MIN(Table1[Index]), ALL(Table1), Table1[Group] = CurrGroup)
RETURN LOOKUPVALUE(Table1[VarB], Table1[Group], MaxVal) -
LOOKUPVALUE(Table1[VarA], Table1[Index], MinIndex)
I get the error "a table of multiple values was supplied where a single value was expected"

The problem is that you are trying to look up the VarB value for a Group that matches your MaxVal. This doesn't make sense since you probably don't want to match a Group number to a VarB value. It returns multiple values since each group has multiple VarB values associated with it.
I think the following is what you are after:
MaxDiff =
VAR CurrGroup = Table1[Group]
VAR MaxVal = CALCULATE(MAX(Table1[VarB]), ALL(Table1), Table1[Group] = CurrGroup)
VAR MinIndex = CALCULATE(MIN(Table1[Index]), ALL(Table1), Table1[Group] = CurrGroup)
RETURN MaxVal - LOOKUPVALUE(Table1[VarA], Table1[Index], MinIndex)
This returns 1 and 2 for [Group] = 1 and 2.
(Your subtraction looks backward in your question.)

Related

Running multiple arrays in SAS

I am trying to run 3 arrays in my SAS code and input each value into the variables in the last array, however, each time I run this code, it only populates the CREVASC_age column. Please let me know any thoughts on how to populate each age variable, in the third array, using the matching variables in the other arrays.
data Outc_adjust3; set Outc_adjust2;
ARRAY outcvars{3} CABG MI CREVASC;
ARRAY outcdys{3} CABGDY MIDY CREVASCDY;
ARRAY outc_age{3} CABG_age MI_age CREVASC_age;
Do I= 1 to 3;
if outcvars{3} = 1 then outc_age{3} = ageatenroll + (outcdys(3)/365.25);
else if outcvars{3} = 0 then do;
if EXTFLAG = 0 AND EXT2FLAG = 0 then outc_age{3} = AGE_WHIENDFU;
if EXTFLAG = 1 AND EXT2FLAG = 0 then outc_age{3} = AGE_EXT1ENDFU;
if EXT2FLAG = 1 AND EXT2MRC = 1 then outc_age{3} = age_endfu;
if EXT2FLAG = 1 AND EXT2SRC = 1 then outc_age{3} = age_ext1endfu;
end;
end;
run;
You hard coded {3} when you needed {I}. That is why only the third variable of the array was being dealt with.
Change the {3} to {I} for all the array indexed references inside the loop.

Parsing periods in a column dataframe

I have a csv with one of the columns that contains periods:
timespan (string): PnYnMnD, where P is a literal value that starts the expression, nY is the number of years followed by a literal Y, nM is the number of months followed by a literal M, nD is the number of days followed by a literal D, where any of these numbers and corresponding designators may be absent if they are equal to 0, and a minus sign may appear before the P to specify a negative duration.
I want to return a data frame that contains all the data in the csv with parsed timespan column.
So far I have a code that parses periods:
import re
timespan_regex = re.compile(r'P(?:(\d+)Y)?(?:(\d+)M)?(?:(\d+)D)?')
def parse_timespan(timespan):
# check if the input is a valid timespan
if not timespan or 'P' not in timespan:
return None
# check if timespan is negative and skip initial 'P' literal
curr_idx = 0
is_negative = timespan.startswith('-')
if is_negative:
curr_idx = 1
# extract years, months and days with the regex
match = timespan_regex.match(timespan[curr_idx:])
years = int(match.group(1) or 0)
months = int(match.group(2) or 0)
days = int(match.group(3) or 0)
timespan_days = years * 365 + months * 30 + days
return timespan_days if not is_negative else -timespan_days
print(parse_timespan(''))
print(parse_timespan('P2Y11M20D'))
print(parse_timespan('-P2Y11M20D'))
print(parse_timespan('P2Y'))
print(parse_timespan('P0Y'))
print(parse_timespan('P2Y4M'))
print(parse_timespan('P16D'))
Output:
None
1080
-1080
730
0
850
16
How do I apply this code to the whole csv column while running the function processing csv?
def do_process_citation_data(f_path):
global my_ocan
my_ocan = pd.read_csv(f_path, names=['oci', 'citing', 'cited', 'creation', 'timespan', 'journal_sc', 'author_sc'],
parse_dates=['creation', 'timespan'])
my_ocan = my_ocan.iloc[1:] # to remove the first row
my_ocan['creation'] = pd.to_datetime(my_ocan['creation'], format="%Y-%m-%d", yearfirst=True)
my_ocan['timespan'] = parse_timespan(my_ocan['timespan']) #I tried like this, but sure it is not working :)
return my_ocan
Thank you and have a lovely day :)
Like with Python's builtin map, Pandas also has that method. You can check its documentation here. Since you already have your function ready which takes a single parameter and returns a value, you just need this:
my_ocan['timespan'] = my_ocan['timespan'].map(parse_timespan) #This will take each value in the column "timespan", pass it to your function 'parse_timespan', and update the specific row with the returned value
And here is a generic demo:
import pandas as pd
def demo_func(x):
#Takes an int or string, prefixes with 'A' and returns a string.
return "A" + str(x)
df = pd.DataFrame({"Column_1": [1, 2, 3, 4], "Column_2": [10, 9, 8, 7]})
print(df)
df['Column_1'] = df['Column_1'].map(demo_func)
print("After mapping:\n{}".format(df))
Output:
Column_1 Column_2
0 1 10
1 2 9
2 3 8
3 4 7
After mapping:
Column_1 Column_2
0 A1 10
1 A2 9
2 A3 8
3 A4 7

Pandas calculating column based on inter-dependent lagged values

I have a dataframe that looks like the following. The rightmost two columns are my desired columns:
Open Close open_to_close close_to_next_open open_desired close_desired
0 0 0 3 0 0
0 0 4 8 3 7
0 0 1 1 15 16
The calculations are as the following:
open_desired = close_desired(prior row) + close_to_next_open(prior row)
close_desired = open_desired + open_to_close
How do I implement the following in a loop manner? I am trying to do this until the last row.
df = pd.DataFrame({'open': [0,0,0], 'close': [0,0,0], 'open_to_close': [0,4,1], 'close_to_next_open': [3,8,1]})
df['close_desired'] = 0
df['open_desired'] = 0
##First step is to create open_desired in current row which is dependent on close_desired in previous row
df['open_desired'] = df['close_desired'].shift() + df['close_to_next_open'].shift()
##second step is to create close_desired in current row which is dependent on open_desired in current row
df['close_desired'] = df['open_desired'] + df['open_to_close']
df.fillna(0,inplace=True)
The only way I can think of doing this is with iterrows()
for row, v in df.iterrows():
if row>0:
df.loc[row,'open_desired'] = df.shift(1).loc[row, 'close_desired'] + df.shift(1).loc[row, 'close_to_next_open']
df.loc[row,'close_desired'] = df.loc[row, 'open_desired'] + df.loc[row, 'open_to_close']

vlookup function rounding up in vba

Does it exist a way to avoid getting a rounded up result when using vlookup formula in vba. Here is my code:
Sub test()
Dim lastrow, pressurecolumn, lastcolumn As String, total As Integer, x, row, irow, column As Double
lastrow = Range("B8").End(xlDown).Value + 7
Range("Pump_design[Total Pipe losses from plantroom]").ClearContents
For Each row In Columns("EZ")
For irow = 8 To lastrow
total = 0
For column = 4 To 153
x = Cells(irow, column).Value
If Not IsEmpty(x) Then
total = total + Application.WorksheetFunction.VLookup(x, Sheets("Pump Design").Range("Pump_design"), 154, False)
End If
Next column
Cells(irow, "EZ") = Round(total, 4)
If Cells(irow, "EZ") = 0 Then Cells(irow, "EZ").ClearContents
Next irow
row = irow + 1
Next row
End Sub
Just found the trick get total as double instead of integer

Pre-increment assginement as Row Number to List

i trying to assign a row number and a Set-number for List, but Set Number containing wrong number of rows in one set.
var objx = new List<x>();
var i = 0;
var r = 1;
objY.ForEach(x => objx .Add(new x
{
RowNumber = ++i,
DatabaseID= x.QuestionID,
SetID= i == 5 ? r++ : i % 5 == 0 ? r += 1 : r
}));
for Above code like objY Contains 23 rows, and i want to break 23 rows in 5-5 set.
so above code will give the sequence like[Consider only RowNumber]
[1 2 3 4 5][6 7 8 9][ 10 11 12 13 14 ].......
its a valid as by the logic
and if i change the logic for Setid as
SetID= i % 5 == 0 ? r += 1 : r
Result Will come Like
[1 2 3 4 ][5 6 7 8 9][10 11 12 13 14].
Again correct output of code
but expected for set of 5.
[1 2 3 4 5][ 6 7 8 9 10].........
What i missing.............
i should have taken my Maths class very Serious.
I think you want something like this:
var objX = objY.Select((x, i) => new { ObjX = x, Index = i })
.GroupBy(x => x.Index / 5)
.Select((g, i) =>
g.Select(x => new objx
{
RowNumber = x.Index + 1
DatabaseID = x.ObjX.QuestionID,
SetID = i + 1
}).ToList())
.ToList();
Note that i'm grouping by x.Index / 5 to ensure that every group has 5 items.
Here's a demo.
Update
it will be very helpful,if you can explain your logic
Where should i start? I'm using Linq methods to select and group the original list to create a new List<List<ObjX>> where every inner list has maximum 5 elements(less in the last if the total-count is not dividable by 5).
Enumerable.Select enables to project something from the input sequence to create something new. This method is comparable to a variable in a loop. In this case i project an anonymous type with the original object and the index of it in the list(Select has an overload that incorporates the index). I create this anonymous type to simply the query and because i need the index later in the GroupBy``.
Enumerable.GroupBy enables to group the elements in a sequence by a specified key. This key can be anything which is derivable from the element. Here i'm using the index two build groups of a maximum size of 5:
.GroupBy(x => x.Index / 5)
That works because integer division in C# (or C) results always in an int, where the remainder is truncated(unlike VB.NET btw), so 3/4 results in 0. You can use this fact to build groups of the specified size.
Then i use Select on the groups to create the inner lists, again by using the index-overload to be able to set the SetId of the group:
.Select((g, i) =>
g.Select(x => new objx
{
RowNumber = x.Index + 1
DatabaseID = x.ObjX.QuestionID,
SetID = i + 1
}).ToList())
The last step is using ToList on the IEnumerable<List<ObjX>> to create the final List<List<ObX>>. That also "materializes" the query. Have a look at deferred execution and especially Jon Skeets blog to learn more.