PROC RANK by score: minimum number of a counts of target variable

PROC RANK by score: minimum number of a counts of target variable - sas

I have used SAS PROC RANK to rank a population based on score and create groups of equal size. I would like to create groups such that there is a minimum number of target variable (Goods and Bads) in each bin. Is there a way to do that using PROC RANK? I understand that the size of each bin would be different.
For example in the table below, I have created 10 groups based on a certain score. As you can see the Non cures in the lower deciles are sparse. I would like to create groups such there there are at least 10 Non cures in each group.
Cures and Non cures are based on same variable: Cure = 1 and Cure = 0.
Decile cures non cures
0 262 94
1 314 44
2 340 19
3 340 13
4 353 10
5 373 5
6 308 3
7 342 3
8 440 4
9 305 3

Related

KDB moving percentile using Swin function

I am trying to create a list of the 99th and 1st percentiles. Rather than a single percentile for today. I wanted percentiles for 500 days each using the prior 500 days. The functions I was using for this are the following
swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
percentile:{[x;y] y (100 xrank y:asc y) bin x}
swin[percentile[99;];500;List].
The issue I come across is that the 99th percentile calculates perfectly, but the 1st percentile makes the entire list = 0. a bit lost as to why it would do that. suggestions appreciated!

What's causing the zeros is two-fold:
What behaviour do you want for the earliest 500 days when there isn't 500 days of history to work with? On day 1 there's only 1 datapoint, on day 2 only 2 etc. Only on the 500th day is there 500 days of actual data to work with. By default that swin function fills the gaps with some seed value
You're using zero as that seed value, aka w#0
For example a 5 day lookback on each date looks something like:
q)swin[::;5;1 2 3 4 5]
0 0 0 0 1
0 0 0 1 2
0 0 1 2 3
0 1 2 3 4
1 2 3 4 5
You have zeros until you have data, so naturally the 1st percentile will pick up the zeros for the first roughly 500 dates.
So then you can decide to seed with a different value, or else possibly exclude zeros from your percentile function:
q)List:1000?1000
q)percentile:{[x;y] y (100 xrank y:asc y except 0) bin x}
q)swin[percentile[1;];500;List]
908 360 360 257 257 257 90 90 90 90 90 90 90 90...
If zeros are a legitimate value in your list and can't be excluded then maybe seed the swin with some other value that you know won't be in the list (negatives? infinity? null?) and then exclude that seed from the percentile function.
EDIT: A final alternative is to use a different sliding window function which doesn't fill gaps with a seed value, e.g.
q)swin2:{[f;w;s] f each(),/:{neg[x]sublist y,z}[w]\[s]}
q)swin2[::;5;1 2 3 4 5]
,1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
q)percentile:{[x;y] y (100 xrank y:asc y) bin x}
q)swin2[percentile[99;];500;List]
908 908 908 908 908 908 908 908 908 908 908 959 959..
q)swin2[percentile[1;];500;List]
908 360 360 257 257 257 90 90 90 90 90 90 90 90 90..

Power BI - get the graph out of the data set

It seems very simple but I can not get the graph to show the data I want.
So, I have got a lot of IDs with the end and start dates (LENGHT) and open items (OPEN). Each day has got availability (AVAIL) and there is nil used (USED) at day 1.
ID LENGTH OPEN USED AVAIL
1A 6 100 0 2400
I need to create the NEW_DAY column with count of the LENGHT. In this case the result would be
ID LENGTH NEW_DAY OPEN USED AVAIL
1A 6 1 100 0 2400
1A 6 2 100 0 2400
1A 6 3 100 0 2400
1A 6 4 100 0 2400
1A 6 5 100 0 2400
1A 6 6 100 0 2400
Note, I have hundreds of IDs so can not hard code it as 1A and needs to be dynamic.

I am not sure, but maybe this might help you.
If you add a blank query and add this expression:
= List.Repeat({1, 2}, 3)
you will get the first argument {1, 2} repeated three times.
When you separate your ID in a new column and pass this column to the code above (the same goes for the second argument) it might work.

replace multiple column values at the same time

I would like to replace multiple column values at the same time in a dataframe. I would like to change 2 to 1, 1 to 2.
data=data.frmae(store=c(122,323,254,435,654,342,234,344)
,cluster=c(2,2,2,1,1,3,3,3))
The problem in my code is after it changes 2 to 1 , it changes these 1's to 2.
Can I do it in dplyr or sth? Thank you
Desired data set below
store cluster
122 1
323 1
254 1
435 2
654 2
342 3
234 3
344 3

Duplicate each row as many times as is given in a variable

I have a set of individuals with characteristics. Each individual belongs to one or more group. I need to merge individuals to group characteristics, by firstly duplicating each row of individual data set as many times as is given by n_groups.
The data looks like
id age n_groups
1 50 2
2 46 1
3 51 3
4 44 2
I need to have
id age n_groups group_index
1 50 2 1
1 50 2 2
2 46 1 1
3 51 3 1
3 51 3 2
3 51 3 3
4 44 2 1
4 44 2 1
It seems like a very easy task, and I need some variation of expand with variable number of duplicates. Any ideas if there is a simple command for this?
Thanks!

Appears the solution is very standard. The expand command indeed allows for expanding based on variable: expand n_groups solved the question.

GLPK Mathprog group of sets

I'm trying to code a model that can solve the Multiple Choice Knapsack Problem (MCKP) as described in Knapsack Problems involving dimensions, demands and multiple
choice constraints: generalization and transformations between
formulations (Found here, see figures 8 an 9). You can find an example GMPL model of the basic knapsack problem here. For anyone looking for a quick explanation of the knapsack problem read the following illustration:
You are an adventurer and have stumbled upon a treasure trove. There are hundreds of wonderful items 'i' that each have a weight 'w' and a profit 'p'. Say you have a knapsack with weight capacity as 'c' and you want to make the most profit without overfilling your knapsack. What is the best combination of items such that you make the most profit?
In code:
maximize obj :
sum{(i,w,p) in I} p*x[i];
Where 'I' is the basket of items, and x[i] is the binary variable (0 = not chosen, 1 = chosen)
The problem that I am having trouble with is the addition of multiple groups. MCKP requires exactly one item to be selected from each group. So, for example, lets say we have three groups from which to choose. They could be represented as follows (ignore actual values):
# Items: index, weight, profit
set ONE :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
# Items: index, weight, profit
set TWO :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
# Items: index, weight, profit
set THREE :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
I am confused on how I can iterate over each group and how I would define the variable x. I assume it would look something like:
var x{i,j} binary;
Where i is the index of items in j of groups. This assumes I define a set of sets:
set Groups{ONE,TWO,THREE}
Then I'd iterate over the groups of items:
sum{j in Groups, (i,w,p) in Groups[j]} p*x[i,j];
But I am concerned because I believe GMPL does not support ordered sets. I have seen this related question where the answer suggests defining a set within a set. However, I am not sure how it would apply in this particular scenario.
My main question, to be clear: In GMPL, how can I iterate over sets of sets (in this case a set of groups where each group has a set of items)?

Unlike AMPL, GMPL doesn't support sets of sets. Here's how to do it in AMPL:
set Groups;
set Items{Groups} dimen 3;
# define x and additional constraints
# ...
maximize obj: sum{g in Groups, (i,w,p) in Items[g]} p*x[i];
data;
set Groups := ONE TWO THREE;
# Items: index, weight, profit
set Items[ONE] :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
# Items: index, weight, profit
set Items[TWO] :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
# Items: index, weight, profit
set Items[THREE] :=
1 10 10
2 10 10
3 15 15
4 20 20
5 20 20
6 24 24
7 24 24
8 50 50;
If you have no more than 300 variables, you can use a free student version of AMPL and solvers (e.g. CPLEX or Gurobi).

Based on this gnu mailing list thread, I believe GMPL/MathProg has support for what you want to do. Here's their example:
set WORKERS;
param number_of_shifts, integer, >= 1;
set WORKER_CLIQUE{1..number_of_shifts}, within WORKERS;
data;
set WORKERS := Jack Kate Sawyer Sun Juliet Richard Desmond Hugo;
param number_of_shifts := 2;
set WORKER_CLIQUE[1] := Sawyer, Juliet;
set WORKER_CLIQUE[2] := Jack, Kate, Hugo;
In your example, I assume you'd use something like, set Items{1..3}, within Groups; with the data block from #vitaut's answer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

PROC RANK by score: minimum number of a counts of target variable - sas

Related

KDB moving percentile using Swin function

Power BI - get the graph out of the data set

replace multiple column values at the same time

Duplicate each row as many times as is given in a variable

GLPK Mathprog group of sets

Categories

Resources