I want to display Top 10 associates list based on district.
If district number is same for 2 associates then it should rank by considering another variable sales.
If district and sales count also same for 2 associates then it should consider another variable while ranking.
For this scenario, will Proc rank work or is there another way to achieve this output?
Ex,
data input;
input assoc_nm dist_num sales_cnt sales_amt;
datalines;
raju 1 59 1000
kumar 1 59 1600
ramya 3 54 6900
lakshmi 2 65 9000
;
run;
output should be as follows for dist_num=1,
assoc_nm sales_cnt sales_amt rank
kumar 59 1600 1
raju 59 1000 2
for dist_num=2,
lakshmi 2 65 9000 1
for dist_num=3
ramya 3 54 6900
So for each district, you want the top 10, ranked/ordered by sales_cnt, and within the same sales_cnt, by sales_amt.
PROC SORT DATA=input ;
BY district sales_cnt sales_amt ;
RUN ;
DATA input2 ;
SET input ;
BY district sales_amt ;
IF FIRST.district THEN rank=0 ;
rank+1 ;
IF rank LE 10 ;
RUN ;
Simply filter, using a WHERE statement, for whichever district value you want.
Related
I have the following problem, I would like to sum up a column and divide the sum every line through the sum of the whole column till a specific value is reached. so in Pseudocode it would look like that:
data;
set auto;
sum_of_whole_column = sum(price);
subtotal[i] = 0;
i =1;
do until (subtotal[i] = 70000)
subtotal[i] = (subtotal[i] + subtotal[i+1])/sum_of_whole_column
i = i+1
end;
run;
I get the error that I haven't defined an array... so can I use something else instead of subtotal[i]?and how can I put a column in an array? I tried but it doesn't work (data = auto and price the column I want to put into an array)
data invent_array;
set auto;
array price_array {1} price;
run;
EDIT: maybe the dataset I used is helpful :)
DATA auto ;
LENGTH make $ 20 ;
INPUT make $ 1-17 price mpg rep78 ;
CARDS;
AMC Concord 4099 22 3
AMC Pacer 4749 17 3
Audi 5000 9690 17 5
Audi Fox 6295 23 3
BMW 320i 9735 25 4
Buick Century 4816 20 3
Buick Electra 7827 15 4
Buick LeSabre 5788 18 3
Cad. Eldorado 14500 14 2
Olds Starfire 4195 24 1
Olds Toronado 10371 16 3
Plym. Volare 4060 18 2
Pont. Catalina 5798 18 4
Pont. Firebird 4934 18 1
Pont. Grand Prix 5222 19 3
Pont. Le Mans 4723 19 3
;
RUN;
Perhaps I am missing your point but your subtotal will never be equal to 70 000 if you divide by the sum of its column. The maximum value will be 1. Your incremental sum however can be equal or superior to 70 000.
data stage1;
retain _sum 0;
set auto;
_sum = sum(_sum, price);
if _sum < 70000 then output;
run;
proc sql;
create table want as
select t1.*, t1._sum/sum(price) as subtotal
from stage1 as t1;
quit;
subtotal
0.0607268256
0.1310834235
0.2746411058
0.3679017467
0.5121261056
0.5834753107
0.6994325842
0.7851820027
1
I am trying to compute the frequency of observation in a group.
My dataset looks like:
Date Account C_group Age ...
1 152627 A 28
2 152627 B 28
1 163718 B 32
3 163628 D 12
4 163717 C 41
.
.
I would like to determine the percentage of accounts in the different groups.
Do you know how I could that?
Thanks
The following should get you close to what you are looking for:
data dset ;
input
freqgroup $
subgroup ;
datalines ;
A 12
B 12
C 12
C 21
C 23
A 12
A 21
B 12
B 21
B 21
;
run;
proc sort data=dset;
by freqgroup;
run;
proc freq data=dset ;
table freqgroup ;
run ;
proc freq data=dset ;
by freqgroup ;
table subgroup ;
run ;
Say I have a dataset like this
day product sales
1 a 1 48
2 a 2 55
3 a 3 88
4 b 2 33
5 b 3 87
6 c 1 97
7 c 2 95
On day "b" there were no sales for product 1, so there is no row where day = b and product = 1. Is there an easy way to add a row with day = b, product = 1 and sales = 0, and similar "missing" rows to get a dataset like this?
day product sales
1 a 1 48
2 a 2 55
3 a 3 88
4 b 1 0
5 b 2 33
6 b 3 87
7 c 1 97
8 c 2 95
9 c 3 0
In R you can do complete(df, day, product, fill = list(sales = 0)). I realize you can accomplish this with a self-join in proc sql, but I'm wondering if there is a procedure for this.
In this particular example you can also use the SPARSE option in PROC FREQ. It tells SAS to generate all the complete types with every value from DAY included with PRODUCT, so similar to a cross join between those elements. If you do not have the value in the table already it cannot add the value. You would need a different method in that case.
data have;
input n day $ product sales;
datalines;
1 a 1 48
2 a 2 55
3 a 3 88
4 b 2 33
5 b 3 87
6 c 1 97
7 c 2 95
;;;;
run;
proc freq data=have noprint;
table day*product / out=want sparse;
weight sales;
run;
proc print data=want;run;
There are, as usual in SAS, about a dozen ways to do this. Here's my favorite.
data have;
input n day $ product sales;
datalines;
1 a 1 48
2 a 2 55
3 a 3 88
4 b 2 33
5 b 3 87
6 c 1 97
7 c 2 95
;;;;
run;
proc means data=have completetypes;
class day product;
types day*product;
var sales;
output out=want sum=;
run;
completetypes tells SAS to put out rows for every class combination, including missing ones. You could then use proc stdize to get them to be 0's (if you need them to be 0). It's possible you might be able to do this in the first place with proc stdize, I'm not as familiar unfortunately with that proc.
You can do this with proc freq using the sparse option.
Code:
proc freq data=have noprint;
table day*product /sparse out=freq (drop=percent);
run;
Output:
day=a product=1 COUNT=1
day=a product=2 COUNT=1
day=a product=3 COUNT=1
day=b product=1 COUNT=0
day=b product=2 COUNT=1
day=b product=3 COUNT=1
day=c product=1 COUNT=1
day=c product=2 COUNT=1
day=c product=3 COUNT=0
Below is an example that I found to reshape data from long to wide.But I am not able ti understand the code, especially the way they are replacing blanks and why. Can someone help me understand the code?
Example 1: Reshaping one variable
We will begin with a small data set with only one variable to be reshaped. We will use the variables year and faminc (for family income) to create three new variables: faminc96, faminc97 and faminc98. First, let's look at the data set and use proc print to display it.
DATA long ;
INPUT famid year faminc ;
CARDS ;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;
RUN ;
PROC PRINT DATA=long ;
RUN ;
Obs famid year faminc
1 1 96 40000
2 1 97 40500
3 1 98 41000
4 2 96 45000
5 2 97 45400
6 2 98 45800
7 3 96 75000
8 3 97 76000
9 3 98 77000
Now let's look at the program. The first step in the reshaping process is sorting the data (using proc sort) on an identification variable (famid) and saving the sorted data set (longsort). Next we write a data step to do the actual reshaping. We will explain each of the statements in the data step in order.
PROC SORT DATA=long OUT=longsort ;
BY famid ;
RUN ;
DATA wide1 ;
SET longsort ;
BY famid ;
KEEP famid faminc96 -faminc98 ;
RETAIN faminc96 - faminc98 ;
ARRAY afaminc(96:98) faminc96 - faminc98 ;
IF first.famid THEN
DO;
DO i = 96 to 98 ;
afaminc( i ) = . ;
END;
END;
afaminc( year ) = faminc ;
IF last.famid THEN OUTPUT ;
RUN;
This is a good example to compare and contrast with DO UNTIL(LAST. It does away with the RETAIN and INIT to missing on FIRST.FAMID and the LAST. test for when to OUTPUT. Those operations are sill done just using the built in features of the data step loop.
DATA long;
INPUT famid year faminc;
CARDS;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;;;;
RUN;
proc print;
run;
data wide;
do until(last.famid);
set long;
by famid;
ARRAY afaminc[96:98] faminc96-faminc98;
afaminc[year]=faminc;
end;
drop year faminc;
run;
proc print;
run;
The main element here is the SAS retain statement.
The datastep is executed for every observation in the dataset. For every iteration all variables are set to missing and then the data is loaded from the dataset.
If a variable is RETAINed it will not be reset, but will keep the information from the last iteration.
BY famid ;
Your dataset is ordered and the datastep is using a by statement. This will initialize the first.famid and last.famid. These are just binaries that turn to 1 for the first/last observation of a single id-group.
RETAIN faminc96 - faminc98 ;
As already explained faminc96 - faminc98 will keep their value from one datastep iteration to the next.
ARRAY afaminc(96:98) faminc96 - faminc98 ;
Just an array, so you can call the variables by number instead of name.
IF first.famid THEN
DO;
DO i = 96 to 98 ;
afaminc( i ) = . ;
END;
END;
For every first observation in an id-group the retained variables are reset. Otherwise you would keep values from one od-group to the next. Same could be done by IF first.famid then call missing(of afaminc(*));
afaminc( year ) = faminc ;
Writing the information to your transposed variables, according to the year.
IF last.famid THEN OUTPUT ;
After you have written all the values to your new variables, you only OUTPUT one observation (the last) in every id-group to the new dataset. As the variables were retained, they are all filled at this point.
This datastep is fast and purpose build. But generally you could just use proc transpose
I highly recommend proc transpose. It'll make your life easier.
http://support.sas.com/resources/papers/proceedings09/060-2009.pdf
I have a SAS Table like:
DATA test;
INPUT id sex $ age inc r1 r2 Zaehler work $;
DATALINES;
1 F 35 17 7 2 1 w
17 M 40 14 5 5 1 w
33 F 35 6 7 2 1 w
49 M 24 14 7 5 1 w
65 F 52 9 4 7 1 w
81 M 44 11 7 7 1 w
2 F 35 17 6 5 1 n
18 M 40 14 7 5 1 n
34 F 47 6 6 5 1 n
50 M 35 17 5 7 1 w
;
PROC PRINT; RUN;
proc sort data=have;
by county;
run;
I want compare rows if sex and age is equal and build sum over Zaehler. For example:
1 F 35 17 7 2 1 w
and
33 F 35 6 7 2 1 w
sex=f and age=35 are equale so i want to merge them like:
id sex age inc r1 r2 Zaehler work
1 F 35 17 7 2 2 w
I thought i can do it with proc sql but i can't use sum in proc sql. Can someone help me out?
PROC SUMMARY is the normal way to compute statistics.
proc summary data=test nway ;
class sex age ;
var Zaehler;
output out=want sum= ;
run;
Why would you want to include variables other than SEX, AGE and Zaehler in the output?
Your requirement is not difficult to understand or to satisfy, however, I am not sure what is your underline reason for doing this. Explain more on your purpose may help to facilitate better answers that work from the root of your project. Although I have a feeling the PROC MEAN may give you better matrix, here is a one step PROC SQL solution to get you the summary as well as retaining "the value of first row":
proc sql;
create table want as
select id, sex , age, inc, r1, r2, sum(Zaehler) as Zaehler, work
from test
group by sex, age
having id = min(id) /*This is tell SAS only to keep the row with the smallest id within the same sex,age group*/
;
quit;
You can use proc sql to sum over sex and age
proc sql;
create table sum as
select
sex
,age
,sum(Zaehler) as Zaehler_sum
from test
group by
sex
,age;
quit;
You can than join it back to the main table if you want to include all the variables
proc sql;
create table test_With_Sum as
select
t.*
,s.Zaehler_sum
from test t
inner join sum s on t.sex = s.sex
and t.age = s.age
order by
t.sex
,t.age
;
quit;
You can write it all as one proc sql query if you wish and the order by is not needed, only added for a better visibility of summarised results
Not a good solution. But it should give you some ideas.
DATA test;
INPUT id sex $ age inc r1 r2 Zaehler work $;
DATALINES;
1 F 35 17 7 2 1 w
17 M 40 14 5 5 1 w
33 F 35 6 7 2 1 w
49 M 24 14 7 5 1 w
65 F 52 9 4 7 1 w
81 M 44 11 7 7 1 w
2 F 35 17 6 5 1 n
18 M 40 14 7 5 1 n
34 F 47 6 6 5 1 n
50 M 35 17 5 7 1 w
;
run;
data t2;
set test;
nobs = _n_;
run;
proc sort data=t2;by descending sex descending age descending nobs;run;
data t3;
set t2;
by descending sex descending age;
if first.age then count = 0;
count + 1;
zaehler = count;
if last.age then output;
run;
proc sort data=t3 out=want(drop=nobs count);by nobs sex age;run;
thanks for your help. Here is my final code.
proc sql;
create table sum as
select distinct
sex
,age
,sum(Zaehler) as Zaehler
from test
WHERE work = 'w'
group by
sex
,age
;
PROC PRINT;quit;
I just modify the code a little bit. I filtered the w and i merg the Columns with the same value.
It was just an example the real Data is much bigger and has more Columns and rows.