sas not do sum of all obeservation - sas

i have a dataset name censusdata with 11346 observation in last some observationare blank data.we have to find total population variable name t_p.
i am using this code:
data q1(keep=t_p count);
set censusdata;
array num(*) t_p;
retain count;
do i=1 to dim(num);
if t_p = i then count=t_p;
else count+t_p;
end;
run;
problem is sas find sum of first 3236 observation then do sum of 3237 to 4683 observation and so on.they cannot do sum of all observation as we need.
we need sum of totalpopulation(t_p) & we need output dataset like this
totalpopulation=number

Sum the variable in a proc sql step:
proc sql;
create table q1 as select
sum(t_p) as total_pop
from censusdata;
quit;

Related

Cummulative sum of variable by a condition and ID on sas

I am trying to sum one variable as long as another remains constant. I want to cumulative sum dur as long as a is constant. when a changes the sum restarts. when a new id, the sum restarts.
enter image description here
and I would like to do this:
enter image description here
Thanks
You can use a BY statement to specify the variables whose different value combinations organize data rows into groups. You are resetting an accumulated value at the start of each group and adding to the accumulator at each row in the group. Use retain to maintain a new variables value between the DATA step implicit loop iterations. The SUM statement is a unique SAS feature for accumulating and retaining.
Example:
data want;
set have;
by id a;
if first.a then mysum = 0;
mysum + dur;
run;
The SUM statement is different than the SUM function.
<variable> + <expression>; * SUM statement, unique to SAS (not found in other languages);
can be thought of as
retain <variable>;
<variable> = sum (<variable>, <expression>);
As far as I am concerned, you need to self-join your table with a ranked column.
It should be ranked by id and a columns.
FROM WORK.QUERY_FOR_STCKOVRFLW t1; is the table you provided in the screenshot
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_STCKOVRFLW_0001 AS
SELECT t1.id,
t1.a,
t1.dur,
/* mono */
(monotonic()) AS mono
FROM WORK.QUERY_FOR_STCKOVRFLW t1;
QUIT;
PROC SORT
DATA=WORK.QUERY_FOR_STCKOVRFLW_0001
OUT=WORK.SORTTempTableSorted
;
BY id a;
RUN;
PROC RANK DATA = WORK.SORTTempTableSorted
TIES=MEAN
OUT=WORK.RANKRanked(LABEL="Rank Analysis for WORK.QUERY_FOR_STCKOVRFLW_0001");
BY id a;
VAR mono;
RANKS rank_mono ;
RUN; QUIT;
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_RANKRANKED AS
SELECT t1.id,
t1.a,
t1.dur,
/* SUM_of_dur */
(SUM(t2.dur)) FORMAT=BEST12. AS SUM_of_dur
FROM WORK.RANKRANKED t1
LEFT JOIN WORK.RANKRANKED t2 ON (t1.id = t2.id) AND (t1.a = t2.a AND (t1.rank_mono >= t2.rank_mono ))
GROUP BY t1.id,
t1.a,
t1.dur;
QUIT;

Need to compute column total in SAS and use it as input to calculate another column

Data IV_SAS;
set IV;
Total_Loans=Goods+Bads;
Dist_Loans=Total_Loans/sum(Total_Loans));
Dist_Goods=Goods/Sum(Goods);
Dist_Bads=Bads/Sum(Bads);
Difference=Dist_Goods-Dist_Bads;
WOE=log10(Dist_goods/Dist_Bads);
IV=WOE*Difference;
run;
I am facing issues in calculating sum of (Total Loans),its calculating Row total instead of column total.
That's how Base SAS works - it operates on row level in the data step.
You would want to use PROC MEANS or PROC TABULATE or similar proc and find the column total there, then merge that on (or combine in another method).
For example:
proc means data=sashelp.class;
var age height weight;
output out=class_means sum(age)=age_sum sum(height)=height_sum sum(weight)=weight_Sum;
run;
data class;
if _n_=1 then set class_means;
set sashelp.class;
age_prop = age/age_sum;
height_prop = height/height_sum;
weight_prop = weight/weight_Sum;
run;
Alternately, use SAS/IML or PROC SQL, both of which will operate on the column level when asked inline (though I think the above solution is likely superior in speed to both due to lower overhead).
data a;
input goods bads;
datalines;
36945 33337
23820 21761
26990 24647
33195 30299
43755 39014
46100 41100
89765 79978
25940 23508
35940 32506
31840 28846
33430 30366
34480 31388
36640 33129
39640 35992
42490 38325
44240 40075
42840 38840
49690 44936
69190 64740
;
run;
proc sql;
create table b as
select goods,bads,
sum(goods,bads) as Total_Loans format=dollar10.,
sum(goods)as Column_goods_tot format=dollar10. ,
sum(bads) as Column_bads_tot format=dollar10. ,
sum(calculated Column_goods_tot, calculated Column_bads_tot) as Column_Total_Loans format=dollar10. ,
(calculated Total_Loans/calculated Column_Total_Loans) as Dist_Loans
/*add more code to calculate Dist_Goods, Dist_Bads, etc..*/
from a;
quit;
/*Column totals only*/
proc sql;
create table c as
select
sum(goods)as Column_goods_tot format=dollar10. ,
sum(bads) as Column_bads_tot format=dollar10. ,
sum(calculated Column_goods_tot, calculated Column_bads_tot) as Column_Total_Loans format=dollar12.
from a;
quit;

SAS: Using Do/Loop in a Proc Transpose

I'm not very familiar with Do Loops in SAS and was hoping to get some help. I have data that looks like this:
Product A: 1
Product A: 2
Product A: 4
I'd like to transpose (easy) and flag that Product A: 3 is missing, but I need to do this iteratively to the i-th degree since the number of products is large.
If I run the transpose part in SAS, my first column will be 1, second column will be 2, and third column will be 4 - but I'd really like the third column to be missing and the fourth column to be 4.
Any thoughts? Thanks.
Get some sample data:
proc sort data=sashelp.iris out=sorted;
by species;
run;
Determine the largest column we will need to transpose to. Depending on your situation you may just want to hardcode this value using a %let max=somevalue; statement:
proc sql noprint;
select cats(max(sepallength)) into :max from sorted;
quit;
%put &=max;
Transpose the data using a data step:
data want;
set sorted;
by species;
retain _1-_&max;
array a[1:&max] _1-_&max;
if first.species then do;
do cnt = lbound(a) to hbound(a);
a[cnt] = .;
end;
end;
a[sepallength] = sepallength;
if last.species then do;
output;
end;
keep species _1-_&max;
run;
Notice we are defining an array of columns: _1,_2,_3,..._max. This happens in our array statement.
We then use by-group processing to populate these newly created columns for a single species at a time. For each species, on the first record, we clear the array. For each record of the species, we populate the appropriate element of the array. On the final record for the species output the array contents.
You need a way to tell SAS that you have 4 products and the values are 1-4. In this example I create dummy ID with the needed information then transpose using ID statement to name new variables using the value of product.
data product;
input id product ##;
cards;
1 1 1 2 1 4
2 2 2 3
;;;;
run;
proc print;
run;
data productspace;
if 0 then set product;
do product = 1 to 4;
output;
end;
stop;
run;
data productV / view=productV;
set productspace product;
run;
proc transpose data=productV out=wide(where=(not missing(id))) prefix=P;
by id;
var product;
id product;
run;
proc print;
run;

How to sum a variable and record the total in the last row using SAS

I have a dataset looks like the following:
Name Number
a 1
b 2
c 9
d 6
e 5.5
Total ???
I want to calculate the sum of variable Number and record the sum in the last row (corresponding with Name = 'total'). I know I can do this using proc means then merge the output backto this file. But this seems not very efficient. Can anyone tell me whether there is any better way please.
you can do the following in a dataset:
data test2;
drop sum;
set test end = last;
retain sum;
if _n_ = 1 then sum = 0;
sum = sum + number;
output;
if last then do;
NAME = 'TOTAL';
number = sum;
output;
end;
run;
it takes just one pass through the dataset
It is easy to get by report procedure.
data have;
input Name $ Number ;
cards;
a 1
b 2
c 9
d 6
e 5.5
;
proc report data=have out=want(drop=_:);
rbreak after/ summarize ;
compute after;
name='Total';
endcomp;
run;
The following code uses the DOW-Loop (DO-Whitlock) to achieve the result by reading through the observations once, outputting each one, then lastly outputting the total:
data want(drop=tot);
do until(lastrec);
set have end=lastrec;
tot+number;
output;
end;
name='Total';
number=tot;
output;
run;
For all of the data step solutions offered, it is important to keep in mind the 'Length' factor. Make sure it will accommodate both 'Total' and original values.
proc sql;
select max(5,length) into :len trimmed from dictionary.columns WHERE LIBNAME='WORK' AND MEMNAME='TEST' AND UPCASE(NAME)='NAME';
QUIT;
data test2;
length name $ &len;
set test end=last;
...
run;

In SAS --Summing binary is producing Binary Results

I am trying to convert a categorical variable (Product) in binary and then want to know how many products per customer.
data is in the following format:
ID Product
C1 A
C1 B
C2 A
C3 B
C4 A
The code I am using for converting category to binary
IF PRODUCT="A" THEN PROD_A =1 ; ELSE PROD_A=0;
IF PRODUCT="B" THEN PROD_B =1 ; ELSE PROD_B=0;
TOT_PROD = SUM(PROD_A, PROD_B);
But when I count no. of product it gives me '1' for all customer and I am expecting 1 or 2.
I have tried
TOT_PROD = PROD_A + PROD_B;
but I get the same results
This is all inside one datastep, correct? If so you're processing only one line at a time. For each individual line the only possible values for PROD_A and PROD_B are one or zero. You need an aggregate function. For example, if your dataset is named PRODUCTS:
DATA X;
SET PRODUCTS;
IF PRODUCT="A" THEN PROD_A = 1 ; ELSE PROD_A=0;
IF PRODUCT="B" THEN PROD_B = 1 ; ELSE PROD_B=0;
TOT_PROD = SUM(PROD_A, PROD_B);
RUN;
(TOT_PROD will always be equal to 1 in X, but never mind for now).
Now sum them up:
proc sql;
create table prod_totals as
select product, sum(tot_prod) as total_products
from x
group by product;
quit;
More simply just skip the data step:
proc sql;
create table prod_totals as
select product, count(*) as total_products
from products
group by product;
quit;
Or use PROC SUMMARIZE or PROC MEANS instead of PROC SQL.
I have assumed you only want 1 record output per id.
In the solutions below I have employed the DOW-Loop (DO-Whitlock).
If you wanted prod_a and prod_b just to help with the totals and if they're not required in the output, then you could use something like:
data want;
do until(last.id);
set have;
by id;
tot_prod=sum(tot_prod,product='A',product='B');
end;
run;
If you need prod_a and prod_b in the output, then you could use:
data want;
do until(last.id);
set have;
by id;
prod_a=(product='A');
prod_b=(product='B');
tot_prod=sum(tot_prod,prod_a,prod_b);
end;
run;
In both data steps the last product per id will be output along with the other variables and in the case of the 2nd data step example the last prod_a & prod_b per id will also be output.
To do this in the data step, you need retain. Make sure you've sorted the dataset by id first.
data prod_totals;
set products;
by ID;
retain prod_a prod_b;
if first.id then do; *initialize to zero for each new ID;
prod_a=0; prod_b=0;
end;
if product='A' then prod_a=1; *set to 1 for each one found;
else if product='B' then prod_b=1;
if last.id then do; *for last record in each ID, output and sum total;
total_products=sum(prod_a,prod_b);
output;
end;
keep id prod_a prod_b total_products;
run;