why proc sql sum function returns count instead of total values? - sas

I am learning proc sql in SAS. When I use sql sum function, I realize if a comparison operator is added, the output is the count of rows instead of vertical sum. How can I get a vertical sum and what is the mechanism behind the said summation?
data apple;
input target;
cards;
0
1
3
5
;
run;
proc sql;
select sum(target ge 3)
from apple;
quit;
expected result will be 3+5=8;
actual result is 2

proc sql;
select sum(target)
from apple
where target ge 3;
quit;
I believe what your code was doing is evaluating (target gt 3) as a boolean expression, so since in SAS TRUE=1 and FALSE=0, the sum function was adding 0,0,1,1.

The solution from Craig is actually better, but with case when else end you could do what you tried.
proc sql;
select sum(case when target ge 3 then target else 0 end)
from apple;
quit;

Related

Alternate of Proc SQL case statements in SAS

I have been able to do the desired with the following code.
But i have a large data set and i want to do the same using SAS Data step code and Not Proc SQL.
Following is the code:
`proc sql;
create table RTA_NDP_Red_2 as
select TRFFIC_NO as TRAFFIC_NO,
sum( case when ticket_date_v1 between '01OCT2019'd and '30SEP2020'd then 1
else 0 end) as NDP_vio_cnt_t1,
sum( case when ticket_date_v1 between '01OCT2018'd and '30SEP2019'd then 1
else 0 end) as NDP_vio_cnt_t2,
sum( case when ticket_date_v1 between '01OCT2017'd and '30SEP2018'd then 1
else 0 end) as NDP_vio_cnt_t3,
sum( case when ticket_date_v1 LT '01OCT2017'd then 1
else 0 end) as NDP_vio_cnt_t4
from public.RTA_NDP_Red_1
group by TRFFIC_NO;
quit;
run;`
Using by grouping in the data step will generate two temporary variables:FIRST.varibles and LAST.varibles.
And if a conditional statement is true,the value will be Assigned as 1.If a conditional statement is false,the value will be Assigned as 0.
If you grasp all above,then what you desired is a piece of cake.
proc sort data=public.RTA_NDP_Red_1;by TRFFIC_NO;run;
data RTA_NDP_Red_2;
set public.RTA_NDP_Red_1;
by TRFFIC_NO;
if first.TRFFIC_NO then call missing(of NDP_vio_cnt_t1-NDP_vio_cnt_t4);
NDP_vio_cnt_t1+('01OCT2019'd<=ticket_date_v1<='30SEP2020'd);
NDP_vio_cnt_t2+('01OCT2018'd<=ticket_date_v1<='30SEP2019'd);
NDP_vio_cnt_t3+('01OCT2017'd<=ticket_date_v1<='30SEP2018'd);
NDP_vio_cnt_t4+(ticket_date_v1<='01OCT2017'd);
if last.TRFFIC_NO then output;
run;
Hope it helps

How to create an output row if a proc sql “group by” group has no no observations

I am working in SAS Enterprise guide and am running a proc sql query as follows:
proc sql;
CREATE TABLE average_apples AS
SELECT farm, size, type, mean(apples) as average_apples
FROM input_table
GROUP BY farm, size, type
;
quit;
For some of the data sets I am running this query on there are groups which have no observations assigned to them, so there is no entry for them in the query output.
How can I force this query to return a row for each of my groups (for example with a value of 0in the apples column?
Thanks up front for the help!
I'd do this:
/* sample input table */
data input_table;
length farm size type $3 apples 8;
stop; /* try also with this statement commented out
to check the result for non-empty input table */
run;
proc sql;
CREATE TABLE average_apples AS
SELECT farm, size, type, mean(apples) as average_apples
FROM input_table
GROUP BY farm, size, type
;
quit;
%let group_rows = &SQLOBS;
%put &group_rows;
data average_apples_blank;
if &group_rows ne 0 then set average_apples(obs=0);
else do;
array zeros {*} _numeric_ /* or your list of variables */;
do i=1 to dim(zeros);
zeros[i] = 0;
end;
output; /* empty row */
end;
drop i;
run;
proc append base=average_apples data=average_apples_blank force;
run;
Try this
proc sql;
select f.farm, s.size, t.type, coalesce(mean(apples), 0) as average_apples
from (select distinct farm from input_table) as f
, (select distinct size from input_table) as s
, (select distinct type from input_table) as t
left join input_table as i
on i.farm = f.farm and i.size = s.size and i.type t.type;
quit;
I did not test it, though. It it does not work, put this in a comment and I will debug it.

Need to compute column total in SAS and use it as input to calculate another column

Data IV_SAS;
set IV;
Total_Loans=Goods+Bads;
Dist_Loans=Total_Loans/sum(Total_Loans));
Dist_Goods=Goods/Sum(Goods);
Dist_Bads=Bads/Sum(Bads);
Difference=Dist_Goods-Dist_Bads;
WOE=log10(Dist_goods/Dist_Bads);
IV=WOE*Difference;
run;
I am facing issues in calculating sum of (Total Loans),its calculating Row total instead of column total.
That's how Base SAS works - it operates on row level in the data step.
You would want to use PROC MEANS or PROC TABULATE or similar proc and find the column total there, then merge that on (or combine in another method).
For example:
proc means data=sashelp.class;
var age height weight;
output out=class_means sum(age)=age_sum sum(height)=height_sum sum(weight)=weight_Sum;
run;
data class;
if _n_=1 then set class_means;
set sashelp.class;
age_prop = age/age_sum;
height_prop = height/height_sum;
weight_prop = weight/weight_Sum;
run;
Alternately, use SAS/IML or PROC SQL, both of which will operate on the column level when asked inline (though I think the above solution is likely superior in speed to both due to lower overhead).
data a;
input goods bads;
datalines;
36945 33337
23820 21761
26990 24647
33195 30299
43755 39014
46100 41100
89765 79978
25940 23508
35940 32506
31840 28846
33430 30366
34480 31388
36640 33129
39640 35992
42490 38325
44240 40075
42840 38840
49690 44936
69190 64740
;
run;
proc sql;
create table b as
select goods,bads,
sum(goods,bads) as Total_Loans format=dollar10.,
sum(goods)as Column_goods_tot format=dollar10. ,
sum(bads) as Column_bads_tot format=dollar10. ,
sum(calculated Column_goods_tot, calculated Column_bads_tot) as Column_Total_Loans format=dollar10. ,
(calculated Total_Loans/calculated Column_Total_Loans) as Dist_Loans
/*add more code to calculate Dist_Goods, Dist_Bads, etc..*/
from a;
quit;
/*Column totals only*/
proc sql;
create table c as
select
sum(goods)as Column_goods_tot format=dollar10. ,
sum(bads) as Column_bads_tot format=dollar10. ,
sum(calculated Column_goods_tot, calculated Column_bads_tot) as Column_Total_Loans format=dollar12.
from a;
quit;

SAS: Using Do/Loop in a Proc Transpose

I'm not very familiar with Do Loops in SAS and was hoping to get some help. I have data that looks like this:
Product A: 1
Product A: 2
Product A: 4
I'd like to transpose (easy) and flag that Product A: 3 is missing, but I need to do this iteratively to the i-th degree since the number of products is large.
If I run the transpose part in SAS, my first column will be 1, second column will be 2, and third column will be 4 - but I'd really like the third column to be missing and the fourth column to be 4.
Any thoughts? Thanks.
Get some sample data:
proc sort data=sashelp.iris out=sorted;
by species;
run;
Determine the largest column we will need to transpose to. Depending on your situation you may just want to hardcode this value using a %let max=somevalue; statement:
proc sql noprint;
select cats(max(sepallength)) into :max from sorted;
quit;
%put &=max;
Transpose the data using a data step:
data want;
set sorted;
by species;
retain _1-_&max;
array a[1:&max] _1-_&max;
if first.species then do;
do cnt = lbound(a) to hbound(a);
a[cnt] = .;
end;
end;
a[sepallength] = sepallength;
if last.species then do;
output;
end;
keep species _1-_&max;
run;
Notice we are defining an array of columns: _1,_2,_3,..._max. This happens in our array statement.
We then use by-group processing to populate these newly created columns for a single species at a time. For each species, on the first record, we clear the array. For each record of the species, we populate the appropriate element of the array. On the final record for the species output the array contents.
You need a way to tell SAS that you have 4 products and the values are 1-4. In this example I create dummy ID with the needed information then transpose using ID statement to name new variables using the value of product.
data product;
input id product ##;
cards;
1 1 1 2 1 4
2 2 2 3
;;;;
run;
proc print;
run;
data productspace;
if 0 then set product;
do product = 1 to 4;
output;
end;
stop;
run;
data productV / view=productV;
set productspace product;
run;
proc transpose data=productV out=wide(where=(not missing(id))) prefix=P;
by id;
var product;
id product;
run;
proc print;
run;

How to sum a variable and record the total in the last row using SAS

I have a dataset looks like the following:
Name Number
a 1
b 2
c 9
d 6
e 5.5
Total ???
I want to calculate the sum of variable Number and record the sum in the last row (corresponding with Name = 'total'). I know I can do this using proc means then merge the output backto this file. But this seems not very efficient. Can anyone tell me whether there is any better way please.
you can do the following in a dataset:
data test2;
drop sum;
set test end = last;
retain sum;
if _n_ = 1 then sum = 0;
sum = sum + number;
output;
if last then do;
NAME = 'TOTAL';
number = sum;
output;
end;
run;
it takes just one pass through the dataset
It is easy to get by report procedure.
data have;
input Name $ Number ;
cards;
a 1
b 2
c 9
d 6
e 5.5
;
proc report data=have out=want(drop=_:);
rbreak after/ summarize ;
compute after;
name='Total';
endcomp;
run;
The following code uses the DOW-Loop (DO-Whitlock) to achieve the result by reading through the observations once, outputting each one, then lastly outputting the total:
data want(drop=tot);
do until(lastrec);
set have end=lastrec;
tot+number;
output;
end;
name='Total';
number=tot;
output;
run;
For all of the data step solutions offered, it is important to keep in mind the 'Length' factor. Make sure it will accommodate both 'Total' and original values.
proc sql;
select max(5,length) into :len trimmed from dictionary.columns WHERE LIBNAME='WORK' AND MEMNAME='TEST' AND UPCASE(NAME)='NAME';
QUIT;
data test2;
length name $ &len;
set test end=last;
...
run;