SAS Array <array-elements> to jump by 10 - sas

I want to achieve the same output but instead of harcoding each of the array-element use something like var1 - var10 but that would jump by 10 like decades.
data work.test(keep= statename pop_diff:);
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
dimp = dim(population_array);
/* here and below something like:
array pop_diff_amount {10} pop_diff_amount_1920 -- pop_diff_amount_2010;*/
array pop_diff_amount {10} pop_diff_amount_1920 pop_diff_amount_1930
pop_diff_amount_1940 pop_diff_amount_1950
pop_diff_amount_1960 pop_diff_amount_1970
pop_diff_amount_1980 pop_diff_amount_1990
pop_diff_amount_2000 pop_diff_amount_2010;
array pop_diff_prcnt {10} pop_diff_prcnt_1920 pop_diff_prcnt_1930
pop_diff_prcnt_1940 pop_diff_prcnt_1950
pop_diff_prcnt_1960 pop_diff_prcnt_1970
pop_diff_prcnt_1980 pop_diff_prcnt_1990
pop_diff_prcnt_2000 pop_diff_prcnt_2010;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
RUN;
I am still beginner in it therefore I am not sure is this possible or easy to achieve.
Thanks!

Not automatic but not all that difficult either. First create a data set of the names then transpose and use an unexecuted set to bring in the names and then define arrays. Note how arrays are define using [*] and name: as you did with population_array.
data names;
do type = 'Amount','Prcnt';
do year=1920 to 2010 by 10;
length _name_ $32;
_name_ = catx('_','pop_diff',type,year);
output;
end;
end;
run;
proc print;
run;
proc transpose data=names out=pop_diff(drop=_name_);
var;
run;
proc contents varnum;
run;
data pop;
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
if 0 then set pop_diff;
array pop_diff_amount[*] pop_diff_amount:;
array pop_diff_prcnt[*] pop_diff_prcnt:;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
run;
proc print data=pop;
run;

SAS is automatically going to increment the array elements by 1. Here is an alternative solution that creates the variables using one extra step to create a set of macro variables that hold the desired variable names. Since you are basing them off of the variable POPULATION_<year>, we will simply grab the years from those variable names, create the variable names for the arrays that we want, and store them into a few macro variables.
proc sql noprint;
select cats('pop_diff_amount_', scan(name, -1, '_') )
, cats('pop_diff_prcnt_', scan(name, -1, '_') )
into :pop_diff_amount_vars separated by ' '
, :pop_diff_prcnt_vars separated by ' '
from dictionary.columns
where libname = 'SASHELP'
AND memname = 'US_DATA'
AND upcase(name) LIKE 'POPULATION_%'
;
quit;
data work.test(keep= statename pop_diff:);
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
dimp = dim(population_array);
array pop_diff_amount {*} &pop_diff_amount_vars.;
array pop_diff_prcnt {*} &pop_diff_prcnt_vars.;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
RUN;

Getting the data out of the meta data (create variable year) would make coding life easier.
proc transpose data=sashelp.us_data out=us_pop(rename=(col1=Population));
by statename;
var population_:;
run;
data us_pop;
set us_pop;
by statename;
year = input(scan(_name_,-1,'_'),4.);
pop_diff_amount=dif(population);
pop_diff_prcnt =(population/lag(population))-1;
format pop_diff_prcnt percent10.2;
if first.statename then call missing(of pop_diff_amount pop_diff_prcnt);
drop _:;
run;
proc print data=us_pop(obs=10);
run;

Related

SaS 9.4: How to use different weights on the same variable without datastep or proc sql

I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here

SAS Array Variable Name Based on Another Array

I have data in the following format:
data have;
input id rtl_apples rtl_oranges rtl_berries;
datalines;
1 50 60 10
2 10 30 80
3 40 8 1
;
I'm trying to create new variables that represent the percent of the sum of the RTL variables, PCT_APPLES, PCT_ORANGES, PCT_BERRIES. The problem is I'm doing this within a macro so the names and number of RTL variables with vary with each iteration so the new variable names need to be generated dynamically.
This data step essentially gets what I need, but the new variables are in the format PCT1, PCT2, PCTn format so it's difficult to know which RTL variable the PCT corresponds too.
data want;
set have;
array rtls[*] rtl_:;
total_sales = sum(of rtl_:);
call symput("dim",dim(rtls));
array pct[&dim.];
do i=1 to dim(rtls);
pct[i] = rtls[i] / total_sales;
end;
drop i;
run;
I also tried creating the new variable name by using a macro variable, but only the last variable in the array is created. In this case, PCT_BERRIES.
data want;
set have;
array rtls[*] rtl_:;
total_sales = sum(of rtl_:);
do i=1 to dim(rtls);
var_name = compress(tranwrd(upcase(vname(rtls[i])),'RTL','PCT'));
call symput("var_name",var_name);
&var_name. = rtls[i] / total_sales;
end;
drop i var_name;
run;
I have a feeling I'm over complicating this so any help would be appreciated.
If you have the list of names in data already then use the list to create the names you need for your arrays.
proc sql noprint;
select distinct cats('RTL_',name),cats('PCT_',name)
into :rtl_list separated by ' '
, :pct_list separated by ' '
from dataset_with_names
;
quit;
data want;
set have;
array rtls &rtl_list;
array pcts &pct_list;
total_sales = sum(of rtls[*]);
do index=1 to dim(rtls);
pcts[index] = rtls[index] / total_sales;
end;
drop index ;
run;
You can't create variables while a data step is executing. This program uses PROC TRANSPOSE to create a new data using the RTL_ variables "renamed" PCT_.
data have;
input id rtl_apples rtl_oranges rtl_berries;
datalines;
1 50 60 10
2 10 30 80
3 40 8 1
;;;;
run;
proc transpose data=have(obs=0) out=names;
var rtl_:;
run;
data pct;
set names;
_name_ = transtrn(_name_,'rtl_','PCT_');
y = .;
run;
proc transpose data=pct out=pct2;
id _name_;
var y;
run;
data want;
set have;
if 0 then set pct2(drop=_name_);
array _rtl[*] rtl_:;
array _pct[*] pct_:;
call missing(of _pct[*]);
total = sum(of _rtl[*]);
do i = 1 to dim(_rtl);
_pct[i] = _rtl[i]/total*1e2;
end;
drop i;
run;
proc print;
run;
You may want to just report the row percents
proc transpose data=&data out=&data.T;
by id;
var rtl_:;
run;
proc tabulate data=&data.T;
class id _name_;
var col1;
table
id=''
, _name_='Result'*col1=''*sum=''
_name_='Percent'*col1=''*rowpctsum=''
/ nocellmerge;
run;

Dividing multiple columns dynamically

I have a dataset that has 415 columns. 15 are computed indicators and the 400 others are numerators and denominators of indicators I want to compute. The 400 variables all have the same format i.e. *variable-name*_NUM and *variable-name*_DEN. For example, from A_NUM and A_DEN I want to compute A = divide(A_NUM, A_DEN). In other words, from the initial 415 columns, I want to end up with 15 (already computed indicators) + 200 (400/2) indicators on my data set.
At the moment I am computing them manually as follow:
data want;
set have;
a = divide(a_NUM,a_DEN);
b = divide(b_NUM,b_DEN);
c = divide(c_NUM,c_DEN);
...
y = divide(y_NUM,y_DEN);
z = divide(z_NUM,z_DEN);
...
run;
But I am sure there is a dynamical way of doing this (maybe using arrays?).
data want;
set have;
array _num (*) num_:;
array _den (*) den_:;
array _results(*) results1-results200;
do i=1 to dim(_num);
_results(i) = _num(i)/_den(i);
end;
run;
Another option may be to transpose your data to a long structure so that you have numerator in one column and denominator in another and then do the math easily.
data long;
set have;
array _num (*) num_:;
array _den (*) den_:;
do i=1 to dim(_num);
numerator = _num(i);
denominator = _den(i);
var_num = scan(vname(_num(i)), 2, "_");
var_den = scan(vname(_den(i)), 2, "_");
output;
end;
run;
data want;
set have;
length flag $8.;
ratio = numerator/denominator;
if var_num ne var_den then flag = "CHECKME";
run;
proc transpose data=want out=wide prefix=ratio_;
by someUniqueVariable;
id var_num ;
var ratio;
run;
This solution does not require you to rename the variables:
/* use a sql statement to generate the repeating code */
proc sql;
select trim(indicator) ||' = divide('|| trim(indicator) ||'_NUM, '|| trim(indicator) ||'_DEN)'
/* store all statements in one macro variable */
into : divisions separated by '; ';
/* but first list the indicators for which you need to do so
using the view SASHELP.VCOLUMN */
from (select substr(NUM.name, 1, length(NUM.name)-4) as indicator
from sasHelp.vColumn as NUM, sasHelp.vColumn as DIV
where NUM.libName eq 'WORK' and NUM.memName eq 'HAVE' and scan(NUM.name, -1, '_') eq 'NUM'
and DIV.libName eq 'WORK' and DIV.memName eq 'HAVE' and scan(DIV.name, -1, '_') eq 'DIV'
and substr(NUM.name, 1, length(NUM.name)-4) eq substr(NUM.name, 1, length(DEV.name)-4)
)
quit;
data WANT;
set HAVE;
&divisions;
run;
Note that you might need to apply the uppercase function on all column names if upper and lower case are not used consistently in column names.

Concatenating all variables in an observation in SAS

Is there a general purpose way of concatenating each variable in an observation into one larger variable whilst preserving the format of numeric/currency fields in terms of how it looks when you do a proc print on the dataset. (see sashelp.shoes for example)
Here is some code you can run, as you can see when looking at the log, using the catx function to produce a comma separated output removes both the $ currency sign as well as the period from the numeric variables
proc print data=sashelp.shoes (obs=10);
run;
proc sql;
select name into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
Any solution needs to be general purpose as it will run on disparate datasets with differently formatted variables.
You can manually loop over PDV variables of the data set, concatenating each formatted value retrieved with vvaluex. A hash can be used to track which variables of the data set to process. If you are comma separating values you will probably want to double quote formatted values that contain a comma.
data want;
set sashelp.cars indsname=_data;
if _n_ = 1 then do;
declare hash vars();
length _varnum 8 _varname $32;
vars.defineKey('_n_');
vars.defineData('_varname');
vars.defineDone();
_dsid = open(_data);
do _n_ = 1 to attrn(_dsid,'NVAR');
rc = vars.add(key:_n_,data:varname(_dsid,_n_));
end;
_dsid = close(_dsid);
call missing (of _:);
end;
format weight comma7.;
length allcat $32000 _vvx $32000;
do _n_ = 1 to vars.NUM_ITEMS;
vars.find();
_vvx = strip(vvaluex(_varname));
if index(_vvx,",") then _vvx = quote(strip(_vvx));
if _n_ = 1
then allcat = _vvx;
else allcat = cats(allcat,',',_vvx);
end;
drop _:;
run;
You can use import and export to csv file:
filename tem temp;
proc export data=sashelp.SHOES file=tem dbms=csv replace;
run;
data l;
length all $ 200;
infile tem truncover firstobs=2;
input all 1-200;
run;
P.S.
If you need concatenate only char, uou can create array of all CHARACTER columns in dataset, and just iterate thru:
data l;
length all $ 5000;
set sashelp.SHOES;
array ch [*] _CHARACTER_;
do i = 1 to dim(ch);
all=catx(',',all,ch[i]);
end;
run;
The PUT statement is the easiest way to do that. You don't need to know the variables names as you can use the _all_ variable list.
put (_all_) (+0);
It will honor the formats attached the variables and if you have used DSD option on the FILE statement then the result is a delimited list.
What is the ultimate goal of this exercise? If you want to create a file you can just write the file directly.
data _null_;
set sashelp.shoes(obs=3);
file 'myfile.csv' dsd ;
put (_all_) (+0);
run;
If you really do want to get that string into a dataset variable there is no need to invent some new function. Just take advantage of the PUT statements abilities by creating a file and then reading the lines from the file.
filename junk temp;
data _null_;
set sashelp.shoes(obs=3);
file junk dsd ;
put (_all_) (+0);
run;
data stuff ;
set sashelp.shoes(obs=3);
infile junk truncover ;
input all $5000.;
run;
You can even do it without creating the full text file. Instead just write one line at a time and save the line into a variable using the _FILE_ automatic variable.
filename junk temp;
data stuff;
set sashelp.shoes(obs=3);
file junk dsd lrecl=5000 ;
length all $5000;
put #1 (_all_) (+0) +(-2) ' ' #;
all = _file_;
output;
all=' ';
put #1 all $5000. #;
run;
Solution with vvalue and concat function (||):
It is similar with 'solution without catx' (the last one), but it is simplified by vvalue function instead put.
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for concat function (||)*/
proc sql;
select ('strip(vvalue('|| strip(name) ||'))') into :varstr4 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step to concat all variables*/
data stuff2;
format all $5000.;
set work.wocatx ;
all = &varstr4. ;
put all;
run;
Solution with catx:
proc print data=SASHELP.SHOES;
run;
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
If there isn't in dictionary.columns format, then in macro variable varstr2 will just name, if there is format, then when it would call in catx it will convert in format, that you need, for example,if variable is num type then put(Sales,DOLLAR12.), or if it char type then input function . You could add any conditions in select into if you need.
If there is no need of using of input function just change select:
ifc(strip(format) is missing,strip(name),'put('|| strip(name) ||','|| strip(format) ||')')
Solution without catx:
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for catx*/
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*data step with catx*/
data stuff;
format all $5000.;
set work.wocatx ;
all = catx(',',&varstr2.) ;
put all;
run;
/*Macro variable for concat function (||)*/
proc sql;
select ifc(strip(format) is missing,
'strip(' || strip(name) || ')',
'strip(put('|| strip(name) ||','|| strip(format) ||'))') into :varstr3 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step without catx*/
data stuff1;
format all $5000.;
set work.wocatx ;
all = &varstr3. ;
put all;
run;
Result with catx and missing values:
Result without catx and with missing values:

How to calculate a mean for the non zero values using proc means or proc summary

I want to have a mean which is based in non zero values for given variables using proc means only.
I know we do can calculate using proc sql, but I want to get it done through proc means or proc summary.
In my study I have 8 variables, so how can I calculate mean based on non zero values where in I am using all of those in the var statement as below:
proc means = xyz;
var var1 var2 var3 var4 var5 var6 var7 var8;
run;
If we take one variable at a time in the var statement and use a where condition for non zero variables , it works but can we have something which would work for all the variables of interest mentioned in the var statement?
Your suggestions would be highly appreciated.
Thank you !
One method is to change all of your zero values to missing, and then use PROC MEANS.
data zeromiss /view=zeromiss ;
set xyz ;
array n{*} var1-var8 ;
do i = 1 to dim(n) ;
if n{i} = 0 then call missing(n{i}) ;
end ;
drop i ;
run ;
proc means data=zeromiss ;
var var1-var8 ;
run ;
Create a view of your input dataset. In the view, define a weight variable for each variable you want to summarise. Set the weight to 0 if the corresponding variable is 0 and 1 otherwise. Then do a weighted summary via proc means / proc summary. E.g.
data xyz_v /view = xyz_v;
set xyz;
array weights {*} weight_var1-weight_var8;
array vars {*} var1-var8;
do i = 1 to dim(vars);
weights[i] = (vars[i] ne 0);
end;
run;
%macro weighted_var(n);
%do i = 1 to &n;
var var&i /weight = weight_var&i;
%end;
%mend weighted_var;
proc means data = xyz_v;
%weighted_var(8);
run;
This is less elegant than Chris J's solution for this specific problem, but it generalises slightly better to other situations where you want to apply different weightings to different variables in the same summary.
Can't you use a data statement?
data lala;
set xyz;
drop qty;
mean = 0;
qty = 0;
if(not missing(var1) and var1 ^= 0) then do;
mean + var1;
qty + 1;
end;
if(not missing(var2) and var2 ^= 0) then do;
mean + var2;
qty + 1;
end;
/* ... repeat to all variables ... */
if(not missing(var8) and var8 ^= 0) then do;
mean + var8;
qty + 1;
end;
mean = mean/qty;
run;
If you want to keep the mean in the same xyz dataset, just replace lala with xyz.