I want to count the number of unique items in a variable (call it "categories") then use that count to set the number of iterations in a SAS macro (i.e., I'd rather not hard code the number of iterations).
I can get a count like this:
proc sql;
select count(*)
from (select DISTINCT categories from myData);
quit;
I can run a macro like this:
%macro superFreq;
%do i=1 %to &iterationVariable;
Proc freq data=myData;
table var&i / out=var&i||freq;
run;
%mend superFreq;
%superFreq
I want to know how to get the count into the iteration variable so that the macro iterates as many times as there are unique values in the variable "categories".
Sorry if this is confusing. Happy to clarify if need be. Thanks in advance.
You can achieve this by using the into clause in proc sql:
proc sql noprint;
select max(age),
max(height),
max(weight)
into :max_age,
:max_height,
:max_weight
from sashelp.class;
quit;
%put &=max_age &=max_height &=max_weight;
Result:
MAX_AGE= 16 MAX_HEIGHT= 72 MAX_WEIGHT= 150
You can also select a list of results into a macro variable by combining the into clause with the separated by clause:
proc sql noprint;
select name into :list_of_names separated by ' ' from sashelp.class;
quit;
%put &=list_of_names;
Result:
LIST_OF_NAMES=Alfred Alice Barbara Carol Henry James Jane Janet Jeffrey John Joyce Judy Louise Mary Philip Robert Ronald Thomas
William
Related
First I have created dataset 'have'. Then I sorted this dataset(have).
Again created a dataset 'havenot'.Now basically,I need to subtract two datasets('have' and 'havenot').
data have;
input party_ID Preference_ID:$11.;
datalines;
101 Preference1
101 Preference2
102 Preference4
102 Preference1
102 Preference5
;
proc sort data = have;
by party_ID Preference_ID;
run;
data havenot;
set have;
by party_ID Preference_ID;
if first.party_id;
run;
(output of havenot)
party_ID Preferenece_ID
101 Preference1
102 Preference1
Desired output that I want
party_ID Preference_ID
101 Preference2
102 Preference4
102 Preference5
Are you asking how to remove the first record per PARTY_ID?
You could just reverse the logic in your subsetting IF statement.
data want;
set have;
by party_id;
if not first.party_id;
run;
Or another way is to explicitly delete the first observations.
if first.party_id then delete;
If you are asking how to remove exact row matches then PROC SQL can do that.
proc sql ;
create table want as
select * from have
except
select * from havenot
;
quit;
If you want to remove rows based on just key matches then might be better in a data step.
data want ;
merge have havenot(in=in2 keep=party_id preference_id);
by party_id preference_id;
if not in2;
run;
basically you can do if not first.variable will give the dataset you want
data other;
set have;
by party_ID Preference_ID;
if not first.party_id;
run;
The easiest option is to use a data step:
data output;
merge have(in=i1) havenot(in=i2);
by party_ID Preference_ID;
if not i2;
run;
If you want to use proc sql, you could do the following:
proc sql noprint;
create table output as
select a.*
from have as a
full outer join havenot as b
on a.party_ID eq b.party_ID and a.Preference_ID eq b.Preference_ID
where b.party_ID is missing;
quit;
I am trying to create array that hold a value.
proc sql noprint;
select count(*) into :dscnt from study;
select libname into :libname1 - :libname&dscnt from study;
quit;
I think the syntax is correct but i keep getting this following error message in in SAS studio.
***NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: Line generated by the macro variable "DSCNT".
79 libname 4
_
22
200
ERROR 22-322: Syntax error, expecting one of the following: ',', FROM, NOTRIM.
ERROR 200-322: The symbol is not recognized and will be ignored.***
Can someone explain to me what i am doing wrong?
Thanks
You do not need to know the number of items ahead of time, if you leave it blank, SAS will automatically create the correct number of macro variables.
If you do want to use that number elsewhere you can create it using the TRIMMED option to remove any extra spaces. See the second example below.
proc sql noprint;
select name into :name1- from sashelp.class;
quit;
%put &name1;
%put &name19.;
proc sql noprint;
select count(distinct name) into :name_count TRIMMED from sashelp.class;
quit;
%put &name_count;
Results:
3068 proc sql noprint;
3069 select name into :name1- from sashelp.class;
3070 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
3071
3072 %put &name1;
Alfred
3073 %put &name19.;
William
3074
3075 proc sql noprint;
3076 select count(distinct name) into :name_count TRIMMED from
3076! sashelp.class;
3077 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
3078
3079 %put &name_count;
19
The into syntax in proc sql stores formatted values into macro variables. For example if you run this code:
proc sql noprint;
select count(*) into :dscnt from sashelp.class;
quit;
%put #&dscnt#;
You will see the output is:
# 19#
In otherwords the result is left padded with spaces. This means in your example, the code is resolving to something like:
select libname into :libname1 - :libname 19 from study;
^ Which is obviously invalid syntax. To fix this, you can simply add the TRIMMED keyword to your SQL statement:
select count(*) into :dscnt TRIMMED from study;
Thanks to Reeza for the TRIMMED keyword.
do something like below
proc sql noprint;
select count(*) into :dscnt from sashelp.class;
select name into :name1 - :name%left(&dscnt) from sashelp.class;
quit;
I am looking to create multiple datasets from city_variables dataset. There are a total of 58 observations that I summed up into macrovariable (&count) to stop the do loop.
The city_variables dataset looks like (vertically ofcourse):
CITY_NAME
City1
City2
City3
City4
City5
City6
City7
City8
City9
City10
..........
City58
I created macrovariable &name from a data null statement in order to input the cityname into the dataset name.
Any help would be great on how to automate the creation of the 48 files by name (not number). Thanks again.
/Create macro with number of observations in concordinate file/
proc sql;
select count(area_name);
into :count
from main.state_all;
quit;
%macro repeat;
data _null_;
set city_variables;
%do i= 1 %UNTIL (i = &count);
call symput('name',CITY_NAME);
run;
data &name;
set dataset;
where city_name = &name;
run;
%end;
%mend repeat;
%repeat
Well, if you're going to do
proc sql;
select count(area_name);
into :count
from main.state_all;
quit;
Then why not go all the way? Make a macro that does one dataset output, given the criteria as parameters, then make one call for each separate whatever-name. This might be close to what you're looking at.
%macro make_data(data_name=, set_name=, where=);
data &data_name.;
set &set_name.;
where &where.;
run;
%mend make_data;
proc sql;
select
cats('%make_data(data_name=',city_name,
', set_name=dataset, where=city_name="',
city_name,
'" )')
into :make_datalist
separated by ' '
from main.state_all;
quit;
&make_datalist.;
Some other options that I'll just link to:
Chris Hemedinger # SAS Dummy blog How to Split One Data Set Into Many shows a similar concept except he doesn't put the macro wrapper where I do.
Paul Dorfman, Data Step Hash Objects as Programming Tools is the seminal paper on using a hash table to do this. This is the "fastest" way to do this, likely, if you understand hash tables and have the memory available.
You don't need to use a macro to automate splitting up your data in this way. Since your example is really simple, I would consider using call execute in a null data step:
data test;
infile datalines ;
input city_name $20.;
datalines;
City1
City2
City2
City3
City3
City3
;
run;
data _null_;
set test;
call execute("data "||strip(city_name)||";"||"
set test;
where city_name = '"||strip(city_name)||"';"||"
run;");
run;
Objective: Go from Have table + Help table to Want table. The current implementation (below) is slow. I believe this is a good example of how not to use SAS Macros, but I'm curious as to whether...
1. the macro approach could be salvaged / made fast enough to be viable
(e.g. proc append is supposed to speed up the action of stacking datasets, but I was unable to see any performance gains.)
2. what all the alternatives would look like.
I have written a non-macro solution that I will post below for comparison sake.
Data:
data have ;
input name $ term $;
cards;
Joe 2000
Joe 2002
Joe 2008
Sally 2001
Sally 2003
; run;
proc print ; run;
data help ;
input terms $ ;
cards;
2000
2001
2002
2003
2004
2005
2006
2007
2008
; run;
proc print ; run;
data want ;
input name $ term $ status $;
cards;
Joe 2000 here
Joe 2001 gone
Joe 2002 here
Joe 2003 gone
Joe 2004 gone
Joe 2005 gone
Joe 2006 gone
Joe 2007 gone
Joe 2008 here
Sally 2001 here
Sally 2002 gone
Sally 2003 here
; run;
proc print data=have ; run;
I can write a little macro to get me there for each individual:
%MACRO RET(NAME);
proc sql ;
create table studtermlist as
select distinct term
from have
where NAME = "&NAME"
;
SELECT Max(TERM) INTO :MAXTERM
FROM HAVE
WHERE NAME = "&NAME"
;
SELECT MIN(TERM) INTO :MINTERM
FROM HAVE
WHERE NAME = "&NAME"
;
CREATE TABLE TERMLIST AS
SELECT TERMS
FROM HELP
WHERE TERMS BETWEEN "&MINTERM." and "&MAXTERM."
ORDER BY TERMS
;
CREATE TABLE HEREGONE_&Name AS
SELECT
A.terms ,
"&Name" as Name,
CASE
WHEN TERMS EQ TERM THEN 'Here'
when term is null THEN 'Gone'
end as status
from termlist a left join studtermlist b
on a.terms eq b.term
;
quit;
%MEND RET ;
%RET(Joe);
%RET(Sally);
proc print data=HEREGONE_Joe; run;
proc print data=HEREGONE_Sally; run;
But it's incomplete. If I loop through for (presumably quite a few names)...
*******need procedure for all names - grab info on have ;
proc sql noprint;
select distinct name into :namelist separated by ' '
from have
; quit;
%let n=&sqlobs ;
%MACRO RETYA ;
OPTIONS NONOTEs ;
%do i = 1 %to &n ;
%let currentvalue = %scan(&namelist,&i);
%put ¤tvalue ;
%put &i ;
%RET(¤tvalue);
%IF &i = 1 %then %do ;
data base; set HEREGONE_¤tvalue; run;
%end;
%IF &i gt 1 %then %do ;
proc sql ; create table base as
select * from base
union
select * from HEREGONE_¤tvalue
;
drop table HEREGONE_¤tvalue;
quit;
%end;
%end ;
OPTIONS NOTES;
%MEND;
%RETYA ;
proc sort data=base ; by name terms; run;
proc print data=base; run;
So now I have want, but with 6,000 names, it takes over 20 minutes.
Let's try the alternative solution. For each name find the min/max term via a proc SQL data step. Then use a data step to create the time period table and merge that with your original table.
*Sample data;
data have ;
input name $ term ;
cards;
Joe 2000
Joe 2002
Joe 2008
Sally 2001
Sally 2003
; run;
*find min/max of each name;
proc sql;
create table terms as
select name, min(term) as term_min, max(term) as term_max
from have
group by name
order by name;
quit;
*Create table with the time periods for each name;
data empty;
set terms;
do term=term_min to term_max;
output;
end;
drop term_min term_max;
run;
*Create final table by merging the original table with table previously generated;
proc sql;
create table want as
select a.name, a.term, case when missing(b.term) then 'Gone'
else 'Here' end as status
from empty a
left join have b
on a.name=b.name
and a.term=b.term
order by a.name, a.term;
quit;
EDIT: Now looking at your macro solution, part of the problem is that you're scanning your table too many times.
The first table, studenttermlist is not required, the last join can
be filtered instead.
The two macro variables, min/max term can be
calculated in a single pass
Avoid the smaller interim term list and use a where clause to filter your results
Use Call Execute to call your macro rather than another macro loop
Rather than loop through to append the
data, take advantage of a naming convention and use a single data
step to append all outputs.
%MACRO RET(NAME);
proc sql noprint;
SELECT MIN(TERM), Max(TERM) INTO :MINTERM, :MAXTERM
FROM HAVE
WHERE NAME = "&NAME"
;
CREATE TABLE _HG_&Name AS
SELECT
A.terms ,
"&Name" as Name,
CASE
WHEN TERMS EQ TERM THEN 'Here'
when term is null THEN 'Gone'
end as status
from help a
left join have b
on a.terms eq b.term
and b.name="&name"
where a.terms between "&minterm" and "&maxterm";
;
quit;
%MEND RET ;
*call macro;
proc sort data=have;
by name term;
run;
data _null_;
set have;
by name;
if first.name then do;
str=catt('%ret(', name, ');');
call execute(str);
end;
run;
*append results;
data all;
set _hg:;
run;
You can actually do this in a single nested SQL query. It would be messy and hard to read.
I'm going to break it out into the three components.
First, get the distinct names;
proc sql noprint;
create table names as
select distinct name from have;
quit;
Second, Cartesian product names and terms to get all the combos.
proc sql noprint;
create table temp as
select a.name, b.terms as term
from names as a,
help as b;
quit;
Third, left join to find the matches
proc sql noprint;
create table want as
select a.name,
a.term,
case
when missing(b.term) then "gone"
else "here"
end as Status
from temp as a
left join
have as b
on a.name=b.name
and a.term=b.term;
quit;
Last, delete the temp table to save space;
proc datasets lib=work nolist;
delete temp;
run;
quit;
As Reeza shows, there are other ways to do this. As I said above, you can merge all this into a single SQL join and get the results you want. Depending on computer memory and data size, it should be OK (and might be faster as everything is in memory).
proc sql;
create table want as
select c.name, c.terms, a.term,
( case when missing(a.term) then "Gone"
else "Here" end ) as status
from (select distinct a.name, b.terms
from have a, help b) c
left join have a
on c.terms = a.term and c.name = a.name
order by c.name, c.terms, a.term
;
I'm going to throw in my similar answer so I can compare them all later.
proc sql ;
create table studtermlist as
select distinct term,name
from have
;
create table MAXMINTERM as
SELECT Max(TERM) as MAXTERM, Min(TERM) as MINTERM, name
FROM HAVE
GROUP BY name
;
CREATE TABLE TERMLIST AS
SELECT TERMS,name
FROM HELP a,MAXMINTERM b
WHERE TERMS BETWEEN MINTERM and MAXTERM
ORDER BY name,TERMS
;
CREATE TABLE HEREGONE AS
SELECT
a.terms ,
a.Name ,
CASE
WHEN TERMS EQ TERM THEN 'Here'
when term is null THEN 'Gone'
end as status
from termlist a left join studtermlist b
on a.terms eq b.term
and a.name eq b.name
order by name, terms
;
quit;
I have a problem that seems pretty simple (probably is...) but I can't get it to work.
The variable 'name' in the dataset 'list' has a length of 20. I wish to conditionally select values into a macro variable, but often the desired value is less than the assigned length. This leaves trailing blanks at the end, which I cannot have as they disrupt future calls of the macro variable.
I've tried trim, compress, btrim, left(trim, and other solutions but nothing seems to give me what I want (which is 'Joe' with no blanks). This seems like it should be easier than it is..... Help.
data list;
length id 8 name $20;
input id name $;
cards;
1 reallylongname
2 Joe
;
run;
proc sql;
select trim(name) into :nameselected
from list
where id=2;
run;
%put ....&nameselected....;
Actually, there is an option, TRIMMED, to do what you want.
proc sql noprint;
select name into :nameselected TRIMMED
from list
where id=2;
quit;
Also, end PROC SQL with QUIT;, not RUN;.
It works if you specify a separator:
proc sql;
select trim(name) into :nameselected separated by ''
from list
where id=2;
run;