I am a beginner user of SAS and currently, I'm following one of the Coursera courses. The code given in a lecture doesn't work although I don't see an error. Below the code:
PROC IMPORT DATAFILE ='/home/student123/my_courses/nesarc_pds.csv' OUT = data REPLACE;
LABEL TAB12MDX ="Tobacco Dependance Past 12 Months"
CHECK321 ="Smoked Cigarettes in Past 12 Months"
S3AQ3B1 ="Usual Smoking Frequency"
S3AQ3C1 ="Usual Smoking Quantity";
IF S3AQ3B1=9 THEN S3AQ3B1=.;
IF S3AQ3C1=99 THEN S3AQ3C1=.;
IF CHECK321=1;
IF AGE LE 25;
PROC SORT; BY IDNUM;
PROC FREQ; TABLES TAB12MDX CHECK321 S3AQ3B1 S3AQ3C1 AGE;
RUN;
The error I see in a log is:
80 IF S3AQ3B1=9 THEN S3AQ3B1=MISSING;
__
180
81 IF CHECK321=1;
__
180
82 IF AGE LE 25;
__
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
I use SAS Studio but I don't know does it matter.
Does anyone know where is an error?
You need a data step or procedure to modify data
You should use DATA= to point the procs to the correct data input
You only need one RUN but it helps do delineate your code.
PROC IMPORT DATAFILE ='/home/student123/my_courses/nesarc_pds.csv' OUT = data REPLACE;
RUN; /*3*/
Data yourData; /*1*/
set data;
LABEL TAB12MDX ="Tobacco Dependance Past 12 Months"
CHECK321 ="Smoked Cigarettes in Past 12 Months"
S3AQ3B1 ="Usual Smoking Frequency"
S3AQ3C1 ="Usual Smoking Quantity";
IF S3AQ3B1=9 THEN S3AQ3B1=.;
IF S3AQ3C1=99 THEN S3AQ3C1=.;
IF CHECK321=1;
IF AGE LE 25;
run; /*3*/
PROC SORT data=yourData; /*2*/
BY IDNUM;
RUN; /*3*/
PROC FREQ data=yourData; /*2*/
TABLES TAB12MDX CHECK321 S3AQ3B1 S3AQ3C1 AGE;
RUN;
You cannot run an IF statement or a LABEL statement in the middle of nowhere. They have to be part of a data step (or pat of a proc step that supports those statements). You need a DATA step. (Note it is also a good idea to give your datasets more meaningful names than data.)
PROC IMPORT DATAFILE ='/home/student123/my_courses/nesarc_pds.csv'
OUT = nesarc_pds REPLACE
;
run;
data youth;
set nesarc_pds ;
LABEL TAB12MDX ="Tobacco Dependance Past 12 Months"
CHECK321 ="Smoked Cigarettes in Past 12 Months"
S3AQ3B1 ="Usual Smoking Frequency"
S3AQ3C1 ="Usual Smoking Quantity"
;
IF S3AQ3B1=9 THEN S3AQ3B1=.;
IF S3AQ3C1=99 THEN S3AQ3C1=.;
IF CHECK321=1;
IF AGE LE 25;
run;
PROC SORT data=youth; BY IDNUM;
run;
PROC FREQ data=youth; TABLES TAB12MDX CHECK321 S3AQ3B1 S3AQ3C1 AGE;
RUN;
Related
Could anyone please tell the difference between table and tables in Proc freq with an example?
proc freq data= want;
table variable;
run;
proc freq data= want;
tables variable;
run;`
There is no difference. The statement is the TABLES statement, but SAS will silently accept TABLE as a synonym without issuing any warning or note. Some miss spellings will generate just a warning while others will cause an error.
1668 proc freq data= sashelp.class;
1669 tablex age name;
------
1
WARNING 1-322: Assuming the symbol TABLE was misspelled as tablex.
1670 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
1671
1672 proc freq data= sashelp.class;
1673 tabl age name;
----
1
WARNING 1-322: Assuming the symbol TABLE was misspelled as tabl.
1674 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
1675
1676 proc freq data= sashelp.class;
1677 tab age name;
---
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
1678 run;
NOTE: The SAS System stopped processing this step because of errors.
I am trying to extract all the Time occurrences for only the recent visit. Can someone help me with the code please.
Here is my data:
Obs Name Date Time
1 Bob 2017090 1305
2 Bob 2017090 1015
3 Bob 2017081 0810
4 Bob 2017072 0602
5 Tom 2017090 1300
6 Tom 2017090 1010
7 Tom 2017090 0805
8 Tom 2017072 0607
9 Joe 2017085 1309
10 Joe 2017081 0815
I need the output as:
Obs Name Date Time
1 Bob 2017090 1305,1015
2 Tom 2017090 1300,1010,0805
3 Joe 2017085 1309
Right now my code is designed to give me only one recent entry:
DATA OUT2;
SET INP1;
BY DATE;
IF FIRST.DATE THEN OUTPUT OUT2;
RETURN;
I would first sort the data by name and date. Then I would transpose and process the results.
proc sort data=have;
by name date;
run;
proc transpose data=have out=temp1;
by name date;
var value;
run;
data want;
set temp1;
by name date;
if last.name;
format value $2000.;
value = catx(',',of col:);
drop col: _name_;
run;
You may want to further process the new VALUE to remove excess commas (,) and missing value .'s.
Very similar to the question yesterday from another user, you can use quite a few solutions here.
SQL again is the easiest; this is not valid ANSI SQL and pretty much only SAS supports this, but it does work in SAS:
proc sql;
select name, date, time
from have
group by name
having date=max(date);
quit;
Even though date and time are not on the group by it's legal in SAS to put them on the select, and then SAS automatically merges (inner joins) the result of select name, max(date) from have group by name having date=max(date) to the original have dataset, returning multiple rows as needed. Then you'd want to collapse the rows, which I leave as an exercise for the reader.
You could also simply generate a table of maximum dates using any method you choose and then merge yourself. This is probably the easiest in practice to use, in particular including troubleshooting.
The DoW loop also appeals here. This is basically the precise SAS data step implementation of the SQL above. First iterate over that name, figure out the max, then iterate again and output the ones with that max.
proc sort data=have;
by name date;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then output;
end;
run;
Of course here you more easily collapse the rows, too:
data want;
length timelist $1024;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then timelist=catx(',',timelist,time);
if last.name then output;
end;
run;
If the data is sorted then just retain the first date so you know which records to combine and output.
proc sort data=have ;
by name descending date time;
run;
data want ;
set have ;
by name descending date ;
length timex $200 ;
retain start timex;
if first.name then do;
start=date;
timex=' ';
end;
if date=start then do;
timex=catx(',',timex,time);
if last.date then do;
output;
call missing(start,timex);
end;
end;
drop start time ;
rename timex=time ;
run;
I have the following dataset and code:
options nocenter;
DATA survey;
INPUT product_id department;
DATALINES;
1212 Sales
1213 Sales
1214 Marketing
;
PROC PRINT; RUN;
data sales marketing;
set survey;
if department = 'Sales' then output sales;
else if department = 'Marketing' then output marketing;
run;
title 'Sales employees';
proc print data= sales;
run;
title;
title 'Marketing employees';
proc print data= marketing;
run;
title;
This however gives me two tables with all the values while I only a table with the marketing- and sales values. Also the title appears above the second table but not above the first. Any thoughts what goes wrong?
Your missing a '$' sign after your variable 'department', so you get the '.' for missing (numeric) values. In addition to that the variable is truncating my value of Marketing to Marketin, so the data set Marketing never finds a string that equals 'Marketing', so your input should be INPUT product_id department $10.; . The title statements work of for me.
I want to sum over a specific variable in my dataset, without loosing all the other columns. I have tried the following code:
proc summary data=work.test nway missing;
class var_1 var_2 ; *groups;
var salary;
id _character_ _numeric_; * keeps all variables;
output out=test2(drop=_:) sum= ;
run;
But it does not seem to sum properly, and for the "salary" column I'm just left with the value of the last value in each group (var_1 and var_2). If I remove
id _character_ _numeric_;
it works fine, but I loose all other columns.
Example:
data:
data salary;
input name $ dept $ Salary Sex $;
datalines;
John Sales 23 M
John Sales 43 M
Mary Acctng 21 F
;
desired output:
John Sales 66 M
Mary Acctng 21 F
I think this does what you want. You still get warnings about name conflicts and variables being dropped but at least the ones you want are kept. The ID statement is depreciated in favor in the new and better IDGROUP output statement option.
You could add the AUTONAME option to the output statement if you wanted PROC SUMMARY to automatically rename the conflicting variables.
data salary;
input name $ dept $ Salary Sex $;
datalines;
John Sales 23 M
John Sales 43 M
Mary Acctng 21 F
;;;;
run;
proc print;
run;
proc summary nway missing;
class name dept;
var salary;
output out=test2(drop=_:) sum= idgroup(out(_all_)=);
run;
proc print;
run;
Try this:
data salary;
input name $ dept $ Salary Sex $;
datalines;
John Sales 23 M
John Sales 43 M
Mary Acctng 21 F
;
proc sql;
create table salary2 as
select *,
monotonic() as n,
sum(salary) as sum_salary
from salary
group by name
having max(n)=n;
quit;
I wasn't aware that SAS did this, but the problem appears to lie in the fact that the id statement takes preference over the var statement. By including all variables in the id statement, all the output is showing is the maximum value for each variable, including Salary.
One option is to pull a list of the variables not included in the class or var statements from dictionary.columns, then use that list in the id statement. Just be aware that proc summary runs in memory and I have come across out of memory problems in the past when many variables have been included in the id statement
data salary;
input name $ dept $ Salary Sex $;
datalines;
John Sales 23 M
John Sales 43 M
Mary Acctng 21 F
;
proc sql noprint;
select name into :cols separated by ' '
from dictionary.columns
where libname='WORK'
and
memname='SALARY'
and
name not in ('name','Salary');
quit;
%put &cols.;
proc summary data=salary nway missing;
class name;
var salary;
id &cols.;
output out=want (drop=_:) sum=;
run;
I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;