I have the following dataset and code:
options nocenter;
DATA survey;
INPUT product_id department;
DATALINES;
1212 Sales
1213 Sales
1214 Marketing
;
PROC PRINT; RUN;
data sales marketing;
set survey;
if department = 'Sales' then output sales;
else if department = 'Marketing' then output marketing;
run;
title 'Sales employees';
proc print data= sales;
run;
title;
title 'Marketing employees';
proc print data= marketing;
run;
title;
This however gives me two tables with all the values while I only a table with the marketing- and sales values. Also the title appears above the second table but not above the first. Any thoughts what goes wrong?
Your missing a '$' sign after your variable 'department', so you get the '.' for missing (numeric) values. In addition to that the variable is truncating my value of Marketing to Marketin, so the data set Marketing never finds a string that equals 'Marketing', so your input should be INPUT product_id department $10.; . The title statements work of for me.
Related
I have the following dataset:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11SEPT20016
2 12AUG2016
3 14JAN2016
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
What I would like to do now is classify the records based on their last visit. So what I want to do is:
Set a date (fe, 10SEPT 2016)
Classify all records that have a lastvisit > 30days as 1, Classify all records that have a lastvisit > 60days as 2 etc...
Any thoughts on how I need to program this?
You could build something like this (count the days between the dates, divide them by 30 and ceil them). Alternativly, if you want to use months and not 30 days, you can replace the first intck parameter with 'month' and remove the ceil and /30:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11SEP2016
2 12AUG2016
3 14JAN2016
4 09SEP2016
5 10AUG2016
;
RUN;
%let lastvisit=10SEP2016;
data result;
set survey;
days_30=ceil(intck('days', order_date,"&lastvisit"d)/30)-1;
run;
PROC PRINT data = result;
format order_date date9.;
RUN;
I have the following dataset and code:
DATA survey;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
PROC PRINT; RUN;
data work;
set survey;
where '11JAN2007'<= order_date <= '13JAN2007';
proc print data=work;
run;
When I run this code it does give the desired output however. It only gives a table with three empty order_date columns.
Any thoughts on what goes wrong here?
This would work:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
data work;
set survey;
where '11JAN2007'd<= order_date <= '13JAN2007'd;
run;
proc print data=work;
format order_date date9. ;
run;
See SAS help for topics date, informat,...
If you want to query based on date, you need to tell SAS that your string is a date. You do this by putting a 'd' after the date string, e.g.
'11JAN2007'd
The following inherited simplified code is meant to replace missing values of a column with the values of not missing entries in a group:
DATA WORK.TOYDATA;
INPUT Category $ PRICE;
DATALINES;
Cat1 2
Cat1 .
Cat1 .
Cat2 .
Cat2 3
Cat2 .
;
DATA WORK.OUTTOYDATA;
SET WORK.TOYDATA;
BY Category ;
RETAIN _PRICE;
IF FIRST.Category THEN _PRICE=PRICE;
IF NOT MISSING(PRICE) THEN _PRICE=PRICE;
ELSE PRICE=_PRICE;
DROP _PRICE;
RUN;
Unfortunately, this will not work if the first entry in a group is missing. How could this be fixed?
As SAS works row by row through the dataset there is no value to replace if the first value is missing.
You could sort the data by Category and Price DESCENDING to circumvent this.
proc sort data= WORK.TOYDATA; by Category DESCENDING PRICE; run;
Or if there is only one NON-missing value by category you could use a sql join e.g.
proc sql;
create table WORK.OUTTOYDATA as
select a.Category, coalesce(a.PRICE, b.PRICE) as PRICE
from WORK.TOYDATA a
left join (select distinct Category, PRICE
from WORK.TOYDATA
where PRICE ne .
) b
on a.Category eq b.Category
;
quit;
As #Jetzler pointed out, the easiest way is just to sort the data. However, if you have multiple columns with missing values then you'd need to do multiple sorts, which isn't efficient.
Another option from doing a join is proc stdize which can be used to replace missing values with a simple measure (mean, median, sum etc). The default method will suffice in your example, you just need to add the reponly option which only replaces missing values and does not standardize the data.
DATA WORK.TOYDATA;
INPUT Category $ PRICE;
DATALINES;
Cat1 2
Cat1 .
Cat1 .
Cat2 .
Cat2 3
Cat2 .
;
run;
proc stdize data=TOYDATA out=want reponly;
by category;
var price;
run;
I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;
I need a column a total as an observation.
Input Dataset Output Dataset
------------- --------------
data input; Name Mark
input name$ mark; a 10
datalines; b 20
a 10 c 30
b 20 Total 60
c 30
;
run;
The below code which I wrote is working fine.
data output;
set input end=eof;
tot + mark;
if eof then
do;
output;
name = 'Total';
mark = tot;
output;
end;
else output;
run;
Please suggest if there is any better way of doing this.
PROC REPORT is a good solution for doing this. This summarizes the entire report - other options give you the ability to summarize in groups.
proc report out=outds data=input nowd;
columns name mark;
define name/group;
define mark/analysis sum;
compute after;
name = "Total";
line "Total" mark.sum;
endcomp;
run;
Your code is fine in general, however the issue might be in terms of performance. If the input table is huge, you end up rewriting full table.
I'd suggest something like this:
proc sql;
delete from input where name = 'Total';
create table total as
select 'Total' as name length=8, sum(mark) as mark
from input
;
quit;
proc append base=input data=total;
run;
Here you are reading full table but writing only a single row to existing table.