The following inherited simplified code is meant to replace missing values of a column with the values of not missing entries in a group:
DATA WORK.TOYDATA;
INPUT Category $ PRICE;
DATALINES;
Cat1 2
Cat1 .
Cat1 .
Cat2 .
Cat2 3
Cat2 .
;
DATA WORK.OUTTOYDATA;
SET WORK.TOYDATA;
BY Category ;
RETAIN _PRICE;
IF FIRST.Category THEN _PRICE=PRICE;
IF NOT MISSING(PRICE) THEN _PRICE=PRICE;
ELSE PRICE=_PRICE;
DROP _PRICE;
RUN;
Unfortunately, this will not work if the first entry in a group is missing. How could this be fixed?
As SAS works row by row through the dataset there is no value to replace if the first value is missing.
You could sort the data by Category and Price DESCENDING to circumvent this.
proc sort data= WORK.TOYDATA; by Category DESCENDING PRICE; run;
Or if there is only one NON-missing value by category you could use a sql join e.g.
proc sql;
create table WORK.OUTTOYDATA as
select a.Category, coalesce(a.PRICE, b.PRICE) as PRICE
from WORK.TOYDATA a
left join (select distinct Category, PRICE
from WORK.TOYDATA
where PRICE ne .
) b
on a.Category eq b.Category
;
quit;
As #Jetzler pointed out, the easiest way is just to sort the data. However, if you have multiple columns with missing values then you'd need to do multiple sorts, which isn't efficient.
Another option from doing a join is proc stdize which can be used to replace missing values with a simple measure (mean, median, sum etc). The default method will suffice in your example, you just need to add the reponly option which only replaces missing values and does not standardize the data.
DATA WORK.TOYDATA;
INPUT Category $ PRICE;
DATALINES;
Cat1 2
Cat1 .
Cat1 .
Cat2 .
Cat2 3
Cat2 .
;
run;
proc stdize data=TOYDATA out=want reponly;
by category;
var price;
run;
Related
I have the following dataset and code:
options nocenter;
DATA survey;
INPUT product_id department;
DATALINES;
1212 Sales
1213 Sales
1214 Marketing
;
PROC PRINT; RUN;
data sales marketing;
set survey;
if department = 'Sales' then output sales;
else if department = 'Marketing' then output marketing;
run;
title 'Sales employees';
proc print data= sales;
run;
title;
title 'Marketing employees';
proc print data= marketing;
run;
title;
This however gives me two tables with all the values while I only a table with the marketing- and sales values. Also the title appears above the second table but not above the first. Any thoughts what goes wrong?
Your missing a '$' sign after your variable 'department', so you get the '.' for missing (numeric) values. In addition to that the variable is truncating my value of Marketing to Marketin, so the data set Marketing never finds a string that equals 'Marketing', so your input should be INPUT product_id department $10.; . The title statements work of for me.
Here is my data :
data example;
input id sports_name;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
This is just a sample. The variable sports_name is categorical with 56 types.
I am trying to transpose the data to wide form where each row would have a user_id and the names of sports as the variables with values being 1/0 indicating Presence or absence.
So far, I used proc freq procedure to get the cross tabulated frequency table and put that in a different data set and then transposed that data. Now i have missing values in some cases and count of the sports in rest of the cases.
Is there any better way to do this?
Thanks!!
You need a way to create something from nothing. You could have also used the SPARSE option in PROC FREQ. SAS names cannot have length greater than 32.
data example;
input id sports_name :$16.;
retain y 1;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;;;;
run;
proc print;
run;
proc summary data=example nway completetypes;
class id sports_name;
output out=freq(drop=_type_);
run;
proc print;
run;
proc transpose data=freq out=wide(drop=_name_);
by id;
var _freq_;
id sports_name;
run;
proc print;
run;
Same theory here, generate a list of all possible combinations using SQL instead of Proc Summary and then transposing the results.
data example;
informat sports_name $20.;
input id sports_name $;
datalines;
1 baseball
1 basketball
1 cricket
1 soccer
2 golf
2 fencing
;
run;
proc sql;
create table complete as
select a.id, a_x.sports_name, case when not missing(e.sports_name) then 1 else 0 end as Present
from (select distinct ID from example) a
cross join (select distinct sports_name from example) a_x
full join example as e
on e.id=a.id
and e.sports_name=a_x.sports_name;
quit;
proc transpose data=complete out=want;
by id;
id sports_name;
var Present;
run;
NAME DATE
---- ----------
BOB 24/05/2013
BOB 12/06/2012
BOB 19/10/2011
BOB 05/02/2010
BOB 05/01/2009
CARL 15/05/2011
LOUI 15/01/2014
LOUI 15/05/2013
LOUI 15/05/2012
DATA newdata;
SET mydata;
count + 1;
IF FIRST.name THEN count=1;
BY name DESCENDING date;
run;
here i got count group wise 1,2,3 so on..I want the output of name(all obs of bob) if count> 3. please help me..
The simplest way to do that is to output the last row for each ID if it is > 3, then merge that dataset back to your master dataset, keeping only matches. You could also use PROC FREQ to generate the dataset of counts and merge to that.
You can do it in a single datastep using a DoW loop, but that's more complicated, so I wouldn't recommend a new user do that.
I think this shows the power of SQL - though some would say since this generates a NOTE in the log it isn't good practice. Use the GROUP & HAVING clause in SQL to create a count of the names that you then limit to 3.
proc sql;
create table want as
select *
from have
group by name
having count(name)>3;
quit;
Here are a couple different ways to do this using SUBQUERIES in PROC SQL
Data HAVE;
Length NAME $50;
Input Name $ Date: ddmmyy10.;
Format date ddmmyy10.;
datalines;
BOB 24/05/2013
BOB 12/06/2012
BOB 19/10/2011
BOB 05/02/2010
BOB 05/01/2009
CARL 15/05/2011
LOUI 15/01/2014
LOUI 15/05/2013
LOUI 15/05/2012
;
Run;
Using a multiple-value subquery in the Where statement
Proc sql;
Create table WANT1 as
Select *
From Have
Where Name in (Select name from have b group by b.name having count(b.name)>3);
Quit;
Using a subquery in the From clause
Proc sql;
Create table WANT2 as
Select a.name, a.date
From Have a Inner Join (select name, count(name) as Count from have b group by b.name having Count>3)
On a.name=b.name
;
Quit;
Trying to make a more simple unique identifier from already existing identifier. Starting with just and ID column I want to make a new, more simple, id column so the final data looks like what follows. There are 1million + id's, so it isnt an option to do if thens, maybe a do statement?
ID NEWid
1234 1
3456 2
1234 1
6789 3
1234 1
A trivial data step solution not using monotonic().
proc sort data=have;
by id;
run;
data want;
set have;
by id;
if first.id then newid+1;
run;
using proc sql..
(you can probably do this without the intermediate datasets using subqueries, but sometimes monotonic doesn't act the way you'd think in a subquery)
proc sql noprint;
create table uniq_id as
select distinct id
from original
order by id
;
create table uniq_id2 as
select id, monotonic() as newid
from uniq_id
;
create table final as
select a.id, b.newid
from original_set a, uniq_id2 b
where a.id = b.id
;
quit;
I am trying to basically do this :
I have a frequency query running on a data set which will output the result in excel.
I also want to add a column to the output in which the value will be based on what is listed in a particular cell or a particular column.
How would I go about this? (*very new sas user)
Without hearing more information, I assume what you're trying to do is save the output of your proc freq and then manipulate it further with a data step.
Simple example of this:
data beer;
length firstname favbrand $20.;
input firstname $
favbrand $;
datalines;
John bud
Steve dogfishhead
Jason coors
Anna anchorsteam
Bob bud
Dan bud
;
run;
proc freq data=beer;
table favbrand / out=freqout;
run;
data beerstat(keep=favbrand status);
set freqout;
* create a new column called "status" based on the count column ;
if (count >=2) then status="popular";
else status = "hipster";
run;
* instead of proc print you can send your output to excel with proc export ;
proc print data=beerstat;
run;