I have this column with table named 'price' in SAS studio.
ID Price
1 12.90
2 12.30
3 N/A
4 NA
5 NoValue
6 97.02
7 87.45
I plan to replace all strings (N/A, NA, NoValue) in the dataset with SAS null values. May I know how could I start with PROC SQL?
SQL:
proc sql noprint;
create table want as
CASE
when(upcase(Price) IN("N/A", "NA", "NOVALUE") ) then ''
else Price
END as Price
from have
;
quit;
Data step:
data want;
set have;
if(upcase(Price) IN("N/A", "NA", "NOVALUE") ) then call missing(Price);
run;
Related
Dataset a:-
cc dob enrolled
1 10-13-1981 10-13-2001
2 10-17-1984 12-15-2004
3 07-20-1957 12-20-2007
4 10-13-1989 12-24-2010
5 10-13-1996 12-28-2013
6 10-14-1996 12-11-1999
7 10-15-1996 12-24-2010
8 10-16-1996 12-24-2010
9 10-17-1996 12-24-2010
10 10-18-1996 12-24-2010
SAS Code:-
proc sql;
select distinct count(*) as cust_enrolled ,year(enrolled) as yr
from a
group by yr
order by cust_enrolled desc;
quit;
Result:-
cust_enrolled yr
5 2010
1 2013
1 2004
1 1999
1 2001
1 2007
My query is to get the first row from this result. How can I achieve this?
Typically I would use a having clause testing an aggregate such as freq=max(freq). However, since freq is already an aggregate count(*) that has to be in a sub-select.
Example:
data have;
input cc dob: mmddyy10. enrolled: mmddyy10.;
format dob enrolled mmddyy10.;
datalines;
1 10-13-1981 10-13-2001
2 10-17-1984 12-15-2004
3 07-20-1957 12-20-2007
4 10-13-1989 12-24-2010
5 10-13-1996 12-28-2013
6 10-14-1996 12-11-1999
7 10-15-1996 12-24-2010
8 10-16-1996 12-24-2010
9 10-17-1996 12-24-2010
10 10-18-1996 12-24-2010
;
proc sql;
create table most_popular_enrollment_year as
select * from
(select count(*) as freq, year(enrolled) as yr_enroll
from have
group by yr_enroll
)
having freq=max(freq)
;
quit;
If there are multiple years with the max number of year enrollment count the query will return multiple rows. If you want the earliest year of those you need another nesting.
proc sql;
create table earliest_most_popular as
select * from
(
select * from
(
select count(*) as freq, year(enrolled) as yr_enroll
from have
group by yr_enroll
)
having freq=max(freq)
)
having yr_enroll=min(yr_enroll)
;
quit;
Another way is to sort by yr_enroll and use Proc SQL option OUTOBS=1 to grab the first
proc sql outobs=1;
create table earliest_most_popular as
select * from
(
select count(*) as freq, year(enrolled) as yr_enroll
from have
group by yr_enroll
)
having freq=max(freq)
order by yr_enroll
;
reset outobs=max;
You can use the OUTOBS option of PROC SQL to control how many observations the SELECT statement writes to the output destination(s).
First let's convert your listing into an actual dataset.
data have;
input cc dob :mmddyy. enrolled :mmddyy.;
format dob enrolled date9.;
datalines;
1 10-13-1981 10-13-2001
2 10-17-1984 12-15-2004
3 07-20-1957 12-20-2007
4 10-13-1989 12-24-2010
5 10-13-1996 12-28-2013
6 10-14-1996 12-11-1999
7 10-15-1996 12-24-2010
8 10-16-1996 12-24-2010
9 10-17-1996 12-24-2010
10 10-18-1996 12-24-2010
;
Now let's run your SELECT statement with OUTOBS set to 1. Make sure to give it some criteria for deciding which observation to take when there are ties for the largest count.
proc sql outobs=1;
select year(enrolled) as yr
, count(*) as cust_enrolled
from have
group by yr
order by cust_enrolled desc, yr
;
quit;
Results:
cust_
yr enrolled
----------------------
2010 5
You can use data set options anywhere. SQL doesn't guarantee an order so you often will want logic that's more complicated than simply the first, but if that's what you want using the OBS=1 option is a decent option.
proc sql;
select * from sashelp.class(obs=1);
quit;
If you want something besides the first, use FIRSTOBS and OBS together.
proc sql;
select * from sashelp.class(firstobs=10 obs=10);
quit;
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
I have a data that has column A with following data
Column A
--------
1
2
?
2
I used the query:
proc sql;
select
if A= '?' then A=., count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
This is not working. Please suggest how can i replace the ? to any standard value?
In SQL it's a CASE statement, not IF/THEN.
proc sql;
select
case when a='?' then .
else a end as a, count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
Or you could use an IFC() function as well.
proc sql;
select
ifc(a='?', ., a) as a, count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
Column A contains "?" so it is character valued. The #reeza code should be then "" or ifc(a='?',"", a). Also, if you do not also select the grouping variable the context of the N_obs is lost.
Suggest
data have;
input a $ nsp ;
datalines;
1 2
2 3
? 7
2 7
run;
proc sql;
select
nsp
, case when a='?' then '' else a end as a
, count(*) as nsp_count
from have
group by nsp
;
quit;
The query will also log the message NOTE: The query requires remerging summary statistics back with the original data. as Proc SQL is performing an automatic remerge of group aggregates with individual rows within the group.
I have one table having 4 columns and i want to separate them into 2 table 2 columns in one table and 2 columns in another table.but both table should be below to each other.I want this in proc report format.code should be in report.
id name age gender
1 abc 21 m
2 pqr 23 f
3 qwe 25 f
4 ert 54 m
i want id and name in one table and age and gender in other table.but one below the other in ods excel.
I've split the main table into two tables using a data setp then appended them to each other, I added an extra columns called "source" in order to be differniate between the tables. if you use a Proc report you can group by "source"
Code:
*Create input data*/
data have;
input id name $ age gender $ ;
datalines;
1 abc 21 m
2 pqr 23 f
3 qwe 25 f
4 ert 54 m
;;;;
run;
/*Split / create first table*/
data table1;
set have;
source="table1: id & name";
keep source id name ;
run;
/*Split / create second table*/
data table2;
set have;
source="table2: age & gender";
keep source age gender;
run;
/*create Empty table*/
data want;
length Source $30. column1 8. column2 $10.;
run;
proc sql; delete * from want; quit;
/* Append both tables to each other*/
proc append base= want data=table1(rename=(id=column1 name=column2)) force ; run;
proc append base= want data=table2(rename=(age=column1 gender=column2)) force ; run;
/*Create Report*/
proc report data= want;
col source column1 column2 ;
define source / group;
run;
Output Table:
Report:
For data
data have;input
id name $ age gender $; datalines;
1 abc 21 m
2 pqr 23 f
3 qwe 25 f
4 ert 54 m
run;
Being output as Excel, the splitting into two parts can be done via two Proc REPORT steps; each step responsible for a single set of columns. Options are used in the ODS EXCEL to control how sheet processing is handled.
The first step manages the common header through DEFINE, the subsequent steps are NOHEADER and don't need DEFINE statements. Each step must define and compute the value of the new source column. There will be a one Excel row gap between each table.
ods _all_ close;
ods excel file='want.xlsx' options(sheet_interval='NONE');
proc report data=have;
column source id name;
define id / 'Column 1';
define name / 'Column 2';
define source / format=$20.;
compute source / character length=20; source='ID and NAME'; endcomp;
run;
proc report data=have noheader;
column source age gender;
define source / format=$20.;
compute source / character length=20; source='AGE and GENDER'; endcomp;
run;
ods excel close;
There is no reasonable single Proc REPORT step that would produce similar output from dataset have.
I've the below dataset as input
ID
--
1
2
2
3
4
4
4
5
And need a new dataset as below
ID count of ID
-- -----------
1 1
2 2
3 1
4 3
5 1
Could you please tell how to do this in SAS wihtout using PROC SQL?
or how about Proc Freq or Proc Summary? These avoid having to presort the data.
proc freq data=have noprint;
table id / out=want1 (drop=percent);
run;
proc summary data=have nway;
class id;
output out=want2 (drop=_type_);
run;
proc sql noprint;
create table test as select distinct id, count(id)
from your_table
group by ID
order by ID
;
quit;
Try this:
DATA Have;
input id ;
datalines;
1
2
2
3
4
4
4
5
;
Proc Sort data=Have;
by ID;
run;
Data Want;
Set Have;
By ID;
If first.ID then Count=0;
Count+1;
If Last.ID then Output;
Run;
PROC SORT DATA=YOURS NOPRINT;
BY ID; RUN;
PROC MEANS DATA=YOURS;
VAR ID;
BY ID;
OUTPUT OUT=NEWDATASET N=; RUN;
You can also choose to keep only the Id and N variables in your newdataset.
We can use simple PROC SQL count to do this:
proc sql;
create table want as
select id, count(id) as count_of_id
from have
group by id;
quit;
Here is yet another possibility, often known as a DoW construction:
Data want;
do count=1 by 1 until(last.ID);
set have;
by id;
end;
run;
If the aggregation you want to do is complex then go with PROC SQL only as we are more familiar with Group by in SQL
proc sql ;
create table solution_1 as select distinct ID, count(ID)
from table_1
group by ID
order by ID
;
quit;
OR
If you are using SAS- EG Query builders are very useful in small
analyses .
It's just drag & drop the columns u want to aggregate and in summary option Select whatever operation you want to perform like Avg,Count,miss,NMiss etc .