Is there any PERIOD datatype for SAS sql ? - sas

I have the following table for people running a marathon
person start end
mike 2-Jun-14 2-Aug-14
nike 3-Jul-14 9-Aug-14
mini 1-Aug-14 3-Nov-14
I want to know if a person was "running the marathon" on the 1st of each month. The desired table should look like this
person running on
mike 1-Jul-14
mike 1-Aug-14
nike 1-Aug-14
mini 1-Aug-14
mini 1-Sep-14
... for all other months
Is there any way in SAS proc sql to get this result. The other dbms have this (Teradata, Oracle). What is the best approach to this in SAS ?

I would use a DATA STEP to solve this in SAS.
data have;
input person $ from :anydtdte. to :anydtdte.;
format from to date11.;
datalines;
mike 2-Jun-14 2-Aug-14
nike 3-Jul-14 9-Aug-14
mini 1-Aug-14 3-Nov-14
;
run;
data want(keep=person running_on);
set have;
format Running_on date11.;
do Running_on=from to to;
if day(running_on) = 1 then
output;
end;
run;

Related

Proc Append Duplication (SAS)

I was working on a SAS problem where I need to append the data. The data run is successful but it creates duplicates every time I run the program.
Please check my code and screenshot of the table:
Question: Create a new file "Total_Sales" by appending data file "Hyundai" with the file first created in problem 3.
/*Problem 3*/:
data avik1.var1;
length uniqueid $50 Manufacturer $ 50 Model $20 Sales_in_thousands 8 _4_year_resale_value 8 Price_in_thousands 8;
retain uniqueid Manufacturer Model Latest_Launch Sales_in_thousands _4_year_resale_value Price_in_thousands;
set avik1.conc(drop= Vehicle_type Engine_size Horsepower Wheelbase Width Length Curb_weight Fuel_capacity Fuel_efficiency );
informat Latest_Launch date9.;
format Latest_Launch ddmmyy10.;
run;
proc print data = avik1.var1;
run;
/* Data To be Appended */
data avik1.hyundai;
length uniqueid $ 50 Manufacturer $ 50 Model $20 Sales_in_thousands 8 _4_year_resale_value 8;
informat Latest_Launch date7. ;
format Latest_Launch ddmmyy10.;
input Manufacturer $ Model $ Sales_in_thousands _4_year_resale_value Latest_Launch;
uniqueid=(Model||Manufacturer);
cards;
Hyundai Tuscon 16.919 16.36 2Feb12
Hyundai i45 39.384 19.875 3Jun11
Hyundai Verna 14.114 18.225 4Jan12
Hyundai Terracan 8.558 29.775 10Mar11
;
run;
Proc Print data = avik1.hyundai;
run;
Now I used the following code to append:
data avik1.total_sales;
set avik1.var1 avik1.hyundai;
proc append base=avik1.var1 new=avik1.hyundai force;
run;
proc print data= avik1.total_sales;
run;
The program runs but gets me duplicates which you can check in the image
Screenshot in Yellow Mark Shows Duplicates
I am new to SAS really appreciate your response and solution to this problem. Also please tell me why this is happening.
Thanks!
Did you run it twice? I'm guessing but that could be the reason you see duplicates. I'll try to explain.
In your append code here, you are creating the new dataset total_sales by combining var1 and hyundai:
data avik1.total_sales;
set avik1.var1 avik1.hyundai;
In the below code, you are not creating a new dataset, you are expanding var1 by adding the records from hyundai.
proc append base=avik1.var1 new=avik1.hyundai force;
run;
If you ran this proc append and then ran the first data step again, you will have duplicates of all hyundai records because you are taking the EXPANDED var1 and re-adding the hyundai records.
So the point is, to answer the original question, the proc append procedure is totally unnecessary. You achieved it with just the data step.

The Subsetting IF Statement with date in SAS

I have been working on a SAS problem on the University Edition where it is given that:
Separate out the data only for passenger vehicle launched after 1-October-2014;
data passenger;
set avik1.clean;
informat Latest_Launch ddmmyy10.;
if Vehicle_type = "Passenger" and Latest_Launch > "01-10-2014";
run;
proc print data=passenger;
run;
I am able to separate only the passenger vehicles however my date has no effect as it doesn't separate out the dates after 01/10/2014.
I ran the Proc Contents Command just in case you would like to have a look on my data attributes
Proc Contents Print Output
I am new to SAS and I am facing some issues whenever there is a date problem.
In SAS date constants are written 'DDMONYYYY'D date9 format followed by D.
for you '01OCT2014'd

How do I find the percentage of people in a group in SAS?

DATA test;
INPUT name$ group_no$;
CARDS;
John 1
Michelle 1
Peter 1
Kai 2
Peter 2
Liam 2
Claire 2
Sam 3
Jim 3
run;
How do I find the percentage of people within each group. i.e 33.3% in group 1. 44.4% in group 2 etc....
I tried using the code below but it was not sufficient in answering my question. I believe I may have to use SQL code;
Proc FREQ data = test;
TABLE group_no;
BY group_no;
RUN;
Please let me know how to solve the issue.
Proc FREQ data = test;
TABLE group_no;
RUN;
proc means as shown by data null is the way to go. In SQL you can do as shown below.
proc sql;
select group_no,
count(group_no) *100/(select count(*) from test) as percentage format= 5.2
from test
group by group_no
;

How to work across two datasets in SAS

I have two datasets described below
data1:
$restaurant $reviewers
A Tom
B Jack.Mary.Joan
C Tom.Joan
D Rose
data2 (sorted by the friends numbers):
$user $friends
Tom Joan.Mary.Jack
Jack Tom.Rose
Mary Tom
Joan Tom
The question is to calculate the overlap in the reviews of these users with the reviews of their friends.
Take an example of Tom, the restaurants Toms friends reviewed are B and C, from which C was also reviewed by Tom. So here the percentage is C/B+C = 1/2, so the overlap is 50%.
I think I need a loop to work across two datasets, but with very basic knowledge of SAS, I don't know how. Has anybody an idea?
Thank you very much.
You should try something like this.
data reviews;
infile datalines dsd dlm=",";
input restaurant $ reviewer $;
datalines;
A,Tom
B,Jack
B,Mary
B,Joan
C,Tom
C,Joan
D,Rose
;
run;
data users;
infile datalines dsd dlm=",";
input user $ friend $;
datalines;
Tom,Joan
Tom,Mary
Tom,Jack
Jack,Tom
Jack,Rose
Mary,Tom
Joan,Tom
;
run;
proc sql;
create table want as
select t1.user
,sum(case when t3.restaurant=t2.restaurant then 1 else 0 end)/count(*) as percentage
from users t1
inner join reviews t2
on t1.user=t2.reviewer
inner join reviews t3
on t1.friend=t3.reviewer
group by t1.user
;
quit;
I did'nt get your 0,5 value for Tom, but maybe you have a mistake.
So you can adapt the code as needed.
I followed the logic from here :
How to check percentage overlap in SAS

SAS: PROC MEANS Grouping in Class Variable

I have the following sample data and 'proc means' command.
data have;
input measure country $;
datalines;
250 UK
800 Ireland
500 Finland
250 Slovakia
3888 Slovenia
34 Portugal
44 Netherlands
4666 Austria
run;
PROC PRINT data=have; RUN;
The following PROC MEANS command prints out a listing for each country above. How can I group some of those countries (i.e. UK & Ireland, Slovakia/SLovenia as Central Europe) in the PROC MEANS step, rather than adding another datastep to add a 'case when' etc?
proc means data=have sum maxdec=2 order=freq STACKODS;
var measure;
class country;
run;
Thanks for any help at all on this. I understand there are various things you can do in the PROC MEANS command itself (like limit the number of countries by doing this:
proc means data=have(WHERE=(country not in ('Finland', 'UK')
I'd like to do the grouping in the PROC MEANS command for brevity.
Thanks.
This is very easy with a format for any PROC that takes a CLASS statement.
Simply build a format, either with code or from data; then apply the format in the PROC MEANS statement.
proc format lib=work;
value $countrygroup
"UK"="British Isles"
"Ireland"="British Isles"
"Slovakia","Slovenia"="Central Europe"
;
quit;
proc means data=have;
class country;
var measure;
format country $countrygroup.;
run;
It's usually better to have numeric codes for country and then format those to be whichever set of names is needed at any one time, particularly as capitalization/etc. is pretty irritating, but this works well enough even here.
The CNTLIN= option in PROC FORMAT allows you to make a format from a dataset, with FMTNAME as the value statement, START as the value-to-label, LABEL as the label. (END=end of range if numeric.) There are other options also, the documentation goes into more detail.