select specific rows by row number in sas - sas

I am new to SAS I have SAS data like (It does not contain Obs column)
Obs ID Name Score1 Score2 Score3
1 101 90 95 98
2 203 78 77 75
3 223 88 67 75
4 280 68 87 75
.
.
.
.
100 468 78 77 75
I want data having row number 2 6 8 10 34. Output should look like
Obs ID Name Score1 Score2 Score3
1 203 78 77 75
2 227 88 67 75
3 280 68 87 75
.
.
.
Thanks in advance.

The other answer is ok for small tables, but if you are working with a very large table it's inefficient as it reads every row in the table to see whether it has the right row number. Here's a more direct approach:
data example;
do i = 2, 6, 8, 10;
set sashelp.class point = i;
output;
end;
stop;
run;
This picks out just the rows you actually want and doesn't read all the others.

You can loop through each line of data with a data step and only output the lines when you are in the n'th loop with a condition like this.
data test;
set LIB.TABLE;
if _N_ in (2, 6, 8, 10, 34) then output;
run;
where _N_ will correspond to the number of the line in this case.

Related

Get frequency from dataset with repeated measurements over time

this is my problem: I have a dataset that has 10 measurements over time, something like this:
ID Expenditure Age
25 100 89
25 102 89
25 178 89
25 290 89
25 200 89
.
.
.
26 100 79
26 102 79
26 178 79
26 290 79
26 200 79
.
.
.
27 100 80
27 102 80
27 178 80
27 290 80
27 200 80
.
.
.
Now I want to obtain the frequency of age, so I did this:
proc freq data=Expenditure;
table Age / out= Age_freq outexpect sparse;
run;
Output:
Age Frequency Count Percent of total frequency
79 10 0.1
80 140 1.4
89 50 0.5
The problem is that this counts all rows, but doesn't take into account the repeated measurements per id. So I wanted to create a new colum with the actual frequencies like this:
data Age;
set Age_freq;
freq = Frequency Count /10;
run;
but I think sas doesn't recognize this 'Frequency Count' variable, can anybody gives me some insight on this?
thanks
You have to remove the duplicate records so that each ID had one record containing the age.
Solution: create a new table with the disticnt values of the ID and Age. then run the proc freq
Code:
I created a new table called Expenditure_ids that doesn't have any duplicate values for the ID & Age.
data Expenditure;
input ID Expenditure Age ;
datalines;
25 100 89
25 102 89
25 178 89
25 290 89
25 200 89
26 100 79
26 102 79
26 178 79
26 290 79
26 200 79
27 100 80
27 102 80
27 178 80
27 290 80
27 200 80
28 100 80
28 102 80
28 178 80
28 290 80
28 200 80
;
run;
proc sql;
create table Expenditure_ids as
select distinct ID, Age from Expenditure ;
quit;
proc freq data=Expenditure_ids;
table Age / out= Age_freq outexpect sparse;
run;
Output:
Age=79 COUNT=1 PERCENT=25
Age=80 COUNT=2 PERCENT=50
Age=89 COUNT=1 PERCENT=25

How to extract next12 month data from master table for each id in Sample table based on the yearmonth and ID using sas

I am currently practicing SAS programming on using two SAS dataset(sample and master) . Below are the hypothetical or dummy data created for illustration purpose to solve my problem through SAS programming . I would like to extract the data for the id's in sample dataset from master dataset(test). I have given an example with few id's as sample dataset, for which i need to extract next 12 month information from master table(test) for each id's based on the yearmonth information( desired output given in the third output).
Below is the code to extract the previous 12 month data but i am not getting idea to extract next 12 month records as pulled for previous months, Can anyone help me in solving this problem using SAS programming with optimized way.
proc sort data=test;
by id yearmonth;
run;
data result;
set test;
array prev_month {13} PREV_MONTH_0-PREV_MONTH_12;
by id;
if first.id then do;
do i =1 to 13;
prev_month(i)=0;
end;
end;
do i = 13 to 2 by -1;
prev_month(i)=prev_month(i-1);
end;
prev_month(1)=no_of_cust;
drop i prev_month_0;
retain prev_month:;
run;
data sample1;
set sample(drop=no_of_cust);
run;
proc sort data=sample1;
by id yearmonth;
run;
data all;
merge sample1(in=a) result(in=b);
by id yearmonth;
if a;
run;
One sample dataset (dataset name - sample).
ID YEARMONTH NO_OF_CUST
1 200909 50
1 201005 65
1 201008 78
1 201106 95
2 200901 65
2 200902 45
2 200903 69
2 201005 14
2 201006 26
2 201007 98
One master dataset - dataset name (test) (huge dataset over the year for each id from start of the account to till date.)
ID YEARMONTH NO_OF_CUST
1 200808 125
1 200809 125
1 200810 111
1 200811 174
1 200812 98
1 200901 45
1 200902 74
1 200903 73
1 200904 101
1 200905 164
1 200906 104
1 200907 22
1 200908 35
1 200909 50
1 200910 77
1 200911 86
1 200912 95
1 201001 95
1 201002 87
1 201003 79
1 201004 71
1 201005 65
1 201006 66
1 201007 66
1 201008 78
1 201009 88
1 201010 54
1 201011 45
1 201012 100
1 201101 136
1 201102 111
1 201103 17
1 201104 77
1 201105 111
1 201106 95
1 201107 79
1 201108 777
1 201109 758
1 201110 32
1 201111 15
1 201112 22
2 200711 150
2 200712 150
2 200801 44
2 200802 385
2 200803 65
2 200804 66
2 200805 200
2 200806 333
2 200807 285
2 200808 265
2 200809 222
2 200810 220
2 200811 205
2 200812 185
2 200901 65
2 200902 45
2 200903 69
2 200904 546
2 200905 21
2 200906 256
2 200907 214
2 200908 14
2 200909 44
2 200910 65
2 200911 88
2 200912 79
2 201001 65
2 201002 45
2 201003 69
2 201004 54
2 201005 14
2 201006 26
2 201007 98
Desired Output should like below,
ID YEARMONTH NO_OF_CUST AFTER_MONTH_1 AFTER_MONTH_2 AFTER_MONTH_3 AFTER_MONTH_4 AFTER_MONTH_5 AFTER_MONTH_6 AFTER_MONTH_7 AFTER_MONTH_8 AFTER_MONTH_9 AFTER_MONTH_10 AFTER_MONTH_11 AFTER_MONTH_12
1 200909 50 77 86 95 95 87 79 71 65 66 66 78 88
Step1: Join your sample table with the main(test) table and using intnx to get all the values for next 12 months.
Step2: Making a column names "after month"
Step3: Transpose to get your final output
proc sql;
create table abc as
select a.id,a.yearmonth,b.yearmonth as yearmonth1, b.no_of_cust
from
sample a
left join
test b
on a.id = b.id and a.yearmonth <= b.yearmonth <= intnx("month",a.yearmonth,12)
order by a.id,a.yearmonth,b.yearmonth;
quit;
data abc1(drop=col yearmonth1);
set abc;
by id yearmonth;
if first.yearmonth then col=-1;
col+1;
columns = compress("after_month_"||col);
run;
proc transpose data=abc1 out=abc2(rename=(after_month_0 = no_of_cust) drop=_name_);
by id yearmonth;
id columns;
var no_of_cust;
run;
My Output:
Or
If you want to make changes in your query then you could use the below code.
proc sort data=test;
by id descending yearmonth;
run;
data result;
set test;
array after_month {13} after_MONTH_0-after_MONTH_12;
by id;
if first.id then do;
do i = 1 to 13;
after_month(i) = 0;
end;
end;
do i = 13 to 2 by -1;
after_month(i) = after_month(i-1);
end;
after_month(1) = NO_OF_CUST;
drop i after_MONTH_0;
retain after_MONTH:;
run;
data sample1;
set sample(drop=no_of_cust);
run;
proc sort data=result;
by id yearmonth;
run;
proc sort data=sample1;
by id yearmonth;
run;
data all;
merge sample1(in=a) result(in=b);
by id yearmonth;
if a;
run;
Let me know in case of any queries.

plotting a vertical line exactly in the middle and at specific value of X in SAS

For the below x, y data, how to plot vertical line exactly in the middle of the plot (please note, X should not be ordered and plot be as it is).
Also how to plot a vertical line at x=5 (a distance on X from X=0) when X in the data below is taken as 0, 1, 2, 3.. and so forth.
data sample;
infile cards truncover expandtabs;
input X Y;
cards;
29 21
18 23
28 24
16 26
3 27
18 29
2 33
3 37
26 39
2 42
25 47
9 54
13 57
17 58
29 60
5 63
23 66
4 69
3 72
17 73
7 73
12 72
8 69
20 66
12 63
8 60
28 58
3 57
18 54
11 47
21 42
8 39
1 37
16 29
3 27
17 22
3 19
6 17
19 14
18 10
;
run;
I tried:
proc sort data=sample ;
by x;
run;
proc sgplot data=sample;
needle x=x y=y;
run;
data Trapezoidal;
set sample end=last;
dif_x=dif(x);
mean_y=mean(lag(y),y);
integral + (dif_x*mean_y);
if last then putlog 'area under curve is ' integral;
run;
Vertical lines are plotted with refline. Determine what the 'middle' is however you wish (using proc means or proc sql or similar), get it into a variable in your dataset or a macro variable, and use refline in proc sgplot to produce the line.
Same applies for your specific-values-of-x (except you don't actually need to do anything there to produce the value to plot at). Add refline x=5; or similar to your plots to get them.
You could also use band plots if you're trying to highlight certain areas.

Use ODS Graphics to produce grouped histogram

I have this data set:
data a1q1;
input pid los age gender $ temp wbc anti service $ ;
cards;
1 5 30 F 99 82 2 M
2 10 73 F 98 52 1 M
3 6 40 F 99 122 2 S
4 11 47 F 98 42 2 S
5 5 25 F 99 112 2 S
6 14 82 M 97 61 2 S
7 30 60 M 100 81 1 M
8 11 56 F 99 72 2 M
9 17 43 F 98 72 2 M
10 3 50 M 98 122 1 S
11 9 59 F 98 72 1 M
12 3 4 M 98 32 2 S
13 8 22 F 100 111 2 S
14 8 33 F 98 141 1 S
15 5 20 F 98 112 1 S
16 5 32 M 99 92 2 S
17 7 36 M 99 61 2 S
18 4 69 M 98 62 2 S
19 3 47 M 97 51 2 M
20 7 22 M 98 62 2 S
21 9 11 M 98 102 2 S
22 11 19 M 99 141 2 S
23 11 67 F 98 42 2 M
24 9 43 F 99 52 2 S
25 4 41 F 98 52 2 M
;
I need to use PROC SGPLOT to output an identical, if not, similar barchart that would be outputted from the following PROC:
proc gchart data = a1q1;
vbar wbc / group = gender;
run;
I need PROC SGPLOT to group the two genders together and not stack them. I have tried coding this way but to no avail:
proc sgplot data = a1q1;
vbar wbc / group= gender response =wbc stat=freq nostatlabel;
run;
How would I go about coding to get the output I need?
Thank you for your time!
Sounds like you should use SGPANEL, not SGPLOT. SGPLOT can make grouped bar charts, but not automatically make histogram bins without using a format (you could do that if you want) and doesn't support group with the histogram plot. However, SGPANEL can handle that.
proc sgpanel data=a1q1;
panelby gender;
histogram wbc;
run;

Selecting groups of records using FIRST.var

My dataset is:
ID AGE
1 65
1 66
1 67
1 68
1 69
1 70
1 71
2 70
2 71
2 72
3 68
3 69
3 70
[...]
My (basic) question is: which is the most direct way for obtaining a dataset with ID records starting with 65 <= AGE <= 68? (in the above example I would like to get the first 7 rows and the last three). Thanks!
Just to have another method...
proc sql;
delete from input_dataset I where not exists
(select 1 from input_dataset D where I.id=D.id having 65 le min(age) le 68);
quit;
If you want to create a new dataset, the same basic query would work as part of a SELECT, reversing the NOT.
data input_dataset;
input ID AGE;
cards;
1 65
1 66
1 67
1 68
1 69
1 70
1 71
2 70
2 71
2 72
;
run;
proc sort data=input_dataset out=sorted;
by ID;
run;
data work.first_age65to68;
set sorted;
retain keepit 0;
by ID;
if first.ID then do;
if AGE ge 65 and AGE le 68 then keepit=1;
else keepit=0;
end;
if keepit;
drop keepit;
run;