compute variable after datalines - sas

I have the following dataset (fictional data).
DATA test;
INPUT name $ age height weight;
DATALINES;
Peter 20 1.70 80
Hans 30 1.72 75
Tina 25 1.67 65
Luisa 10 1.20 50
;
RUN;
How can I compute a new variable "bmi" (weight / height^2) directly after the end of the DATALINE-command? Unfortunately in my SAS-book all the examples are with DATA ... INFILE= instead of using DATALINES.
PROC PRINT
DATA = test;
TITLE 'Fictional Data';
RUN;

Datalines appears at the end of the data step. Your computation statements should be placed before datalines, after the input
INPUT name $ age height weight;
bmi = weight / height**2;
DATALINES;
…

Related

Complete columns based on values that precede data as table format in SAS

How should the code be completed to make this work?
Code:
data ms;
infile 'C';
input cr ls ms color $;
if input #; *statemet that reads the line with one word and complete the color column*
run;
Input:
Blars
10 83287 10.00
20 1748956 30.00
30 2222222 73.00
40 833709 90.00
Klirs
10 922222 90.50
20 1222222 10.00
30 1111111 93.33
40 8998877 300.90
Expected output:
cr
ls
ms
color
10
83287
10.00
Blars
20
1748956
30.00
Blars
30
2222222
50.00
Blars
40
833709
73.00
Blars
10
922222
90.50
Klirs
20
1222222
10.00
Klirs
30
1111111
93.33
Klirs
40
8998877
300.90
Klirs
Attempted to read it
Just RETAIN the extra variable. You need some way to detect which type of line you currently are reading. When it has the COLOR just update the COLOR variable and do not write out an observation. When it has the actual data then read all of the fields and write an observation.
data ms;
infile 'C' truncover ;
length color $10 cr ls ms 8;
retain color;
input cr ?? # ;
if missing(cr) then do;
color = _infile_;
delete;
end;
input ls ms ;
run;
Make sure to define the COLOR column long enough to store the longest value. This assumes there are no blank lines, as you mentioned in your comment on the original question.
Slightly different method than other solution.
Use INPUT ## to read the full line and hold it in the automatic variable _infile_.
Check _infile_ variable to see if it contains any numeric values, if so, process as data.
Otherwise, process as a colour.
data have;
infile cards truncover;
*set length and retain color across rows;
length color $10 cr ls ms 8;
retain color;
*read in string;
input ##;
*check for any digits in string, if any are found, process as data;
if anydigit(_infile_) then do;
input cr ls ms;
output;
end;
*otherwise read in as color;
else input color $;
cards;
Blars
10 83287 10.00
20 1748956 30.00
30 2222222 73.00
40 833709 90.00
Klirs
10 922222 90.50
20 1222222 10.00
30 1111111 93.33
40 8998877 300.90
;;;;
run;
Richard, your code could even be more succinct.
* attempt to read first 2 chars as number;
* ?? suppresses errors;
input num ?? 2. #;
if missing(num) then
input #1 color $;
else do;
input #1 cr ls ms;
output;
end;
You can scan a held generic input line and then choose which input statement you want based on the scan.
data want;
length color $20 cr ls ms 8;
retain color;
infile 'c' missover;
input #;
if missing(input(scan(_infile_,1),??best12.)) then
input #1 color ;
else
input #1 cr ls ms ;
if not missing(cr);
run;

How to combine dated data rows in SAS?

I have longitudinal data, but I wish to combine rows if the value of one variable is the same, and update the time variable so that the start and finish time reflects the combined time period. At the end of this only the combined rows and unique rows are kept.
Here is an example
Data have:
Person
Start
Finish
Weight
A
1/1/1988
31/12/1988
78
A
1/1/1989
31/12/1989
78
A
1/1/1990
31/12/1990
78
A
1/1/1991
31/12/1991
81
A
1/1/1992
31/12/1992
82
A
1/1/1993
31/12/1993
82
B
1/1/1968
31/12/1968
56
B
1/1/1969
31/12/1969
55
B
1/1/1970
31/12/1970
55
Data want:
Person
Start
Finish
Weight
A
1/1/1988
31/12/1990
78
A
1/1/1991
31/12/1991
81
A
1/1/1992
31/12/1993
82
B
1/1/1968
31/12/1968
56
B
1/1/1969
31/12/1970
55
What would be the best way of doing this? Thank you all for your time!
The code below will produce your required results from the sample data. It relies on the data being correctly sorted by start and finish.
data want (keep=person start finish weight);
set have (rename= (start=original_start finish=original_finish)); * rename so that the names do not clash with the final variable names;
by person weight notsorted; * assumes that the data are sorted by START and FINISH dates;
retain start; * remembers this variable WHILE we read multiple rows;
format start finish ddmmyy10.;
if first.weight then start=original_start; * record the first START date of each combination of PERSON and WEIGHT;
if last.weight then do;
finish=original_finish; * record the last FINISH date;
output; * only output when we have read the last row for this combination of PERSON and WEIGHT;
end;
run;
If your real data has complications like overlapping time periods then you could use proc summary to get the result:
* use PROC SUMMARY to calculate minimum and maximum for START and FINISH. Only keep the minimum for START and the maximum for FINISH;
proc summary data=have nway;
class person weight;
var start finish;
output out=want2 (drop=_type_ _freq_ min_finish max_start) min=start min_finish max=max_start finish;
run;
or if you need to keep the rows in the same order you can replace class person weight; with by person weight notsorted; but this will cause issues if the rows containing the same person and weight values are not all together in the dataset.
proc summary data=have nway;
by person weight notsorted;
var start finish;
output out=want2 (drop=_type_ _freq_ min_finish max_start) min=start min_finish max=max_start finish;
run;
Suppose the data values for a person contain a time ordered pattern such as
X X X Y X X
and you want 3 rows, 1 for each contiguous period of same valued data values.
You can use DOW processing to compute the start and finish of each contiguous group.
Example:
data have;
input Person $ Start ddmmyy10. Finish ddmmyy10. Weight;
format start finish ddmmyy10.;
datalines;
A 1/1/1988 31/12/1988 78
A 1/1/1989 31/12/1989 78
A 1/1/1990 31/12/1990 78
A 1/1/1991 31/12/1991 81
A 1/1/1992 31/12/1992 82
A 1/1/1993 31/12/1993 82
B 1/1/1968 31/12/1968 56
B 1/1/1969 31/12/1969 55
B 1/1/1970 31/12/1970 55
;
proc sort data=have;
by person start;
data want(keep=person start finish weight);
do until (last.weight);
set have;
by person weight notsorted;
if first.weight then gstart=start;
end;
start = gstart;
run;

Proc Transpose With multiple ID values per Group

In this first data-set each employee has one team lead and one supervisor. I can transpose that no problem.
data a;
input employee_id ReportsTo $ ReportsToType $12.;
cards;
100 Jane Supervisor
100 Mark Team_lead
101 Max Supervisor
101 Marie Team_lead
102 Sarah Supervisor
102 Sam Team_lead
;
run;
proc transpose data = a
out = aTP(drop = _:);
by employee_id;
id ReportsToType;
var ReportsTo;
run;
/* Output */
/*employee_id Supervisor Team_lead */
/*100 Jane Mark */
/*101 Max Marie */
/*102 Sarah Sam */
Now, what if an employee can have anywhere from 1 to 3 team leads?
data b;
input employee_id ReportsTo $ ReportsToType $12.;
cards;
100 Jane Supervisor
100 Mark Team_lead
100 Jamie Team_lead
101 Max Supervisor
101 Marie Team_lead
101 Satyendra Team_lead
101 Usha Team_lead
102 Sarah Supervisor
102 Sam Team_lead
;
run;
/* Desired Output */
/*employee_id Supervisor Team_lead1 Team_lead2 Team_lead3 */
/*100 Jane Mark Jamie */
/*101 Max Marie Satyendra Usha */
/*102 Sarah Sam */
Using proc transpose gives an error telling me I can't have more than one identical ID variable in each group. Is there a procedure for transposing which does allow this?
ERROR: The ID value "Team_lead" occurs twice in the same BY group
You need to change your input data so that rather than the word Team_lead repeating, it shows it incrementing... i.e. Team_lead1, Team_lead2, etc...
You can use by-group processing and the retain statement to achieve this:
proc sort data=b;
by employee_id reportstotype;
run;
data want;
set b;
by employee_id reportstotype;
retain cnt .;
if first.reportstotype then do;
cnt = 1;
end;
if upcase(reportsToType) eq 'TEAM_LEAD' then do;
reportsToType = cats(reportsToType,cnt);
end;
cnt = cnt + 1;
run;
Then simply call proc transpose like you did beforehand:
proc transpose data=want out=trans;
by employee_id;
id reportsToType;
var reportsTo;
run;

matching two datasets with one month lag

I am trying to match max daily data within a month to a monthly data.
data daily;
input permno $ date ret;
datalines;
1000 19860101 88
1000 19860102 90
1000 19860201 70
1000 19860202 55
1001 19860201 97
1001 19860202 74
1001 19860203 79
1002 19860301 55
1002 19860302 100
1002 19860301 10
;
run;
data monthly;
input permno $ date ret;
datalines;
1000 19860131 1
1000 19860228 2
1000 19860331 5
1001 19860331 3
1002 19860430 4
;
run;
The result I want is the following; (I want to match daily max data to one month lag monthly data. )
1000 19860102 90 1000 19860228 2
1000 19860201 70 1000 19860331 5
1001 19860201 97 1001 19860331 3
1002 19860302 100 1002 19860430 4
Below is what I have tried so far.
I want to have maximum ret value within a month so I have created yrmon to assign same yyyymm data for the same month daily data
data a1; set daily;
yrmon=year(date)*100 + month(date);
run;
In order to choose the maximum value(here, ret) within same yrmon group for the same permno, I used code below
proc means data=a1 noprint;
class permno yrmon ;
var ret;
output out= a2 max=maxret;
run;
However, it only got me permno yrmon ret data, leaving the original date data away.
data a3;
set a2;
new=intnx('month',yrmon,1);
format date new yymmn6.;
run;
But it won't work since yrmon is no longer date format.
Thank you in advance.
Hello
I am trying to match two different sets by permno(same company) but with one month lag (eg. daily9 dataset yrmon=198601 and monthly2 dataset yrmon=198602)
it is pretty difficult to handle for me because if I just add +1 in yrmon, 198612 +1 will not be 198701 and I am confused with handling these issues.
Can anyone help?
1) informat date1/date2 yymmn6. is used to read the date in yyyymm format
2) format date1/date2 yymmn6. is used to view the date in yyyymm format
3) intnx("months",b.date2,-1) is used to join the dates with lag of 1 month
data data1;
input date1 value1;
informat date1 yymmn6.;
format date1 yymmn6.;
cards;
200101 200
200212 300
200211 400
;
run;
data data2;
input date2 value2;
informat date2 yymmn6.;
format date2 yymmn6.;
cards;
200101 3000000
200102 4000000
200301 2000000
200212 2000000
;
run;
proc sql;
create table result as
select a.*,b.date2,b.value2 from
data1 a
left join
data2 b
on a.date1 = intnx("months",b.date2,-1);
quit;
My Output:
date1 |value1 |date2 |value2
200101 |200 |200102 |4000000
200211 |400 |200212 |2000000
200212 |300 |200301 |2000000
Let me know in case of any queries.

SAS PROC REPORT how to display analysis variables as rows?

I don't know where to start with this. I've tried listing the columns in every possible order but they are always listed horizontally. The dataset is:
data job2;
input year apply_count interviewed_count hired_count interviewed_mean hired_mean;
datalines;
2012 349 52 12 0.149 0.23077
2013 338 69 20 0.20414 0.28986
2014 354 70 18 0.19774 0.25714
;
run;
Here's an example of the proc report code for just one analysis variable:
proc report data = job2;
columns apply_count year;
define year / across " ";
define apply_count / analysis "Applied" format = comma8.;
run;
Ideally the final report would look like this:
2012 2013 2014
Applied 349 338 354
Interv. 52 69 70
Hired 12 20 18
Inter % 15% 20% 20%
Hired % 23% 29% 26%
I don't know if this is the best way to do this.
data job2;
input year apply_count interviewed_count hired_count interviewed_mean hired_mean;
datalines;
2012 349 52 12 0.149 0.23077
2013 338 69 20 0.20414 0.28986
2014 354 70 18 0.19774 0.25714
;;;;
run;
proc transpose data=job2 out=job3;
by year;
run;
data job3;
set job3;
length y atype $8;
y = propcase(scan(_name_,1,'_'));
atype = scan(_name_,-1,'_');
if atype eq 'mean' then substr(y,8,1)='%';
run;
proc print;
run;
proc report data=job3 list;
columns atype y year, col1 dummy;
define atype / group noprint;
define y / group order=data ' ';
define year / across ' ';
define dummy / noprint;
define col1 / format=12. ' ';
compute before atype;
xatype = atype;
endcomp;
compute after atype;
line ' ';
endcomp;
compute col1;
if xatype eq 'mean' then do;
call define('_C3_','format','percent12.');
call define('_C4_','format','percent12.');
call define('_C5_','format','percent12.');
end;
endcomp;
run;