TS-140 Record layout not working.
Below is the header section for variables greater than 40.
HEADER RECORD*******LABELV8 HEADER RECORD!!!!!!!nnnnn
where nnnnn is the number of variables for which long labels will be.
Can we get any sample xpt (sas8) file (Atleast one label is greater than 40).
In the File Open menu of SAS Universal Viewer it definitely says V5 transport files. The V5 format supports only the limits that existed in SAS version 5. So 8 character names, 40 character labels, max length of 200 for character variables.
But it looks like it does support the longer values. At least when using Version 1.42 .
Code to make example:
data v8;
attrib var1 label=
'This label is so long that it will have more than 40 characters';
input (var1 var2 ThisNameIsMoreThan8Chars) (:$1.);
cards;
1 2 3
4 5 6
;
%loc2xpt(libref=work,memlist=V8,filespec=xpt,format=auto);
Resulting file:
305 data _null_;
306 infile xpt lrecl=80 recfm=f;
307 input;
308 list;
309 run;
NOTE: The infile XPT is:
Filename=...,
File Size (bytes)=1440
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 HEADER RECORD*******LIBV8 HEADER RECORD!!!!!!!000000000000000000000000000000
2 CHAR SAS SAS SASLIB 9.1 LIN X64. 07JUN21:11:11:51
ZONE 54522222545222225454442232322222444253302222222222222222222222223345433333333333
NUMR 3130000031300000313C92009E100000C9E0864000000000000000000000000007A5E21A11A11A51
3 07JUN21:11:11:51
4 HEADER RECORD*******MEMBV8 HEADER RECORD!!!!!!!000000000000000001600000000140
5 HEADER RECORD*******DSCPTV8 HEADER RECORD!!!!!!!000000000000000000000000000000
6 CHAR SAS V8 SASDATA 9.1 LIN X64.07JUN21:11:11:51
ZONE 54522222532222222222222222222222222222225454454232322222444253303345433333333333
NUMR 3130000068000000000000000000000000000000313414109E100000C9E0864007A5E21A11A11A51
7 07JUN21:11:11:51
8 HEADER RECORD*******NAMSTV8 HEADER RECORD!!!!!!!000000000300000000000000000000
9 CHAR ........var1 This label is so long that it will have ........
ZONE 00000000767322225667266666267276266662766726727666266762222222220000000022222222
NUMR 020001016121000048930C125C09303F0CFE704814094079CC081650000000000000000000000000
10 CHAR ........var1 .?..........................var2
ZONE 00000000767322222222222222222222222222220300000000000000000000000000767322222222
NUMR 00000000612100000000000000000000000000000F01010000000000000002000102612200000000
11 CHAR ........ ........var2
ZONE 22222222222222222222222222222222222222222222000000002222222200000000767322222222
NUMR 00000000000000000000000000000000000000000000000000000000000000000001612200000000
12 CHAR ............................ThisName
ZONE 22222222222222222222000000000000000000000000000056674666222222222222222222222222
NUMR 0000000000000000000001010100000000000000020001034893E1D5000000000000000000000000
13 CHAR ........ ........ThisNameIsMoreThan8Chars
ZONE 22222222222222222222222200000000222222220000000056674666474676566634667722222222
NUMR 0000000000000000000000000000000000000000000000024893E1D593DF25481E83812300000000
14 CHAR ....................
ZONE 00000000000000000000222222222222222222222222222222222222222222222222222222222222
NUMR 01010100000000000000000000000000000000000000000000000000000000000000000000000000
15 HEADER RECORD*******LABELV8 HEADER RECORD!!!!!!!1
16 CHAR .....?var1This label is so long that it will have more than 40 characters
ZONE 00000376735667266666267276266662766726727666266762667627666233266676676772222222
NUMR 01040F612148930C125C09303F0CFE704814094079CC081650DF250481E040038121345230000000
17 HEADER RECORD*******OBSV8 HEADER RECORD!!!!!!! 2
18 123456
NOTE: 18 records were read from the infile XPT.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
Related
I was working on a problem that involved creating dummy variables, but I ran into an issue where I'm having missing values for the dummy variables in the corresponding reference category even though the dataset doesn't have missing values. Even if I'm selecting one of the categories to be the reference category or variable, shouldn't the dummy variable values be zero? I had the same issue even when I did not account for missing values. I've included my code, log, output, and the content of the text file for context and so that my question will be clearer.
The part of the homework assignment that I'm having issues with is the following:
Fibromyalgia is a syndrome of widespread body pain that is often treated by rheumatologists. One way of measuring the impact of fibromyalgia on patients is the Fibromyalgia Impact Questionnaire (FIQ). On the FIQ, high values show greater impact of disease (bad) and low values show lesser impact of disease (good). We have data on women with fibromyalgia who attended one of two types of disease self-management classes or who received standard care (the control group).
Data from this study are in the file fibr03_sum18.txt on the BS 805 web site in the Assignments section for Class 6. The variables in the data file are:
FIQ score (3.1 format) taken after the classes Group (1 = class 1, 2 = class 2, 3 = standard care) Disease Severity (On a scale of 1 to 6) before the classes Age (years) Since the data were entered into this file, information on a new patient and a correction to the data have been found. The new patient is in the control group, has FIQ = 8.2, Disease Severity =2, and Age = 25 years. The correction is that the second subject in class 1 was 17 rather than 18 years old.
A) Create a temporary SAS data set using these data. In the data set, create a set of indicator variables that code for group membership. Use PROC PRINT to list the data.
I read in the text file using column input, but I think it can be read in using list input as well? The text file contained the data below was the file was called: fibr03_sum18.txt.
3.1 1 6 21
1.8 1 6 18
3.3 1 5 22
2.9 1 4 15
4.3 1 3 24
4.8 1 3 22
4.9 1 2 17
6.4 1 2 18
5.7 2 5 17
6.1 2 5 25
8.5 2 3 31
7.1 2 2 17
7.7 2 1 25
9.8 2 1 22
5.1 3 4 23
7.2 3 1 15
8.3 3 1 22
6.7 3 2 20
My code for reading in the data and creating the temporary dataset with the dummy variables was:
*Part A: Reading in Data and Creating a Temporary Dataset;
libname HW6 'C:\Users\jackz\Desktop\SAS';
filename HW6new 'C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt';
proc format;
value grpf 1='class 1' 2='class 2' 3='standard care';
run;
data one;
infile HW6new;
input #1 FIQ 3.1 #5 grp 1. #7 disev 1. #9 age 2.;
*Creating Dummy Variables;
if grp=1 then classc1=1; else if grp=2 then classc1=0;
if grp=2 then classc2=1; else if grp=1 then classc2=0;
if grp=. then classc1=.;
if grp=. then classc2=.;
label FIQ='FIQ Score'
grp='Group'
disev='Disease Severity'
age='Age';
format grp grpf.;
run;
*Printout of Dataset one;
proc print data=one label;
run;
My log for this code was:
NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M5)
Licensed to BOSTON UNIVERSITY - SFA T&R, Site 70009029.
NOTE: This session is executing on the W32_10HOME platform.
NOTE: Updated analytical products:
SAS/STAT 14.3
SAS/ETS 14.3
SAS/OR 14.3
SAS/IML 14.3
SAS/QC 14.3
NOTE: Additional host information:
W32_10HOME WIN 10.0.16299 Workstation
NOTE: SAS initialization used:
real time 0.96 seconds
cpu time 0.95 seconds
1 *Part A: Reading in Data and Creating a Temporary Dataset;
2 libname HW6 'C:\Users\jackz\Desktop\SAS';
NOTE: Libref HW6 was successfully assigned as follows:
Engine: V9
Physical Name: C:\Users\jackz\Desktop\SAS
3 filename HW6new 'C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt';
4 proc format;
5 value grpf 1='class 1' 2='class 2' 3='standard care';
NOTE: Format GRPF has been output.
6 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
7 data one;
8 infile HW6new;
9 input #1 FIQ 3.1 #5 grp 1. #7 disev 1. #9 age 2.;
10 *Creating Dummy Variables;
11 if grp=1 then classc1=1; else if grp=2 then classc1=0;
12 if grp=2 then classc2=1; else if grp=1 then classc2=0;
13 if grp=. then classc1=.;
14 if grp=. then classc2=.;
15 label FIQ='FIQ Score'
16 grp='Group'
17 disev='Disease Severity'
18 age='Age';
19 format grp grpf.;
20 run;
NOTE: The infile HW6NEW is:
Filename=C:\Users\jackz\Desktop\SAS\fibr03_sum18.txt,
RECFM=V,LRECL=32767,File Size (bytes)=214,
Last Modified=15Jun2018:12:56:26,
Create Time=15Jun2018:12:56:26
NOTE: 18 records were read from the infile HW6NEW.
The minimum record length was 10.
The maximum record length was 10.
NOTE: The data set WORK.ONE has 18 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
21 *Printout of Dataset one;
22 proc print data=one label;
NOTE: Writing HTML Body file: sashtml.htm
23 run;
NOTE: There were 18 observations read from the data set WORK.ONE.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.27 seconds
cpu time 0.06 seconds
Here is the output, although it is not lined up:
The SAS System
Obs FIQ Score Group Disease
Severity Age classc1 classc2
1 3.1 class 1 6 21 1 0
2 1.8 class 1 6 18 1 0
3 3.3 class 1 5 22 1 0
4 2.9 class 1 4 15 1 0
5 4.3 class 1 3 24 1 0
6 4.8 class 1 3 22 1 0
7 4.9 class 1 2 17 1 0
8 6.4 class 1 2 18 1 0
9 5.7 class 2 5 17 0 1
10 6.1 class 2 5 25 0 1
11 8.5 class 2 3 31 0 1
12 7.1 class 2 2 17 0 1
13 7.7 class 2 1 25 0 1
14 9.8 class 2 1 22 0 1
15 5.1 standard care 4 23 . .
16 7.2 standard care 1 15 . .
17 8.3 standard care 1 22 . .
18 6.7 standard care 2 20 . .
You can see that there are missing values for the dummy variables classc1 and classc2 even though there are no missing values in the original dataset. Should those values read 0, since group 3 does not fall in either grp=1 or grp=2?
Can anyone give me any hints as to what I have done wrong, if I have done anything wrong? Thanks for all of your help!
The output shows that the rows where the flag variables are missing values have group = 3 (standard care). The missing values are not missing due to the if statements, but due to the implicit resetting of data step variables to missing at the start of the implicit loop.
When group=3, there is no if statement that causes the flags variables to change from their initial 'reset to missing'
* when grp=3 neither classic1 nor classic2 variable is changed from its initial missing value;
put 'NOTE: ' _n_= (classic:) (=);
if grp=1 then classc1=1; else if grp=2 then classc1=0;
if grp=2 then classc2=1; else if grp=1 then classc2=0;
if grp=. then classc1=.;
if grp=. then classc2=.;
put 'NOTE: ' _n_= (classic:) (=);
Hello,
I want to write a dynamic program which helps me to flag the start and end dates of events that are nested within the consolidated dates that are present at the top of each Pt.ID in the attached example. I can easily do these if there is only one such consolidated period per Pt.ID. However, there could be more than one such consolidated periods per Pt. ID. (As shown for second Pt.ID, 1002). As shown in the example, the events that fall within the consolidated period/s are fagged as "Y" in the flag variable and if they don't fall within the consolidated period then they are flagged as "N" in this variable. How can I write a program that accounts for all of such consolidated periods per Pt.ID and then compare them with the dates for the rest of the events of a particular patient and flag events which fall within any of those consolidated periods?
Thank you.
So join the event records with the period records and calculate whether the event is within the period. Then you could take the MAX over all periods.
For example here is code for your sample that creates a binary 1/0 flag variable called INCLUDED.
data Sample;
infile datalines missover;
input Pt_ID Event_ID Category $ Start_Date : mmddyy10.
Start_Day End_date : mmddyy10. End_day Duration
;
format Start_date End_date mmddyy10.;
datalines;
1001 . Moderate 8/5/2016 256 9/3/2016 285 30
1001 1 Moderate 3/8/2016 106 3/16/2016 114 9
1001 2 Moderate 8/5/2016 256 8/14/2016 265 10
1001 3 Moderate 8/21/2016 272 8/24/2016 275 4
1001 4 Moderate 8/23/2016 274 9/3/2016 285 12
1002 . Severe 11/28/2016 13 12/19/2016 34 22
1002 . Severe 2/6/2017 83 2/28/2017 105 23
1002 1 Severe 11/28/2016 13 12/5/2016 20 8
1002 2 Severe 12/12/2016 27 12/19/2016 34 8
1002 3 Severe 1/9/2017 55 1/12/2017 58 4
1002 4 Severe 2/6/2017 83 2/13/2017 90 8
1002 5 Severe 2/20/2017 97 2/28/2017 105 9
1002 6 Severe 3/17/2017 122 3/24/2017 129 8
1002 7 Severe 5/4/2017 170 5/13/2017 179 10
1002 8 Severe 5/24/2017 190 5/30/2017 196 7
1002 9 Severe 6/9/2017 206 6/13/2017 210 5
;
proc sql ;
create table want as
select a.*
, max(b.start_date <= a.start_date and b.end_date >= a.end_date ) as Included
from sample a
left join sample b
on a.pt_id = b.pt_id and missing(b.event_id)
group by 1,2,3,4,5,6,7,8
order by a.pt_id, a.event_id, a.start_date , a.end_date
;
quit;
I am new to SAS, so this might be a silly type of question.
Assume there are several datasets with similar structure but different column names. I want to get new datasets with the same number of rows but only a subset of columns.
In the following example, Data_A and Data_B are original datasets and SubA and SubBare what I want. What is the efficient way of deriving SubA and SubB?
DATA A_auto;
LENGTH A_make $ 20;
INPUT A_make $ 1-17 A_price A_mpg A_rep78 A_hdroom A_trunk A_weight A_length A_turn A_displ A_gratio A_foreign;
CARDS;
AMC Concord 4099 22 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 17 3 3.0 11 3350 173 40 258 2.53 0
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
;
RUN;
DATA B_auto;
LENGTH make $ 20;
INPUT B_make $ 1-17 B_price B_mpg B_rep78 B_hdroom B_trunk B_weight B_length B_turn B_displ B_gratio B_foreign;
CARDS;
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;
DATA SubA;
set A_auto;
keep A_make A_price;
RUN;
DATA SubB;
set B_auto;
keep B_make B_price;
RUN;
Here's my new answer. This introduces quite a few concepts, but all are necessary to complete this task.
First of all I would store the required part variable names (the suffixes that are common to all datasets) in a new dataset. This keeps them all in one place and makes it easier to change if required.
The next step is to create a regular expression (regex) search string that combines all the names, separated by a pipe (|), which is the regex symbol for or. I've also added a $ symbol to end of the names, this ensures only variables ending with the part names will be selected.
select into :[macroname] is the method to create macro variables within proc sql
Then I set up a macro to extract the specific variable names for the current dataset and use those names to create a view (like my original answer)
The dictionary library referenced in the proc sql is a metadata library that contains information on all active libraries, tables, columns etc, so is a good source of identifying what the actual variable names are called (based on the regex search string created earlier).
You won't need the proc print in your code, I just put it in to show everything is working as expected.
Let me know if this works for you
/* create intial datasets */
DATA A_auto;
LENGTH A_make $ 20;
INPUT A_make $ 1-17 A_price A_mpg A_rep78 A_hdroom A_trunk A_weight A_length A_turn A_displ A_gratio A_foreign;
CARDS;
AMC Concord 4099 22 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 17 3 3.0 11 3350 173 40 258 2.53 0
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
;
RUN;
DATA B_auto;
LENGTH B_make $ 20;
INPUT B_make $ 1-17 B_price B_mpg B_rep78 B_hdroom B_trunk B_weight B_length B_turn B_displ B_gratio B_foreign;
CARDS;
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;
/* create dataset containing partial name of variables to keep */
data keepvars;
input part_name $ :20.;
datalines;
_make
_price
;
run;
/* create regular expression search string from partial names */
proc sql noprint;
select
cats(part_name,'$') /* '$' matches end of string */
into
:name_str separated by '|' /* '|' is an 'or' search operator in regular expressions */
from
keepvars;
quit;
%put &name_str.; /* print search string to log */
/* macro to create views from datasets */
%macro create_views (dsname, vwname); /* inputs are dataset name being read in and view name being created */
/* extract specific variable names to be kept, based on search string */
proc sql noprint;
select
name
into
:vars separated by ' '
from
dictionary.columns
where
libname = 'WORK'
and memname = upper("&dsname.")
and prxmatch("/&name_str./",strip(name))>0; /* prxmatch is regular expression search function */
quit;
%put &vars.; /* print variables to keep to log */
/* create views */
data &vwname. / view=&vwname.;
set &dsname. (keep=&vars.);
run;
/* test view by printing */
proc print data=&vwname.;;
run;
%mend create_views;
/* run macro for each dataset */
%create_views(A_auto, SubA);
%create_views(B_auto, SubB);
I have a SAS dataset chapter5.pressure and i verified that it is fine by printing it proc print:
Obs SBPbefore SBPafter
1 120 128
2 124 131
3 130 131
4 118 127
5 140 132
6 128 125
7 140 141
8 135 137
9 126 118
10 130 132
11 126 129
12 127 135
So, I want to export it to the .dat file, and the following method does not work:
libname chapter5 'c:\users\owner\desktop\sas\chapter5';
data _null_;
set chapter5.pressure;
file 'c:\users\owner\desktop\sas\chapter5\xxx.dat';
put a b ;
run;
The resulting file has all missing values. Why
Try using the variable names instead of "a" and "b".
data _null_;
set chapter5.pressure;
file 'c:\users\owner\desktop\sas\chapter5\xxx.dat';
put SBPbefore SBPafter;
run;
Instead of using a put statement, you can also use PROC EXPORT to create a delimited file from a SAS dataset:
PROC EXPORT
DATA=chapter5.pressure
OUTFILE='c:\users\owner\desktop\sas\chapter5\xxx.dat'
DBMS=DLM
REPLACE;
RUN;
The default delimiter is a blank, which should match what you are trying to do. To create a tab or comma-delimited file, change the DBMS option value to TAB or CSV respectively. This will create a header row in the external file. Here is a link to the SAS 9.2 documentation. Check the SAS support site if you are using a different version.
I have the following matrix of data, which I am reading into SAS:
1 5 12 19 13
6 3 1 3 14
2 7 12 19 21
22 24 21 29 18
17 15 22 9 18
It represents 5 different species of animal (the rows) in 5 different areas of an environment (the columns). I want to get a Shannon diversity index for the whole environment, so I sum the rows to get:
48 54 68 79 84
Then calculate the Shannon index from this, to get:
1.5873488
What I need to do, however, is calculate a confidence interval for this Shannon index. So I want to perform a nonparametric bootstrap on the initial matrix.
Can anyone advise how this is possible in SAS?
There are several ways to do this in SAS. I would use proc surveyselect to generate the bootstrap samples, and then calculate the Shannon Index for each replicate. (I didn't know what the Shannon Index was, so my code is just based on what I read on Wikipedia.)
data animals;
input v1-v5;
cards;
1 5 12 19 13
6 3 1 3 14
2 7 12 19 21
22 24 21 29 18
17 15 22 9 18
run;
/* Generate 5000 bootstrap samples, with replacement */
proc surveyselect data=animals method=urs n=5 reps=5000 seed=10024 out=boots;
run;
/* For each replicate, calculate the sum of each variable */
proc means data=boots noprint nway;
class replicate;
var v:;
output out=sums sum=;
run;
/* Calculate the proportions, and p*log(p), which will be used next */
data sums;
set sums;
ttl=sum(of v1-v5);
array ps{*} p1-p5;
array vs{*} v1-v5;
array hs{*} h1-h5;
do i=1 to dim(vs);
ps{i}=vs{i}/ttl;
hs{i}=ps{i}*log(ps{i});
end;
keep replicate h:;
run;
/* Calculate the Shannon Index, again for each replicate */
data shannon;
set sums;
shannon = -sum(of h:);
keep replicate shannon;
run;
We now have a data set, shannon, which contains the Shannon Index calculated for each of 5000 bootstrap samples. You could use this to calculate p-values, but if you just want critical values, you can run proc means (or univariate if you want a 5% value, as I don't think it's possible to get 97.5 quantiles with proc means).
proc means data=shannon mean p1 p5 p95 p99;
var shannon;
run;