I know that this is a very basic question, but I really am having trouble.
I wrote the following code and all I want to do is to read the data correctly, but I cannot find a way to instruct SAS to read the BMI.
There are two things I want to do.
1), Have SAS store the entire number including all the decimals.
2), when printed, I would like to approximate the number to two decimal points.
data HW02EX01;
input Patient 1-2 Weight 5-7 Height 9-10 Age 13-14 BMI 17-26 Smoking $ 29-40 Asthma $ 45-48;
cards;
14 167 70 65 23.9593878 never no
run;
Note: I left only the first observation since the display becomes really ugly and wearisome to edit by hand.
friend.
Maybe it could be useful the following code:
data HW02EX01_;
input Patient Weight Height Age BMI Smoking : $20. Asthma $10.;
format BMI 32.2;
cards;
14 167 70 65 23.9593878 never no
;
By way of comment, I would like to indicate some details:
If your input data has a fixed length, use the column reading method as you propose in your code in the input statement. If not, use a list reading entry (assuming there is a delimiter between your input data).
When SAS reads in the list form, it converts and saves all the digits of a numerical value. So you do not have to worry about reading decimals too.
To display a numerical value the way you like it, you can use the format statement to assign a representation of the value. In this case with two decimals we use the format "w.d". Where w is the total length of the number that can occur and d indicates the number of decimals to show. It should be mentioned that the formats do not change the value of the variable, only its presentation.
When using the cards statement, it is not necessary to use a run statement.
I hope it will be useful.
See the comments in the code, specifically the usage of LENGTH, FORMAT and INFORMAT statements to control the input and output appearance of data.
data HW02EX01;
*specify the length of the variables;
length patient $8. weight height age bmi 8. smoking asthma $8.;
*specify the informats of the variables;
*an informat is does the variable look like when trying to read it in;
informat patient $8. weight height age bmi best32.;
*formats control how infomraiton is displayed in output/tables;
format bmi 32.2 weight height age 12.;
input Patient $ Weight Height Age BMI Smoking Asthma ;
cards;
14 167 70 65 23.9593878 never no
;
run;
Related
I am working with a SAS dataset that includes up to 30 medications prescribed to an individual patient. The medications are coded med1, med2 ... med30. Each medication is represented by a 5-digit character variable. Using the identifier, I can then code the name of the drug, and whether that particular medication is a topical antibiotic or a systemic antibiotic.
For each patient, I want to use all 30 medication codes to create one variable indicating whether the patient got a topical antibiotic only, a systemic antibiotic only, or both a topical and an oral antibiotic. So if any of the 30 medications is a systemic antibiotic, I want the patient coded as oral_antibiotic=1.
I currently have this code:
data want;
set have;
array meds[30] med1-med30;
if meds[i] in ('06925' '06920') then do;
penicillin=1;
oral_antibiotic=1;
end;
else if meds[i] in ('03197') then do;
neosporin=1;
topical_antibiotic=1;
end;
.... (many more do loops with many more medications)
run;
The problem is that this code creates one indicator variable instead of 30, overwriting previous information.
I think that I really need 30 indicator variables, indicating whether each of the 30 drugs is an oral or topical antibiotic, before I write code that says if any of the drugs are oral antibiotics, the patient received an oral antibiotic.
I am new to macros and would really appreciate help.
data current;
input med1 med2 med3;
cards;
'06925' '06920' '03197' ;
run;
And I want this:
data want;
input med1 topical_antibiotic1 oral_antibiotic1 med2 topical_antibiotic2 oral_antibiotic2 med3 topical_antibiotic3 oral_antibiotic3;
cards;
'06925' 0 1 '06920' 0 1 '03197' 1 0
;
run;
I think that I really need 30 indicator variables, indicating whether
each of the 30 drugs is an oral or topical antibiotic, before I write
code that says if any of the drugs are oral antibiotics, the patient
received an oral antibiotic.
That's not true. Your current approach is fine as long as you're not resetting them. You don't show us the full code, so it's hard to say, but I'm going to assume that's what is happening here.
Your loop should look like:
array med(30) med1-med30;
*set to 0 at top of the loop;
topical_antibiotic=0; oral_antibiotic=0;
do i=1 to dim(med);
if med(i) in (.....) /*list of topical codes*/ then topical_antibiotic=1;
else if med(i) in (.....) /*list of oral codes*/ then oral_antibiotic=1;
end;
This assumes that an antibiotic cannot be in both Topical/Oral groups. If it can, you need to remove the ELSE from the second IF statement.
I agree that you probably only need one indicator variable for each drug group, (medication of interest). Seems like you just want to know for each subject, "Do they have it?" This example flips the arguments of the IN operator. If you had given more example data I could have done better with this example.
data current;
infile cards missover;
array med[3] $5;
input med[*];
oral_antibotic = '069' in: med; /*Assume oral all start with '069'*/;
topical_antibotic = '03197' in med;
cards;
06925 06920 03197
06925
;;;;
run;
I'm trying to create variables Cap1 through Cap6. I'm not sure how to have read them as character data. My code is:
DATA Capture;
INFILE '/folders/myfolders/sasuser.v94/Capture.txt' DLM='09'x DSD MISSOVER FIRSTOBS=2;
INPUT Sex $ AgeGroup $ Weight Cap1 - Cap6 $;
RUN;
And my issue is Cap1 through Cap5 are interpreted as numerical data. How do I solve this?
Your issue is simple: you are using a variable list, but you aren't applying the $ to the whole variable list! You need ( ) around the list and the modifier to apply it to the whole list.
See:
DATA Capture;
INFILE datalines DLM=' ' DSD;
INPUT Sex $ AgeGroup $ Weight (Cap1 - Cap6) ($);
datalines;
M 18-34 135 A B C D E F
F 35-54 115 G H I J K L
;;;;
RUN;
Indeed,
I would also expect this input statement to work as you did, but it does not. Putting a $ after Cap1 does not resolve it either, as this log shows.
26 INPUT Sex $ AgeGroup $ Weight Cap1 $ - Cap6 $;
_
22
ERROR 22-322: Expecting a name.
You can solve it
by assigning a format to your variables before reading them, for instance format Cap1 - Cap6 $2.;
To test it,
I included the data in the source file, i.e. using datalines
DATA Capture;
INFILE datalines DLM='09'x DSD missover FIRSTOBS=1;
format Sex $1. AgeGroup $9. Weight 8.2 Cap1 - Cap6 $2.;
INPUT Sex AgeGroup Weight Cap1 - Cap6;
datalines;
M 1-5 24.5 11 12 13 14 15 16
M 6-10 34.2 21 22 23 24 25 26
;
proc print;
proc contents;
RUN;
How to understand this:
SAS was originally created as a programming language for non-developers (i.c. statisticians) who rather don't care about data formats, so SAS does a lot of guess work for you (just like VBA if you don't use option explicit).
So, the first time you mention a variable name in a data step, SAS ads a variable to the Program Data Vector (PDV) with an apropriate type (numeric or charater) and length, but this is guess work.
For instance: as the first student in the test dataset CLASS included in the standard instalation of SAS is male,
data WORK.CLASS;
set sasHelp.CLASS;
select (sex);
when ('M') gender = 'male';
when ('F') gender = 'female';
otherwise gender = 'unknown';
end;
run;
results in truncating 'female' to four positions:
You can correct that by instructing sas to add the variable to the PDV beforehand.
For a character variable,
format myName $20.; and
length myName $20.; are equivalent and
informat myName $20.; is also about the same.
(The storry becomes more complex with user defined formats, though.)
For numerics, there is a huge difference:
length mySize 8.; preserves 8 bytes in the PDV for mySize
format mySize 8.; tells SAS to print or display mySize with up to 8 digits and no decimals
informat mySize $20.; tells SAS a expect 8 digits without decimals when reading mySize.
Numericals can only have certain lengths, depending on the operatin system. On windowns
8. is the default and corresponds to a double on most databases
4. corresponds to a float
3. is the minimum, which I use for booleans
Formats can be very different
format mySize 8.3; tells SAS tot print mySize with 8 characters, including 3 decimals for the fraction (which leaves room for up to 4 decimals before the decimal dot if it has a positive value. Less decimals will be printed to display larger numbers)
format mySize 8.3; tells SAS tot read mySize assuming the last 3 decimals are the fraction, so 12345678 will be interpreted as 12345.678
Then there are special formats to read and write dates, times and so on and user defined value and picture formats, but that lead me too far.
I am taking an intro to SAS course and am having serious issues reading some data in.
This is the code I have thus far (the assignment tells us to use Column input) is as follows:
DATA shirtCol;
INPUT Name $ 1-6 Color $ 8-13 Price 15-19 ShippingCost 21-24;
DATALINES;
Large Red 18.97 0.25
Medium Blue 24.68 1.10
XLarge Black 29.99 1.75
Small Orange 15.89 0.50
;
RUN;
PROC print data=shirtCol;
RUN;
I am using SAS university edition to run this code and when I run the program the Price and Shipping column only have one number after the decimal point. Is there anything I am doing wrong? How can I make it so the program no longer truncates my output?
It seems that SAS studio counts the blanks spaces as columns. Try removing the leading blanks before your input:
DATA shirtCol;
INPUT Name $ 1-6 Color $ 8-13 Price 15-19 ShippingCost 21-24;
DATALINES;
Large Red 18.97 0.25
Medium Blue 24.68 1.10
XLarge Black 29.99 1.75
Small Orange 15.89 0.50
;
RUN;
Count the number of characters in your data lines to see why it is not reading what you want. Either correct the column numbers in your INPUT statement or remove the extra blanks you seem to have added to every line. Inserting a ruler line can help.
DATA shirtCol;
INPUT Name $ 1-6 Color $ 8-13 Price 15-19 ShippingCost 21-24;
*---+---10----+---20----+---30;
DATALINES;
Large Red 18.97 0.25
Medium Blue 24.68 1.10
XLarge Black 29.99 1.75
Small Orange 15.89 0.50
;
RUN;
Using a consistent indentation style can avoid this type of issue. I never indent the DATALINES; statement. Actually I never use DATALINES; statements, but I never indent CARDS; statements and I use them a lot. :)
If that doesn't work then make sure you haven't introduced tabs to replace spaces in your data. I think that SAS/Studio might have trouble with actually storing the tabs in the code lines that it sends to SAS to execute. You can add an INFILE statement with the EXPANDTABS option to have SAS expand the tabs into spaces for you. While you are at it you might want to add the TRUNCOVER option also.
infile datalines expandtabs truncover ;
I have some complicated string parsing which would be very difficult to accomplish using regular SAS functions because of the string value inconsistency; as a result
I think I will need to use Perl Regular Expressions. Below have 4 variables (price, date, size, bundle) which I have to create using parts of the text string. I'm have trouble getting the syntax correct - I am new to regular expressions.
Here is a sample data set.
data have;
infile cards truncover;
input text $80.;
cards;
acq_newsale_0_CartChat_0_Flash_1192014.jpg
acq_old_3x_GadgetPotomac_7999_Flash_112014.swf
acq_sale_3xconoffer_8999_nacpg_2102014.sfw
acq_is_3X_ItsEasy_8999_NACPG_Flash_272014_728x90.hgp
awa_os_3xMZ1_FiOSPresents_FF_160x600_12252014.mov
awa_fs_0_TWCMLP_v2_switch_0_0_Static_462014_300x250.jpg
acq_fi_2x_incrediblemz1_7999_nac_flash_1192014_160x600.swf
acq_fio_3x_bringhome_6499_0_flash_12162013_728x90.swf
;run;
/The first variable is price it is normally located near the end or middle of the string/
data want;
set have;
price =(input(prxchange('s/(\w+)_(\d+)_(\w+)/$2/',-1,text),8.))/100;
format price dollar8.2;
run;
Using the data set above I need to have this result:
price
0
79.99
89.99
89.99
79.99
64.99
/Date is always a series of consecutive digits. Either 6, 7 or 8. Using | which means 'or' I thought I would be able to pull that way/
data want;
set have;
date=prxparse('/\d\d\d\d\d\d|\d\d\d\d\d\d\d|\d\d\d\d\d\d\d\d/',text);
run;
Using the data set above I need to have this result:
Date
1192014
112014
2102014
272014
12252014
462014
1192014
12162013
/* For size there is always an ‘x’ in the middle of the sub-string which is with followed by two or three digits on either side*/
data want;
set have;
size=prxparse('/(\w+)_(\d+)'x'(\d+)_(\w+)/',text);
run;
Size
728x90
160x600
300x250
160x600
728x90
/*This is normally located towards the beginning of the string. It’s always a single digit number followed by an x It in never followed by additional digits but can also be just 0. */
data want;
set have;
Bundle=prxparse('/(\d+)'x'',text);
run;
Bundle
0
3x
3x
3X
3x
0
2x
3x
The final product I am looking for should look like this:
Text Date price Size Bundle
acq_newsale_0_CartChat_0_Flash_1192014.jpg 1192014 0 0
acq_old_3x_GadgetPotomac_7999_Flash_112014.swf 112014 79.99 3x
acq_sale_3xconoffer_8999_nacpg_2102014.sfw 2102014 89.99 3x
acq_is_3X_ItsEasy_8999_NACPG_Flash_272014_728x90.hgp 272014 89.99 728x90 3X
awa_os_3xMZ1_FiOSPresents_FF_160x600_12252014.mov 12252014 160x600 3x
awa_fs_0_TWCMLP_v2_switch_0_0_Static_462014_300x250.jpg 462014 300x250 0
acq_fi_2x_incrediblemz1_7999_nac_flash_1192014_160x600.swf 1192014 79.99 160x600 2x
acq_fio_3x_bringhome_6499_0_flash_12162013_728x90.swf 12162013 64.99 728x90 3
x
If you're extracting, don't use PRXCHANGE. Use PRXPARSE, PRXMATCH, and PRXPOSN.
Sample usage, with date:
data have;
infile cards truncover;
input text $80.;
cards;
acq_newsale_0_CartChat_0_Flash_1192014.jpg
acq_old_3x_GadgetPotomac_7999_Flash_112014.swf
acq_sale_3xconoffer_8999_nacpg_2102014.sfw
acq_is_3X_ItsEasy_8999_NACPG_Flash_272014_728x90.hgp
awa_os_3xMZ1_FiOSPresents_FF_160x600_12252014.mov
awa_fs_0_TWCMLP_v2_switch_0_0_Static_462014_300x250.jpg
acq_fi_2x_incrediblemz1_7999_nac_flash_1192014_160x600.swf
acq_fio_3x_bringhome_6499_0_flash_12162013_728x90.swf
;
run;
data want;
set have;
rx_date = prxparse('~(\d{6,8})~io');
rc_date = prxmatch(rx_date,text);
if rc_date then datevar = prxposn(rx_date,1,text);
run;
Just enclose in parens the section you want to extract (in this case, all of it).
Date was easy - as you say, 6-8 numbers. The others may be harder. The 3x etc. bit you can probably find, depending on how strict you need to be; the price I think you'll have a very hard time finding. You need to be able to better articulate the rules. "Towards the beginning" isn't a regex rule. "The second set of digits" is; "The second to last set", perhaps might work. I'll see if I can figure out a few.
In your example data, this works. I in particular don't like the price search; that one may well fail with a more complicated set of data. You can figure out adding the decimal for yourself.
data have;
infile cards truncover;
input text $80.;
cards;
acq_newsale_0_CartChat_0_Flash_1192014.jpg
acq_old_3x_GadgetPotomac_7999_Flash_112014.swf
acq_sale_3xconoffer_8999_nacpg_2102014.sfw
acq_is_3X_ItsEasy_8999_NACPG_Flash_272014_728x90.hgp
awa_os_3xMZ1_FiOSPresents_FF_160x600_12252014.mov
awa_fs_0_TWCMLP_v2_switch_0_0_Static_462014_300x250.jpg
acq_fi_2x_incrediblemz1_7999_nac_flash_1192014_160x600.swf
acq_fio_3x_bringhome_6499_0_flash_12162013_728x90.swf
blahblah :23 blahblah
blahblahblah 23 blah blah
;
run;
data want;
set have;
rx_date = prxparse('~_(\d{6,8})[_\.]~io');
rx_price = prxparse('~_(\d+)_.*?(?=_\d+[_\.]).*?(?!_\d+[_\.])~io');
rx_bundle = prxparse('~(?!_\d+_)_(\dx)~io');
rx_size = prxparse('~_(\d+x\d+)[_\.]~io');
rx_adnum = prxparse('~\s:?(\d\d)\s~io');
rc_date = prxmatch(rx_date,text);
rc_price = prxmatch(rx_price,text);
rc_bundle = prxmatch(rx_bundle,text);
rc_size = prxmatch(rx_size,text);
rc_adnum = prxmatch(rx_adnum,text);
if rc_date then datevar = prxposn(rx_date,1,text);
if rc_price then price = prxposn(rx_price,1,text);
if rc_bundle then bundle = prxposn(rx_bundle,1,text);
if rc_size then size = prxposn(rx_size,1,text);
if rc_adnum then adnum = prxposn(rx_adnum,1,text);
run;
I'm new to SAS, and would greatly appreciate anyone who can help me formulate a code. Can someone please help me with formatting changing arrays based on the first column values?
So basically here's the original data:
Category Name1 Name2......... (Changes invariably)
#ofpeople 20 30
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100
AmountPaid 11 35
AmountDed 5 6
I would like to format the values under Name1 to infinite Name# and reformat them to dollar10.2 for any values under Category called 'AmountBilled','AmountPaid','AmountDed'.
Thank you so much for your help!
You can't conditionally format a column (like you might in excel). A variable/column has one format for the entire column. There are tricks to get around this, but they're invariably more complex than should be considered useful.
You can store the formatted value in a character variable, but it loses the ability to do math.
data have;
input category :$10. name1 name2;
datalines;
#ofpeople 20 30
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100
AmountPaid 11 35
AmountDed 5 6
;;;;
run;
data want;
set have;
array names name:; *colon is wildcard (starts with);
array newnames $10 newname1-newname10; *Arbitrarily 10, can be whatever;
if substr(category,1,6)='Amount' then do;
do _t = 1 to dim(names);
newnames[_t] = put(names[_t],dollar10.2);
end;
end;
run;
You could programmatically figure out the newname1000 endpoint using PROC CONTENTS or SQL's DICTIONARY.COLUMNS / SAS's SASHELP.VCOLUMN. Alternately, you could put out the original dataset as a three column dataset with many rows for each category (was it this way to begin with prior to a PROC TRANSPOSE?) and put the character variable there (not needing an array). To me that's the cleanest option.
data have_t;
set have;
array names name:;
format nameval $10.;
do namenum = 1 to dim(names);
if substr(category,1,6)='Amount' then nameval = put(names[namenum],dollar10.2 -l);
else nameval=put(names[namenum],10. -l); *left aligning here, change this if you want otherwise;
output; *now we have (namenum) rows per line. Test for missing(name) if you want only nonmissing rows output (if not every row has same number of names).
end;
run;
proc transpose data=have_t out=want_T(drop=_name_) prefix=name;
by category notsorted;
var nameval;
run;
Finally, depending on what you're actually doing with this, you may have superior options in terms of the output method. If you're doing PROC REPORT for example, you can use compute blocks to set the style (format) of the column conditionally in the report output.