I have a blood.txt dataset like this (first 5 obs):
1 Female AB Young 7710 7.4 258
2 Male AB Old 6560 4.7 .
3 Male A Young 5690 7.53 184
4 Male B Old 6680 6.85 .
5 Male A Young . 7.72 187
I used the following program to read it:
data blood_sum;
infile "/path/blood.txt";
input #1 SubjID $
#6 Gender $
#13 BloodType $
#16 AgeGrp $
#22 RBC
#29 WBC
#34 Cholesterol ;
run;
But the last column "Cholesterol" can't display; all values are replaced by "." My log has numerous NOTE errors like this:
NOTE: Invalid data for Cholesterol in line 1 34-37.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 CHAR 1 Female AB Young 7710 7.4 258. 37
ZONE 3222246666624425676623333222323223330
NUMR 1000065D1C501209F5E7077100007E400258D
SubjID=1 Gender=Female BloodType=AB AgeGrp=Young RBC=7710 WBC=7.4 Cholesterol=. _ERROR_=1
Can anyone help?
I'm going to guess that you are running this on a UNIX system but the file you are reading (blood.txt) was created on a Windows system and copied to your system in binary mode.
If you look at the log, you should notice there is a "dot" after the last value in your input line (in column 37). The ZONE and NUMR parts of the display reveal the hex code for that position, in this case '0D', which is a carriage return character. If you open the file with a UNIX editor (like vi), you will see those characters represented as ^M at the end of each line.
You can either download a fresh copy from where ever you received it (making sure to transfer the file in TEXT mode) or you can convert your copy to a UNIX text file. To convert, you can use the dos2unix command like this:
dos2unix /path/blood.txt /path/blood.txt
Note that if you use the same name it will overwrite the original file. Of course, I assume you have permission to do that.
In case you cannot convert the file for some reason, you can use a pipe to do the conversion. In other words, use this FILENAME statement and change your INFILE statement to read from the filename:
filename mydata pipe "tr -d '\r' < /path/blood.txt";
data blood_sum;
infile mydata truncover;
input #1 SubjID $
#6 Gender $
#13 BloodType $
#16 AgeGrp $
#22 RBC
#29 WBC
#34 Cholesterol ;
run;
I added the truncover option although you may not need it. Read more about it in the docs if interested.
By the way, this is a very common error and happens to everyone at least once. Welcome to StackOverflow.
I'll give a slightly different solution for the problem, which I agree with Bob is caused by the carriage return at the end of the line.
You can control the terminating character for a line (normally, for Windows, CR/LF or '0d'x '0a'x ; for Unix, '0a'x or LF only) with the TERMSTR option on the infile.
http://support.sas.com/kb/14/178.html
data blood_sum;
infile "/path/blood.txt" termstr=CRLF;
input #1 SubjID $
#6 Gender $
#13 BloodType $
#16 AgeGrp $
#22 RBC
#29 WBC
#34 Cholesterol ;
run;
By the way, I find your input method a bit confusing. You're sort of mixing input types here, so you might not always get consistent results. In fact, this probably would've never happened if you had explicitly assigned the formats!
input
#1 subjid $4.
#6 gender $6.
#13 bloodtype $2.
#16 agegrp $5.
#22 rbc best8.
#29 wbc best4.
#34 Cholesterol 3.
;
Then Choleserol would be read from 34-36 and you would've never had SAS trying to include 37 in the variable.
Related
I have the following code where users will be presented with the following window and they are to enter a text
Code:
%let study_code=;
%macro startme ;
%global study_code;
%window first
#3 #45 'Electronic Filing System' color=blue ////
#20 'Study code:' color=black +2 study_code 30 color=green required=yes attr=underline //
#10 '**************** Hit ENTER to begin ******************' color=green
;
%display first ;
%let study_code_new=%sysfunc(strip(%nrbquote(&study_code)));
%put &study_code_new.;
%mend;
%startme;
Window presented when run:
I type 123, hit Enter and it outputs 123 in the logs as expected:
However, if a user enters 123" by accident in the field, I am presented with the single quote error:
ERROR: Literal contains unmatched quote.
ERROR: The macro STARTME will stop executing.
How do I prevent SAS from reading " as code and treat it as literal string? I want to capture it in study_code_new macro variable so that I can tell the user that they have mistyped it.
It is not the %WINDOW command or the %DISPLAY command that is the issue. It is the code that you write that uses the macro variable's value. You need to add macro quoting.
So first immediately add macro quoting to the macro variable populated by the %DISPLAY statement call.
%window first
#3 #45 'Electronic Filing System' color=blue
#7 #20 'Study code:' color=black +2 study_code 30 color=green
required=yes attr=underline
#9 #10 '**************** Hit ENTER to begin ******************' color=green
;
%display first ;
%let study_code=%superq(study_code);
Then make sure to keep the macro quoting on any macro variable you derive from it (at least until you are sure it no longer needs the macro quoting).
46 %window first
47 #3 #45 'Electronic Filing System' color=blue
48 #7 #20 'Study code:' color=black +2 study_code 30 color=green required=yes attr=underline
49 #9 #10 '**************** Hit ENTER to begin ******************' color=green
50 ;
51 %display first ;
52 %let study_code=%superq(study_code);
53 %let study_code_new=%qsysfunc(strip(&study_code));
54 %put &=study_code &=study_code_new;
STUDY_CODE= 123" STUDY_CODE_NEW=123"
I have two datasets in two different SAS tables that also have completely different data structures. I am being asked (not my idea) to export these datasets to one .dat file and essentially stack them on top of each other using a fixed width method. The below listed snippet of data is how the export should ultimately look when it gets to the .dat file. The first row is the result of the first dataset. The second row is result of the second dataset.
UH INCR000000XXXXXXXXXXXXXXXX
XXX SFLXXXXXXXXXXXX 000 M SMITH XXXXXX XXXXXXXXXXXXX9991231
I cant figure out exactly how to do this. Below is the code I've come up with that exports the data but the second data step just overwrites the first.
Here's an example using the MOD option on the FILE statement.
Note this may not work on all OS's.
filename test1 '/home/reeza/Demo1/testfile.dat';
data exportClass;
set sashelp.class;
file test1;
if _n_=1 then do;
put #1 "Name" #20 "Age" #30 "Sex";
end;
put #1 Name #20 Age #30 Sex;
run;
data exportClass;
set sashelp.class;
file test1 mod;
if _n_=1 then do;
put #1 "Name" #20 "Weight" #30 "Height";
end;
put #1 Name #20 Weight #30 Height;
run;
filename test1;
I have a text file downloaded from the BLS website that has a lot of spaces in between columns.
Code:
data unemployment;
infile 'P:\Projects\la.data.2.AllStatesU.txt' dsd firstobs=2;
input #1 series_id : $20.
#32 year
#36 period : $3.
#51 value
#57 footnote_codes : $1.;
run;
But I get a mess of errors
NOTE: Invalid data for year in line 2 32-53.
NOTE: Invalid data for value in line 2 51-53.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9--
3 CHAR LAUST010000000000003 .1976.M02. 7.7. 53
ZONE 44555333333333333333222222222203333043302222222223230
NUMR C15340100000000000030000000000919769D0290000000007E79
The period column has the first two characters right, but the year and everything else is wrong. How do I fix this?
Snapshot of text file:
code output:
The file from that website
filename bls url
"https://download.bls.gov/pub/time.series/la/la.data.2.AllStatesU"
;
has tab characters in it. That is shown in the example you posted of line 3 from the SAS LOG.
You can either tell the INFILE statement to expand the tabs into spaces and read it as fixed column format.
data unemployment;
infile bls expandtabs firstobs=2 truncover;
input
series_id $ 1-20
year 33-36
period $ 41-43
value ?? 50-60
footnote_codes $ 65
;
run;
Or tell it that the tab character is the delimiter.
data unemployment;
infile bls dlm='09'x dsd firstobs=2 truncover;
input
series_id :$20.
year
period :$3.
value ??
footnote_codes :$1.
;
run;
Note: The ?? modifier for VALUE is because the file has a hyphen to represent missing values in that field. The ?? input modifier will tell the data step to not flag those as data errors.
That's the dataset. I need a variable for ShipID, Received, Shipped, City, Zip Code. How would I go about doing that?
This is my first statistical programming language course and I am struggling. My professor hasn't been much of a help either.
ShipID Received Shipped Address .
X8742 2018/03/14 2018/03/17 Little River, KS, 67457
There's a ton more lines and I've been lost on it for an hour.
infile "/home/rossfosher0/SAS Homework/SAS Sessions/WarehouseA.txt" firstobs = 2;
input #2-7 ShipID $ #9-18 Received: YYYYMMDD8. #20-28 Shipped: YYYYMMDD8. #City $;
run;
I'm trying to set up a data set for this warehouse.
data mydata;
input #1 shipid $ #7 received yymmdd10. #18 shipped yymmdd10. #28 address $30.;
format received yymmdd10. shipped yymmdd10.;
datalines;
X8742 2018/03/14 2018/03/17 blue ridge, MA 02391
;
run;
Assuming that all rows have values for the first three variables you could just read those using list mode input. Then read the rest of the line as the address.
data want;
infile "..." firstobs=2 truncover;
input shipid $ received shipped address $50. ;
informat received shipped yymmdd.;
format received shipped yymmdd10.;
run;
If the data is really in fixed columns then you can use column locations in your INPUT statement, but that is not compatible with using informats. So either use formatted input for the two date fields or read them as strings.
input shipid $1-7 #8 Received yymmdd10. #19 Shipped yymmdd10. Address $ 30-79 ;
format Received Shipped yymmdd10.;
Tom and DCR are both right. I prefer an easier route using Proc import.
proc import datafile='c:\personal\My_file.csv'
out=SAS_data replace;
DELIMITER=";" ;
getnames=yes;
guessingrows= 32767;
run;
What this does is that it makes a guess based on the file read and auto creates the infile-statement. (I just copy it from log and make adjustments if something is read incorrectly.)
If you know the structure of the data follow the other answers, but this is more beginnger friendly approach. (imho) For more see documentation
How to add input box to sas sql query which ask user about parameter ? (Something aka Access input box) (in Enterprise Guide)
Here is a solution using BASE -
You could use the %Window procedure with the %display
DATA _NULL_;
%LET BATCH1=;
%WINDOW BATCH_ANALYSIS COLOR = WHITE
ICOLUMN = 30 IROW = 11
COLUMNS = 88 ROWS = 20
#1 #28 "CLIENT BATCH REPORT"
#4 #12 "Date must be entered YYYY-MM-DD Format, ascending order."
#6 #28 "Example = '2015-01-31'"
#9 #5 "Enter Batch Date - [ENTER] when complete:"
#11 #5 BATCH1 12 attr=underline
#13 #5 "Reports will be written to 'location'";
%DISPLAY BATCH_ANALYSIS;
STOP;
RUN;
%put %batch1;
This above is an example of using the "user input" to operate on your query/data step. In this case, I am prompting the user to enter a date, which creates that string value as a macro variable that can be passed anywhere in your SAS code (I am only using the string date format because it gets passed to an RSUBMIT in a DB2 environment). May be a good idea to play with the Input Lines/etc to display the text you want in your prompt window...
Are you using Enterprise Guide?
If thats the case, you can create prompts which will create macro variables when you run your code.
You will just have to use those macro variables in your code.
Right click your program > Properties > Prompts > Prompt Manager and so on.
Have a look at it and see if it solves your problem.