I'm not sure why this code doesn't work in SAS. May someboday help, please?
DATA WORK.POLLUTION;
INPUT State $ County $ City $ Month $ Year $ O3MAX $ Category;
IF O3MAX < 0.054 THEN Category = "Good";
ELSE IF O3MAX < 0.070 THEN Category = "Moderate";
ELSE IF O3MAX < 0.085 THEN Category = "UnhealthySensitive";
ELSE IF O3MAX < 0.105 THEN Category = "Unhealthy";
ELSE IF O3MAX < 0.200 THEN Category = "VeryUnhealthy";
ELSE Category = "Dangerous";
RUN;
PROC PRINT DATA = WORK.POLLUTION;
TITLE= "O3";
RUN;
The input statement tells SAS how to read from a text file, but it does not specify where to look for the tekst. That you do with an infile statement, like in
DATA WORK.POLLUTION;
INFILE "C:\myFolder\myInput.txt";
INPUT State $ County $ City $ Month $ Year $ O3MAX $ Category;
...;
run;
Optionally you can give the information on the file upfront by giving it a name and referring it:
filename MY_TEXT "C:\myFolder\myInput.txt";
DATA WORK.POLLUTION;
INFILE MY_TEXT;
INPUT State $ County $ City $ Month $ Year $ O3MAX $ Category;
...;
run;
A special filename is datalines, which refers to inline data, between a datalines statement and a semicolon
DATA WORK.POLLUTION;
INFILE datalines;
INPUT State $ County $ City $ Month $ Year $ O3MAX $ Category;
...;
datalines;
<your data comes here>
;
run;
If you don't specify an INFILE statement infile datalines; is implicitly used, but as you don't give a datalines; statement, there is no input. I bet your log tells something about zero lines read.
By the way, why do you specify Category in the input statement? I suppose it is only in your output.
There is a lot more to say about the infile statement. You have for instance options to handle lines with not all data filled in, like trunkover, which you should read about.
Related
I have this csv dataset named Movie:
ID,Underage,Name,Rating,Year, Rank on IMDb ,
M1021,,Elanor, Melanor,12,1879,5
M1203,Yes,IT,12,1999,1,
M0081,,Cars 2,13,1999,2,
M1371,No,Kiminonawa,12,2017,3,
M3416,,Living in the past, fading future,13,2018,12
I would like to import Movie into SAS such that "Elanor, Melanor" is the Name instead of 'Elanor' being under Name while 'Melanor' being in Rating.
I tried the follow code:
FILENAME XX '....Movie.csv';
data movieYY (drop=DLM1at field2);
infile XX dlm=',' firstobs=2 dsd;
format ID $5. Underage $3. Name $50. Year 4. Rating $3. 'Rank on IMDb'n 2.;
input #;
DLM1at = find(_INFILE_, ',');
length field2 $4;
field2 = substr(_INFILE_, DLM1at + 1, 4);
if lengthn(compress(field2, '1234567890')) ne 0 then do;
_INFILE_ = substr(_INFILE_, 1, dlm1at - 1) || ' ' ||
substr(_INFILE_, dlm1at + 1);
end;
input ID Underage Name Year Rating 'Rank on IMDb'n;
run;
May I know what should i do? I am still a beginner in SAS. Thank you!
Add quotes to the name of each movie, or use another delimiter. Any data within a delimited file that also has the same delimiter must be in quotes. For example:
data foo;
infile datalines dlm="," dsd;
length id 8. name $25.;
input id name$;
datalines;
1, "Smith, John"
2, "Cage, Nicolas"
;
run;
I am confused about what DSD actually does in terms of "moving the pointer" and reading in data. To better explain, look at the following code:
data one;
infile cards dlm=',' TRUNCOVER ; /*using dlm','*/
input cust_id date ddmmyy10. A $ B $ C $;
cards;
1,10/01/2015,5000,dr
;
run;
data two;
infile cards dsd TRUNCOVER ;
input cust_id date ddmmyy10. A $ B $ C $;
cards;
1,10/01/2015,5000,dr
;
run;
The dataset one contains values for A and B of 5000 and dr but the dataset two contains values of A as missing whereas B and C are 5000 and dr. I don't get why the dsd sets A to missing.
Thanks!
Your problem is not DLM or DSD it is "DATE DDMMYY10." that is inFORMATTED input which is not compatible with delimited input in any form DSD or NO.
You need INFORMAT statement or : informat modified.
date :DDMMYY10.
data test;
infile cards dsd dlm=', .';
input stmt : $ ##;
cards;
T
;run;
/*-----------------------------------------------*/
data test;
infile cards dsd dlm=', .';
input stmt : $ ##;
cards;
Th
;run;
/*-----------------------------------------------*/
data test;
infile cards dsd dlm=', .';
input stmt : $ ##;
cards;
This is SAS.
;run;
When first program is run, 80 observations are created
When second program is run, 79 observations are created
When third program is run, 72 observations are created
I know these program has worst programming style. Wrong options are set for wrong technique. DSD option is set, double trailing operator ## (line holder), Colon modifier (:) are used and more than 1 delimeter is used which is worst SAS programming ever.
Aside from this I want to know why so many observations are created, why 80? 79? how program is executed? I think DSD option & 2 delimeters have major impact. Can anyone explain?
The reason you get more records than you expect is because CARDS are fixed length records. The reason you get a difference number of records is because there is a different number of null fields left after reading the non-null field(s). You can see this by adding the COL option to the INFILE statement to show you where the column pointer is after reading each field. Col=3, 4 , 13
data test;
infile cards dsd dlm=', .' col=c;
input stmt : $ ##;
col=c;
cards;
T
;run;
proc print data=test(obs=5);
/*-----------------------------------------------*/
data test;
infile cards dsd dlm=', .' col=c;
input stmt : $ ##;
col=c;
cards;
Th
;run;
proc print data=test(obs=5);
/*-----------------------------------------------*/
data test;
infile cards dsd dlm=', .' col=c;
input stmt : $ ##;
col=c;
cards;
This is SAS.
;run;
proc print data=test(obs=5);
run;
I would like to create a variable called DATFL that would have the following values for the last obseration :
DATFL
gender/scan
Here is the code :
data mix_ ;
input id $ name $ gender $ scan $;
datalines;
1 jon M F
2 jill F L
3 james F M
4 jonas M M
;
run;
data mix_3; set mix_;
length datfl datfl_ $ 50;
array m4(*) id name gender scan;
retain datfl;
do i=1 to dim(m4);
if index(m4(i) ,'M') then do;
datfl_=vname(m4(i)) ;
if missing(datfl) then datfl=datfl_;
else datfl=strip(datfl)||"/"||datfl_;
end;
end;
run;
Unfortunately, the value I get for 'DATFL' at the last observation is 'gender/scan/gender/scan'.Obviously because of the retain statement that I used for 'DATFL' I ended up with duplicates. At the end of this data step, I was planning to use a CALL SYMPUT statement to load the last value into macro variable but I won't do it until I fix my issue...Can anyone provide me with a guidance on how to prevent 'DATFL' to have duplicates value at the end of the dataset ? Cheers
sas_kappel
Don't retain DATFL, Instead, retain DATFL_.
data mix_3; set mix_;
length datfl datfl_ $ 50;
array m4(*) id name gender scan;
retain datfl_;
do i=1 to dim(m4);
if index(m4(i) ,'M') then do;
datfl_=vname(m4(i)) ;
if missing(datfl) then datfl=datfl_;
else datfl=strip(datfl)||"/"||datfl_;
end;
end;
if missing(datfl) then datfl = datfl_;
run;
It doesn't work...Let me change the dataset (mix_) and you can see that RETAIN DATFLl_, is not working in this scenario.
data mix_ ;
input id $ name $ gender $ scan $;
datalines;
1 jon M M
2 Marc F L
3 james F M
4 jonas H M
;
run;
To resume, what I want is to have the DISTINCT value of DATFL, into a macro variable. The code that I proposed does,for each records,a search for variables having the letter M, if it true then DATFL receives the variable name of the array variable. If there are multiple variable names then they will be separated by '/'. For the next records, do the same, BUT add only variable names satisfying the condition AND the variables that were not already kept in DATFL. Currently, if you run my program I have for DATFL at observation 4, DATFL=gender/scan/name/scan/scan but I would like to have DATFL=gender/scan/name , because those one are the distinct values. Ultimatlly, I will then write the following code;
if eof then CALL SYMPUT('DATFL',datfl);
sas_kappel
Your revised data makes it much clearer what you're looking for. Here is some code that should give the correct result.
I've used the CALL CATX function to add new values to DATFL, separated by a /. It first checks that the relevant variable name doesn't already exist in the string.
data mix_ ;
input id $ name $ gender $ scan $;
datalines;
1 jon M M
2 Marc F L
3 james F M
4 jonas H M
;
run;
data _null_;
set mix_ end=eof;
length datfl $100; /*or whatever*/
retain datfl;
array m4{*} $ id name gender scan;
do i = 1 to dim(m4);
if index(m4{i},'M') and not index(datfl,vname(m4{i})) then call catx('/',datfl,vname(m4{i}));
end;
if eof then call symput('DATFL', datfl);
run;
%put datfl = &DATFL.;
Im learning 4Gl and I have little problem with informats:
I have file:
Imie;Nazwisko;Wiek;indeks;PESEL;Kierunek;Rok;Urodziny;Srednia;Frekwencja
Tomasz;Szan;23;114132;9134765445;Informatyka;5;5.32;99%;14.03.91
Karolina;Herl;21;134294;93543245;;3;4.57;92%;29.09.93
Damian;Kwak;24;189994;1234567890;Informatyka;5;3.50;80%;24.09.90
Ebenezer;Scrooge;AA;882741;78899609;Automatyka;4;3.72;34%;30.02.88
And 4GL code:
DATA projekt.project1;
length PESEL $ 11;
length nazwisko $ 15;
length kierunek $ 15;
INFILE 'c:\lasa_do_sasa\studenty.txt' DLM=';' MISSOVER DSD FIRSTOBS=2;
INPUT imie $ nazwisko $ wiek $ nr_indeksu PESEL $ kierunek $ rok srednia_ocen frekwencja PERCENT3. urodziny ddmmyy8. ;
RUN;
The problem is that: if I have xx%;date SAS won't read date. Im getting error:
Invalid data for urodziny
anyone could help me? I tihnk Im doind something obvious...
The trick here is to use the : format modifier to stop SAS trying to read beyond the next delimiter after the % sign. You can also set the lengths of your other variables on the input statement this way:
data want;
infile cards4 dsd dlm = ';' firstobs = 2;
input imie $ nazwisko :$15. wiek $ indeks $ PESEL :$11. kierunek :$15. rok urodziny srednia :PERCENT3. frekwencja ddmmyy8. ;
format frekwencja ddmmyy8.;
cards4;
Imie;Nazwisko;Wiek;indeks;PESEL;Kierunek;Rok;Urodziny;Srednia;Frekwencja
Tomasz;Szan;23;114132;9134765445;Informatyka;5;5.32;99%;14.03.91
Karolina;Herl;21;134294;93543245;;3;4.57;92%;29.09.93
Damian;Kwak;24;189994;1234567890;Informatyka;5;3.50;80%;24.09.90
Ebenezer;Scrooge;AA;882741;78899609;Automatyka;4;3.72;34%;30.02.88
;;;;
run;