can you only use END Stamement in SAS in set statement? For example...why isn't this working?
filename FS '/folders/myfolders/list4.txt';
data steward;
infile FS dlm = ',' END = EOF;
input Name $ Age Gender $;
if EOF = 1;
run;
Most SAS data steps actually stop when the INPUT or SET statement reads past the end of the file.
I suspect that your input file is either empty or does not have enough data to satisfy your INPUT statement.
You don't need to check EOF or IF as the data step will terminate automatically once it reaches the last record.
Solution:
DATA WORK.input1;
LENGTH
name $ 5
age 8
gender $ 1 ;
FORMAT
name $CHAR5.
age BEST2.
gender $CHAR1. ;
INFORMAT
name $CHAR5.
age BEST2.
gender $CHAR1. ;
INFILE 'E:\saswork\Input.txt'
LRECL=256
FIRSTOBS=2 /*I am skipping first row, as it containts column names*/
ENCODING="WLATIN1"
DLM='2c'x /* this is "," delimiter; I am using windows*/
MISSOVER
DSD ;
INPUT
name : $CHAR5.
age : ?? BEST2.
gender : $CHAR1. ;
put _all_;
RUN;
/*Contents of the Input.txt*/
/*name, age, gender*/
/*jack,32,M*/
/*John,45,M*/
/*Sally,38,F*/
Output:
name=jack age=32 gender=M _ERROR_=0 _N_=1
name=John age=45 gender=M _ERROR_=0 _N_=2
name=Sally age=38 gender=F _ERROR_=0 _N_=3
Related
I want to create a variable that resolves to the character before a specified character (*) in a string. However I am asking myself now if this specified character appears several times in a string (like it is in the example below), how to retrieve one variable that concatenates all the characters that appear before separated by a comma?
Example:
data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
;
run;
Code:
data want;
set have;
cnt = count(string, "*");
_startpos = 0;
do i=0 to cnt until(_startpos=0);
before = catx(",",substr(string, find(string, "*", _startpos+1)-1,1));
end;
drop i _startpos;
run;
That code output before=C for the first and second observation. However I want it to be before=C,E for the first one and before=C,W,d for the second observation.
You can use Perl regular expression replacement pattern to transform the original string.
Example:
data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
;
data want;
set have;
csl = prxchange('s/([^*]*?)([^*])\*/$2,/',-1,string); /* comma separated letters */
csl = prxchange('s/, *$//',1,csl); /* remove trailing comma */
run;
Make sure to increment _STARTPOS so your loop will finish. You can use CATX() to add the commas. Simplify selecting the character by using CHAR() instead of SUBSTR(). Also make sure to TELL the data step how to define the new variable instead of forcing it to guess. I also include test to handle the situation where * is in the first position.
data have;
input string $20.;
datalines;
ABC*EDE*
EFCC*W*d*
*XXXX*
asdf
;
data want;
set have;
length before $20 ;
_startpos = 0;
do cnt=0 to length(string) until(_startpos=0);
_startpos = find(string,'*',_startpos+1);
if _startpos>1 then before = catx(',',before,char(string,_startpos-1));
end;
cnt=cnt-(string=:'*');
drop i _startpos;
run;
Results:
Obs string before cnt
1 ABC*EDE* C,E 2
2 EFCC*W*d* C,W,d 3
3 *XXXX* X 1
4 asdf 0
call scan is also a good choice to get position of each *.
data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
****
asdf
;
data want;
length before $20.;
set have;
do i = 1 to count(string,'*');
call scan(string,i,pos,len,'*');
before = catx(',',before,substrn(string,pos+len-1,1));
end;
put _n_ = +7 before=;
run;
Result:
_N_=1 before=C,E
_N_=2 before=C,W,d
_N_=3 before=
_N_=4 before=
I have a large data file with data in the following format: country, datatype, year1month1 to year2018month7.
Reading the data using proc import did not work for all data fields. I ended up modifying the SAS datastep code to ensure data format was correct.
However, I am having trouble simplifying the code, namely I would like a do loop to go through all the years and month. This way, I could use current date to figure out the range of dates for the file and the code to create Year/Month variable does not have to repeat 100 times in the file.
data test;
infile 'abc.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Country_Name $34. ;
do i = 1940 to 2018;
do j = 1 to 12;
informat _(i)M(j) best32.;
end;
end;
informat Base_Year $1. ;
format Country_Name $34. ;
do i = 1940 to 2018;
do j = 1 to 12;
format _(i)M(j) best12.;
end;
end;
format Base_Year $1. ;
input
Country_Name $
do i = 1940 to 2018;
do j = 1 to 12;
_(i)M(j) $;
end;
end;
Base_Year $;
run;
There are a few approaches here that could work. The most directly translatable to your approach is to use the macro language.
You need to translate those two loops to something like this:
%do i = 1940 %to 2018;
%do j = 1 %to 12;
informat _&i.M&j. best32.;
%end;
%end;
Notice the % there. This also has to be in a macro; you can't do this in normal datastep code.
I would rewrite it to use a macro like so:
%macro make_ym(startyear=, endyear=, separator=);
%local i j;
%do i = &startyear. %to &endyear.;
%do j = 1 %to 12;
_&i.&separator.&j.
%end;
%end;
%mend make_ym;
data test;
infile 'abc.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Country_Name $34. ;
informat %make_ym(startyear=1940,endyear=2018,separator=M) best32.;
informat Base_Year $1. ;
format %make_ym(startyear=1940,endyear=2018,separator=M) best12.;
format Base_Year $1. ;
input
Country_Name $
%make_ym(startyear=1940,endyear=2018,separator=M)
Base_Year $;
run;
I took out the $ after the yMm bits in the input since you declared them as numeric.
Don't model your data step after the code generated by PROC IMPORT. It does a lot of useless things, like attaching formats and informats to variables that don't need them.
For your problem you just need a simple program like this:
data test;
infile 'abc.csv' dsd dlm= ',' truncover firstobs=2 ;
input Country_Name :$34. Y1940M01 .... Y2018M08 Base_Year :$1. ;
run;
Now the only tricky part is building that list of numerical variables. If the list is small enough you could just put it into a macro variable. Fortunately that is not a problem in this case since using 8 character names (YyyyyMmm) there is room for over 300 years worth in a data step character variable. A variable of length 10,800 bytes should have room for 100 years of month names.
So just run this data step first.
data _null_;
length names $10800 ;
basedate = mdy(1,1,1940);
lastdate = today();
do i=0 to intck('month',basedate,lastdate);
date=intnx('month',basedate,i);
names=catx(' ',names,cats('Y',year(date),'M',put(month(date),Z2.)));
end;
call symputx('names',names);
run;
Now you can use the macro variable in your INPUT statement.
data test;
infile 'abc.csv' dsd dlm= ',' truncover firstobs=2 ;
input Country_Name :$34. &names Base_Year :$1. ;
run;
I am confused about what DSD actually does in terms of "moving the pointer" and reading in data. To better explain, look at the following code:
data one;
infile cards dlm=',' TRUNCOVER ; /*using dlm','*/
input cust_id date ddmmyy10. A $ B $ C $;
cards;
1,10/01/2015,5000,dr
;
run;
data two;
infile cards dsd TRUNCOVER ;
input cust_id date ddmmyy10. A $ B $ C $;
cards;
1,10/01/2015,5000,dr
;
run;
The dataset one contains values for A and B of 5000 and dr but the dataset two contains values of A as missing whereas B and C are 5000 and dr. I don't get why the dsd sets A to missing.
Thanks!
Your problem is not DLM or DSD it is "DATE DDMMYY10." that is inFORMATTED input which is not compatible with delimited input in any form DSD or NO.
You need INFORMAT statement or : informat modified.
date :DDMMYY10.
Here is a simple code to read two files
filename one "C:\Users\Owner\Desktop\SAS\nbExce\ch1\1-12 one.txt" ;
filename two "C:\Users\Owner\Desktop\SAS\nbExce\ch1\1-12 two.txt" ;
data test ;
input extfile $ ;
infile dummy filevar=extfile end=last ;
do until (last) ;
input name $ score ;
output ;
end ;
datalines ;
one
two
;
run ;
proc print ;
run ;
Why do I get this error ? How can I improve my file references ?
You cannot refer to file assigned using a filename statement in the filevar. Use the full path to the files.
data test;
infile datalines dsd;
length extfile $128;
input extfile $;
infile dummy filevar=extfile end=last;
do until (last);
input name $ score;
output;
end;
datalines;
"C:\Users\Owner\Desktop\SAS\nbExce\ch1\1-12 one.txt"
"C:\Users\Owner\Desktop\SAS\nbExce\ch1\1-12 two.txt"
;
run;
This seems straightforward, but it's not working as expected:
data names;
input name $12.;
cards;
John
Jacob
Jingleheimer
Schmidt
;
run;
data names;
length namelist $100.;
set names end=eof;
retain namelist;
if _n_=1 then namelist=name;
else namelist = namelist || "|" || name;
if eof then output;
run;
I would expect the result to have one observation containing
John|Jacob|Jingleheimer|Schmidt
but namelist is just John. What am I doing wrong?
You need to trim the whitespace before concatenating to your list.
data names;
length namelist $100.;
set names end=eof;
retain namelist;
if _n_=1 then namelist=trim(name);
else namelist = trim(namelist) || "|" || trim(name);
if eof then output;
run;
You could also use the cats() function (which does the trimming and concatenation for you):
data names;
length namelist $100.;
set names end=eof;
retain namelist;
if _n_=1 then namelist=name;
else namelist = cats(namelist,"|",name);
if eof then output;
run;
If you added STRIP to your assignment
strip(namelist) || "|" || name
it would work also
(but CATS is a really good solution)
Using the catx function allows you to specify the delimiter...
data names;
length namelist $100.;
set names end=eof;
retain namelist;
namelist = catx("|",namelist,name);
if eof then output;
run;