I have a excel-file where I want to split words into different columns in SAS.
In the file it looks like this in the same column, I want to split it and get rid of quotation marks :
ID;"City";"Year"
1;"New york";NULL
2;"stockton";"18"
This is what I tried to do:
data work.project ;
infile "&path\users.csv" delimiter=';' missover dsd;
input ID: $30.
City: $200.
Year: $5. ;
run;
proc print data=work.project;
run;
My output:
Obs ID City Year
1 ,,,"ID ""City"" ""Year
2 ,,,"1 ""new york"" NULL"
3 ,,,"2 ""stockton"" ""18"
4 ,,,"3 ""moscow "" NULL"
Rather than the colon and formats in the INPUT statement use an INFORMAT statement.
data work.project;
infile datalines4 delimiter=';' truncover dsd;
informat id $30. city $200. year $4.;
input ID City Year;
datalines4;
1;"New York";NULL
2;"Stockton";"18"
;;;;
run;
proc print data=project;
run;
Related
I have the following dataset and code:
DATA survey;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
PROC PRINT; RUN;
data work;
set survey;
where '11JAN2007'<= order_date <= '13JAN2007';
proc print data=work;
run;
When I run this code it does give the desired output however. It only gives a table with three empty order_date columns.
Any thoughts on what goes wrong here?
This would work:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
data work;
set survey;
where '11JAN2007'd<= order_date <= '13JAN2007'd;
run;
proc print data=work;
format order_date date9. ;
run;
See SAS help for topics date, informat,...
If you want to query based on date, you need to tell SAS that your string is a date. You do this by putting a 'd' after the date string, e.g.
'11JAN2007'd
I want to calculate 'age last birthday' on a specific evaluation date, given a specific date of birth, using a SAS PROC SQL command.
How can I do this and are there any limitations?
Sample Input
DATA INPUTS;
infile cards dlm=',' dsd;
INPUT DOBDt :DATE9. EvalDt :DATE9. expected;
FORMAT DOBDt date9. EvalDt date9.;
CARDS;
11MAY2009,10MAY2015,5
11MAY2009,11MAY2015,6
11MAY2009,12MAY2015,6
28FEB1984,01DEC2015,31
29FEB1984,28FEB2012,27
29FEB1984,29FEB2012,28
29FEB1984,01MAR2012,28
;
RUN;
The goal would be to take the dobDt as an input, evaluate on the EvalDt and produce the answer of expected
This can be done as such :
PROC SQL
PROC SQL;
CREATE TABLE outputs2 AS
select
*
,intck('year',DOBDt,EvalDt,'c') AS actual
,((calculated actual) eq expected) AS check
FROM
inputs
;
QUIT;
actual, the calculated value, matches expected, the desired outcome, for all the examples provided. I am not aware of any limitations to this approach although there are probably some extreme ages that it cannot calculate due to SAS dates having a limited range of values.
As a bonus:
DATA STEP
DATA outputs;
set inputs;
actual = intck('year',DOBDt,EvalDt,'c');
check = (actual eq expected);
RUN;
This is how we used to do it back in the day. Also "age at last birthday" seems pretty clear to me.
DATA INPUTS;
infile cards dlm=',' dsd;
INPUT DOBDt :DATE9. EvalDt :DATE9. expected;
FORMAT DOBDt date9. EvalDt date9.;
age = year(evaldt)-year(dobdt) - (month(evaldt) eq month(dobdt) and day(evaldt) lt day(dobdt)) - (month(evaldt) lt month(dobdt));
CARDS;
11MAY2009,10MAY2015,5
11MAY2009,11MAY2015,6
11MAY2009,12MAY2015,6
28FEB1984,01DEC2015,31
29FEB1984,28FEB2012,27
29FEB1984,29FEB2012,28
29FEB1984,01MAR2012,28
;;;;
RUN;
proc print;
run;
My datalines has 2 variables, date 1 and date 2 with corresponding format ddmmyy10. and mmddyy10.
data date;
input date1 ddmmyy10. date2 mmddyy10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
11/01/2015 01/11/2015
12/01/2015 01/12/2015
13/01/2015 01/13/2015
;
run;
I tried to add the code but still not work
infile datalines delimiter=' ';
The : (colon) format modifier enables you to use list input but also to specify an informat after a variable name, whether character or numeric. SAS reads until it encounters a blank column, the defined length of the variable (character only), or the end of the data line, whichever comes first.
Format Modifier
data date;
input date1: ddmmyy10. date2: mmddyy10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
11/01/2015 01/11/2015
12/01/2015 01/12/2015
13/01/2015 01/13/2015
;
run;
The reason why the code is not working is because you are using List input to read non-standard data without colon input modifier or informat statement.
For non-standard data(commas, dollar, date etc. -> Reading Raw data -> Kinds of Data ) or standard data of length greater than 8 byte using List input technique you would need to use either INFORMAT statement or colon modifier with INPUT statement.
1) Assign informat for the input variables using INFORMAT statement or ATTRIB statement
data date;
informat date1 ddmmyy10. date2 mmddyy10.;
input date1 date2;
format date1-date2 yymmdd10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
;
run;
2) Use Colon (:) input modifier
data date;
input date1: ddmmyy10. date2: mmddyy10.;
format date1-date2 yymmdd10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
;
run;
I want to read the following dat file into SAS. Since the names and values are separated by 2 spaces I use the ampersand in the input statement. But it seems that the DLM='/' in the infile statement conflicts with it. Can someone tell me what the mistake in my code is?
File:
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
Code:
data mylib.D_report;
infile Dinning dlm='/' dsd missover;
input ID 1-4 Name & $17. M1-M6;
run;
You're mixing input styles, which while understandable given you have fairly mixed input data, isn't permitted the way you're doing it.
Your best option is to read M1-6 into one variable, then split it up using SCAN.
data work.D_report;
infile datalines missover dlm=' ';
input ID :4.
Name & $17.
Ms :$40.;
array M[6];
do _t = 1 to countc(Ms,'/')+1;
if _t > dim(M) then leave;
M[_t]=scan(Ms,_t,'/','m');
end;
datalines;
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
;;;;
run;
You just need to change the delimiter.
data D_report;
dlm = ' ';
infile cards dlm=dlm missover dsd;
input ID 1-4 Name & $17. #;
dlm = '/';
input M1-M6;
cards;
1118 ART CONTUCK 57.69/65.20/120.50//152.60
2287 MICHAEL WINSTONE 145.89
run;
proc print;
run;
I have the below raw data
1,,35,000
2,100,45,000
and need the below in a dataset
1 . 35000
2 100 45000
this would require both dsd option and using comma. informat.
How to carry this out?
DSD has nothing to do with this - DSD involves input like
1,,"35,000"
2,100,"45,000"
If that is what you have, then you can use the : operator to read it in with the comma informat.
data test;
infile datalines dlm=',' dsd;
input id
num
dollar :comma8.;
datalines;
1,,"35,000"
2,100,"45,000"
;;;;
run;
If you do not have the quotes around the field, then you will need to parse this somehow. One solution is below, which will work as long as the field with commas is the final field.
data test;
infile datalines dlm=',' dsd;
input #;
if countc(_infile_,',') =3 then do;
_commapos = findc(_infile_,',',-1*length(_infile_));
_infile_ = substr(_infile_,1,_commapos-1)||substr(_infile_,_commapos+1);
end;
input id
num
dollar ;
put _all_;
datalines;
1,,35,000
2,100,45,000
;;;;
run;
If the field your potential is in is in a consistent field, but NOT the first one, you can modify the above solution to correct it. If it's in potentially more than one field, you have a much more difficult problem to solve.