SAS: PROC IMPORT: CSV WITH DATES AS VAR NAMES - sas

I'm importing CSV data in the following format:
SEDOL,12/08/2009,13/08/2009,14/08/2009,17/08/2009,18/08/2009
B1YVN39,7.8431,7.8431,7.8431,7.8431,7.598
B00G7R3,3.8,3.61,3.81,3.81,3.81
2965237,4.5351,4.5351,4.5351,4.5351,4.5351
2554345,7.355,7.355,7.355,7.355,7.355
I'm using the following command:
PROC IMPORT OUT= want
DATAFILE= have
DBMS=CSV REPLACE;
RUN;
Then transposing the data to long format, as follows:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
proc print; run;
How can I import the dates correctly formatted and change the variable type from default to date?

Importing and transposing are handy procedures, but if you understand your data well, a little data step program can deal with this in one step:
data want(keep=sedol v_date v_value);
infile have dsd dlm=',' truncover;
informat sedol $8. d1-d50 ddmmyy10. v1-v50 8.;
format v_date yymmdd10.;
array d(50) d1-d50;
array v(50) v1-v50;
/* Retain the date values and the count of dates */
retain d1-d50 idx;
/* Read header */
if _n_ = 1 then do;
input sedol d1-d50;
/* loop to find how many date columns there are */
do idx=1 to 50 while(d(idx) ne .);
end;
idx = idx - 1; /* must subtract one here */
delete;
end;
/* Read data lines */
input sedol v1-v50;
do i=1 to idx;
v_date = d(i);
v_value = v(i);
output;
end;
run;
As long as your input file is exactly as you describe (a header record with a leading ID variable less than 8 characters followed by some number of date values representing columns), this will process up to 50 measurements. It should be easy enough to modify if your needs change.

I would suggest in this case importing separately data and headers.
First, we import data:
PROC IMPORT OUT= want
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
datarow=2;
RUN;
Then we import only the first row with variables' names:
options obs=1;
PROC IMPORT OUT= header
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
RUN;
options obs=max;
Then we transpose row with headers into column and "mask" illegal (as SAS-names) values - add letter (doesn't matter which one, I chose 'D') as the first character and replace all slashes '/' to underscores '_':
proc transpose data=header out=header(drop=_name_);var _all_;run;
data header;
set header;
if anydigit(substr(COL1,1,1)) then COL1=cats("D",COL1);
COL1=translate(COL1,"_","/");
run;
Put this new 'cleaned' column names into a macrovariable:
proc sql noprint;
select COL1 into :names separated by ' '
from header;
quit;
And generate DATA-step for renaming using CALL EXECUTE routine:
data _null_;
dsid=open("want","i");
num=attrn(dsid,"nvars");
call execute("data want;");
call execute("set want;");
call execute("rename");
do i=1 to num;
call execute(varname(dsid,i)||"="||scan("&names",i," "));
end;
call execute(";run;");
rc=close(dsid);
run;
Now your original SORT and TRANSPOSE:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
And at last 'unmask' those dates back (deleting first D and replacing _ to /), and covert them to real dates with INPUT(). RETAIN statement is added just to put the new variable DATE at the second place right after SEDOl.
data transp;
retain SEDOL date;
set transp;
substr(_name_,1,1)='';
_name_=translate(_name_,"/","_");
date=input(strip(_name_),ddmmyy10.);
drop _name_;
format date ddmmyy10.;
run;

Related

Missing the last few column names when exporting to CSV file

I am trying to export a dataset in my Library/Work. It shows normal in SAS. However when I export the data as CSV or txt file (either from right click -> export, or use SAS code), the last few column names were missing (showing empty in CSV), while the values were kept. The column names missing are all in the format of "Log_xxx" but some the same-format columns were exported correctly. There're around 4000+ columns in my dataset.
The code I've tried is like:
proc export data=logdata
outfile="path.csv"
dbms=csv
replace;
run;
I've exported many datasets before, but it's the first time I have this kind of problem. I've tried to restart SAS and it's still not working.
I simply wanted to export the whole dataset completely with all column names and values.
Do you have any ideas?
I don't think it is PROC EXPORT that is the issue. You have to tell SAS that you want to write lines that are longer then 32,767 bytes (the default setting for the LRECL option).
This code works:
data test;
array longname [3500] ;
run;
filename csv temp lrecl=1000000 ;
proc export data=test dbms=csv file=csv ;
run;
So change your code to set the LRECL long enough for all of the variable names.
filename csv "path.csv" lrecl=1000000 ;
proc export data=logdata
outfile=csv
dbms=csv
replace
;
run;
Based on this post, your header is likely exceeding 32k characters, which causes the issues.
Solution is to manually create the file without proc export, or proc export to XLSX doesn't appear to have the issue.
*Create demo data;
data class;
set sashelp.class;
label age='Age, Years' weight = 'Weight(lbs)' height='Height, inches';
run;
proc sql noprint;
create table temp as
select name as _name_, label as _label_
from dictionary.columns
where libname="WORK" and upcase(memname)="CLASS";
select nliteral(name) into :varList separated by ' '
from dictionary.columns
where libname="WORK" and upcase(memname)="CLASS";
quit;
data _null_;
file "&sasforum.\datasets\TwoLinesHeader.csv" dsd lrecl = 40000;
set class;
if _n_ = 1 then do;
do until(eof);
set temp end=eof;
put _name_ #;
end;
put;
end;
put (&varList) (:);
run;

SAS proc export without comma thousands

I noticed in the SAS log that when I call a proc export data=mydata outfile="csv.csv" dbms=csv replace; run;, I get a generated internal set which declares a comma data format: comma20.3.
138 format YEAR best12. ;
145 format RATE_SPREAD comma20.3 ;
How can I get proc export not to do this, and to export without comma separators? Eg 9000 instead of 9,000?
Unfortunately PROC EXPORT does not support the FORMAT statement.
You could make a view to the original data with the format removed and export that.
data for_export / view=for_export;
set mydata;
format rate_spread ;
run;
proc export data=for_export outfile="csv.csv" dbms=csv replace;
run;
But you really don't need to use PROC EXPORT to write a CSV file. A data step works just as well. You might have to do a little work to add the header row.
proc transpose data=mydata(obs=0) out=names ;
var _all_;
run;
data _null_;
file "csv.csv" dsd ;
set names;
put _name_ #;
run;
data _null_;
file "csv.csv" dsd mod ;
set mydata;
put (_all_) (+0);
format rate_spread ;
run;

How do I calculate range of a variable in SAS?

I have a table in SAS which has a variable say X.
I want to know only the range of X, I used PROC UNIVARIATE, but it gives out a lot of other information.
I have been trying to use RANGE function in the following way, but doesn't yield any result. Please help!
DATA DATASET2;
SET DATASET1;
R=RANGE(X);
KEEP R;
RUN;
PROC PRINT DATASET2;
RUN;
the range function is for within a row and you have tried for column, so probably you might have got zeros.
range function can be used as follows.
R= range(x,y,x);
For within an column you need use proc means.
proc means data=sashelp.class range maxdec=2;
var age;
run;
or by using proc sql as shown below.
proc sql;
select max(age) -min(age) as range
from sashelp.class;
You can also use the range function in proc sql, where it operates on columns rather than rows:
proc sql;
select range(age) from sashelp.class;
quit;
This is also possible within a data step, if you don't like sql:
data _null_;
set sashelp.class end = eof;
retain min_age max_age;
min_age = min(age,min_age);
max_age = max(age,max_age);
if eof then do;
range = max_age - min_age;
put range= min_age= max_age=;
end;
run;
Or equivalently:
data _null_;
do until (eof);
set sashelp.class end = eof;
min_age = min(age,min_age);
max_age = max(age,max_age);
end;
range = max_age - min_age;
put range= min_age= max_age=;
run;

SAS: Printing monthly and weekly average

How can I print (and export to file) monthly and weekly average of value? The data is stored in a library and the form is following:
Obs. Date Value
1 08FEB2016:00:00:00 29.00
2 05FEB2016:00:00:00 29.30
3 04FEB2016:00:00:00 29.93
4 03FEB2016:00:00:00 28.65
5 02FEB2016:00:00:00 28.40
(...)
3078 08MAR2004:00:00:00 32.59
3079 05MAR2004:00:00:00 32.75
3080 04MAR2004:00:00:00 32.05
3081 03MAR2004:00:00:00 31.82
EDIT: I somehow managed to get the monthly data but I'm returning average for each month separately. I would to have it done as one result, namely Month-Average+export it to a file or a data set. And still I have no idea how to deal with weeks.
%macro printAvgM(start,end);
proc summary data=sur1.dane(where=(Date>=&start
and Date<=&end)) nway;
var Value;
output out=want (drop=_:) mean=;
proc print;
run;
%mend printAvgM;
%printAvgM('01jan2003'd,'31jan2003'd);
EDIT2: Here is my code, step by step:
libname sur 'C:\myPath';
run;
proc import datafile="C:\myPath\myData.csv"
out=SUR.DANE
dbms=csv replace;
getnames=yes;
run;
proc sort data=sur.dane out=sur.dane;
by Date;
run;
libname sur1 "C:\myPath\myDB.accdb";
run;
proc datasets;
copy in=sur out=sur1;
select dane;
run;
data sur1.dane2;
set sur1.dane;
date2=datepart(Date);
format date2 WEEKV11.;
run;
The last step results in NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables. and the format of dane2 variable is DATETIME19..
Ok, it's small enough to handle easily then. I would recommend first converting your datetime variable to a date variable using DATEPART() function and then use a format within PROC MEANS. You can look up the WEEKU and WEEKV formats to see if they meet your needs. The code below should be enough to get you started. You could do the monthly without the date conversion, but I couldn't find a weekly format for the datetime variable.
*Fake data generated;
data fd;
start=datetime();
do i=1 to 3000000 by 120;
datetime=start+(i-1)*30;
var=rand('normal', 25, 5);
output;
end;
keep datetime var;
format datetime datetime21.;
run;
*Get date variable;
data fd_date;
set fd;
date_var = datepart(datetime);
date_month = put(date_var, yymon7,);
Date_week = put(date_var, weekv11.);
run;
*Monthly summary;
proc means data=fd_date noprint nway;
class date_var;
var var;
output out=want_monthly mean(var)=avg_var std(var)=std_var;
format date_var monyy7.;
run;
*Weekly summary;
proc means data=fd_date noprint nway;
class date_var;
var var;
output out=want_weekly mean(var)=avg_var std(var)=std_var;
format date_var weekv11.;
run;
Replace date_var with the new monthly and weekly variables. Because these are character variables they won't sort properly.

Create new variables from format values

What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;