Simple question.
PROC IMPORT OUT= braw.address
DATAFILE= "&path.\address_data.csv"
DBMS=csv REPLACE;
GETNAMES=YES;
RUN;
This statement will create the dataset columns as character or numeric depending on the values, which is smart, but not what I want.
I want to import them all as character, to make for easier regex evaluation.
Is there a simple way to do this?
I would generally just write my own input statement for the CSV, then you can make them whatever you want.
IE:
data braw.address;
infile "&path.\address_data.csv" dlm=',' dsd missover;
input
field1 $
field2 $
....
;
run;
You can use the log from the PROC IMPORT to generate this the first time and just edit it to contain $ for each variable.
If you do not want to write a SAS macro to read all the columns as character, you could try a "cheat". Manually edit the file and duplicate the first row (the one containing column headers. Since those will most likely all be character strings, SAS should import all the columns as character.
Of course, a macro to do this would not be that difficult. You can try something like this:
%macro readme(dsn,fn);
/* Macro to read all columns of a CSV as character */
/* Parameters: */
/* DSN - The name of the SAS data set to create */
/* FN - The external file to read (quoted) */
/* Example: */
/* %readme(want, 'c:\temp\tempfile.csv'); */
data _null_;
infile &fn;
input;
i = 1;
length headers inputstr $200;
headers = compress(_infile_,"'");
newvar = scan(headers,1,',');
do until (newvar = ' ');
inputstr = trim(inputstr) || ' ' || trim(newvar) || ' $';
i + 1;
newvar = scan(headers,i,',');
end;
call symput('inputstr',inputstr);
stop;
run;
data &dsn;
infile &fn firstobs=2 dsd dlm=',' truncover;
input &inputstr.;
run;
%mend;
%readme(want, 'c:\temp\tempfile.csv');
Here is my macro to read dlm file with all vars as char:
%MACRO ImportText(file,dsn,dlm);
* Read data use proc import to get variable name and length;
PROC IMPORT DATAFILE="&file" OUT=temp DBMS=dlm REPLACE;
DELIMITER = &dlm;
GETNAMES = YES;
GUESSINGROWS = 32767;
RUN;
* Put variable names into macro variable;
PROC CONTENTS DATA=temp out=vars NOPRINT; RUN;
PROC SQL NOPRINT;
SELECT CATT(name,' : $',length,'.') INTO :vars SEPARATED BY ' ' FROM vars ORDER BY varnum;
QUIT;
* Read real data;
DATA &dsn;
INFILE "&file" DELIMITER=&dlm MISSOVER DSD FIRSTOBS=2 LRECL=32767;
INPUT &vars;
RUN;
%MEND;
Related
I have many datasets for each month with the same name, changing just the end with specific month so for instance my datasets that i am calling with this code:
TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
are called "TEMPCAAD.LIFT_MODEL_V1_202021", "TEMPCAAD.LIFT_MODEL_V1_202022" and so on...
I am trying to append all datasets but some of them doesn't exist, so when i run the following code I get the error
Dataset "TEMPCAAD.LIFT_MODEL_V1_202022" does not exist.
%let currentmonth = &anomes_scores;
%let previousyearmonth = &anomes_x12;
data _null_;
length string $1000;
cur_month = input("&previousyearmonth.01",yymmdd8.);
do until (cur_month > input("¤tmonth.01",yymmdd8.));
string = catx(' ',trim(string),'TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
cur_month = intnx('month',cur_month,1,'b');
end;
call symput('mydatasets',trim(string));
%put &mydatasets;
run;
data WORK.LIFTS_U6M;
set &mydatasets.;
run;
How can I append only existing datasets?
Instead of looping on every file to see whether it exist or not, why don't you just extract all the dataset names from dictionary.tables?
libname TEMPCAAD "/home/kermit/TEMPCAAD";
data tempcaad.lift_model_v1_202110 tempcaad.lift_model_v1_202111 tempcaad.lift_model_v1_202112;
id = 1;
output tempcaad.lift_model_v1_202110;
id = 2;
output tempcaad.lift_model_v1_202111;
id = 3;
output tempcaad.lift_model_v1_202112;
run;
%let nome_modelo = MODEL;
%let versao_modelo = V1;
proc sql;
select strip("TEMPCAAD."||memname) into :dataset separated by " "
from dictionary.tables
where libname="TEMPCAAD" and memname like "LIFT_&NOME_MODELO._&VERSAO_MODELO.%";
quit;
data want;
set &dataset.;
run;
You can easily tweak the where statement to only extract the data that you wish to append. Just remember to put double quotes if you specify a macro-variable in it.
Is there a general purpose way of concatenating each variable in an observation into one larger variable whilst preserving the format of numeric/currency fields in terms of how it looks when you do a proc print on the dataset. (see sashelp.shoes for example)
Here is some code you can run, as you can see when looking at the log, using the catx function to produce a comma separated output removes both the $ currency sign as well as the period from the numeric variables
proc print data=sashelp.shoes (obs=10);
run;
proc sql;
select name into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
Any solution needs to be general purpose as it will run on disparate datasets with differently formatted variables.
You can manually loop over PDV variables of the data set, concatenating each formatted value retrieved with vvaluex. A hash can be used to track which variables of the data set to process. If you are comma separating values you will probably want to double quote formatted values that contain a comma.
data want;
set sashelp.cars indsname=_data;
if _n_ = 1 then do;
declare hash vars();
length _varnum 8 _varname $32;
vars.defineKey('_n_');
vars.defineData('_varname');
vars.defineDone();
_dsid = open(_data);
do _n_ = 1 to attrn(_dsid,'NVAR');
rc = vars.add(key:_n_,data:varname(_dsid,_n_));
end;
_dsid = close(_dsid);
call missing (of _:);
end;
format weight comma7.;
length allcat $32000 _vvx $32000;
do _n_ = 1 to vars.NUM_ITEMS;
vars.find();
_vvx = strip(vvaluex(_varname));
if index(_vvx,",") then _vvx = quote(strip(_vvx));
if _n_ = 1
then allcat = _vvx;
else allcat = cats(allcat,',',_vvx);
end;
drop _:;
run;
You can use import and export to csv file:
filename tem temp;
proc export data=sashelp.SHOES file=tem dbms=csv replace;
run;
data l;
length all $ 200;
infile tem truncover firstobs=2;
input all 1-200;
run;
P.S.
If you need concatenate only char, uou can create array of all CHARACTER columns in dataset, and just iterate thru:
data l;
length all $ 5000;
set sashelp.SHOES;
array ch [*] _CHARACTER_;
do i = 1 to dim(ch);
all=catx(',',all,ch[i]);
end;
run;
The PUT statement is the easiest way to do that. You don't need to know the variables names as you can use the _all_ variable list.
put (_all_) (+0);
It will honor the formats attached the variables and if you have used DSD option on the FILE statement then the result is a delimited list.
What is the ultimate goal of this exercise? If you want to create a file you can just write the file directly.
data _null_;
set sashelp.shoes(obs=3);
file 'myfile.csv' dsd ;
put (_all_) (+0);
run;
If you really do want to get that string into a dataset variable there is no need to invent some new function. Just take advantage of the PUT statements abilities by creating a file and then reading the lines from the file.
filename junk temp;
data _null_;
set sashelp.shoes(obs=3);
file junk dsd ;
put (_all_) (+0);
run;
data stuff ;
set sashelp.shoes(obs=3);
infile junk truncover ;
input all $5000.;
run;
You can even do it without creating the full text file. Instead just write one line at a time and save the line into a variable using the _FILE_ automatic variable.
filename junk temp;
data stuff;
set sashelp.shoes(obs=3);
file junk dsd lrecl=5000 ;
length all $5000;
put #1 (_all_) (+0) +(-2) ' ' #;
all = _file_;
output;
all=' ';
put #1 all $5000. #;
run;
Solution with vvalue and concat function (||):
It is similar with 'solution without catx' (the last one), but it is simplified by vvalue function instead put.
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for concat function (||)*/
proc sql;
select ('strip(vvalue('|| strip(name) ||'))') into :varstr4 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step to concat all variables*/
data stuff2;
format all $5000.;
set work.wocatx ;
all = &varstr4. ;
put all;
run;
Solution with catx:
proc print data=SASHELP.SHOES;
run;
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
If there isn't in dictionary.columns format, then in macro variable varstr2 will just name, if there is format, then when it would call in catx it will convert in format, that you need, for example,if variable is num type then put(Sales,DOLLAR12.), or if it char type then input function . You could add any conditions in select into if you need.
If there is no need of using of input function just change select:
ifc(strip(format) is missing,strip(name),'put('|| strip(name) ||','|| strip(format) ||')')
Solution without catx:
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for catx*/
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*data step with catx*/
data stuff;
format all $5000.;
set work.wocatx ;
all = catx(',',&varstr2.) ;
put all;
run;
/*Macro variable for concat function (||)*/
proc sql;
select ifc(strip(format) is missing,
'strip(' || strip(name) || ')',
'strip(put('|| strip(name) ||','|| strip(format) ||'))') into :varstr3 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step without catx*/
data stuff1;
format all $5000.;
set work.wocatx ;
all = &varstr3. ;
put all;
run;
Result with catx and missing values:
Result without catx and with missing values:
I have a large data file with data in the following format: country, datatype, year1month1 to year2018month7.
Reading the data using proc import did not work for all data fields. I ended up modifying the SAS datastep code to ensure data format was correct.
However, I am having trouble simplifying the code, namely I would like a do loop to go through all the years and month. This way, I could use current date to figure out the range of dates for the file and the code to create Year/Month variable does not have to repeat 100 times in the file.
data test;
infile 'abc.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Country_Name $34. ;
do i = 1940 to 2018;
do j = 1 to 12;
informat _(i)M(j) best32.;
end;
end;
informat Base_Year $1. ;
format Country_Name $34. ;
do i = 1940 to 2018;
do j = 1 to 12;
format _(i)M(j) best12.;
end;
end;
format Base_Year $1. ;
input
Country_Name $
do i = 1940 to 2018;
do j = 1 to 12;
_(i)M(j) $;
end;
end;
Base_Year $;
run;
There are a few approaches here that could work. The most directly translatable to your approach is to use the macro language.
You need to translate those two loops to something like this:
%do i = 1940 %to 2018;
%do j = 1 %to 12;
informat _&i.M&j. best32.;
%end;
%end;
Notice the % there. This also has to be in a macro; you can't do this in normal datastep code.
I would rewrite it to use a macro like so:
%macro make_ym(startyear=, endyear=, separator=);
%local i j;
%do i = &startyear. %to &endyear.;
%do j = 1 %to 12;
_&i.&separator.&j.
%end;
%end;
%mend make_ym;
data test;
infile 'abc.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Country_Name $34. ;
informat %make_ym(startyear=1940,endyear=2018,separator=M) best32.;
informat Base_Year $1. ;
format %make_ym(startyear=1940,endyear=2018,separator=M) best12.;
format Base_Year $1. ;
input
Country_Name $
%make_ym(startyear=1940,endyear=2018,separator=M)
Base_Year $;
run;
I took out the $ after the yMm bits in the input since you declared them as numeric.
Don't model your data step after the code generated by PROC IMPORT. It does a lot of useless things, like attaching formats and informats to variables that don't need them.
For your problem you just need a simple program like this:
data test;
infile 'abc.csv' dsd dlm= ',' truncover firstobs=2 ;
input Country_Name :$34. Y1940M01 .... Y2018M08 Base_Year :$1. ;
run;
Now the only tricky part is building that list of numerical variables. If the list is small enough you could just put it into a macro variable. Fortunately that is not a problem in this case since using 8 character names (YyyyyMmm) there is room for over 300 years worth in a data step character variable. A variable of length 10,800 bytes should have room for 100 years of month names.
So just run this data step first.
data _null_;
length names $10800 ;
basedate = mdy(1,1,1940);
lastdate = today();
do i=0 to intck('month',basedate,lastdate);
date=intnx('month',basedate,i);
names=catx(' ',names,cats('Y',year(date),'M',put(month(date),Z2.)));
end;
call symputx('names',names);
run;
Now you can use the macro variable in your INPUT statement.
data test;
infile 'abc.csv' dsd dlm= ',' truncover firstobs=2 ;
input Country_Name :$34. &names Base_Year :$1. ;
run;
%let dirname = C:\Users\data;
filename DIRLIST pipe 'dir/B &dirname\*.dbf';
/* Create a data set with one observation for each file name */
data dirlist;
length fname $8.;
infile dirlist length=reclen;
input fname $8.;
run;
data all_text (drop=fname);
set dirlist;
filepath = "&dirname\"||fname||".dbf";
infile dummy filevar = filepath length=reclen end=done missover;
do while(not done);
INPUT
F1 : 2.
F2 : 2.
F3 : 2.
F4 : 10.
F5 : 4.;
output;
end;
run;
The problem is that it is only reading the first line of each files and not the whole file before moving to the next. Also variable F1 are shown as missing.
Suggestions are welcome
So a standard proc import would be:
proc import out=sample1 datafile="path to dbf file.dbf" dbms=DBF replace;
run;
The problem now, is how to generate this set of code for every file in your file list. Using the CALL EXECUTE statement from #Tom is your best bet. You call also create a small macro and call it for each filename, using CALL EXECUTE. If you're new to SAS this can be easier to understand.
*Create a macro that imports the DBF
%macro import_dbf(input= , output=);
proc import out=&out datafile="&output" dbms=DBF replace;
run;
%mend;
Then call macro from dataset. I'm naming the datasets DBF001, DBF0002 etc.
%let dirname=C:\_localdata;
data dirlist;
informat fname $20.;
input fname;
cards;
data1.dbf
data2.dbf
data3.dbf
data4.dbf
;
run;
data out;
set dirlist;
str=catt('%import_dbf(input="', "&dirname", '\', fname, '", output=dbf',
put(_n_, z4.), ');');
run;
proc print data=out;
run;
Import them one by one and then combine them.
%let dirname = C:\Users\data;
data filelist ;
infile "dir /b &dirname\*.dbf" pipe truncover end=eof;
fileno + 1;
input fname $256. ;
tempname = 'temp'||put(fileno,z4.);
call execute(catx(' ','proc import replace dbms=dbf'
,'out=',tempname,'datafile=',quote(trim(fname)),';run;'
));
if eof then call symputx('lastname',tempname);
run;
data want ;
set temp0001-&lastname;
run;
I am trying to parse a delimited dataset with over 300 fields. Instead of listing all the input fields like
data test;
infile "delimited_filename.txt"
DSD delimiter="|" lrecl=32767 STOPOVER;
input field_A:$200.
field_B :$200.
field_C:$200.
/*continues on */
;
I am thinking I can dump all the field names into a file, read in as a sas dataset, and populate the input fields - this also gives me the dynamic control if any of the field names changes (add/remove) in the dataset. What would be some good ways to accomplish this?
Thank you very much - I just started sas, still trying to wrap my head around it.
This worked for me - Basically "write" data open code using macro language and run it.
Note: my indata_header_file contains 5 columns: Variable_Name, Variable_Length, Variable_Type, Variable_Label, and Notes.
%macro ReadDsFromFile(filename_to_process, indata_header_file, out_dsname);
%local filename_to_process indata_header_file out_dsname;
/* This macro var contain code to read data file*/
%local read_code input_in_line;
%put *** Processing file: &filename_to_process ...;
/* Read in the header file */
proc import OUT = ds_header
DATAFILE = &indata_header_file.
DBMS = EXCEL REPLACE; /* REPLACE flag */
SHEET = "Names";
GETNAMES = YES;
MIXED = NO;
SCANTEXT = YES;
run;
%let id = %sysfunc(open(ds_header));
%let NOBS = %sysfunc(attrn(&id.,NOBS));
%syscall set(id);
/*
Generates:
data &out_dsname.;
infile "&filename_to_process."
DSD delimiter="|" lrecl=32767 STOPOVER FIRSTOBS=3;
input
'7C'x
*/
%let read_code = data &out_dsname. %str(;)
infile &filename_to_process.
DSD delimiter=%str("|") lrecl=32767 STOPOVER %str(;)
input ;
/*
Generates:
<field_name> : $<field_length>;
*/
%do i = 1 %to &NObs;
%let rc = %sysfunc(fetchobs(&id., &i));
%let VAR_NAME = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Name)) ));
%let VAR_LENGTH = %sysfunc(getvarn(&id., %sysfunc(varnum(&id., Variable_Length)) ));
%let VAR_TYPE = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Type)) ));
%let VAR_LABEL = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Label)) ));
%let VAR_NOTES = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Notes)) ));
%if %upcase(%trim(&VAR_TYPE.)) eq CHAR %then
%let input_in_line = &VAR_NAME :$&VAR_LENGTH..;
%else
%let input_in_line = &VAR_NAME :&VAR_LENGTH.;
/* append in_line statment to main macro var*/
%let read_code = &read_code. &input_in_line. ;
%end;
/* Close the fid */
%let rc = %sysfunc(close(&id));
%let read_code = &read_code. %str(;) run %str(;) ;
/* Run the generated code*/
&read_code.
%mend ReadDsFromFile;
Sounds like you want to generate code based on metadata. A data step is actually a lot easier to code and debug than a macro.
Let's assume you have metadata that describes the input data. For example let's use the metadata about the SASHELP.CARS. We can build our metadata from the existing DICTIONARY.COLUMNS metadata on the existing dataset. Let's set the INFORMAT to the FORMAT since that table does not have INFORMAT value assigned.
proc sql noprint ;
create table varlist as
select memname,varnum,name,type,length,format,format as informat,label
from dictionary.columns
where libname='SASHELP' and memname='CARS'
;
quit;
Now let's make a sample text file with the data in it.
filename mydata temp;
data _null_;
set sashelp.cars ;
file mydata dsd ;
put (_all_) (:);
run;
Now we just need to use the metadata to write a program that could read that data. All we really need to do is define the variables and then add a simple INPUT firstvar -- lastvar statement to read the data.
filename code temp;
data _null_;
set varlist end=eof ;
by varnum ;
file code ;
if _n_=1 then do ;
firstvar=name ;
retain firstvar ;
put 'data ' memname ';'
/ ' infile mydata dsd truncover lrecl=1000000;'
;
end;
put ' attrib ' name 'length=' #;
if type = 'char' then put '$'# ;
put length ;
if informat ne ' ' then put #10 informat= ;
if format ne ' ' then put #10 format= ;
if label ne ' ' then put #10 label= :$quote. ;
put ' ;' ;
if eof then do ;
put ' input ' firstvar '-- ' name ';' ;
put 'run;' ;
end;
run;
Now we can just run the generated code using %INCLUDE.
%include code / source2 ;