%let dirname = C:\Users\data;
filename DIRLIST pipe 'dir/B &dirname\*.dbf';
/* Create a data set with one observation for each file name */
data dirlist;
length fname $8.;
infile dirlist length=reclen;
input fname $8.;
run;
data all_text (drop=fname);
set dirlist;
filepath = "&dirname\"||fname||".dbf";
infile dummy filevar = filepath length=reclen end=done missover;
do while(not done);
INPUT
F1 : 2.
F2 : 2.
F3 : 2.
F4 : 10.
F5 : 4.;
output;
end;
run;
The problem is that it is only reading the first line of each files and not the whole file before moving to the next. Also variable F1 are shown as missing.
Suggestions are welcome
So a standard proc import would be:
proc import out=sample1 datafile="path to dbf file.dbf" dbms=DBF replace;
run;
The problem now, is how to generate this set of code for every file in your file list. Using the CALL EXECUTE statement from #Tom is your best bet. You call also create a small macro and call it for each filename, using CALL EXECUTE. If you're new to SAS this can be easier to understand.
*Create a macro that imports the DBF
%macro import_dbf(input= , output=);
proc import out=&out datafile="&output" dbms=DBF replace;
run;
%mend;
Then call macro from dataset. I'm naming the datasets DBF001, DBF0002 etc.
%let dirname=C:\_localdata;
data dirlist;
informat fname $20.;
input fname;
cards;
data1.dbf
data2.dbf
data3.dbf
data4.dbf
;
run;
data out;
set dirlist;
str=catt('%import_dbf(input="', "&dirname", '\', fname, '", output=dbf',
put(_n_, z4.), ');');
run;
proc print data=out;
run;
Import them one by one and then combine them.
%let dirname = C:\Users\data;
data filelist ;
infile "dir /b &dirname\*.dbf" pipe truncover end=eof;
fileno + 1;
input fname $256. ;
tempname = 'temp'||put(fileno,z4.);
call execute(catx(' ','proc import replace dbms=dbf'
,'out=',tempname,'datafile=',quote(trim(fname)),';run;'
));
if eof then call symputx('lastname',tempname);
run;
data want ;
set temp0001-&lastname;
run;
Related
I have a SAS dataset where I keep 50 diagnoses codes and 50 diagnoses descriptions.
It looks something like this:
data diags;
set diag_list;
keep claim_id diagcode1-diagcode50 diagdesc1-diagdesc50;
run;
I need to print all of the variables but I need diagnosis description right next to corresponding diagnosis code. Something like this:
proc print data=diags;
var claim_id diagcode1 diagdesc1 diagcode2 diagdesc2 diagcode3 diagdesc3; *(and so on all the way to 50);
run;
Is there a way to do this (possibly using arrays) without having to type it all up?
Here's one approach then, using Macros. If you have other variables make sure to include them BEFORE the %loop_names(n=50) portion in the VAR statement.
*generate fake data to test/run solution;
data demo;
array diag(50);
array diagdesc(50);
do claim_id=1 to 100;
do i=1 to 50;
diag(i)=rand('normal');
diagdesc(i)=rand('uniform');
end;
output;
end;
run;
%macro loop_names(n=);
%do i=1 %to &n;
diag&i diagdesc&i.
%end;
%mend;
proc print data=demo;
var claim_ID %loop_names(n=20);
run;
Here is some example SAS code that uses actual ICD 10 CM codes and their descriptions and #Reeza proc print:
%* Copy government provided Medicare code data zip file to local computer;
filename cms_cm url 'https://www.cms.gov/Medicare/Coding/ICD10/Downloads/2020-ICD-10-CM-Codes.zip' recfm=s;
filename zip_cm "%sysfunc(pathname(work))/2020-ICD-10-CM-Codes.zip" lrecl=200000000 recfm=n ;
%let rc = %sysfunc(fcopy(cms_cm, zip_cm));
%put %sysfunc(sysmsg());
%* Define fileref to the zip file member that contains ICD 10 CM codes and descriptions;
filename cm_codes zip "%sysfunc(pathname(zip_cm))" member="2020 Code Descriptions/icd10cm_codes_2020.txt";
%* input the codes and descriptions, there are 72,184 of them;
%* I cheated and looked at the data (more than once) in order
%* to determine the variable sizes needed;
data icd10cm_2020;
infile cm_codes lrecl=250 truncover;
attrib
code length=$7
desc length=$230
;
input
code 1-7 desc 9-230;
;
run;
* simulate claims sample data with mostly upto 8 diagnoses, and
* at least one claim with 50 diagnoses;
data have;
call streaminit(123);
do claim_id = 1 to 10;
array codes(50) $7 code1-code50;
array descs(50) $230 desc1-desc50;
call missing(of code:, of desc:);
if mod(claim_id, 10) = 0
then top = 50;
else top = rand('uniform', 8);
do _n_ = 1 to top;
p = ceil(rand('uniform', n)); %* pick a random diagnosis code, 1 of 72,184;
set icd10cm_2020 nobs=n point=p; %* read the data for that random code;
codes(_n_) = code;
descs(_n_) = desc;
end;
output;
end;
stop;
drop top;
run;
%macro loop_names(n=);
%do i=1 %to &n;
code&i desc&i.
%end;
%mend;
ods _all_ close;
ods html;
proc print data=have;
var claim_id %loop_names(n=50);
run;
Is there a general purpose way of concatenating each variable in an observation into one larger variable whilst preserving the format of numeric/currency fields in terms of how it looks when you do a proc print on the dataset. (see sashelp.shoes for example)
Here is some code you can run, as you can see when looking at the log, using the catx function to produce a comma separated output removes both the $ currency sign as well as the period from the numeric variables
proc print data=sashelp.shoes (obs=10);
run;
proc sql;
select name into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
Any solution needs to be general purpose as it will run on disparate datasets with differently formatted variables.
You can manually loop over PDV variables of the data set, concatenating each formatted value retrieved with vvaluex. A hash can be used to track which variables of the data set to process. If you are comma separating values you will probably want to double quote formatted values that contain a comma.
data want;
set sashelp.cars indsname=_data;
if _n_ = 1 then do;
declare hash vars();
length _varnum 8 _varname $32;
vars.defineKey('_n_');
vars.defineData('_varname');
vars.defineDone();
_dsid = open(_data);
do _n_ = 1 to attrn(_dsid,'NVAR');
rc = vars.add(key:_n_,data:varname(_dsid,_n_));
end;
_dsid = close(_dsid);
call missing (of _:);
end;
format weight comma7.;
length allcat $32000 _vvx $32000;
do _n_ = 1 to vars.NUM_ITEMS;
vars.find();
_vvx = strip(vvaluex(_varname));
if index(_vvx,",") then _vvx = quote(strip(_vvx));
if _n_ = 1
then allcat = _vvx;
else allcat = cats(allcat,',',_vvx);
end;
drop _:;
run;
You can use import and export to csv file:
filename tem temp;
proc export data=sashelp.SHOES file=tem dbms=csv replace;
run;
data l;
length all $ 200;
infile tem truncover firstobs=2;
input all 1-200;
run;
P.S.
If you need concatenate only char, uou can create array of all CHARACTER columns in dataset, and just iterate thru:
data l;
length all $ 5000;
set sashelp.SHOES;
array ch [*] _CHARACTER_;
do i = 1 to dim(ch);
all=catx(',',all,ch[i]);
end;
run;
The PUT statement is the easiest way to do that. You don't need to know the variables names as you can use the _all_ variable list.
put (_all_) (+0);
It will honor the formats attached the variables and if you have used DSD option on the FILE statement then the result is a delimited list.
What is the ultimate goal of this exercise? If you want to create a file you can just write the file directly.
data _null_;
set sashelp.shoes(obs=3);
file 'myfile.csv' dsd ;
put (_all_) (+0);
run;
If you really do want to get that string into a dataset variable there is no need to invent some new function. Just take advantage of the PUT statements abilities by creating a file and then reading the lines from the file.
filename junk temp;
data _null_;
set sashelp.shoes(obs=3);
file junk dsd ;
put (_all_) (+0);
run;
data stuff ;
set sashelp.shoes(obs=3);
infile junk truncover ;
input all $5000.;
run;
You can even do it without creating the full text file. Instead just write one line at a time and save the line into a variable using the _FILE_ automatic variable.
filename junk temp;
data stuff;
set sashelp.shoes(obs=3);
file junk dsd lrecl=5000 ;
length all $5000;
put #1 (_all_) (+0) +(-2) ' ' #;
all = _file_;
output;
all=' ';
put #1 all $5000. #;
run;
Solution with vvalue and concat function (||):
It is similar with 'solution without catx' (the last one), but it is simplified by vvalue function instead put.
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for concat function (||)*/
proc sql;
select ('strip(vvalue('|| strip(name) ||'))') into :varstr4 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step to concat all variables*/
data stuff2;
format all $5000.;
set work.wocatx ;
all = &varstr4. ;
put all;
run;
Solution with catx:
proc print data=SASHELP.SHOES;
run;
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
If there isn't in dictionary.columns format, then in macro variable varstr2 will just name, if there is format, then when it would call in catx it will convert in format, that you need, for example,if variable is num type then put(Sales,DOLLAR12.), or if it char type then input function . You could add any conditions in select into if you need.
If there is no need of using of input function just change select:
ifc(strip(format) is missing,strip(name),'put('|| strip(name) ||','|| strip(format) ||')')
Solution without catx:
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for catx*/
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*data step with catx*/
data stuff;
format all $5000.;
set work.wocatx ;
all = catx(',',&varstr2.) ;
put all;
run;
/*Macro variable for concat function (||)*/
proc sql;
select ifc(strip(format) is missing,
'strip(' || strip(name) || ')',
'strip(put('|| strip(name) ||','|| strip(format) ||'))') into :varstr3 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step without catx*/
data stuff1;
format all $5000.;
set work.wocatx ;
all = &varstr3. ;
put all;
run;
Result with catx and missing values:
Result without catx and with missing values:
I am trying to parse a delimited dataset with over 300 fields. Instead of listing all the input fields like
data test;
infile "delimited_filename.txt"
DSD delimiter="|" lrecl=32767 STOPOVER;
input field_A:$200.
field_B :$200.
field_C:$200.
/*continues on */
;
I am thinking I can dump all the field names into a file, read in as a sas dataset, and populate the input fields - this also gives me the dynamic control if any of the field names changes (add/remove) in the dataset. What would be some good ways to accomplish this?
Thank you very much - I just started sas, still trying to wrap my head around it.
This worked for me - Basically "write" data open code using macro language and run it.
Note: my indata_header_file contains 5 columns: Variable_Name, Variable_Length, Variable_Type, Variable_Label, and Notes.
%macro ReadDsFromFile(filename_to_process, indata_header_file, out_dsname);
%local filename_to_process indata_header_file out_dsname;
/* This macro var contain code to read data file*/
%local read_code input_in_line;
%put *** Processing file: &filename_to_process ...;
/* Read in the header file */
proc import OUT = ds_header
DATAFILE = &indata_header_file.
DBMS = EXCEL REPLACE; /* REPLACE flag */
SHEET = "Names";
GETNAMES = YES;
MIXED = NO;
SCANTEXT = YES;
run;
%let id = %sysfunc(open(ds_header));
%let NOBS = %sysfunc(attrn(&id.,NOBS));
%syscall set(id);
/*
Generates:
data &out_dsname.;
infile "&filename_to_process."
DSD delimiter="|" lrecl=32767 STOPOVER FIRSTOBS=3;
input
'7C'x
*/
%let read_code = data &out_dsname. %str(;)
infile &filename_to_process.
DSD delimiter=%str("|") lrecl=32767 STOPOVER %str(;)
input ;
/*
Generates:
<field_name> : $<field_length>;
*/
%do i = 1 %to &NObs;
%let rc = %sysfunc(fetchobs(&id., &i));
%let VAR_NAME = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Name)) ));
%let VAR_LENGTH = %sysfunc(getvarn(&id., %sysfunc(varnum(&id., Variable_Length)) ));
%let VAR_TYPE = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Type)) ));
%let VAR_LABEL = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Label)) ));
%let VAR_NOTES = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Notes)) ));
%if %upcase(%trim(&VAR_TYPE.)) eq CHAR %then
%let input_in_line = &VAR_NAME :$&VAR_LENGTH..;
%else
%let input_in_line = &VAR_NAME :&VAR_LENGTH.;
/* append in_line statment to main macro var*/
%let read_code = &read_code. &input_in_line. ;
%end;
/* Close the fid */
%let rc = %sysfunc(close(&id));
%let read_code = &read_code. %str(;) run %str(;) ;
/* Run the generated code*/
&read_code.
%mend ReadDsFromFile;
Sounds like you want to generate code based on metadata. A data step is actually a lot easier to code and debug than a macro.
Let's assume you have metadata that describes the input data. For example let's use the metadata about the SASHELP.CARS. We can build our metadata from the existing DICTIONARY.COLUMNS metadata on the existing dataset. Let's set the INFORMAT to the FORMAT since that table does not have INFORMAT value assigned.
proc sql noprint ;
create table varlist as
select memname,varnum,name,type,length,format,format as informat,label
from dictionary.columns
where libname='SASHELP' and memname='CARS'
;
quit;
Now let's make a sample text file with the data in it.
filename mydata temp;
data _null_;
set sashelp.cars ;
file mydata dsd ;
put (_all_) (:);
run;
Now we just need to use the metadata to write a program that could read that data. All we really need to do is define the variables and then add a simple INPUT firstvar -- lastvar statement to read the data.
filename code temp;
data _null_;
set varlist end=eof ;
by varnum ;
file code ;
if _n_=1 then do ;
firstvar=name ;
retain firstvar ;
put 'data ' memname ';'
/ ' infile mydata dsd truncover lrecl=1000000;'
;
end;
put ' attrib ' name 'length=' #;
if type = 'char' then put '$'# ;
put length ;
if informat ne ' ' then put #10 informat= ;
if format ne ' ' then put #10 format= ;
if label ne ' ' then put #10 label= :$quote. ;
put ' ;' ;
if eof then do ;
put ' input ' firstvar '-- ' name ';' ;
put 'run;' ;
end;
run;
Now we can just run the generated code using %INCLUDE.
%include code / source2 ;
This problem might be trivial but I got stuck.
My problem is I've to go to each folder for a dataset and transpose the data.
I wrote the following code and it works fine.
OPTIONS MPRINT MLOGIC SYMBOLGEN;
%LET LOC=E:\folder;
%macro test1(k,l);
libname libary "&loc.\&k\&l.";
data dataset_&l.;
set libary.dataset_original;
run;
proc transpose data=dataset_&l. out=dataset_&l._T;
run;
%mend;
%test1(var_1,var'_1);
%test1(var_2,var'_2);
%test1(var_3,var'_3);
The issue with this code is it's not dynamic in terms of folder structure. E.g. if there's another 4 extra folders, I've to write "%test1(var_3,var'_3);"4 times.
So I tried writing the following code to make it more dynamic. But unfortunately it's not working. Can anybody please tell me where I'm making the mistake.
OPTIONS MPRINT MLOGIC SYMBOLGEN;
%LET LOC=E:\folder;
%let k=var_1 var_2 var_3;
%let l=var'_1 var'_2 var'_3;
%macro words(string);
%local count word;
%let count=1;
/* The third argument of the %QSCAN function specifies the delimiter */
%let word=%qscan(&string,&count,%str( ));
%do %while(&word ne);
%let count=%eval(&count+1);
%let word=%qscan(&string,&count,%str( ));
%end;
%eval(&count-1)
%mend words;
%macro test1(k,l);
libname libary "&loc.\&k\&l.";
data dataset_&l.;
set libary.dataset_original;
run;
proc transpose data=dataset_&l. out=dataset_&l._T;
run;
%mend;
%macro test();
%do i=1 %to %words(&k.);
%do j=1 %to %words(&l.);
%let var=%scan(&k.,&i.,str());
%let var1=%scan(&l.,&j.,str());
%test1(&var.,&var1.);
%end;
%end;
%mend;
%test();
Thanks!
Try this:
/* Set your base directory */
%let base = E:\Folder;
/* Pipe output from dir */
filename flist pipe "dir /s /b /a:-h &base";
/* Read files from pipe */
data files;
length file dir $ 200 name $ 50 ext $ 10;
infile flist;
input #1 file $ &;
/* File extension */
ext = scan(file, -1, ".");
/* File name */
name = scan(scan(file, -1, "\"), 1, ".");
/* Directory */
rfile = reverse(file);
dir = reverse(substr(rfile, index(rfile, "\") + 1));
/* Select only SAS datasets */
if upcase(ext) = "SAS7BDAT" then output;
drop rfile;
run;
/* Define a macro to process each file */
%macro trans_file(dir, name);
libname d "&dir";
proc transpose data = d.&name out = d.&name._t;
run;
libname d clear;
%mend trans_file;
/* Run on all files */
data _null_;
set files;
call execute(cats('%trans_file(', dir, ",", name, ");"));
run;
This gets the file list by submitting the Windows command dir. It gets all files in the specified directory and its subdirectories.
This approach then uses a simple macro that defines a data library, reads a dataset from the library, writes a transposed dataset to it, then clears it. The macro is called for each file in the list using call execute.
Simple question.
PROC IMPORT OUT= braw.address
DATAFILE= "&path.\address_data.csv"
DBMS=csv REPLACE;
GETNAMES=YES;
RUN;
This statement will create the dataset columns as character or numeric depending on the values, which is smart, but not what I want.
I want to import them all as character, to make for easier regex evaluation.
Is there a simple way to do this?
I would generally just write my own input statement for the CSV, then you can make them whatever you want.
IE:
data braw.address;
infile "&path.\address_data.csv" dlm=',' dsd missover;
input
field1 $
field2 $
....
;
run;
You can use the log from the PROC IMPORT to generate this the first time and just edit it to contain $ for each variable.
If you do not want to write a SAS macro to read all the columns as character, you could try a "cheat". Manually edit the file and duplicate the first row (the one containing column headers. Since those will most likely all be character strings, SAS should import all the columns as character.
Of course, a macro to do this would not be that difficult. You can try something like this:
%macro readme(dsn,fn);
/* Macro to read all columns of a CSV as character */
/* Parameters: */
/* DSN - The name of the SAS data set to create */
/* FN - The external file to read (quoted) */
/* Example: */
/* %readme(want, 'c:\temp\tempfile.csv'); */
data _null_;
infile &fn;
input;
i = 1;
length headers inputstr $200;
headers = compress(_infile_,"'");
newvar = scan(headers,1,',');
do until (newvar = ' ');
inputstr = trim(inputstr) || ' ' || trim(newvar) || ' $';
i + 1;
newvar = scan(headers,i,',');
end;
call symput('inputstr',inputstr);
stop;
run;
data &dsn;
infile &fn firstobs=2 dsd dlm=',' truncover;
input &inputstr.;
run;
%mend;
%readme(want, 'c:\temp\tempfile.csv');
Here is my macro to read dlm file with all vars as char:
%MACRO ImportText(file,dsn,dlm);
* Read data use proc import to get variable name and length;
PROC IMPORT DATAFILE="&file" OUT=temp DBMS=dlm REPLACE;
DELIMITER = &dlm;
GETNAMES = YES;
GUESSINGROWS = 32767;
RUN;
* Put variable names into macro variable;
PROC CONTENTS DATA=temp out=vars NOPRINT; RUN;
PROC SQL NOPRINT;
SELECT CATT(name,' : $',length,'.') INTO :vars SEPARATED BY ' ' FROM vars ORDER BY varnum;
QUIT;
* Read real data;
DATA &dsn;
INFILE "&file" DELIMITER=&dlm MISSOVER DSD FIRSTOBS=2 LRECL=32767;
INPUT &vars;
RUN;
%MEND;