SAS Filename exceeds 32 characters - sas

I have multiple files in .cpt format.
These filenames exceed 32 characters and cannot be imported into sas using data step.
# ERROR
Filename file "path/&FILENAME..cpt " ;
PROC CIMPORT INFILE = file LIBRARY = WORK ;
RUN;
data test;
set &FILENAME.;
run;
I tried proc sql but getting stuck.
%macro test(FILENAME= );
Filename file "path/&FILENAME..cpt " ;
PROC CIMPORT INFILE = file LIBRARY = WORK ;
RUN;
proc sql;
create table test as
(
select * from &FILENAME.
);
quit;
%MEND;
%test(FILENAME = filename_exceeds_thirtytwo_characters_1);
ERROR: Member name "filename_exceeds_thirtytwo_characters_1" exceeds 32 characters
Any help?

The name of the file used to store the CPORT file does not necessarily have anything to do with the name(s) of any datasets that might be contained in the CPORT file.
To see what members (if any) are in the CPORT file check the notes generated by your PROC CIMPORT step.
Since you told PROC CIMPORT to write the datasets to the WORK library your DATA or PROC SQL steps are not needed.
Also do not append an extra space on the end of the filename.
PROC CIMPORT INFILE = "/somepath/&FILENAME..cpt" LIBRARY = WORK ;
RUN;

Add the log information that Proc CIMPORT creates.
Your problem is the from clause in the Proc SQL select statement
select * from &FILENAME.
You are using the long filename as the table to select from. SAS table names (also known as library member names) are limited to 32 characters. So there is your error.
After you Proc CIMPORT the data sets to WORK, what are the table names ? If you look in the log the procedure will list the members it imported.
Are any of the members listed the same name as the filename ? Probably not.
Do any of the members listed have sequential suffixes and clipped at <32 character prefixes ? Extreme edge case, maybe. Some SAS procedures have a best-try naming algorithm when it has to create multiple >32 character member names in the same library.

Related

SAS PROC IMPORT Multiple SAV Files- Force SPSS Value Labels to Create UNIQUE SAS Format Names

Sometimes if I import multiple SAV files into the SAS work library, one variable imported later on overwrites the display text (i.e., the format) of an earlier imported variable with a similar name.
I've determined that this is because the later dataset's variable produces a format name for the custom format (from SPSS Values Labels) that is identical to format name from the earlier variable, even though the variables have different definitions in the Value Labels attributes in the SAV files.
Is there a way to force SAS to not re-use the same format names by automatically checking at PROC IMPORT whether a format name already exists in the work library format library before auto-naming a new custom format? Or is there any other way of preventing this from happening?
Here is my code as well as an example of the variable names, format names, etc.
proc import out=Dataset1 datafile="S:\folder\Dataset1.SAV"
dbms=SAV replace;
run;
proc import out=DatasetA datafile="S:\folder\DatasetA.SAV"
dbms=SAV replace;
run;
Dataset1 contains variable Question_1. The original SPSS Values Labels are 1=Yes 2=No. When this dataset is imported, SAS automatically generates the Format Name QUESTION., for Question_1. When only Dataset1 is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_1 in Dataset1.SAV
DatasetA contains variable Question_A with SPSS Value Labels 1=Agree 2=Unsure 3=Disagree. When this dataset is imported after Dataset1, SAS automatically generates the Format Name QUESTION. for Question_A, even though the work library already contains a format named QUESTION.. Therefore, this overwrites the definition of format QUESTION. that was generated when Dataset1 was imported. Once DatasetA is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_A in DatasetA.SAV
Therefore, when Dataset1 and DatasetA are both imported, Variable Question_1 and Question_A both have the format name QUESTION assigned to them - And the definition of the format QUESTION. in the SAS work folder corresponds to the SPSS Value Labels in DatasetA.SAV, not Dataset1.SAV. Therefore, Question_1 will display as 1=Agree 2=Unsure, even though the variable values actually mean 1=Yes 2=No.
I would ideally like for these two variables to produce distinct custom format names at their import step, automatically. Is there any way to make this happen? Alternatively, is there any other way that prevent this type of overwriting from occurring?
Thank you.
The way to prevent literal overwriting is to point to a different format catalog for each SPSS file that is being read using the FMTLIB= optional statement.
proc import out=dataset1 replace
datafile="S:\folder\Dataset1.SAV" dbms=SAV
;
fmtlib=work.fmtcat1;
run;
proc import out=dataset2 replace
datafile="S:\folder\Dataset2.SAV" dbms=SAV
;
fmtlib=work.fmtcat2;
run;
You can then work later to rename the conflicting formats (and change the attached format in the dataset to use the new name).
So if the member name and format name are short enough you should be able to generate a unique new name by appending the two (add something in between to avoid conflict). So something like this will rename the formats, change the format name attached to the variables and rebuild the formats into the WORK.FORMATS catalog.
%macro sav_import(file,memname);
%if 0=%length(&memname) %then %let memname=%scan(&file,-2,\./);
proc import datafile=%sysfunc(quote(&file)) dbms=save
out=&memname replace
;
fmtlib=work.&memname ;
run;
proc format lib=work.&memname cntlout=formats;
run;
data formats ;
set formats end=eof;
by fmtname type notsorted;
oldname=fmtname;
fmtname=catx('_',"&memname",oldname);
run;
proc contents data=&memname noprint out=contents;
run;
proc sql noprint;
select distinct catx(' ',c.name,cats(f.fmtname,'.'))
into :fmtlist separated by ' '
from contents c inner join formats f
on c.format = f.oldname
;
quit;
proc datasets nolist lib=work;
modify &memname;
format &fmtlist ;
run;
quit;
proc format lib=work.formats cntlin=formats;
run;
%mend sav_import;
%sav_import(S:\folder\Dataset1.SAV);
%sav_import(S:\folder\Dataset2.SAV);

Proc contents looping through table names from a different data set

I am a newbie to SAS and I am trying to execute below code to obtain all the information for a particular library. However it fails in between due to data in a particular dataset. Is there any way to read dataset names from a different dataset and loop through them creating a different dataset specific to each datasetname from the list?
Proc contents data= testlib. _ALL_ out=x;
Run;
Instead I want something like this
Proc contents data in (work. Tbnames) out = x;
Run;
And read data from below data set.
Data tbnames(keep tablename) ;
Set WORK. tablenames;
Run;
Please help
St
Proc contents data = work.Tbnames out = x;
Run;
Use Proc COPY to copy data sets from one library to another.
libname testlib '<os-path-to-folder>';
proc copy in=testlib out=work memtype=DATA;
run;
Read the data from dictionary.table instead.
This assumes that you have the list of tables in a data set called tableNames and it has a variable called tName, which is the variable name. Note that it is a case sensitive comparison so UPCASE() is used make it all upper case.
proc sql;
create table summary as
select *
from dictionary.table
where memname in (select upcase(tName) from tableNames);
quit;
Or look at PROC DATASETS which operates on a library, not a single data set.
proc datasets lib=myLib;
run;quit;

SAS - Input all variables in a data step without naming every variable

How does one input all variables/columns within a data step using INPUT but without naming every variable? This can be done by naming each variable, for example:
DATA dataset;
INFILE '/folders/myfolders/file.txt';
INPUT variable1 variable2 variable3 variable4 $ variable5;
RUN;
However, this is very tedious for large datasets containing 200+ variables.
The original question implied that you already had a SAS data set. In that case all variables are automatically included when you SET the dataset.
data copy ;
set '/folders/myfolders/file.sas7bdat';
run;
Or just reference it in the analysis you want to do.
proc means data='/folders/myfolders/file.sas7bdat';
run;
If you actually have a TEXT file and you want to read it into a SAS dataset you could use PROC IMPORT to guess what is in the file. If it has a header row then proc import will try to convert those into valid variable names. It will also try to guess how to define the variables based on what values it sees in the text file.
proc import out=want datafile='/folders/myfolders/file.txt' dbm=dlm ;
delimiter=',';
run;
Or if the issue that it is too hard to create 200 unique variable names you could just use a variable list with numeric suffixes to save a lot of typing.
DATA dataset;
INFILE '/folders/myfolders/file.txt' dsd ;
length var1-var200 $20 ;
input var1-var200 ;
RUN;

Assigning Headers from one file to multiple data files

I have a list of ~100 files. The first file contains header information for the other 98 data files. The information should be in table format, however each table is a different size (with regards to column and row number).
My goal is to import these files such that the column headers from the first file are correctly assigned.
Additional information:
I am told this list of files was generated using SAS (however I am not familiar with the file format) Furthermore, the "CIMPORT" command does not work on these files.
The files are "|" delineated
Thank you very much for any help.
This was a fun issue. I came up with following way:
First lets load up some data.
proc import datafile = "\\Datadrive\mydata.csv"
out=w_headers;
delimiter=";";
guessingrows=32767;
run;
proc import datafile = "\\Datadrive\no_headers.csv"
out=no_headers;
delimiter=";";
guessingrows=32767;
run;
Then I extract the names of the columns and variable number to a dataset.
proc contents data=w_headers out=meta(keep=NAME VARNUM) noprint ; run ;
Then I create commands to renaming the columns without names to have proper names based on the existing. ones.
data meta;
set meta;
cmd = cats('VAR',VARNUM,'=', name);
run;
Here comes the kicker, I put the the commends to a variable. Next the variable is fed to proc datasets for renaming the columns.
proc sql noprint;
select cmd into :cmd_list separated by ' ' from meta;
quit;
proc datasets library = work nolist;
modify no_headers;
rename &cmd_list;
quit;
At this point my two datasets have identical column names. the method is a bit tricky, but works. I'm sure there is another way, but this was fun one. :)

Stop sas macro from overwriting different imported csv files as the same sas dataset

I found a macro and have been using it to import datasets that are given to me in csv format. Now I need to edit it because I have datasets that have an id number in them and I want sas datasets with the same name.
THE csvs are named things like IDSTUDY233_first.csv So I want the sas dataset to be IDSTUDY233_first. It should appear in my work folder.
I thought it would just create a sas dataset for each csv named IDSTUDY233_first or something like that. (and so on and so forth for each additional study). However it's naming this way.
IDSTUDY_FIRST
and over rights itself for every ID. I am newer to macros and have been trying to figure out WHY it does this and how to fix it. Suggestions?
%let subdir=Y:\filepath\; *MACRO VARIABLE FOR FILEPATH;
filename dir "&subdir.*.csv "; *give the file the name from the path that your at whatever the csv is named;
data new; *create the dataset new it has all those filepath names csv names;
length filename fname $ 200;
infile dir eof=last filename=fname;
input ;
last: filename=fname;
run;
proc sort data=new nodupkey; *sort but don't keep duplicate files;
by filename;
run;
data null; *create the dataset null;
set new;
call symputx(cats('filename',_n_),filename); *call the file name for this observation n;
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka')); *call the dataset for this file compress then read the file;
call symputx('nobs',_n_); *call for the number of observations;
run;
%put &nobs.; *but each observation in;
%macro import; *start the macro import;
%do i=1 %to &nobs; *Do for each fie to number of observations;
proc import datafile="&&filename&i" out=&&dsn&i dbms=csv replace;
getnames=yes;
run;
%end;
%mend import;
%import
*call import macro;
As you can see I added my comments of my understanding. Like I said macros are new to me. I may be incorrect in my understanding. I am guessing the problem is either in
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));
or it is in the import statement probably out=&&dsn&i since it rapidly over writes the previous SAS files until it does every one. It's just I need all the sas files not just the last 1.
My guess is that you are right, it is to do with this line:
call symputx(cats('dsn',_n_),compress(scan(filename,-2,'\.'), ,'ka'));
The gotcha is in the arguments passed to compress. Compress can be used to remove or keep certain characters in a string. In the above example, they are using it to just keep alphabetic characters by passing in the 'ka' modifiers. This is effectively causing files with different names (because they have different numbers) to be treated as the same file.
You can modify this behaviour to keep alphabetic characters, digits, and the underscore character by changing the parameters from ka to kn.
This change does mean that you also need to make sure that none of your file names begin with a number (as SAS datasets can't begin with a number).
The documentation for the compress function is here:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm
An easy way to debug this would be to take the dataset with all of the call symput statements, and in addition to storing these values in macro variables, write them to variables in the dataset. Also change it from a data _null_ to a data tmp statement. You can then see for each file what the destination table name will be.