Importing file into SAS using data step - sas

I'm trying to import a sas dataset into SAS, using a data step. It is tab-delimited. Here is my code so far. When I go to run the data, it runs but then the output data looks all wrong. Is there a better way to do this?
data medicare;
infile '/folders/myfolders/sasuser.v94/medicare.sas' dlm='09'X;
input NPI NPPES_CREDENTIALS $ NPPES_PROVIDER_GENDER $ NPPES_ENTITY_CODE $ NPPES_PROVIDER_ZIP $ NPPES_PROVIDER_STATE $ PROVIDER_TYPE $ MEDICARE_PARTICIPATION_INDICATOR $ PLACE_OF_SERVICE $ HCPCS_CODE $ HCPCS_DRUG_INDICATOR $ LINE_SRVC_CNT BENE_UNIQUE_CNT BENE_DAY_SRVC_CNT AVERAGE_MEDICARE_ALLOWED_AMT STDEV_MEDICARE_ALLOWED_AMT AVERAGE_SUBMITTED_CHRG_AMT STDEV_SUBMITTED_CHRG_AMT AVERAGE_MEDICARE_PAYMENT_AMT sSTDEV_MEDICARE_PAYMENT_AMT;
run;

If you are looking for a beginner-friendly approach, you can first run proc import. SAS will guess the correct data types, lengths, etc. Here is an example:
filename imp "C:\Users\&sysuserid.\Documents\xxx.txt" encoding="cp1252" TERMSTR=CRLF;
proc import datafile=imp
out=imported_table
dbms=dlm
replace;
delimiter='09'x;
getnames=yes;
guessingrows = 1000000;
run;
Then, copy the data step code SAS prints into the log and update it (if needed).

Related

SAS says data contains no columns when using linear regression

I'm new to SAS and am having issues with using Linear Regression.
I loaded a CSV file and then in Tasks and Utilities > Tasks > Statistics > Linear Regression I selected WORK.BP (BP = filename) for my data. When I try to select my dependent variable SAS says "No columns are available."
The CVS file appears to have loaded correctly and has 2 columns so I can't figure out what the issue is.
Thanks for the help.
This is the code I used for loading the file:
data BP;
infile '/folders/myfolders/BP.csv' dlm =',' firstobs=2;
input BP $Pressure$;
run;
And this is what the output looks like
By running your code. you import the .csv file with the 'PRESSURE' variable as character variable; in a linear regression model, you need to have all varaible as _numeric_.
In order to do this, I suggest to use the PROC IMPORT to import a .csf format file, instead of a DATA step with an INPUT statement.
In your case, you shold follows those steps:
Define the path where the .csv file is located:
%let path = the_folder_path_where_the_csv_file_is_located ;
Define the number of rows from which start your data (by including the labels/variables names in the count):
%let datarow = 2;
Import the .csv file, here named 'BP', as follows:
proc import datafile="&path.\BP.csv"
out=BP
dbms=csv
replace;
delimiter=",";
datarow=&datarow.;
getnames=YES;
run;
I assumed that the file you want as output has to be called BP too (you'll find it in the work library!) and that the delimiter is the colon.
Hope this helps!

import csv file in SAS virtual machine

I new in SAS and I have big data around 3000 rows and 10 columns in CSV file and I want to import this to SAS but I have MAC and I use SAS in virtual machine how can I import it?
I try to copy it but does not work.
3000 rows isn't big! I can't comment on the specifics of your VM and file access configuration, but one easy way is to simply copy paste your CSV values into SAS Studio and read them in using the datalines statement, eg:
/* set up a temp fileref to hold your csv */
filename tmp temp;
/* read in the raw data using datalines, and write to fileref */
data _null_;
infile datalines ;
file tmp ;
input;
put _infile_;
datalines;
col1,col2,col3,col4
your,data,goes,here
see,how,it,works?
;
run;
/* import the csv any way you like */
proc import datafile=tmp out=work.want dbms=csv replace;
getnames=yes;
run;
A more efficient option would be to build the dataset direct from the datalines - I'll leave it to you to decide which is more convenient, but here's a head start:
data work.want;
infile datalines delimiter=',';
input col1 $ col2 $ col3 $ col4 $;
datalines;
your,data,goes,here
see,how,it,works?
;
run;

SAS Error: "No logical assign for filename SALARIES"

I am running SAS Studio 3.6 Basic Edition. I am a beginner at SAS and I can't get past this error that I've been having. I have the code below and the file is in the correct place. I created a folder under the "my folders" in the sidebar called "Exercises" and under that I created a folder called "data". It seems that it is not reading the file but I'm not sure why because the path is correct (to my knowledge).
Any ideas? I have already tried googling and most of the results with this error have to do with _WEBOUT which I don't believe is my problem.
DATA SALARIES;
INFILE '/Exercises/data/AAUP_data.txt';
INFILE SALARIES delimiter=',';
INPUT FICE College_Name $ State $ Type $ Average_Salary_Full
Average_Salary_Assoc Average_Salary_Asst Average_Salary_All
Average_Comp_Full Average_Comp_Assoc Average_Comp_Asst Average_Comp_All
Number_of_Professors_Full Number_of_Professors_Assoc
Number_of_Professors_Asst Number_of_Instructors Number_of_Faculty_All
;
RUN;
PROC PRINT;
RUN;
I appreciate the help.
Your second infile statement is meant to be used in conjunction with a filename statement.
If your SALARIES infile is meant to be the text file stored at '/Exercises/data/AAUP_data.txt', then there are two ways you could write this:
FILENAME SALARIES '/Exercises/data/AAUP_data.txt';
DATA SALARIES;
INFILE SALARIES delimiter=',';
INPUT FICE College_Name $ State $ Type $ Average_Salary_Full
Average_Salary_Assoc Average_Salary_Asst Average_Salary_All
Average_Comp_Full Average_Comp_Assoc Average_Comp_Asst Average_Comp_All
Number_of_Professors_Full Number_of_Professors_Assoc
Number_of_Professors_Asst Number_of_Instructors Number_of_Faculty_All
;
RUN;
PROC PRINT;
RUN;
or simply
DATA SALARIES;
INFILE '/Exercises/data/AAUP_data.txt' delimiter=',';
INPUT FICE College_Name $ State $ Type $ Average_Salary_Full
Average_Salary_Assoc Average_Salary_Asst Average_Salary_All
Average_Comp_Full Average_Comp_Assoc Average_Comp_Asst Average_Comp_All
Number_of_Professors_Full Number_of_Professors_Assoc
Number_of_Professors_Asst Number_of_Instructors Number_of_Faculty_All
;
RUN;
PROC PRINT;
RUN;

Reading n files into SAS to create n datasets

I have just started learning SAS, and I'm using the following code to read xlsx files:
proc import out = data_lib.dataset_1
datafile = 'C:\data_folder\data_file_1.xlsx'
dbms = xlsx replace;
sheet = 'Sheet1';
getnames = yes;
run;
This has been working fine for me, but I'd like to supply the code with a list of filenames to read and a list of dataset names to create, so that the code need only appear once. I have looked at several instructional web pages about using macros, but I've been unable to translate that information into working code. Any help would be greatly appreciated. I'm using SAS 9.4, 64 bit.
I'd offer a modified version of kl78's suggestion, avoiding macros. Again, assuming you have the file names in a SAS data set, use a data step to read the list of file names and use call execute to run your proc import code for each file name.
data _null_;
set t_list;
call execute (
"proc import out = " || datasetname || "
datafile = '"|| filename ||"'
dbms = xlsx replace;
sheet = 'Sheet1';
getnames = yes;
run;");
run;
So, suppose you have your filenames and datanames in a table called t_list with variablename datasetname and filename, you could try something like this:
%macro readexcels;
data _null_;
set t_list (nobs=nobs);
call symputx(cat("libname_",_n_), datasetname);
call symputx(cat("filename_",_n_), filename);
if _n_=1 then
call symputx("nobs", nobs);
run;
%do i=1 %to &nobs;
proc import out = &&libname_&i;
datafile = "&&filename_&i"
dbms = xlsx replace;
sheet = 'Sheet1';
getnames = yes;
run;
%end;
%mend;
%readexcels;
In the datastep you read every entry of your table with datasetname and listname and create macrovariables with a numeric suffix. You only need to create a macrovariable for the number of entries once, so i did it when n = 1, you could also do this at eof.
Then you have a do loop, and with every loop you read the specific excel and write it in the specific dataset.
You need to write it like &&libname&i, because at first this resolves to &libname_1, and after this resolves to the variablevalue...

Import text file into SAS

I'm importing a text file into SAS, using the code below :
proc import datafile="C:\Users\Desktop\data.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
However, I get error messages in the log and certain fields are populated with "." in place of the real data and I don't know what is the problem.
The error message is :
Invalid data for DIPL in line 26 75-76.
Invalid data for DIPL in line 28 75-76.
Invalid data for DIPL in line 31 75-76.
Invalid data for DIPL in line 34 75-76.
A sample of the data is available here http://m.uploadedit.com/b029/1392916373370.txt
Don't use PROC IMPORT in most cases for delimited files; you should use data step input. You can use PROC IMPORT to generate initial code (to your log), but most of the time you will want to make at least some changes. This sounds like one of those times.
data want;
infile "blah.dat" dlm=';' dsd lrecl=32767 missover;
informat
trans $1.
triris $1.
typc $6.
;
input
trans $
triris $
typc $
... rest of variables ...
;
run;
PROC IMPORT generates code just like this in your log, so you can use that as a starting point, and then correct things that are wrong (numeric instead of character, add variables if it has too few as the above apparently does, etc.).
I copied the text file from your link, and ran your code (without the apostrophe):
proc import datafile="C:\temp\test.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
And it worked fine despite the following:
Number of names found is less than number of variables found.
Result:
NOTE: WORK.INDIVS data set was successfully created.
NOTE: The data set WORK.INDIVS has 50 observations and 89 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.30 seconds
cpu time 0.26 seconds
If log has this "Number of names found is less than number of variables found."
then it creates new variables which have blank values.