I'm new to SAS and am having issues with using Linear Regression.
I loaded a CSV file and then in Tasks and Utilities > Tasks > Statistics > Linear Regression I selected WORK.BP (BP = filename) for my data. When I try to select my dependent variable SAS says "No columns are available."
The CVS file appears to have loaded correctly and has 2 columns so I can't figure out what the issue is.
Thanks for the help.
This is the code I used for loading the file:
data BP;
infile '/folders/myfolders/BP.csv' dlm =',' firstobs=2;
input BP $Pressure$;
run;
And this is what the output looks like
By running your code. you import the .csv file with the 'PRESSURE' variable as character variable; in a linear regression model, you need to have all varaible as _numeric_.
In order to do this, I suggest to use the PROC IMPORT to import a .csf format file, instead of a DATA step with an INPUT statement.
In your case, you shold follows those steps:
Define the path where the .csv file is located:
%let path = the_folder_path_where_the_csv_file_is_located ;
Define the number of rows from which start your data (by including the labels/variables names in the count):
%let datarow = 2;
Import the .csv file, here named 'BP', as follows:
proc import datafile="&path.\BP.csv"
out=BP
dbms=csv
replace;
delimiter=",";
datarow=&datarow.;
getnames=YES;
run;
I assumed that the file you want as output has to be called BP too (you'll find it in the work library!) and that the delimiter is the colon.
Hope this helps!
Related
I am trying to download a file from SAS and import it to Hadoop.
Its a huge dataset - 6GB.
When I export the sas dataset to csv file and then import back to sas.(as I was facing few in issues in hadoop, I tried importing back to SAS and verify values). The import shows problems in the dataset in the same tool itself..
The column values are jumbled up.
Few columns have junk values, few columns are overlapped
How can I export the dataset in csv format with the column values intact.
filename output 'AAA.csv' encoding="utf-8";
Proc export data= input_data
outfile= output
dbms = CSV;
run;
Just a guess, but try removing any end of line characters that might exist in your character strings.
For example you could use a simple data step view to convert the strings on the fly. Here is one that replaces any CR or LF character with a pipe character.
data for_export / view=for_export;
set input_data;
array _c _character_;
do over _c;
_c = translate(_c,'||','0D0A'x);
end;
run;
proc export data=for_export outfile=output dbms=CSV;
run;
You might also watch out for backslash characters. Some readers try to interpret those as an escape character.
Use Case: I have 200+ SAS data sets whose names are in an Excel file. I want to write a SAS script which reads the names of the datasets (2 at a time) from Excel and run a proc compare for them.
I do not want to write 200+ proc compare statements. Is there a way to attain this?
P.S. I am an absolute beginner in SAS
I assumed the excel file has two columns, called Set1 and Set2, and that we want to compare the Set1 in a row with set2 in the same row. I've included a picture of my test data.
The code below provides a simple solution. Import the list of data sets, write code for the compare steps for each row in the imported list, run the code. You can of course easily add options or modifications to the comapare steps by adding them in the quotes in the put statement. Replace "yourpath" and "yourfile" in the macro call with your actual path and file name.
%macro compare (file=);
/* Import list*/
proc import dbms=xlsx file="&file" out=list replace;
run;
/* For each line in list, write code for a proc compare step to a temporary text file.*/
filename code temp;
data _null_;
set list;
file code;
put "proc compare data=" set1 " compare=" set2 "; run;";
run;
/* Run code*/
%include code;
%mend;
%compare(file=%str(yourpath\yourfile.xlsx))
One technique is to import the data set names data and use a data step that generates the step code and invokes it with call execute.
For example:
proc import ... out=namesOfDataSets;
...
data _null_;
set namesOfDataSets;
stepcode = 'Proc Compare base=' || dataset1 || ' compare=' || dataset2 || ';';
call execute (stepcode);
if _n_ = 3 then stop; * remove this line once you are sure the step code is ok and you want to do all 200 ;
run;
We can import an XLS file using namerow and startrow, like in this example :
%let dir_n=TheDir_name;
%let fichimp=file_name.xls;
PROC IMPORT DATAFILE= "&dir_n.\&file_name."
out=want
dbms=xls replace;
sheet=theSheet_name;
getnames=no;
namerow=2;
startrow=3;
run;
I have read : To import XLSX file, use RANGE if the data is not starting on the first line.
Is there similar option to STARTROW to import XLSX file starting from a specific row?
No, there is not. dbms=XLSX only has a limited set of options, listed in the documentation: GETNAMES, SHEET, and RANGE.
EXCEL has a few more options (including DBDSOPTS which opens up several database-type options), but still uses range to control what is read in.
I have just started learning SAS, and I'm using the following code to read xlsx files:
proc import out = data_lib.dataset_1
datafile = 'C:\data_folder\data_file_1.xlsx'
dbms = xlsx replace;
sheet = 'Sheet1';
getnames = yes;
run;
This has been working fine for me, but I'd like to supply the code with a list of filenames to read and a list of dataset names to create, so that the code need only appear once. I have looked at several instructional web pages about using macros, but I've been unable to translate that information into working code. Any help would be greatly appreciated. I'm using SAS 9.4, 64 bit.
I'd offer a modified version of kl78's suggestion, avoiding macros. Again, assuming you have the file names in a SAS data set, use a data step to read the list of file names and use call execute to run your proc import code for each file name.
data _null_;
set t_list;
call execute (
"proc import out = " || datasetname || "
datafile = '"|| filename ||"'
dbms = xlsx replace;
sheet = 'Sheet1';
getnames = yes;
run;");
run;
So, suppose you have your filenames and datanames in a table called t_list with variablename datasetname and filename, you could try something like this:
%macro readexcels;
data _null_;
set t_list (nobs=nobs);
call symputx(cat("libname_",_n_), datasetname);
call symputx(cat("filename_",_n_), filename);
if _n_=1 then
call symputx("nobs", nobs);
run;
%do i=1 %to &nobs;
proc import out = &&libname_&i;
datafile = "&&filename_&i"
dbms = xlsx replace;
sheet = 'Sheet1';
getnames = yes;
run;
%end;
%mend;
%readexcels;
In the datastep you read every entry of your table with datasetname and listname and create macrovariables with a numeric suffix. You only need to create a macrovariable for the number of entries once, so i did it when n = 1, you could also do this at eof.
Then you have a do loop, and with every loop you read the specific excel and write it in the specific dataset.
You need to write it like &&libname&i, because at first this resolves to &libname_1, and after this resolves to the variablevalue...
Situation: i have a workbook .xls with 4 spreadsheets named "SheetA", "SheetB", "SheetC", "SheetD".
For import one spreadsheet i do as following.
proc import
out = outputtableA
datafile = "C:\User\Desktop\excel.xls"
dbms = xls replace;
sheet = 'SheetA';
namerow = 3;
startrow = 5;
run;
All spreadsheet have same number of variables and format. I would like to combine all four outputtableX together using data step:
data combinedata;
set outputtableA outputtableB outputtableC outputtableD;
run;
I am new to SAS, i m thinking whether array and do-loop can help.
I would not use a do loop (as they're almost always overly complicated). Instead, I would make it data driven. I also would use Reese's solution if you can; but if you must use PROC IMPORT due to the namerow/datarow options, this works.
First, create the libname.
libname mylib excel "c:\blah\excelfile.xls";
We won't actually use it, if you prefer the xls options, but this lets us get the sheets.
proc sql;
select cats('%xlsimport(sheet=',substr(memname,1,length(memname)-1),')')
into :importlist separated by ' '
from dictionary.tables
where libname='MYLIB' and substr(memname,length(memname))='$';
quit;
libname mylib clear;
Now we've got a list of macro calls, one per sheet. (A sheet is a dataset but it has a '$' on the end.)
Now we need a macro. Good thing you wrote this already. Let's just substitute a few things in here.
%macro xlsimport(sheet=);
proc import
out = out&sheet.
datafile = "C:\User\Desktop\excel.xls"
dbms = xls replace;
sheet = "&sheet.";
namerow = 3;
startrow = 5;
run;
%mend xlsimport;
And now we call it.
&importlist.
I leave as an exercise for the viewers at home wrapping all of this in another macro that is able to run this given a filename as a macro parameter; once you have done so you have an entire macro that operates with little to no work to import an entire excel libname.
If you an xls file and are using a 32 bit version of SAS something like this would work:
libname inxls excel 'C:\User\Desktop\excel.xls';
proc datasets library=excel;
copy out=work;
run; quit;
libname inxls;
Then you can do your step above to append the files together. I'm not sure Proc Import with excel recognizes the option name row and start row so you may need to modify your code somehow to accommodate that, possibly using firstobs and then renaming the variables manually.
What you have will work assuming the variable names are the same. If they are not use the rename statement to make them all the same.
data combinedata;
set outputtableA(rename=(old_name1=new_name1 old_name2=new_name2 ... ))
outputtableB(...)
...
;
run;
Obviously, fill in the ellipses.