I was given a list of files that need to be imported into SAS, however I am struggling to import them correctly. The files are formatted as such:
There is one "Header File" that contains a few lines of metadata followed by:
RECORD 1
Header column 1
Header column 2
Header column 3
Header column 4
Record 2
Header column 1
Header column 2
Header column 3
Header column 4
Header column 5
Header column 6
.
.
.
RECORD 3
.
.
.
And then "data files" which contain no meta data (that I am aware of) and are simply column ("|") delineated.
I was told these files were generated using SAS and I believed them to be a library, however:
Proc CIMPORT data="C..."
did not work.
I can import them individually using
Proc Import data="";
DBMS=DLM;
Run;
I asked this question earlier to no avail, I included more information this time. I feel like this is something that is really easy that I am just missing somehow. Thank you very much in advance.
You can use PROC IMPORT to read the pipe delimited files. Use the getnames=no; statement to tell it to generate its own names. You can then use the metadata from the first file to generate RENAME statements to change the names.
PROC CIMPORT is for reading transport files generated by PROC CPORT.
For a more complete example with code please post some actual examples of the data files, especially the one with the metadata. If the metadata is complete then you could probably skip the PROC IMPORT and just use the metadata to directly write data steps to read the data files.
Related
I have a macro that I use to import Excel files from a Windows directory to SAS (version 9.3) on a Linux server. In general the macro has worked fine, but now I'm trying to import an Excel file with a column that contains mostly numeric data with some character records thrown in.
The variable looks something like this:
Var2
1111111
2222222
3333333
4444444
Multiple
5555555
H6666-01
The variable is getting read in as numeric so I'm losing the data in the fifth and seventh records. I've tried a few of the suggestions listed in this answer, but nothing seems to change the variable type.
Here's a portion of the macro I have:
proc import replace
out=&d_set
dbms=excelcs
file="\\path\to\file\&xlsx_nm";
sheet="&sheet_nm";
server="Server";
port=0000;
serveruser="&sysget_USER";
serverpassword="&pw";
range="&rng";
DBDSOPTS = "DBTYPE=(Var2='CHAR(8)')";
run;
I just added the statement DBDSOPTS = "DBTYPE=(Var2='CHAR(8)')"; based on the suggestion on the link above, but the output in the log did not change.
I have also tried padding the original Excel file with a "dummy" record (which I'd like to avoid) with character data in the column that I'm having issues with, but this also did not work.
I'd like to solve this in the import procedure but I'm open to other suggestions.
I have a list of ~100 files. The first file contains header information for the other 98 data files. The information should be in table format, however each table is a different size (with regards to column and row number).
My goal is to import these files such that the column headers from the first file are correctly assigned.
Additional information:
I am told this list of files was generated using SAS (however I am not familiar with the file format) Furthermore, the "CIMPORT" command does not work on these files.
The files are "|" delineated
Thank you very much for any help.
This was a fun issue. I came up with following way:
First lets load up some data.
proc import datafile = "\\Datadrive\mydata.csv"
out=w_headers;
delimiter=";";
guessingrows=32767;
run;
proc import datafile = "\\Datadrive\no_headers.csv"
out=no_headers;
delimiter=";";
guessingrows=32767;
run;
Then I extract the names of the columns and variable number to a dataset.
proc contents data=w_headers out=meta(keep=NAME VARNUM) noprint ; run ;
Then I create commands to renaming the columns without names to have proper names based on the existing. ones.
data meta;
set meta;
cmd = cats('VAR',VARNUM,'=', name);
run;
Here comes the kicker, I put the the commends to a variable. Next the variable is fed to proc datasets for renaming the columns.
proc sql noprint;
select cmd into :cmd_list separated by ' ' from meta;
quit;
proc datasets library = work nolist;
modify no_headers;
rename &cmd_list;
quit;
At this point my two datasets have identical column names. the method is a bit tricky, but works. I'm sure there is another way, but this was fun one. :)
I have no working knowledge of SAS, but I have an excel file that I need to import and work with. In the excel file there are about 100 rows (observations) and 7 columns (quantities). In some cases, a particular observation may not have any data in one column. I need to completely ignore that observation when reading my data into SAS. I'm wondering what the commands for this would be.
An obvious cheap solution would be to delete the rows in the excel file with missing data, but I want to do this with SAS commands, because I want to learn some SAS.
Thanks!
Import the data however you want, for example with the IMPORT procedure, as Stig Eide mentioned.
proc import
datafile = 'C:\...\file.xlsx'
dbms = xlsx
out = xldata
replace;
mixed = YES;
getnames = YES;
run;
Explanation:
The DBMS= option specifies how SAS will try to read the data. If your file is an Excel 2007+ file, i.e. xlsx, then you can use DBMS=XLSX as shown here. If your file is older, e.g. xls rather than xlsx, try DBMS=EXCEL.
The OUT= option names the output dataset.
If a single level name is specified, the dataset is written to the WORK library. That's the temporary library that's unique to each SAS session. It gets deleted when the session ends.
To create a permanent dataset, specify a two level name, like mylib.xldata, where mylib refers to a SAS library reference (libref) created with a LIBNAME statement.
REPLACE replaces the dataset created the first time you run this step.
MIXED=YES tells SAS that the data may be of mixed types.
GETNAMES=YES will name your SAS dataset variables based on the column names in Excel.
If I understand you correctly, you want to remove every observation in the dataset that has a missing value in any of the seven columns. There are fancier ways to do this, but I recommend a simple approach like this:
data xldata;
set xldata;
where cmiss(col1, col2, ..., col7) = 0;
run;
The CMISS function counts the number of missing values in the variables you specify at each observation, regardless of the data type. Since we're using WHERE CMISS()=0, the resulting dataset will contain only the records with no missing data for any of the seven columns.
When in doubt, try browsing the SAS online documentation. It's very thorough.
If you have "SAS/ACCESS Interface to PC Files" licensed (hint: proc setinit) you can import the Excel file with this code. The where option lets you select which rows you want to keep, in this example you will keep the rows where the column "name" is not blank:
proc import
DATAFILE="your file.xlsx"
DBMS=XLSX
OUT=resulttabel(where=(name ne ""))
REPLACE;
MIXED=YES;
QUIT;
is it possible to read only specific variabiles from text files into sas as sas dataset using proc import? I have very big data in my text file which contains like 1000 observations and more than 42,000 variables. I tried reading this file into sas using proc import but i failed in doing so i thought may be because of size issues. Now i decided to read only specific variables (columns) really i do not need all of them from this big text file so that i can reduce size of file to read into sas system. is there any ideas to read like this by using data step? Suggestion or help would be appreciated
Thanking you very much,
If you're talking about a delimited text file, you can read in specific variables, contingent on them being consecutive from the first variable. You can't use PROC IMPORT to do this, however; you'd have to write the datastep yourself, though you could conceivably have PROC IMPORT help you write it.
For example, if you have 10 variables and only want the first 3, then you can read it in like this:
data want;
infile "mydata.txt" dlm=',' lrecl=255 missover dsd;
input x $ y $ z $;
run;
However, you must read in all variables from the first variable to the last variable you are interested in, even if you aren't interested in all of the intermediate variables. You can't "skip" them with a delimited text file.
If you have a fixed width text file, you can read in whatever columns you want (but you can't use PROC IMPORT to read in a fixed width text file).
I need to import data from European Social Survey databank to SAS.
I'm not very good at using SAS so I just naively tried importing the text file one gets but it stores it all in one variable.
Can someone maybe help me with what to do? Since there doesn't seem to be a guide on their webpage I reckon it has to be pretty easy.
It's free to register (and takes 5 secs) and I need all possible data for Denmark.
Edit: When downloading what they call a SAS file, what i get is a huge proc format and the same text file as one gets by choosing text.
The data in the text file isn't comma separated and the first row does not contain variable names.
Download it in SAS format. Save the text file in a location you can remember, and open the SAS file. It's not just one big proc format; it's a big proc format followed by a datastep with input code. It was probably created by SPSS (it fits the pattern of an SPSS saved .sas file anyhow). Look for:
DATA OUT.ESS1_4e01_0_F1;
Or something like that (that's what it is when I downloaded it). It's probably about 3/4 of the way down the page. You just need to change the code:
INFILE 'ESS1_4e01_0_F1.txt';
or similar, to be the directory you placed the text file in. Create a LIBNAME for OUT that goes to wherever you want to permanently save this, and do that at the start of the .sas file, replacing the top 3 lines like so.
Originally:
LIBNAME LIBRARY '';
LIBNAME OUT '';
PROC FORMAT LIBRARY=LIBRARY ;
Change these to:
libname out "c:\mystuff\"; *but probably not c:\mystuff :);
options fmtsearch=(out);
proc format lib=out;
Then run the entire thing.
This is the best solution if you want the formatted values (value labels) and variable labels. If you don't care about that, then it might be easier to deal with the CSV like Bob shows.
But the website says yu can download SAS format, why don't you?
You need a delimiter if all goes into one column.
data temp;
length ...;
infile 'file.csv' dlm=',';
input ...;
run;
As Dirk says, the web site says you can download a SAS dataset directly. However, if there's some reason you don't want to do that, choose a comma separated file (CSV) and use PROC IMPORT. Here is an example:
proc import out=some_data
datafile='c:\path\somedata.csv'
dbms=csv replace;
getnames=yes;
run;
Of course, this assumes the first row contains column names.