How to use filters when importing on sas - sas

I have a very large data table on "dsv" format and i'm trying to import it on sas. However i don't have enough space to import the full table and then filter it (i've done this for smaller tables).
Is there any way to filter the data while importing it because at the end i will only use a part of that table ? If i want for example to import only rows that have the value 103 for Var2
PS: i'm using "proc import" not "data - infile..." because i don't know the exact number of columns
Var1
Var2
Var3
A10
103
Test
A02
102
Hiis
...
...
....
Thank you

You can add dataset options to the dataset listed in the OUT= option of PROC IMPORT.
Example:
filename dsv temp;
data _null_;
input (var1-var3) (:$20.);
file dsv dsd dlm='|';
put var1-var3;
cards;
Var1 Var2 Var3
A10 103 Test
A02 102 Hiis
;
proc import file=dsv dbms=csv out=want(where=(var2=102)) replace ;
delimter='|';
run;
The result is a dataset with just one observation.
NOTE: The data set WORK.WANT has 1 observations and 3 variables.
If you don't know the name of the second variable you could always just read the header row first and put the name into a macro variable.
data _null_;
infile dsv dsd dlm='|' truncover obs=1;
input (2*name) (:$32.);
call symputx('var2',nliteral(name));
run;
proc import file=dsv dbms=csv out=want(where=(&var2=102)) replace ;
delimter='|';
run;

You can add a where dataset option to the out= statement. For example:
proc import
file = 'myfile.txt'
out = want(where=(var2=103))
...;
run;

Related

SAS proc export without comma thousands

I noticed in the SAS log that when I call a proc export data=mydata outfile="csv.csv" dbms=csv replace; run;, I get a generated internal set which declares a comma data format: comma20.3.
138 format YEAR best12. ;
145 format RATE_SPREAD comma20.3 ;
How can I get proc export not to do this, and to export without comma separators? Eg 9000 instead of 9,000?
Unfortunately PROC EXPORT does not support the FORMAT statement.
You could make a view to the original data with the format removed and export that.
data for_export / view=for_export;
set mydata;
format rate_spread ;
run;
proc export data=for_export outfile="csv.csv" dbms=csv replace;
run;
But you really don't need to use PROC EXPORT to write a CSV file. A data step works just as well. You might have to do a little work to add the header row.
proc transpose data=mydata(obs=0) out=names ;
var _all_;
run;
data _null_;
file "csv.csv" dsd ;
set names;
put _name_ #;
run;
data _null_;
file "csv.csv" dsd mod ;
set mydata;
put (_all_) (+0);
format rate_spread ;
run;

Mixed Delimiters in Proc Export

Is there a method to make the first delimiter in an observation different to the rest? In Microsoft SQL Server Integration Services (SSIS), there is an option to set the delimiter per column. I wonder if there is a similar way to achieve this in SAS with an amendment to the below code, whereby the first delimiter would be tab instead and the rest pipe:
proc export
dbms=csv
data=mydata.dataset1
outfile="E:\OutPutFile_%sysfunc(putn("&sysdate9"d,yymmdd10.)).txt"
replace
label;
delimiter='|';
run;
For example
From:
var1|var2|var3|var4
to
var1 var2|var3|var4
...Where the large space between var1 and var2 is a tab.
Many thanks in advance.
Sounds like you just want to make a new variable that has the first two variables combined and then write that out using tab delimiter.
data fix ;
length new1 $50 ;
set have ;
new1=catx('09'x,var1,var2);
drop var1 var2 ;
run;
proc export data=fix ... delimiter='|' ...
Note that you can reference a variable in the DLM= option on the FILE statement in a data step.
data _null_;
dlm='09'x ;
file 'outfile.txt' dsd dlm=dlm ;
set have ;
put var1 # ;
dlm='|' ;
put var2-var4 ;
run;
Or you could use the catx() trick in a data _null step. You also might want to use vvalue() function to insure formats are applied.
data _null_;
length newvar $200;
file 'outfile.txt' dsd dlm='|' ;
set have ;
newvar = catx('09'x,vvalue(var1),vvalue(var2));
put newvar var3-var4 ;
run;
Updated Fixed order of delimiters to match question.
Final code based on the marked answer by Tom:
data _null_;
dlm='09'x ;
file "E:\outputfile_%sysfunc(putn("&sysdate9"d,yymmdd10.)).txt" dsd dlm=dlm ;
set work.have;
put
var1 # ;
dlm='|';
put var2 var3 var4;
run;

Value labels to be created using data from another data set

I am having two data sets. The first data set has airport codes (JFK, LGA, EWR) in a variable 'airport'. The second dataset has the list of all major airports in the world. This dataset has two variables 'faa' holding the FAA Code (like JFG, LGA, EWR) and 'name' holding the actual name of the airport (John. F Kennedy, Le Guardia etc.).
My requirement is to create value labels for in the first data set, so that instead of airport code, the actual name of the airport comes up. I know I can use custom formats to achieve this. But can I write SAS code which can read the unique airport codes, then get the names from another data set and create a value label automatically?
PS: Other wise, the only option I see is to use MS Excel to get the unique list of FAA codes in dataset 1, and then use VLOOKUP to get the names of the airports. And then create one custom format by listing each unique FAA code and the airport name.
I think "value label" is SPSS terminology. Looks like you want to create a format. Just use your lookup table to create an input dataset for PROC FORMAT.
So if your second table looks like this:
data table2;
length FAA $4 Name $40 ;
input FAA Name $40. ;
cards;
JFK John F. Kennedy (NYC)
LGA Laguardia (NYC)
EWR Newark (NJ)
;
You can use this code to convert it into a dataset that PROC FORMAT can use to create a format.
data fmt ;
fmtname='$FAA';
hlo=' ';
set table2 (rename=(faa=start name=label));
run;
proc format cntlin=fmt lib=work.formats;
run;
Now you can use that format with your other data.
proc freq data=table1 ;
tables airport ;
format airport faa. ;
run;
Firstly, consider if it is really a format what is needed. For example, you may just do a left join to retrieve the column (airport) name from table2 (FAA-Name table).
Anyway, I believe the following macro does the trick:
Create auxiliary tables:
data have1;
input airport $;
datalines;
a
d
e
;
run;
data have2;
input faa $ name $;
datalines;
a aaaa
b bbbb
c cccc
d dddd
;
run;
Macro to create Format:
%macro create_format;
*count number of faa;
proc sql noprint;
select distinct count(faa) into:n
from have2;
quit;
*create macro variables for each faa and name;
proc sql noprint;
select faa, name
into:faa1-:faa%left(&n),:name1-:name%left(&n)
from have2;
quit;
*create format;
proc format;
value $airport
%do i=1 %to &n;
"&faa%left(&i)" = "&name%left(&i)"
%end;
other = "Unknown FAA code";
run;
%mend create_format;
%create_format;
Apply format:
data want;
set have1;
format airport $airport.;
run;

import csv file in SAS virtual machine

I new in SAS and I have big data around 3000 rows and 10 columns in CSV file and I want to import this to SAS but I have MAC and I use SAS in virtual machine how can I import it?
I try to copy it but does not work.
3000 rows isn't big! I can't comment on the specifics of your VM and file access configuration, but one easy way is to simply copy paste your CSV values into SAS Studio and read them in using the datalines statement, eg:
/* set up a temp fileref to hold your csv */
filename tmp temp;
/* read in the raw data using datalines, and write to fileref */
data _null_;
infile datalines ;
file tmp ;
input;
put _infile_;
datalines;
col1,col2,col3,col4
your,data,goes,here
see,how,it,works?
;
run;
/* import the csv any way you like */
proc import datafile=tmp out=work.want dbms=csv replace;
getnames=yes;
run;
A more efficient option would be to build the dataset direct from the datalines - I'll leave it to you to decide which is more convenient, but here's a head start:
data work.want;
infile datalines delimiter=',';
input col1 $ col2 $ col3 $ col4 $;
datalines;
your,data,goes,here
see,how,it,works?
;
run;

SAS: PROC IMPORT: CSV WITH DATES AS VAR NAMES

I'm importing CSV data in the following format:
SEDOL,12/08/2009,13/08/2009,14/08/2009,17/08/2009,18/08/2009
B1YVN39,7.8431,7.8431,7.8431,7.8431,7.598
B00G7R3,3.8,3.61,3.81,3.81,3.81
2965237,4.5351,4.5351,4.5351,4.5351,4.5351
2554345,7.355,7.355,7.355,7.355,7.355
I'm using the following command:
PROC IMPORT OUT= want
DATAFILE= have
DBMS=CSV REPLACE;
RUN;
Then transposing the data to long format, as follows:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
proc print; run;
How can I import the dates correctly formatted and change the variable type from default to date?
Importing and transposing are handy procedures, but if you understand your data well, a little data step program can deal with this in one step:
data want(keep=sedol v_date v_value);
infile have dsd dlm=',' truncover;
informat sedol $8. d1-d50 ddmmyy10. v1-v50 8.;
format v_date yymmdd10.;
array d(50) d1-d50;
array v(50) v1-v50;
/* Retain the date values and the count of dates */
retain d1-d50 idx;
/* Read header */
if _n_ = 1 then do;
input sedol d1-d50;
/* loop to find how many date columns there are */
do idx=1 to 50 while(d(idx) ne .);
end;
idx = idx - 1; /* must subtract one here */
delete;
end;
/* Read data lines */
input sedol v1-v50;
do i=1 to idx;
v_date = d(i);
v_value = v(i);
output;
end;
run;
As long as your input file is exactly as you describe (a header record with a leading ID variable less than 8 characters followed by some number of date values representing columns), this will process up to 50 measurements. It should be easy enough to modify if your needs change.
I would suggest in this case importing separately data and headers.
First, we import data:
PROC IMPORT OUT= want
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
datarow=2;
RUN;
Then we import only the first row with variables' names:
options obs=1;
PROC IMPORT OUT= header
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
RUN;
options obs=max;
Then we transpose row with headers into column and "mask" illegal (as SAS-names) values - add letter (doesn't matter which one, I chose 'D') as the first character and replace all slashes '/' to underscores '_':
proc transpose data=header out=header(drop=_name_);var _all_;run;
data header;
set header;
if anydigit(substr(COL1,1,1)) then COL1=cats("D",COL1);
COL1=translate(COL1,"_","/");
run;
Put this new 'cleaned' column names into a macrovariable:
proc sql noprint;
select COL1 into :names separated by ' '
from header;
quit;
And generate DATA-step for renaming using CALL EXECUTE routine:
data _null_;
dsid=open("want","i");
num=attrn(dsid,"nvars");
call execute("data want;");
call execute("set want;");
call execute("rename");
do i=1 to num;
call execute(varname(dsid,i)||"="||scan("&names",i," "));
end;
call execute(";run;");
rc=close(dsid);
run;
Now your original SORT and TRANSPOSE:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
And at last 'unmask' those dates back (deleting first D and replacing _ to /), and covert them to real dates with INPUT(). RETAIN statement is added just to put the new variable DATE at the second place right after SEDOl.
data transp;
retain SEDOL date;
set transp;
substr(_name_,1,1)='';
_name_=translate(_name_,"/","_");
date=input(strip(_name_),ddmmyy10.);
drop _name_;
format date ddmmyy10.;
run;