How to generate test data with no input data in SAS?
Why don't you use one of the example datasets in the SASHELP library?
For instance
proc print data=help.class;
run;
More examples can be found here
Another alternative is to use inline data,
like in
data person;
infile datalines delimiter=',';
input name $ dept $;
datalines;
John,Sales
Mary,Acctng
;
run;
If you realy want to use no input at all, use a data step with a loop.
like in
data whatever;
do Key = -99 to 99 by 2;
Square= Key*Key;
Value= rand("Uniform");
output;
end;
run;
There is the sashelp library, but if you are looking for something different - check out the Bizarro Ball data here: https://github.com/allanbowe/BizarroBall
To import, just run the following:
filename bizarro url
"https://raw.githubusercontent.com/allanbowe/BizarroBall/master/bizarroball.sas";
%inc bizarro;
Related
TOPIC: Change dataset with loop count into append statement
I have a macro that will loop and create a new dataset with a counter behind.
Code like this:
PROC IMPORT OUT=WORK.out&i DATAFILE= "&dir/&name"
/excelout/
DBMS=csv REPLACE; delimiter='09'x; getnames=no; RUN;
data test&i (drop= %do k=1 %to &cnt; &&col&k.. %end;
);
length station $10 voltage $10 year 8 month $20 transformer $10 Day $20 Date Time MW_Imp MW_Exp MVAR_Imp MVAR_Exp MVA
Power_Factor 8; format Time hhmm.; set out&i. end=last;
Currently the script will generate about 4 data sets if i have 4 external files by PROC IMPORT.
What i want is to eliminate the creation of multiple datasets but just append them into the master file. Is there a way to do so?
An append statement inside the loop should be sufficient to achieve this. SAS will copy first dataset as base since it was not existing.
proc append base=test data=test&i force; run;
Appending is probably just as easy, but if you don't want to create many datasets to begin with, you could use a data step to read in several files at once, using wildcards. That would eliminate the need to loop through the files, but does require that the files have the same structure and aren't stored in a folder with other similarly named files. The firstobs-option caused som issues in my tests, but as you have specified getnames=no in your import, I guess you have no need for it.
The snippet below inputs all csv files in c:\test.
data test;
infile "c:\test\*.csv" dsd delimiter='09'x;
input varA $ varB $;
run;
I am having two data sets. The first data set has airport codes (JFK, LGA, EWR) in a variable 'airport'. The second dataset has the list of all major airports in the world. This dataset has two variables 'faa' holding the FAA Code (like JFG, LGA, EWR) and 'name' holding the actual name of the airport (John. F Kennedy, Le Guardia etc.).
My requirement is to create value labels for in the first data set, so that instead of airport code, the actual name of the airport comes up. I know I can use custom formats to achieve this. But can I write SAS code which can read the unique airport codes, then get the names from another data set and create a value label automatically?
PS: Other wise, the only option I see is to use MS Excel to get the unique list of FAA codes in dataset 1, and then use VLOOKUP to get the names of the airports. And then create one custom format by listing each unique FAA code and the airport name.
I think "value label" is SPSS terminology. Looks like you want to create a format. Just use your lookup table to create an input dataset for PROC FORMAT.
So if your second table looks like this:
data table2;
length FAA $4 Name $40 ;
input FAA Name $40. ;
cards;
JFK John F. Kennedy (NYC)
LGA Laguardia (NYC)
EWR Newark (NJ)
;
You can use this code to convert it into a dataset that PROC FORMAT can use to create a format.
data fmt ;
fmtname='$FAA';
hlo=' ';
set table2 (rename=(faa=start name=label));
run;
proc format cntlin=fmt lib=work.formats;
run;
Now you can use that format with your other data.
proc freq data=table1 ;
tables airport ;
format airport faa. ;
run;
Firstly, consider if it is really a format what is needed. For example, you may just do a left join to retrieve the column (airport) name from table2 (FAA-Name table).
Anyway, I believe the following macro does the trick:
Create auxiliary tables:
data have1;
input airport $;
datalines;
a
d
e
;
run;
data have2;
input faa $ name $;
datalines;
a aaaa
b bbbb
c cccc
d dddd
;
run;
Macro to create Format:
%macro create_format;
*count number of faa;
proc sql noprint;
select distinct count(faa) into:n
from have2;
quit;
*create macro variables for each faa and name;
proc sql noprint;
select faa, name
into:faa1-:faa%left(&n),:name1-:name%left(&n)
from have2;
quit;
*create format;
proc format;
value $airport
%do i=1 %to &n;
"&faa%left(&i)" = "&name%left(&i)"
%end;
other = "Unknown FAA code";
run;
%mend create_format;
%create_format;
Apply format:
data want;
set have1;
format airport $airport.;
run;
I am looking to create multiple datasets from city_variables dataset. There are a total of 58 observations that I summed up into macrovariable (&count) to stop the do loop.
The city_variables dataset looks like (vertically ofcourse):
CITY_NAME
City1
City2
City3
City4
City5
City6
City7
City8
City9
City10
..........
City58
I created macrovariable &name from a data null statement in order to input the cityname into the dataset name.
Any help would be great on how to automate the creation of the 48 files by name (not number). Thanks again.
/Create macro with number of observations in concordinate file/
proc sql;
select count(area_name);
into :count
from main.state_all;
quit;
%macro repeat;
data _null_;
set city_variables;
%do i= 1 %UNTIL (i = &count);
call symput('name',CITY_NAME);
run;
data &name;
set dataset;
where city_name = &name;
run;
%end;
%mend repeat;
%repeat
Well, if you're going to do
proc sql;
select count(area_name);
into :count
from main.state_all;
quit;
Then why not go all the way? Make a macro that does one dataset output, given the criteria as parameters, then make one call for each separate whatever-name. This might be close to what you're looking at.
%macro make_data(data_name=, set_name=, where=);
data &data_name.;
set &set_name.;
where &where.;
run;
%mend make_data;
proc sql;
select
cats('%make_data(data_name=',city_name,
', set_name=dataset, where=city_name="',
city_name,
'" )')
into :make_datalist
separated by ' '
from main.state_all;
quit;
&make_datalist.;
Some other options that I'll just link to:
Chris Hemedinger # SAS Dummy blog How to Split One Data Set Into Many shows a similar concept except he doesn't put the macro wrapper where I do.
Paul Dorfman, Data Step Hash Objects as Programming Tools is the seminal paper on using a hash table to do this. This is the "fastest" way to do this, likely, if you understand hash tables and have the memory available.
You don't need to use a macro to automate splitting up your data in this way. Since your example is really simple, I would consider using call execute in a null data step:
data test;
infile datalines ;
input city_name $20.;
datalines;
City1
City2
City2
City3
City3
City3
;
run;
data _null_;
set test;
call execute("data "||strip(city_name)||";"||"
set test;
where city_name = '"||strip(city_name)||"';"||"
run;");
run;
filename Source 'C:\Source.txt';
Data Example;
Infile Source;
Input Var1 Var2;
Run;
Is there a way I can import all the variables from Source.txt without the "Input Var1 Var2" line? If there are many variables, I think it's too time consuming to list out all the variables, so I was wondering if there's any way to bypass that.
Thanks
Maybe you can use proc import ?
For a CSV I use this and I don't have to define every variable
proc import datafile="&CSVFILE"
out=myCsvData
dbms=dlm
replace;
delimiter=';';
getnames=yes;
run;
It depends on what you have in your txt file. Try different delimiters.
If you are looking at a solution which is INFILE statement based then following reference code should help.
data _null_;
set sashelp.class;
file '/tester/sashelp_class.txt' dsd dlm='09'x;
put name age sex weight height;
run;
/* Version #1 : When data has mixed data(numeric and character) */
data reading_data_w_format;
infile '/tester/sashelp_class.txt' dsd dlm='09'x;
format name $10. age 8. gender $1. weight height 8.2;
input (name--height) (:);
run;
proc print data=reading_data_w_format;run;
proc contents data=reading_data_w_format;run;
/* Version #2 : When all data can be read a character.
I know this version doesn't make sense, but it's still an option*/
data reading_data_wo_format;
infile '/tester/sashelp_class.txt' dsd dlm='09'x;
input (var1-var5) (:$8.); /* Length would be max length of value in all the columns */
run;
proc print data=reading_data_wo_format;run;
proc contents data=reading_data_wo_format;run;
I'd suggest to write down the informat for the variables to be read so that you are sure that the file is as per your specification. PROC IMPORT will try to scan the data first from 1st row till GUESSINGROWS(do not set it to high, if each column is of consistent length) value and based on the length and type, it will use an informat and length which it finds suitable for the reading the variables in the file.
I am new to SAS and facing few difficulties while creating following program.
My requirement is to pass the filename generated dynamically and read it so that don't have to write code five times to read data from 5 different files and then run freqs on the datasets.
I have provided the code below and have to write this code for more than 50 files:
Code
filename inp1 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1102_t1102_c10216_vEL5535.raw';
filename inp2 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1103_t1103_c10317_vEL8312.raw';
filename inp3 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1104_t1104_c10420_vEL11614.raw';
filename inp4 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1105_t1105_c10510_vEL13913.raw';
filename inp5 '/chshttp/prod/clients/coms/raw/coms_coms_relg_f1106_t1106_c10628_vEL17663.raw';
data test;
Do i = 1 to 5;
infile_name = 'inp' || i;
infile infile_name recfm = v lrecl=1800 end=eof truncover;
INPUT
#1 E_CUSTDEF1_CLIENT_ID $CHAR5.
#1235 E_MED_PLAN_CODE $CHAR20.
#1090 MED_INS_ELIG_COVERAGE_IND $CHAR20.
#1064 MED_COVERAGE_BEGIN_DATE $CHAR8.
#1072 MED_COVERAGE_TERM_DATE $CHAR8.
;
if E_CUSTDEF1_CLIENT_ID ='00002' then
output test;
end;
run;
proc freq data = test;
tables E_CUSTDEF1_CLIENT_ID*E_MED_PLAN_CODE / list missing;
run;
Please help!!
Here's an example you can adapt. There are different ways to do this, but this is one- depending no how you want the frequencies.
Step 1: Create a dataset, 'my_filenames', that stores the filename you want to read in, one per line, in a variable FILE_NAME.
Step 2: Read in the files.
data my_data;
set my_filenames;
infile a filevar=file_name <the rest of your options>;
<your input statement>;
run;
proc freq data=mydata;
by file_name;
<your table statements>;
run;
This is simple, data driven code that doesn't require macros or storing large amounts of data in things that shouldn't have data in them (macro variables, filenames, etc.)
To directly answer your question, here is a SAS macro to read each file and run PROC FREQ:
%macro freqme(dsn);
data test;
infile "&dsn" recfm = v lrecl=1800 end=eof truncover;
INPUT #1 E_CUSTDEF1_CLIENT_ID $CHAR5.
#1235 E_MED_PLAN_CODE $CHAR20.
#1090 MED_INS_ELIG_COVERAGE_IND $CHAR20.
#1064 MED_COVERAGE_BEGIN_DATE $CHAR8.
#1072 MED_COVERAGE_TERM_DATE $CHAR8.
;
if E_CUSTDEF1_CLIENT_ID = '00002';
run;
proc freq data=test;
tables E_CUSTDEF1_CLIENT_ID*E_MED_PLAN_CODE / list missing;
run;
proc delete data=test;
run;
%mend;
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1102_t1102_c10216_vEL5535.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1103_t1103_c10317_vEL8312.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1104_t1104_c10420_vEL11614.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1105_t1105_c10510_vEL13913.raw);
%freqme(/chshttp/prod/clients/coms/raw/coms_coms_relg_f1106_t1106_c10628_vEL17663.raw);
Note that I added a PROC DELETE step to delete the SAS data set after creating the report. I did that more for illustration, since you don't say you need the file as a SAS data set for subsequent processing.
You can use this as a template for other macro programming.