I want to use Porc sql to create a data set that contains some statistics as min and max for a lots of variables. The code below only returns a data set with min and max for first variable, for the rest of variables min and max are not show in the data set.
proc sql;
CREATE TABLE Lib.VarNum AS
%do i=1 %to &nvars;
select min(%SCAN(&numvar,&i)) as Min%SCAN(&numvar,&i),
max(%SCAN(&numvar,&i)) as Max%SCAN(&numvar,&i)
from &data (keep= _numeric_);
%end;
quit;
Somebody can help me?
using proc means is best way to do this.
proc means data=sashelp.cars noprint;
var _numeric_;
output out=want (drop= _type_ _freq_ )min(_numeric_) =
max(_numeric_) =/autoname;
run;
but if you want to so it by Proc SQL easiest to macrovariables from dictionary.columns and use them in your tables.
/* creating macrovariables using dictionary.columns*/
proc sql noprint;
select 'min('|| trim(name)||') as min_'||name,
'max('|| trim(name)||') as max_'||name
into :min separated by ',' , :max separated by ','
from dictionary.columns
where libname ='SASHELP'
and memname ='CARS'
and upcase(type) ='NUM';
Values of macrovariable can be checkedly and only partially shown
%put &min;
min(MSRP) as min_MSRP,min(Invoice) as min_Invoice,min(EngineSize) as min_EngineSize
use this macro variables in proc sql statement as shown below.
proc sql;
create table want as
select &min, &max from sashelp.cars;
Related
I created 40 plus tables (using a marco, which I just learned how to do) that I would like to apply the Proc Sort statement to. I want each table sorted by the same variable 'Account_Description' (each table contains this variable).
The table names are June_53410_v1, June_53420_v1, June_53430_v1, etc. Can I employ a macro, and if so, how can I, to mitigate having to write a proc sort statement for each table?
Thanks!
I found this sample code online but I'm not really sure how it works
%Macro sorter(dsn, var);
proc sort data=&dsn.;
by &var.;
run;
%mend;
%sorter(sample_dataset, age);
Macro that will be used (proc sort write to work):
%Macro sorter(lib,dsn, var);
proc sort data=&lib..&dsn. out=&dsn.;
by &var.;
run;
%mend;
Get dictionary of tables that contains in name some chars (its maby “June_” instead “AIR”) :
data sashelp_tables;
set sashelp.vtable;
where LIBNAME="SASHELP" and MEMNAME contains "AIR"
;
run;
Write code to string , and execute it for all tables:
data _NULL_;
length code $ 200;
set sashelp_tables;
code=cat('%sorter(',LIBNAME,',',MEMNAME,',AIR);');
call execute(code);
run;
I appreciate everyone's input-I think I found an answer though using this code:
%macro st (ds);
proc sort data = &ds;
by Account_Description;
run;
%mend;
%st(June_53410_v1);
%st(June_53420_v1);
You can use this solution, where lib is libname, mask_table is mask to table(June_ in your task) and var is variable to sort tables:
%macro sorter(lib,mask_table, var); %macro d;%mend d;
%let table_list = 0;
proc sql noprint;
select strip(libname) || '.' || strip(memname),count(memname)
into: table_list separated by ' '
from dictionary.tables
where libname = UPCASE("&lib.") and memname LIKE UPCASE("&mask_table.")||"%";
quit;
%do i=1 %to %sysfunc(countw(&table_list,%str( )));
%let name&i = %scan(&table_list, &i, %str( ));
proc sort data=&&name&i.;
by &var.;
run;
%end;
%mend sorter;
%sorter(WORK,June,Account_Description);
I am exporting the SAS contents to excel file and it works good., however the VARNUM option doesnt seem to work and the variables are in alphabetical order in the excel sheet.
here is the loop.
proc sql;
select count(Name) into :NumOfDatasets from Datas;
select Name into :Dataset1-:Dataset%trim(%left(&NumOfDatasets)) from datas;
quit;
%do index = 1 %to &NumOfDatasets;
proc contents data=&ImportLibrary..&&Dataset&index. varnum
out=&ExportLibrary..&&Dataset&index. (keep=name label);run;
proc export data=&ExportLibrary..&&Dataset&index.
outfile="&ExportLocation"
dbms=excelcs replace;
sheet="&&Dataset&index";
run;
%end;
The varnum option on proc contents affects only the report output of the procedure, not the dataset generated with the out= option.
You could just add a proc sort between your contents and export procedures (and move the keep= dataset option from the contents to the export procedure):
proc sql;
select count(Name) into :NumOfDatasets from Datas;
select Name into :Dataset1-:Dataset%trim(%left(&NumOfDatasets)) from datas;
quit;
%do index = 1 %to &NumOfDatasets;
proc contents data=&ImportLibrary..&&Dataset&index.
out=&ExportLibrary..&&Dataset&index.;
run;
proc sort data=&ExportLibrary..&&Dataset&index.;
by varnum;
run;
proc export data=&ExportLibrary..&&Dataset&index.(keep=name label)
outfile="&ExportLocation"
dbms=excelcs
replace;
sheet="&&Dataset&index";
run;
%end;
I have an output table that contains 300+ variables from 30 different tables that are joined by UNION, which is used for modelling. I have created a macro that creates a report with a number of statistics, such as mean, min/max values etc. using this output table. I am trying to add a column to the report that details which table(s) the variables come from. I say table(s) as some of the variables are shared across different tables. I want to avoid having the same variable in the report multiple times as the statistics are the same irrespective of what table the variable comes from. Is there an efficient way to do this?
Instead of UNION consider using a DATA STEP and then use the INDSNAME option instead.
data want;
set sashelp.class sashelp.cars indsname=source;
source_dataset = source;
run;
If it were me, I would loop over each of the union datasets and just put the table name and variable names into a compiled dataset. You probably have all the table names in either a macro list or typed out, so you can just add a few more lines of code to run proc contents on each of those to compile a full list of table and variable names. Note that like your example, there will be duplicate variable names that you can modify after the table is compiled:
** create different tables **;
data height; set sashelp.class(keep=name height); run;
data weight; set sashelp.class(keep=name weight); run;
data sex; set sashelp.class(keep=name sex); run;
** put your datasets into a list either manually or dynamically **;
/* manually */
%let ds_list=height weight sex;
/* dynamically -- be careful to include only tables in your union */
proc sql noprint;
select MEMNAME
into: ds_list separated by " "
from sashelp.vmember
where libname = "WORK" and memname not in ("SASMACR","FORMATS");
quit;
%put &ds_list.;
** loop over each table to put the table name and variables in a dataset **;
%MACRO get_names(ds_list);
%do i=1 %to %sysfunc(countw(&ds_list.));
%let ds = %scan(&ds_list.,&i.);
proc contents data = &ds. noprint
out=names_&ds.(keep=MEMNAME NAME rename=(MEMNAME=SOURCE_DATASET));
run;
proc append data = names_&ds. base=full force; run;
%end;
%MEND;
%get_names(&ds_list.);
I managed to do this using the following:
Create table with source tables.
PROC SQL;
CREATE TABLE SOURCES AS
SELECT NAME
,MEMNAME
FROM DICTIONARY.COLUMNS
WHERE LIBNAME='LIBNAME'
ORDER BY 1,2;
RUN;
Join to my stats table.
PROC SQL;
CREATE TABLE STATS_NEW AS
SELECT memname AS TABLE_NAME,a.*
FROM STATS a
LEFT JOIN SOURCES b
ON a.name = b.name
GROUP BY a.name
ORDER BY a.name;
QUIT;
Transpose data and add in comma separators.
DATA STATS_TRANSPOSE (drop=TABLE_NAME);
LENGTH INPUT_TABLES $1000;
SET STATS_NEW;
BY name;
RETAIN INPUT_TABLES;
IF FIRST.name THEN DO; INPUT_TABLES=TABLE_NAME; END;
IF NOT FIRST.name
THEN DO;
INPUT_TABLES=CATS(INPUT_TABLES,', ',TABLE_NAME);
END;
IF LAST.name THEN DO;
IF name IN ('FIELD1','FIELD2')
THEN DO; INPUT_TABLES='ALL'; END;
OUTPUT;
END;
RUN;
I have a data set with many variables - many of which are character valued. I have the following code to count the number of missing values for each variable:
proc format;
value $missfmt ' '='Missing' other='Not Missing';
value missfmt . ='Missing' other='Not Missing';
run;
proc freq data=dataname;
format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */
tables _CHAR_ / missing missprint nocum nopercent;
format _NUMERIC_ missfmt.;
tables _NUMERIC_ / missing missprint nocum nopercent;
run;
However, this results in a huge output (300 page pdf if I print to pdf) with 90% of the variables having no missing values. How do I tell PROC FREQ to only display the tables which have missing values?
You can identify which variables have a missing value from the NLEVELS option in PROC FREQ. So my process would be to create a dataset that just held the variables with missing values, then store them in a macro variable so the following PROC FREQ can be run against them only.
Here is the code to do that.
/* set up dummy dataset */
data have;
set sashelp.class;
if _n_ in (10,13) then call missing(age,sex);
run;
/* create dataset that holds variables with missing values */
ods select nlevels;
ods output nlevels=miss_vars (where=(nmisslevels>0));
ods noresults;
proc freq data=have nlevels;
run;
ods results;
/* store names in a macro variable */
proc sql noprint;
select tablevar into :missvar separated by ' '
from miss_vars;
quit;
proc format;
value $missfmt ' '='Missing' other='Not Missing';
value missfmt . ='Missing' other='Not Missing';
run;
proc freq data=have (keep=&missvar.);
format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */
tables _CHAR_ / missing missprint nocum nopercent;
format _NUMERIC_ missfmt.;
tables _NUMERIC_ / missing missprint nocum nopercent;
run;
This one removes all blank columns:
%macro removeblanks(dataset,output);
/* create dataset that holds variables with missing values */
ods select nlevels;
ods output nlevels=miss_vars (where=(nmisslevels>0 and nnonmisslevels=0));
ods noresults;
proc freq data=&dataset. nlevels;
run;
/* store names in a macro variable */
proc sql noprint;
select tablevar into :missvar separated by ' '
from miss_vars;
quit;
data &output.;
set &dataset.(drop=&missvar.);
run;
%mend removeblanks;`
How do you read multiple specific datasets and append to one big dataset?
For example I within a library I have 100s of datasets but I only want to append the datasets that have _du1, _du2
The format and column names are the same
My stab of it doesnt work:
PROC SQL NOPRINT;
SELECT memname INTO :tab1-:tab103 FROM sashelp.vtable
where memname like '_DU%';
SELECT count(*) INTO :obs FROM sashelp.vtable
where memname like '_DU%';
QUIT;
%macro rubber;
%do i=1 %to i=&obs;
proc append base=tot_comb data=&&tab&i force;
run;
%end;
%mend;
%rubber;
PROC APPEND may not actually be faster in this case, or at least not faster by enough to justify doing it, than just writing a datastep.
data tot_comb;
set work._DU:; *or your libname;
run;
This will work if you are on SAS 9.2 or later. If you're on 9.1 or earlier, you'll need to do one proc sql step, like
proc sql;
select memname into :namelist separated by ' '
from dictionary.columns
where libname='WORK' /* or your libname */
and memname eqt '_DU';
quit;
*eqt is like starts with;
data tot_comb;
set &namelist;
run;
That only requires one pass to write, and I'm not sure it will be much slower than so many calls to PROC APPEND.
Here is some code that will get you all the data set names from a given library with some characteristics (starts with _DU). You could use the final macro in a variety of ways to append data sets.
Data _DU1;
var="One";
Run;
Data _DU2;
var="Two";
Run;
PROC SQL;
create table main as
SELECT *
FROM DICTIONARY.COLUMNS
WHERE UPCASE(LIBNAME)="WORK" AND
UPCASE(MEMNAME) like '_DU%';
Select memname
into :dsn separated by ' '
from main;
QUIT;
%Put &dsn;
EDIT (according to your comment)
I added some UPCASE statements and used your count macro var for the number of tab macros
Narrowing your where statement should make your code more efficient
Try this (some of the code is untested):
PROC SQL NOPRINT;
SELECT count(*)
INTO :obs
FROM sashelp.vtable
where UPCASE(LIBNAME)="<YOUR LIB IN UPCASE>" AND
upcase(memname) like '_DU%';
%Let obs=&obs;
SELECT memname
INTO :tab1-:tab&obs
FROM sashelp.vtable
where UPCASE(LIBNAME)="<YOUR LIB IN UPCASE>" AND
upcase(memname) like '_DU%';
QUIT;
%macro rubber;
%do i=1 %to &obs;
proc append base=tot_comb data=&&tab&i force; run;
%end;
%mend;
%rubber;