I'm completely new in SAS 4GL...
Is it possible to extract from table, which columns are primary key or parts of compound primary key? I need their values to be merged into one column of an output dataset.
The problem is, that as an input I can get different tables, and I don't know theirs definition.
If an index is defined, then you can find out what variable(s) is/are used in that index. See for example:
data blah(index=(name));
set sashelp.class;
run;
proc contents data=blah out=blahconts;
run;
blahconts has columns that indicate that name is in a simple index, and that it has 1 index total.
Also, you can have foreign key contraints, such as the following from this SAS documentation example:
proc sql;
create table work.mystates
(state char(15),
population num,
continent char(15),
/* contraint specifications */
constraint prim_key primary key(state),
constraint population check(population gt 0),
constraint continent check(continent in ('North America', 'Oceania')));
create table work.uspostal
(name char(15),
code char(2) not null, /* constraint specified as */
/* a column attribute */
constraint for_key foreign key(name) /* links NAME to the */
references work.mystates /* primary key in MYSTATES */
on delete restrict /* forbids deletions to STATE */
/* unless there is no */
/* matching NAME value */
on update set null); /* allows updates to STATE, */
/* changes matching NAME */
/* values to missing */
quit;
proc contents data=uspostal out=postalconts;
run;
proc sql;
describe table constraints uspostal;
quit;
That writes the constraint information to the output window. From the output dataset you can see that the variable is in a simple index. You can wrap either of these (the PROC CONTENTS or the DESCRIBE TABLE CONSTRAINTS) in ODS OUTPUT to get the information to a dataset:
ods output IntegrityConstraints=postalICs;
proc contents data=uspostal out=postalconts;
run;
ods output close;
or
ods output IntegrityConstraints=postalICs;
proc sql;
describe table constraints uspostal;
quit;
ods output close;
Related
I have a dataset with 20 columns all starting with the name morb_, which are all 1 or 2, coded as No and Yes. There is an additional column called Pat_TNO which is the patient reference number. Patients have more than one row.
I wish to create a new dataset which summarises whether each patient has had at least one of each type of event. So far the code I have written works perfectly, but is there a way to simplify it using an array?
proc sql;
select
Pat_TNO,
max(morb_1) as morb_1 format yn.,
max(morb_2) as morb_2 format yn. /* etc etc */
from morbidity
group by Pat_TNO;
quit;
COumn names aren't morb_1 and morb_2, rather morb_amputation, morb_mi, morb_tia, etc.
proc summary data=morbidity nway missing;
class pat_tno;
output out=max max(morb_:) = ;
run;
I would like to know is it possible to perform an action that keeps only columns that contain a certain character.
For example, lets say that I have columns: name, surname, sex, age.
I want to keep only columns that start with letter 's' (surname and sex).
How do I do that?
There's several variations on how to filter out names.
For prefixes or lists of variables it's pretty easy. For suffixes or more complex patterns it keeps more complicated. In general you can short cut lists as follows:
_numeric_ : all numeric variables
_character_ : all character variables
_all_ : all variables
prefix1 - prefix# : all variables with the same prefix assuming they're numbered
prefix: : all variables that start with prefix
firstVar -- lastVar : variables based on location between first and last variable, including the first and last.
first-numeric-lastVar : variables that are numeric based on location between first and last variable
Anything more complex requires that you filter it via the metadata list. SAS basically keeps some metadata about each data set so you can query that information to build your lists. Data about columns and types are in the sashelp.vcolumn or dictionary.column data set.
To filter all columns that have the word mpg for example:
*generate variable list;
proc sql noprint;
select name into :var_list separated by " "
from sashelp.vcolumn
where libname = 'SASHELP' and memname = 'CARS'
and lowcase(name) like '%mpg%';
quit;
*check log for results;
%put &var_list;
*verification from original table;
proc contents data=sashelp.cars;
run;
*example of usage;
data want;
set sashelp.cars;
keep &var_list;
run;
Some more details are available in this blog post and here (documentation).
If you want do keep only variables that start with an s, then use name prefix list operator :.
data want;
set have(keep=s:);
run;
It's possible.
In the code below I created a macro variable that has the name of columns that have in a table. After run the code you will have the name of columns you want.
PROC SQL;
SELECT
NAME
INTO:
NMVAR /* SAVE IN MACRO VARIABLE */
FROM SASHELP.VCOLUMN
WHERE
LIBNAME EQ "YOUR LIBNAME" AND /* THE NAME OF LIB MUST BE WRITTEN IN UPPERCASE */
MEMNAME EQ "YOUR TABLE" AND /* THE NAME OF 'TABLE/DATA SET' MUST BE WRITTEN IN UPPERCASE */
SUBSTR(NAME,1,1) EQ "S";
RUN;
For complex variable name selection filtering, such as regular expressions, or lookup in external metadata control table, you will need to process the metadata of the table itself to construct source that can be applied.
This example demonstrates two, of many, ways that source code can be generated.
metadata table from target table, Proc CONTENTS
process metadata, Proc SQL
construct source code
Expectation of name lists < 64K
SQL INTO :<macro-variable> for source code expected to be < 64K characters
Very large name lists or robust
A macro that streams source code from metadata table
From a data set with 50,000 variables select the columns whose name contains 2912
data have;
retain id 'HOOPLA12345' x1-x50000 .;
stop;
run;
* obtain metadata of target table;
proc contents noprint data=have
out=varlist_table
( keep=name
where= (
prxmatch('/x.*2912.*/',name) /* name selection criteria */
)
);
run;
* Short lists;
* construct source code for name list;
proc sql noprint;
select name into :varlist separated by ' ' from varlist_table;
data want;
set have (keep=&varlist); /* apply generated source code */
run;
* Arbitrary or Long lists expected;
%macro stream_column (data=, column=);
%local dsid index &column;
%let dsid=%sysfunc(open(&data(keep=&column)));
%if &dsid %then %do;
%syscall SET(dsid);
%do %while (0=%sysfunc(fetch(&dsid)));
&&&column. /* emit column value from table */
%end;
%let dsid = %sysfunc(close(&dsid));
%end;
%mend;
options mprint;
data want2;
set have (keep=
/* stream source code as macro text emissions */
%stream_column(data=varlist_table,column=name)
);
run;
In SAS, Proc HPBIN, OUTPUT option does not keep original variables as explained below
OUTPUT=SAS-data-set
creates an output SAS data set in single-machine mode or a database table that is saved alongside the distributed database in distributed mode. The output data set or table contains binning variables. To avoid data duplication for large data sets, the variables in the input data set are not included in the output data set.
--> How can I keep original variables and bin number?
Do a 1:1 merge of the output data set with the input data set.
For example:
proc hpbin data=sashelp.orsales noprint out=profitbinned;
var profit;
run;
data want;
merge sashelp.orsales profitbinned;
* 1:1 merge does not have a BY statement;
run;
If the input data has a primary, or unique key, you can specify those key variables in an ID statement to ensure a more robust post-binning merge:
proc hpbin data=sashelp.citiday noprint out=dowcmp_binned;
var SNYDJCM;
id date;
run;
proc sql;
create table want as
select bin.date, ticker.SNYDJCM, bin.bin_SNYDJCM
from sashelp.citiday as ticker
join work.dowcmp_binned as bin
on ticker.date = bin.date
order by bin.date;
I would like to know is it possible to perform an action that keeps only columns that contain a certain character.
For example, lets say that I have columns: name, surname, sex, age.
I want to keep only columns that start with letter 's' (surname and sex).
How do I do that?
There's several variations on how to filter out names.
For prefixes or lists of variables it's pretty easy. For suffixes or more complex patterns it keeps more complicated. In general you can short cut lists as follows:
_numeric_ : all numeric variables
_character_ : all character variables
_all_ : all variables
prefix1 - prefix# : all variables with the same prefix assuming they're numbered
prefix: : all variables that start with prefix
firstVar -- lastVar : variables based on location between first and last variable, including the first and last.
first-numeric-lastVar : variables that are numeric based on location between first and last variable
Anything more complex requires that you filter it via the metadata list. SAS basically keeps some metadata about each data set so you can query that information to build your lists. Data about columns and types are in the sashelp.vcolumn or dictionary.column data set.
To filter all columns that have the word mpg for example:
*generate variable list;
proc sql noprint;
select name into :var_list separated by " "
from sashelp.vcolumn
where libname = 'SASHELP' and memname = 'CARS'
and lowcase(name) like '%mpg%';
quit;
*check log for results;
%put &var_list;
*verification from original table;
proc contents data=sashelp.cars;
run;
*example of usage;
data want;
set sashelp.cars;
keep &var_list;
run;
Some more details are available in this blog post and here (documentation).
If you want do keep only variables that start with an s, then use name prefix list operator :.
data want;
set have(keep=s:);
run;
It's possible.
In the code below I created a macro variable that has the name of columns that have in a table. After run the code you will have the name of columns you want.
PROC SQL;
SELECT
NAME
INTO:
NMVAR /* SAVE IN MACRO VARIABLE */
FROM SASHELP.VCOLUMN
WHERE
LIBNAME EQ "YOUR LIBNAME" AND /* THE NAME OF LIB MUST BE WRITTEN IN UPPERCASE */
MEMNAME EQ "YOUR TABLE" AND /* THE NAME OF 'TABLE/DATA SET' MUST BE WRITTEN IN UPPERCASE */
SUBSTR(NAME,1,1) EQ "S";
RUN;
For complex variable name selection filtering, such as regular expressions, or lookup in external metadata control table, you will need to process the metadata of the table itself to construct source that can be applied.
This example demonstrates two, of many, ways that source code can be generated.
metadata table from target table, Proc CONTENTS
process metadata, Proc SQL
construct source code
Expectation of name lists < 64K
SQL INTO :<macro-variable> for source code expected to be < 64K characters
Very large name lists or robust
A macro that streams source code from metadata table
From a data set with 50,000 variables select the columns whose name contains 2912
data have;
retain id 'HOOPLA12345' x1-x50000 .;
stop;
run;
* obtain metadata of target table;
proc contents noprint data=have
out=varlist_table
( keep=name
where= (
prxmatch('/x.*2912.*/',name) /* name selection criteria */
)
);
run;
* Short lists;
* construct source code for name list;
proc sql noprint;
select name into :varlist separated by ' ' from varlist_table;
data want;
set have (keep=&varlist); /* apply generated source code */
run;
* Arbitrary or Long lists expected;
%macro stream_column (data=, column=);
%local dsid index &column;
%let dsid=%sysfunc(open(&data(keep=&column)));
%if &dsid %then %do;
%syscall SET(dsid);
%do %while (0=%sysfunc(fetch(&dsid)));
&&&column. /* emit column value from table */
%end;
%let dsid = %sysfunc(close(&dsid));
%end;
%mend;
options mprint;
data want2;
set have (keep=
/* stream source code as macro text emissions */
%stream_column(data=varlist_table,column=name)
);
run;
I have 8 tables, all containing the same order and number of columns, while one specific column named ATTRIBUTE contains different data which is of length 4 to 25. When I use PROC SQL and UNION ALL tables, the ATTRIBUTE column data length in minimizes to the lowest (4 digits).
How do I solve that i.e keeping full length of the data ?
Example, per #Lee
data have1;
attrib name length=$10 format=$10.;
name = "Anton Short";
run;
data have2;
attrib name length=$50 format=$50.;
name = "Pippy Longstocking of Stoyville";
run;
* column attributes such as format, informat and label of the selected columns
* in the result set are 'inherited' on a first found first kept order, dependent on
* the SQL join plan (i.e. the order of the tables as coded for the query);
proc sql;
create table want as
select name from have1 union
select name from have2
;
proc contents data=want varnum;
run;
Format is shorted than Length, any output display of longer values will appear to have been truncated at the data level.
* attributes of columns can be reset,
* (cleared so as to be dealt with in default manners),
* without rewriting the entire data set;
proc datasets nolist lib=work;
modify want;
attrib name format=; * format= removes the format of a variable;
run;
proc contents data=want varnum;
run;