Run a PROC FREQ on var in dataset 1 computing correlations on a variable in dataset 2 - sas

I'm sorry if I'm asking a stupid question, I have some experience in R and am just starting to learn SAS. In enterprise guide I'm trying to compute a correlation matrix (cramv only) for categorical variables. The problem is that explanatory variables are on dataset1 while my objective variable is on dataset2. I cannot append the obj var column to dataset one for external reasons.
Is there a way to perform the procedure without having to create another dataset?
Thank you in advance!
this is how I imagine it would work out:
ods output ChiSq=CRAMV;
%put &charvar;
proc freq
data= dataset1 dataset2
tables (&charvar) * (objvar) / chisq;
run;

SAS procedures only operate against a single dataset or view. If you didn't want to create another dataset then you could create a view that appends the objvar column to dataset one.
Creating a view can be done with either proc sql; create view x as... or in a data step, data x / view=x...

Related

Proc contents looping through table names from a different data set

I am a newbie to SAS and I am trying to execute below code to obtain all the information for a particular library. However it fails in between due to data in a particular dataset. Is there any way to read dataset names from a different dataset and loop through them creating a different dataset specific to each datasetname from the list?
Proc contents data= testlib. _ALL_ out=x;
Run;
Instead I want something like this
Proc contents data in (work. Tbnames) out = x;
Run;
And read data from below data set.
Data tbnames(keep tablename) ;
Set WORK. tablenames;
Run;
Please help
St
Proc contents data = work.Tbnames out = x;
Run;
Use Proc COPY to copy data sets from one library to another.
libname testlib '<os-path-to-folder>';
proc copy in=testlib out=work memtype=DATA;
run;
Read the data from dictionary.table instead.
This assumes that you have the list of tables in a data set called tableNames and it has a variable called tName, which is the variable name. Note that it is a case sensitive comparison so UPCASE() is used make it all upper case.
proc sql;
create table summary as
select *
from dictionary.table
where memname in (select upcase(tName) from tableNames);
quit;
Or look at PROC DATASETS which operates on a library, not a single data set.
proc datasets lib=myLib;
run;quit;

SAS Proc format cntlin equivalent in Teradata

Is there an equivalent to the SAS format cntlin procedure in Teradata. I have a reference value table (code_value), which is used a lot and rather than doing many outer joins to the reference value table, I'd like to have a lookup function similar to the solution below in SAS. Any help is greatly appreciated.
data CodeValueFormat;
set grp.code_value (keep=code_value_id description);
fmtname = 'fmtCodeValue';
start = code_value_id;
label = description;
run;
proc format cntlin=work.codevalueformat;
run;
proc sql;
select foo_code_id format=fmtCodeValue.
from bar;
quit;
There is no way you can emulate SAS format cntlin procedure in Teradata or any other database other than using lookup tables. One way to avoid doing same joins again and again is to do index join. please look into below link to see whether this is what you want to do. https://info.teradata.com/HTMLPubs/DB_TTU_16_00/index.html#page/Database_Management%2FB035-1094-160K%2Fqiq1472240587768.html%23wwID0EFK1R
One another way is to maintain a denormalized table and do joins with your incremental/daily records in your staging area and then append this records to your final table

Adding a column is SAS using MODIFY (no sql)

I'm new to SAS and have some problems with adding a column to existing data set in SAS using MODIFY statement (without proc sql).
Let's say I have data like this
id name salary perks
1 John 2000 50
2 Mary 3000 120
What I need to get is a new column with the sum of salary and perks.
I tried to do it this way
data data1;
modify data1;
money=salary+perks;
run;
but apparently it doesn't work.
I would be grateful for any help!
As #Tom mentioned you use SET to access the dataset.
I generally don't recommend programming this way with the same name in set and data statements, especially as you're learning SAS. This is because it's harder to detect errors, since once run and encounter an error, you destroy your original dataset and have to recreate it before you start again.
If you want to work step by step, consider intermediary datasets and then clean up after you're done by using proc datasets to delete any unnecessary intermediary datasets. Use a naming conventions to be able to drop them all at once, i.e. data1, data2, data3 can be referenced as data1-data3 or data:.
data data2;
set data1;
money = salary + perks;
run;
You do now have two datasets but it's easy to drop datasets later on and you can now run your code in sections rather than running all at once.
Here's how you would drop intermediary datasets
proc datasets library=work nodetails holist;
delete data1-data3;
run;quit;
You can't add a column to an existing dataset. You can make a new dataset with the same name.
data data1;
set data1;
money=salary+perks;
run;
SAS will build it as a new physical file (with a temporary name) and when the step finishes without error it deletes the original and renames the new one.
If you want to use a data set you do it like this:
data dataset;
set dataset;
format new_column $12;
new_column = 'xxx';
run;
Or use Proc SQL and ALTER TABLE.
proc sql;
alter table dataset
add new_column char(8) format = $12.
;
quit;

How to merge multiple datasets by common variable in SAS?

I have multiple datasets. Each of them has different number of attributes. I want to merge them all by common variable. This is 'union' if I use proc SQL. But there is hunderds of variables.
Example.
Dataset_Name Number of columns
dataset1 110
dataset2 120
dataset3 130
... ...
Say they have 100 columns in common. The final dataset which contains all dataset1,dataset2,dataset3..etc
only has common columns(in this case, 100 columns).
How do I do this?
And how do I get columns for each dataset this is not in common with the final dataset.
example: dataset1 will have 10 columns that are not in the final dataset, and list the name of 10 columns.
Thanks!!!!
UNION in SQL is equivalent to sequential SET in SAS.
data want;
set dataset1 dataset2 dataset3;
run;
Now, SAS by default includes all columns present in any dataset. To limit to just what's in all datasets, you have to use a keep statement.
You can determine this using proc sql, among other ways.
proc sql;
select name into :commonlist separated by ' '
from dictionary.columns C, dictionary.columns D
where C.libname=D.libname
and C.memname='DATASET1'
and D.memname='DATASET2'
and C.name=D.name
;
quit;
For more than two datasets it's more complicated and partially depends on your, but if you're comfortable in SQL you can figure that out pretty easily. A similar construct can create a list of just dataset 1 variables. The important part is the into :commonlist separated by ' ', which says to pull the select results into a macro variable called commonlist, separating rows by space. (The colon says to create a macro variable, not a table.)
So you can then run:
data want (keep=&commonlist.) dset1(keep=&dset1list.) dset2(keep=&dset2list.);
set dataset1(in=ds1) dataset2(in=ds2) dataset3(in=ds3);
output want;
if ds1 then output dset1;
else if ds2 then output dset2;
else if ds3 then output dset3;
run;
The in=xyz indicates which dataset a row came from. Each output dataset can have a separate list of variables to keep. You might want to keep the ID variable in those other datasets as well.
I will say that usually in SAS you don't do what you're doing here: it's not easy to do because it doesn't tend to be the best way to handle things - specifically, the little split off datasets. In general you would just keep those extra variables on the master dataset, and they'd just be nulls for anyone not in a dataset with that variable - assuming it makes sense to make this 'master' dataset at all.

Running proc neural inside a loop

I have 10 datasets all same x and y but different observations for x and y in each dataset. Each dataset has 120 observations.
I am running proc neural on this dataset but I have to do this manually. Each time I have to change the data= ....and dmdbcat=..... option to include the correct dataset (10 times) and run Proc dmdb and Proc Neural,
Is there a way to to automate this ? Can this Proc Dmdb and Proc Neural run inside a loop so that it can pick up the right dataset iteratively and not have me do this manually ?
You could use the macro language to do this.
But just about every SAS PROC supports a BY statement, which is much more efficient than looping over a list of datasets.
Suggest you combine the datasets:
data all;
set data1 data2 data3 ... indsname=dsn;
datasetname=dsn;
run;
Then analyze:
proc neural data=all;
by datasetname;
run;