PROC SQL: Warning variable already exists on multiple dataset join - sas

I have this data check integrity code for an oncology study I'm working on. This is ostensibly to confirm TU,TR and RS are consistent.
proc sql ;
create table tu_tr_rs as
select tu.*,tr.*,rs.*
from trans.tu as tu
left join trans.tr as tr on tu.usubjid=tr.usubjid and tu.TULNKID
=tr.TRLNKID and tu.tudtc=tr.trdtc
left join trans.rs as rs on tr.usubjid=rs.usubjid and tr.trdtc=
rs.rsdtc
;
quit;
However, when I run this code I get the warning
"Variable XXXX already exists on file WORK.TU_TR_RS."
When I add the feedback option to PROC SQL to get a more granular look I get this
So I know if it's one variable that brings this warning up you can use a rename/DROP combination to work around it but for this case is it just the case that I have to explicitly state the variables for each dataset in the select statement or is there something fundamentally wrong with the code?

Yes, if you want to select columns with the same name from 2 (or more) data sets, you simply need to select them explicitly and give them distinct names. Something like this:
create table tu_tr_rs as
select
tu.ColA as tu_ColA
,tu.ColB as tu_ColB
/* etc */
,tr.ColA as tr_ColA
,tr.ColB as tr_ColB
/* etc */
,rs.ColA as rs_ColA
,rs.ColB as rs_ColB
/* etc */
from trans.tu as tu
/* etc */

Related

One sql query creating two seperate tables

Sometimes I like to make an instant copy of a data set outside of the work library, so if development gets messy I always know where my back up copy of a critical data set is located.
Ex - Lets say I have a permanent library already established, named source.
I can use a data step in order to create a set (set_1) in two different spots.
data set_1 source.set_1;
set sashelp.cars;
run;
I do understand the below sql (or even a copy procedure for that matter) would be equivalent results to the data step above:
proc sql;
create table set_1 as
select distinct *
from sashelp.cars
;
create table source.set_1 as
select *
from set_1
;
quit;
I sound lazy here, but I am interested to know if there is a method in proc sql in which I can just call two sets to be made off the same query, such as the data step example above.
You cannot. Stick to using data steps.

Filter a SAS dataset to contain only identifiers given in a list

I am working in SAS Enterprise guide and have a one column SAS table that contains unique identifiers (id_list).
I want to filter another SAS table to contain only observations that can be found in id_list.
My code so far is:
proc sql noprint;
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN id_list
quit;
This code gives me the following errors:
Error 22-322: Syntax error, expecting on of the following: (, SELECT.
What am I doing wrong?
Thanks up front for the help.
You can't just give it the table name. You need to make a subquery that includes what variable you want it to read from ID_LIST.
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN (select id from id_list)
;
You could use a join in proc sql but probably simpler to use a merge in a data step with an in= statement.
data want;
merge oneColData(in = A) otherData(in = B);
by id_list;
if A;
run;
You merge the two datasets together, and then using if A you only take the ID's that appear in the single column dataset. For this to work you have to merge on id_list which must be in both datasets, and both datasets must be sorted by id_list.
The problem with using a Data Step instead of a PROC SQL is that for the Data step the Data-set must be sorted on the variable used for the merge. If this is not yet the case, the complete Data-set must be sorted first.
If I have a very large SAS Data-set, which is not sorted on the variable to be merged, I have to sort it first (which can take quite some time). If I use the subquery in PROC SQL, I can read the Data-set selectively, so no sort is needed.
My bet is that PROC SQL is much faster for large Data-sets from which you want only a small subset.

SAS : Select rows from a relationnal database

I work with SAS on a relationnal database that I can access with a libname odbc statement as below :
libname myDBMS odbc datasrc="myDBMS";
Say the database contains a table named 'myTable' with a numeric variable 'var_ex' which values can be 0,1 or . (missing). Now say I want to exclude all rows for which var_ex=1.
If I use the following :
DATA test1;
SET myDBMS.myTable; /* I call directly the table from the DBMS */
where var_ex NE 1;
run;
I don't get rows for which 'var_ex' is missing. Here is a screenshot of the log, with my actual data :
Whereas if I do the exact same thing after importing the table in the Work :
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
DATA test2;
SET myTable; /* I call the table from the work */
where var_ex NE 1;
run;
I select rows for which 'var_ex' is 0 or missing, as intended. Here is a screenshot of the log, with my actual data :
The same happens if I use PROC SQL instead of a DATA step, or another NE-like.
I did some research and more or less understood here that unintended stuff like that can happen if you work directly on a DBMS table.
Does that mean is it simply not recommended to work with a DBMS table, and one has to import table locally as below before doing anything ?
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
Or is there a proper way to manipulate such tables ?
The best way to test how SAS is translating the data step code into database code is through the sastrace system option. Before running code, try this:
options sastrace=',,,db' sastraceloc=saslog;
Then run your code tests. When you check the log, you will see precisely how SAS is translating the code (if it can at all). If it can't, you'll see,
ACCESS ENGINE: SQL statement was not passed to the DBMS, SAS will do the processing.
followed by a select * from table.
In general, if SAS cannot translate data step code into dbms-specific code, it will pull everything to locally manipulate the data. By viewing this output, you can determine precisely how to get the data step to translate into what you need.
If all else fails, you can use explicit SQL pass-through. The code in parentheses operates the same way as if you're running SQL directly from some other client.
proc sql;
connect to odbc(datasrc='source' user='username' pass='password');
create table want as
select * from connection to odbc
(<code specific to your dbms language>);
disconnect from odbc;
quit;

Dynamically create define in a PROC REPORT

I have a dataset(liste_institution) that contain all the name of the variable that I want to "define" in my proc report statement. Here is my code that work when I call my macro not dynamically(%create_institution(815);). If I use the data statement with the call execute(in comment in my code) it not working. The reason seem to be that when I use the call execute the code is not interpreted in a PROC REPORT that is why it give me error.
proc report data = ventes_all_inst4
missing split = "*" nowd
style(header)=[font_weight=bold background = #339966 foreground = white]
style(column)=[cellwidth=15cm];
%macro create_institution(institution);
define TOTAL_&institution. / display "TOTAL*($)" style(column)=[cellwidth=4cm];
%mend;
/* Give error when I use this data step */
/*data _null_;
set liste_institution;
call execute('%create_institution(' || INS || ');');
run;*/
%create_institution(815);
run;
Is there an easy way to create dynamically define statement in a PROC REPORT from a dataset that contain the column name.
Basically, you have a misunderstanding of how macros work and timing. You need to compile the macro list previous to the proc report, but you can't use call execute because that actually executes code. You need to create a macro variable.
Easiest way to do it is like so:
proc sql;
select cats('%create_institution(',ins,')')
into :inslist separated by ' '
from liste_institution
;
quit;
which makes &inslist which is now the list of institutions (with the macro call).
You also may be able to use across variables to allow this to be easier; what you'd have is one row per ins, with a single variable with that value (which defines the column name) and another single variable with the value that goes in the data table portion. Then SAS will automatically create columns for each across value. Across variables are one of the things that makes proc report extremely powerful.

SAS Proc SQL to add a constant to a variable

I have a SAS dataset with numeric variables to, from, and weight. Some of the observations have value 0 for weight. I need all the weight values to be positive, so I wish to simply add 1 to all weight values.
How can I do that using Proc SQL?
I have tried the following, but it doesn't work:
proc sql;
update mylib.mydata
set weight=weight+1;
quit;
The error is:
ERROR: A CURRENT-OF-CURSOR operation cannot be initiated because
the column "weight" cannot be used to uniquely identify a row
because of its data type.
Also, mylib refers to a Greenplum appliance. This might be the problem...
If you have the database permissions to update that table, you might want to use the SAS/Access pass-through facility. You will need to know the correct syntax for this to work. Here is a non-working example:
proc sql;
connect to greenplm as dbcon
(server=greenplum04 db=sample port=5432 user=gpusr1 password=gppwd1);
execute (
/* Native code goes here */
update sample.mydata
set weight=weight+1
) by dbcon;
quit;
The connection string would be the same as used on the LIBNAME that defined your "mylib' libref.
However, if you are really trying to create a SAS dataset (not update the real table), you can do that with a simple data step:
data mydata;
set mylib.mydata
weight = weight + 1;
run;
That will create a copy of the table that can be used with other SAS procedures.
Check out this note at prosgress.com. You probably need to add UPDATE_MULT_ROWS=YES to your library definition.