One sql query creating two seperate tables - sas

Sometimes I like to make an instant copy of a data set outside of the work library, so if development gets messy I always know where my back up copy of a critical data set is located.
Ex - Lets say I have a permanent library already established, named source.
I can use a data step in order to create a set (set_1) in two different spots.
data set_1 source.set_1;
set sashelp.cars;
run;
I do understand the below sql (or even a copy procedure for that matter) would be equivalent results to the data step above:
proc sql;
create table set_1 as
select distinct *
from sashelp.cars
;
create table source.set_1 as
select *
from set_1
;
quit;
I sound lazy here, but I am interested to know if there is a method in proc sql in which I can just call two sets to be made off the same query, such as the data step example above.

You cannot. Stick to using data steps.

Related

Can't use LAG function in Proc SQL in SAS

I have created proc sql query in SAS program, but need to use LAG function and it tell me it can't be used in proc sql, just in data step.
Code:
proc sql;
CREATE TABLE agg_table AS
SELECT USER, MAX(TIME) AS LAST_TIME, SUM(BONUS) AS BONUS_SUM, LAG(EXPDT) AS EXPDT_LAG FROM WORK.MY_DATA GROUP BY USER_ID;
So, I don't how to combine proc sql and datastep into one query to get one table as an output?
Or maybe there is a better approach to the whole problem?
Thanks
PROC SQL does not have a concept of rows the same way the datastep does. SQL may process rows in any order, not necessarily sequential, and may use hash tables, parallel processing, or various binary tree and similar methods to process its query; and the same query may be processed in different methods. Thus lag is not usable in SQL, nor are diff or other functions that expect row data.
It's unclear from your question what exactly you're doing, so it's not really possible to give a direct answer how to do this separately; but you may be able to accomplish this entirely in one datastep, or you may combine a datastep and a SQL query, or two datasteps. You can perform the lag in a prior datastep or a view, then the rest in SQL; or you may use a DoW loop datastep to perform the max/sum elements.

Filter a SAS dataset to contain only identifiers given in a list

I am working in SAS Enterprise guide and have a one column SAS table that contains unique identifiers (id_list).
I want to filter another SAS table to contain only observations that can be found in id_list.
My code so far is:
proc sql noprint;
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN id_list
quit;
This code gives me the following errors:
Error 22-322: Syntax error, expecting on of the following: (, SELECT.
What am I doing wrong?
Thanks up front for the help.
You can't just give it the table name. You need to make a subquery that includes what variable you want it to read from ID_LIST.
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN (select id from id_list)
;
You could use a join in proc sql but probably simpler to use a merge in a data step with an in= statement.
data want;
merge oneColData(in = A) otherData(in = B);
by id_list;
if A;
run;
You merge the two datasets together, and then using if A you only take the ID's that appear in the single column dataset. For this to work you have to merge on id_list which must be in both datasets, and both datasets must be sorted by id_list.
The problem with using a Data Step instead of a PROC SQL is that for the Data step the Data-set must be sorted on the variable used for the merge. If this is not yet the case, the complete Data-set must be sorted first.
If I have a very large SAS Data-set, which is not sorted on the variable to be merged, I have to sort it first (which can take quite some time). If I use the subquery in PROC SQL, I can read the Data-set selectively, so no sort is needed.
My bet is that PROC SQL is much faster for large Data-sets from which you want only a small subset.

SAS : Select rows from a relationnal database

I work with SAS on a relationnal database that I can access with a libname odbc statement as below :
libname myDBMS odbc datasrc="myDBMS";
Say the database contains a table named 'myTable' with a numeric variable 'var_ex' which values can be 0,1 or . (missing). Now say I want to exclude all rows for which var_ex=1.
If I use the following :
DATA test1;
SET myDBMS.myTable; /* I call directly the table from the DBMS */
where var_ex NE 1;
run;
I don't get rows for which 'var_ex' is missing. Here is a screenshot of the log, with my actual data :
Whereas if I do the exact same thing after importing the table in the Work :
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
DATA test2;
SET myTable; /* I call the table from the work */
where var_ex NE 1;
run;
I select rows for which 'var_ex' is 0 or missing, as intended. Here is a screenshot of the log, with my actual data :
The same happens if I use PROC SQL instead of a DATA step, or another NE-like.
I did some research and more or less understood here that unintended stuff like that can happen if you work directly on a DBMS table.
Does that mean is it simply not recommended to work with a DBMS table, and one has to import table locally as below before doing anything ?
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
Or is there a proper way to manipulate such tables ?
The best way to test how SAS is translating the data step code into database code is through the sastrace system option. Before running code, try this:
options sastrace=',,,db' sastraceloc=saslog;
Then run your code tests. When you check the log, you will see precisely how SAS is translating the code (if it can at all). If it can't, you'll see,
ACCESS ENGINE: SQL statement was not passed to the DBMS, SAS will do the processing.
followed by a select * from table.
In general, if SAS cannot translate data step code into dbms-specific code, it will pull everything to locally manipulate the data. By viewing this output, you can determine precisely how to get the data step to translate into what you need.
If all else fails, you can use explicit SQL pass-through. The code in parentheses operates the same way as if you're running SQL directly from some other client.
proc sql;
connect to odbc(datasrc='source' user='username' pass='password');
create table want as
select * from connection to odbc
(<code specific to your dbms language>);
disconnect from odbc;
quit;

SAS Proc SQL to add a constant to a variable

I have a SAS dataset with numeric variables to, from, and weight. Some of the observations have value 0 for weight. I need all the weight values to be positive, so I wish to simply add 1 to all weight values.
How can I do that using Proc SQL?
I have tried the following, but it doesn't work:
proc sql;
update mylib.mydata
set weight=weight+1;
quit;
The error is:
ERROR: A CURRENT-OF-CURSOR operation cannot be initiated because
the column "weight" cannot be used to uniquely identify a row
because of its data type.
Also, mylib refers to a Greenplum appliance. This might be the problem...
If you have the database permissions to update that table, you might want to use the SAS/Access pass-through facility. You will need to know the correct syntax for this to work. Here is a non-working example:
proc sql;
connect to greenplm as dbcon
(server=greenplum04 db=sample port=5432 user=gpusr1 password=gppwd1);
execute (
/* Native code goes here */
update sample.mydata
set weight=weight+1
) by dbcon;
quit;
The connection string would be the same as used on the LIBNAME that defined your "mylib' libref.
However, if you are really trying to create a SAS dataset (not update the real table), you can do that with a simple data step:
data mydata;
set mylib.mydata
weight = weight + 1;
run;
That will create a copy of the table that can be used with other SAS procedures.
Check out this note at prosgress.com. You probably need to add UPDATE_MULT_ROWS=YES to your library definition.

how to get timing information of a data step query

I just wanted to know like in proc sql we define stimer option.
The PROC SQL option STIMER | NOSTIMER specifies whether PROC SQL writes timing information for each statement to the SAS log, instead of writing a cumulative value for the entire procedure. NOSTIMER is the default.
Now in same way how to specify timing information in data set step. I am not using proc sql step
data h;
select name,empid
from employeemaster;
quit;
PROC SQL steps individually are effectively separate data steps, so in a sense you always get the identical information from SAS. What you're asking is effectively how to find out how long 'select name' takes versus 'empid'.
There's not a direct way to get the timing of an individual statement in a data step, but you could write data step code to find out. The problem is that the data step is executed row-wise, so it's really quite different from the PROC SQL STIMER details; almost nothing you do in a data step will take very long by itself, unless you are doing something more complex like a hash table lookup. What takes long is writing out the data first, and reading in the data second.
You do have some options for troubleshooting long data steps, if that's your concern. OPTIONS MSGLEVEL=I will give you information about index usage, merge details, etc., which can be helpful if you aren't sure why it is taking a long time to do certain things (see http://goo.gl/bpGWL in SAS documentation for more info). You can write your own timestamp:
data test;
set sashelp.class sashelp.class;
_t=time();
put _t=;
run;
Odds are that won't show you much of use since most data step iterations won't take very long but if you are doing something fancy it might help. You could also use conditional statements to only print the time at certain intervals - when at FIRST.ID for example in a process that works BY ID;.
Ultimately though the information you already get from notes is what is most useful. In PROC SQL you need the STIMER information because SQL is doing several things at once, while SAS lets/makes you do everything out step-wise. Example:
PROC SQL;
create table X as select * from A,B where A.ID=B.ID;
quit;
is one step - but in SAS this would be:
proc sort data=a; by ID; run;
proc sort data=b; by ID; run;
data x;
merge a(in=a) b(in=b);
by id;
if a and b;
run;
For that you would get information on the duration of each of those steps (the two sorts and the merge) in SAS, which is similar to what STIMER would tell you.
No way.
PROC SQL STIMER logs timing for each separately executable SQL statement/query.
In data step, as you may know, the data step looping occurs, observation per observation, so the data step statement timing would be something like per observation, let's say transactional. Anyway this would not describe all the details where the time is being spent - waiting for disk reads, writes, etc.
So I guess this won't be very usable. In general, SAS performance is I/O driven.