Can anybody explain to me how to get PROC SQL to give the results of my custom function the length I specify in the function definition? Datastep does it fine, but SQL gives me the default length of 200 characters.
Here is code that demonstrates the issue:
proc fcmp outlib = work.funcs.funcs ;
* Note type/length specification ;
function testy(istr $) $11 ;
return ('bibbitybobb') ;
endsub ;
quit ;
options cmplib = work.funcs ;
data from_dstep ;
set sashelp.class ;
tes = testy(name) ;
run ;
proc sql ;
create table from_sql as
select *
, testy(name) as tes
from sashelp.class
;
describe table from_dstep ;
describe table from_sql ;
quit ;
My log on that is:
47 describe table from_dstep ;
NOTE: SQL table WORK.FROM_DSTEP was created like:
create table WORK.FROM_DSTEP( bufsize=65536 )
(
Name char(8),
Sex char(1),
Age num,
Height num,
Weight num,
tes char(11)
);
48 describe table from_sql ;
NOTE: SQL table WORK.FROM_SQL was created like:
create table WORK.FROM_SQL( bufsize=65536 )
(
Name char(8),
Sex char(1),
Age num,
Height num,
Weight num,
tes char(200)
);
As you can see, the datastep gave me a length of 11 on my 'tes' variable, but sql gives me 200.
Is there a way to get length 11 when using SQL?
Unfortunately, I don't think so. SQL and data step work differently in this regard, other built-in functions have some of the same issues (CATS/CATX for example have different defaults in SQL than in data step). I think it has to do with how compilation works in the data step vs. interpretation in SQL. I believe I've seen something specifying this was expected behavior, but I can't seem to find it right now; if you'd like more detail and nobody else here can provide it, perhaps start a track with SAS support and see what they say.
You can directly set it in SQL of course:
proc sql ;
create table from_sql as
select *
, testy(name) as tes length 11
from sashelp.class
;
quit;
Related
I have a pass-through query using proc sql (excerpt below), in which some of the resulting names are modified since they aren't valid SAS names. 1A is replaced by _A, 2A is replaced by _A0, and some other changes are made. My questions are:
Is there a document which explains the rules for name replacement e.g. 2A becomes _A0?
Is it possible for me to change the way SAS corrects the names? For example, can I make 1A become _1A instead of _A?
.
proc sql;
connect to oracle as clc([omitted]);
CREATE table out.bk_ald as
SELECT *
FROM connection to bpm (
SELECT
, "1A"
, "1B"
, "1C"
, "1D"
, "1E"
, "2A"
, "2B"
, "2C"
...
You cannot change the algorithm and I am not sure if it is published. But you could either rename the column yourself on the Oracle side.
select * from connection to oracle (select "1A" as "_1A", ...);
Or rename on the SAS side. SAS will store the original name as the variable's LABEL. You could query the metadata and use that to rename the variables.
proc contents data=bk_ald noprint out=contents; run;
proc sql noprint ;
select catx(name,'=',cats('_',label)) into :rename separated by ' '
from contents
where upcase(name) ne upcase(label)
;
quit;
data want ;
set bk_ald;
rename &rename ;
run;
I want to store count of dataset in variable like below
%let Cnt ;
create table work.delaycheck as
select * from connection to oracle
(
SELECT PTNR_ID,CLNT_ID,REPORTING_DATE_KEY,NET_SALES
FROM FACT_TABLE
MINUS
SELECT PTNR_ID,CLNT_ID,REPORTING_DATE_KEY,NET_SALES
FROM HIST_FCT
);
I want to store count of this table in the variable Cnt like below
%put = (Select count(*) from work.delaycheck )
And Then
If(Cnt=0)
THEN
DO NOTHING
ELSE
execute(
Insert into Oracle_table
select * from work.delaycheck
) by oracle;
disconnect from oracle;
quit;
How can I acheive these steps? Thanks In advance!!
All of the SQL and data shown is occurring remotely. You can perform all the activity there without involving SAS. Oracle will process
PROC SQL;
CONNECT TO ORACLE ...;
EXECUTE (
INSERT INTO <TARGET_TABLE>
SELECT * FROM
( SELECT PTNR_ID,CLNT_ID,REPORTING_DATE_KEY,NET_SALES
FROM FACT_TABLE
MINUS
SELECT PTNR_ID,CLNT_ID,REPORTING_DATE_KEY,NET_SALES
FROM HIST_FCT
)
) BY ORACLE;
and not insert any records if the fact table is comprised of only historical facts.
EXECUTE can also submit PL/SQL statements, which in-turn can reduce the need for extraneous system interplay.
Delete this line from your code
%let Cnt ;
In order to get the count: Add the code below which will create the macro variable Cnt with the count:
proc sql;
Select count(*) into: Cnt from work.delaycheck ;
quit;
Update the if statement: the "&" is used to reference macro variables
If &cnt=0
The Code below shows how to use the if/else and the use of Call Execute:
data _null_;
if &cnt=0 then put 'Cnt is 0';/*if true: a note is written to the log*/
else call execute ('proc print data=work.e; run;');
/*else clause: the Proc Print code is executed*/
run;
I have a "time" var of years in my data. I need to create a new var based on the following with PROC SQL
if time>mean(time)then new var=1 else, new var=0
I keep getting different error, how can I improve my code?
proc sql;
create table v3 as
select*,case
when time>mean(time)then time_group=1
else time_group=0 as time_group,*
from v2;
quit;
You're nearly there:
proc sql ;
create table v3 as select *, case when time>mean(time) then 1 else 0 end
as time_group from v2;
quit;
There are couple issues in your code. 1. The syntax of CASE WHEN is a little off. 2. When using summary functions, such Mean(), you need to make sure the scope of the mean. if no 'group by' is issued to define the scope, the scope is universal.
proc sql;
select *, case when age > mean(age) then 1 else 0 end as _age from sashelp.class group by sex;
quit;
I have a table with postings by category (a number) that I transposed. I got a table with each column name as _number for example _16, _881, _853 etc. (they aren't in order).
I need to do the sum of all of them in a proc sql, but I don't want to create the variable in a data step, and I don't want to write all of the columns names either . I tried this but doesn't work:
proc sql;
select sum(_815-_16) as nnl
from craw.xxxx;
quit;
I tried going to the first number to the last and also from the number corresponding to the first place to the one corresponding to the last place. Gives me a number that it's not correct.
Any ideas?
Thanks!
You can't use variable lists in SQL, so _: and var1-var6 and var1--var8 don't work.
The easiest way to do this is a data step view.
proc sort data=sashelp.class out=class;
by sex;
run;
*Make transposed dataset with similar looking names;
proc transpose data=class out=transposed;
by sex;
id height;
var height;
run;
*Make view;
data transpose_forsql/view=transpose_forsql;
set transposed;
sumvar = sum(of _:); *I confirmed this does not include _N_ for some reason - not sure why!;
run;
proc sql;
select sum(sumvar) from transpose_Forsql;
quit;
I have no documentation to support this but from my experience, I believe SAS will assume that any sum() statement in SQL is the sql-aggregate statement, unless it has reason to believe otherwise.
The only way I can see for SAS to differentiate between the two is by the way arguments are passed into it. In the below example you can see that the internal sum() function has 3 arguments being passed in so SAS will treat this as the SAS sum() function (as the sql-aggregate statement only allows for a single argument). The result of the SAS function is then passed in as the single parameter to the sql-aggregate sum function:
proc sql noprint;
create table test as
select sex,
sum(sum(height,weight,0)) as sum_height_and_weight
from sashelp.class
group by 1
;
quit;
Result:
proc print data=test;
run;
sum_height_
Obs Sex and_weight
1 F 1356.3
2 M 1728.6
Also note a trick I've used in the code by passing in 0 to the SAS function - this is an easy way to add an additional parameter without changing the intended result. Depending on your data, you may want to swap out the 0 for a null value (ie. .).
EDIT: To address the issue of unknown column names, you can create a macro variable that contains the list of column names you want to sum together:
proc sql noprint;
select name into :varlist separated by ','
from sashelp.vcolumn
where libname='SASHELP'
and memname='CLASS'
and upcase(name) like '%T' /* MATCHES HEIGHT AND WEIGHT */
;
quit;
%put &varlist;
Result:
Height,Weight
Note that you would need to change the above wildcard to match your scenario - ie. matching fields that begin with an underscore, instead of fields that end with the letter T. So your final SQL statement will look something like this:
proc sql noprint;
create table test as
select sex,
sum(sum(&varlist,0)) as sum_of_fields_ending_with_t
from sashelp.class
group by 1
;
quit;
This provides an alternate approach to Joe's answer - though I believe using the view as he suggests is a cleaner way to go.
I have a table in SAS which consists of data from stock exchange. One of its columns holds information about date. I would like to create subtables, each one hold data from only one specific year.
Assuming you want to do this (often, this is an inferior option as analyses run separately by year can be run from one dataset using by year;, but certainly this can sometimes be appropriate), the gold standard method for doing this is the hash table, as the hash table can produce unlimited tables based on the data. I will edit in a method for doing this with hash table if I have time while running things this afternoon; it's the 'hashing' method described on this page.
Hashing code, adapted from the sascommunity.org page above:
data have;
call streaminit(7);
do year=1998 to 2014;
do id= 1 to 10;
x=rand('Uniform');
output;
end;
end;
run;
data _null_ ;
dcl hash byyear () ;
byyear.definekey ('k') ; if `id` or similar is a safe unique ID you could use that here, otherwise `k` is your unique identifier - hash requires unique;
byyear.definedata ('year','id','x') ;
byyear.definedone () ;
do k = 1 by 1 until ( last.year ) ;
set have;
by year ;
byyear.add () ;
end ;
dsetname=cats('year',year);
byyear.output (dataset: dsetname) ;
run ;
There is a similar set of methods that revolve around using a macro to generate the code. This paper goes into detail about one method to do that; I won't explain it in detail as I consider it inferior to the hash method (even if it is lower CPU time, it is more complicated to write than either a pure macro method or a pure hash method) but in certain cases it could be better.
A simple example of the macro method using the conceptual have aframe defined:
proc sql;
select distinct(cats('year',year(date))) into :dsetlist
separated by ' '
from have;
select distinct(cats('%outputyear(year=',year(date),')')) into :outputlist
separated by ' '
from have;
quit;
%macro outpuyear(year=);
if year(date)=&year. then output year&dset.;
%mend outputyear;
data &dsetlist.;
set have;
&outputlist.;
run;
data year1 year2 year3 yearN;
set stockdata;
if year(date) = 2014 then
output year1;
else if year(date) = 2013 then
output year2;
else if year(date) = 2012 then
output year3;
else
output yearN;
run;
You could also use case statements I guess.