I have a "time" var of years in my data. I need to create a new var based on the following with PROC SQL
if time>mean(time)then new var=1 else, new var=0
I keep getting different error, how can I improve my code?
proc sql;
create table v3 as
select*,case
when time>mean(time)then time_group=1
else time_group=0 as time_group,*
from v2;
quit;
You're nearly there:
proc sql ;
create table v3 as select *, case when time>mean(time) then 1 else 0 end
as time_group from v2;
quit;
There are couple issues in your code. 1. The syntax of CASE WHEN is a little off. 2. When using summary functions, such Mean(), you need to make sure the scope of the mean. if no 'group by' is issued to define the scope, the scope is universal.
proc sql;
select *, case when age > mean(age) then 1 else 0 end as _age from sashelp.class group by sex;
quit;
Related
So I have a rather interesting problem. I am trying to insert a current date in specific formats and styles, but for some reason it seems to fail. I know its not a formatting issue... But idk how to fix it. a data step solution is welcomed as well... Here's what works.
proc sql;
create table work.test
(test_Id char(50), test_Name char(50), cur_Mo char(1), cur_Qtr char(1), entered_Date char(8));
insert into work.test
values('201703','2017 Mar','0','0','24APR17')
values('201704','2017 Apr','0','0','24APR17')
values('201706','2017 Jun','1','0','23JUN17');
quit;
Here's what doesn't:
proc sql;
insert into work.test
values(catx('',put(year(today()),4.),case when month(today())< 10 then catx('','0',put(month(today()),2.)) else put(month(today()),2.)end) ,catx(' ',[put(year(today()),4.),put(today(),monname3.))],'1','0',put(today(),date7.));
quit;
You can use the %SYSFUNC() macro function to call most other SAS function in macro code. So to generate today's date in DATE7 format you could use:
insert into work.test (date)
values("%sysfunc(date(),date7)")
;
The way I'd probably do it is to use a data step to make a dataset that you would insert, and then insert that dataset.
You can use insert into (...) select (...) from (...) syntax in SAS, and the data step is much more flexible as to allowing you to define columns.
For example:
proc sql;
create table class like sashelp.class;
quit;
proc sql;
insert into class
select * from sashelp.class;
quit;
Or you can specify only certain variables:
proc sql;
insert into class (name, age)
select name, age from sashelp.class;
quit;
data to_insert;
name= 'Wilma';
sex = 'F';
age = 29;
height = 61.2;
weight = 95.3;
run;
proc sql;
insert into class
select * from to_insert;
quit;
Just make sure you either explicitly list the variables to insert/select, or you have the order exactly right (it matches up by position if you use * like I do above).
I am trying to write a PROC SQL query in SAS to determine maximum of many columns starting with a particular letter (say RF*). The existing proc means statement which i have goes like this.
proc means data = input_table nway noprint missing;
var age x y z RF: ST: ;
class a b c;
output out = output_table (drop = _type_ _freq_) max=;
run;
Where the columns RF: refers to all columns starting with RF and likewise for ST. I was wondering if there is something similar in PROC SQL, which i can use?
Thanks!
Dynamic SQL is indeed the way to go with this, if you must use SQL. The good news is that you can do it all in one proc sql call using only one macro variable, e.g.:
proc sql noprint;
select catx(' ','max(',name,') as',name) into :MAX_LIST separated by ','
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS'
and type = 'num'
/*eq: is not available in proc sql in my version of SAS, but we can use substr to match partial variable names*/
and upcase(substr(name,1,1)) in ('A','W') /*Match all numeric vars that have names starting with A or W*/
;
create table want as select SEX, &MAX_LIST
from sashelp.class
group by SEX;
quit;
It’s the first time that I’ve opened sas today and I’m looking at some code a colleague wrote.
So let’s say I have some data (import) where duplicates occur but I want only those which have a unique number named VTNR.
First she looks for unique numbers:
data M.import;
set M.import;
by VTNR;
if first.VTNR=1 then unique=1;
run;
Then she creates a table with the duplicated numbers:
data M.import_dup1;
set M.import;
where unique^=1;
run;
And finally a table with all duplicates.
But here she is really hardcoding the numbers, so for example:
data M.import_dup2;
set M.import;
where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041);
run;
I’m sure there must be a better way.
Since I’m only familiar with R I would write something like:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
I guess there must be something like the $ also for sas?
To me it looks like the most direct translation of the R code
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
Would be to use SQL code
proc sql;
create table import_dup2 as
select * from import
where VTNR in (select VTNR from import_dup1)
;
quit;
But if your intent is to find the observations in IMPORT that have more than one observation per VTNR value there is no need to first create some other table.
data import_dup2 ;
set import;
by VTNR ;
if not (first.VTNR and last.VTNR);
run;
I would use the options in PROC SORT.
Make sure to specify an OUT= dataset otherwise you'll overwrite your original data.
/*Generate fake data with dups*/
data class;
set sashelp.class sashelp.class(obs=5);
run;
/*Create unique and dup dataset*/
proc sort data=class nouniquekey uniqueout=uniquerecs out=dups;
by name;
run;
/*Display results - for demo*/
proc print data=uniquerecs;
title 'Unique Records';
run;
proc print data=dups;
title 'Duplicate Records';
run;
Above solution can give you duplicates but not unique values. There are many possible ways to do both in SAS. Very easy to understand would be a SQL solution.
proc sql;
create table no_duplicates as
select *
from import
group by VTNR
having count(*) = 1
;
create table all_duplicates as
select *
from import
group by VTNR
having count(*) > 1
;
quit;
I would use Reeza's or Tom's solution, but for completeness, the solution most similar to R (and your preexisting code) would be three steps. Again, I wouldn't use this here, it's excess work for something you can do more easily, but the concept is helpful in other situations.
First, get the dataset of duplicates - either her method, or proc sort.
proc sort nodupkey data=have out=nodups dupout=dups;
by byvar;
run;
Then pull those into a macro list:
proc sql;
select byvar
into :duplist separated by ','
from dups;
quit;
Then you have them in &duplist. and can use them like so:
data want;
set have;
if not (byvar in &duplist.);
run;
data want;
set import;
where VTNR in import_dup1;
run;
I know in teradata or other sql platforms you can find the count distinct of a combination of variables by doing:
select count(distinct x1||x2)
from db.table
And this will give all the unique combinations of x1,x2 pairs.
This syntax, however, does not work in proc sql.
Is there anyway to perform such a count in proc sql?
Thanks.
That syntax works perfectly fine in PROC SQL.
proc sql;
select count(distinct name||sex)
from sashelp.class;
quit;
If the fields are numeric, you must put them to character (using put) or use cat or one of its siblings, which happily take either numeric or character.
proc sql;
select count(distinct cats(age,sex))
from sashelp.class;
quit;
This maybe redundant, but when you mentioned "combination", it instantly triggered 'permutation' in my mind. So here is one solution to differentiate these two:
DATA TEST;
INPUT (X1 X2) (:$8.);
CARDS;
A B
B A
C D
C D
;
PROC SQL;
SELECT COUNT(*) AS TOTAL, COUNT(DISTINCT CATS(X1,X2)) AS PERMUTATION,
COUNT(DISTINCT CATS(IFC(X1<=X2,X1,X2),IFC(X1>X2,X1,X2))) AS COMBINATION
FROM TEST;
QUIT;
I need to create multiple tables using proc sql
proc sql;
/* first city */
create table London as
select * from connection to myDatabase
(select * from mainTable
where city = 'London');
/* second city */
create table Beijing as
select * from connection to myDatabase
(select * from mainTable
where city = 'Beijing');
/* . . the same thing for other cities */
quit;
The names of those cities are in the sas table myCities
How can I embed the data step into proc sql in order to iterate through all cities ?
proc sql noprint;
select quote(city_varname) into :cities separated by ',' from myCities;
quit;
*This step above creates a list as a macro variable to be used with the in() operator below. EDIT: Per Joe's comment, added quote() function so that each city will go into the macro-var list within quotes, for proper referencing by in() operator below.
create table all_cities as
select * from connection to myDatabase
(select * from mainTable
where city in (&cities));
*this step is just the step you provided in your question, slightly modified to use in() with the macro-variable list defined above.
One relatively simple solution to this is to do this entirely in a data step. Assuming you can connect via libname (which if you can connect via connect to you probably can), let's say the libname is mydb. Using a similar construction to Max Power's for the first portion:
proc sql noprint;
select city_varname
into :citylist separated by ' '
from myCities;
select cats('%when(var=',city_varname,')')
into :whenlist separated by ' '
from myCities;
quit;
%macro when(var=);
when "&var." output &var.;
%mend when;
data &citylist.;
set mydb.mainTable;
select(city);
&whenlist.;
otherwise;
end;
run;
If you're using most of the data in mainTable, this probably wouldn't be much slower than doing it database-side, as you're moving all of the data anyway - and likely it would be faster since you only hit the database once.
Even better would be to pull this to one table (like Max shows), but this is a reasonable method if you do need to create multiple tables.
You need to put your proc sql code into a SAS Macro.
Create a macro-variable for City (in my example I called the macro-variable "City").
Execute the macro from a datastep program. Since the Datastep program processes one for each observation, there is no need to create complex logic to iterate.
data mycities;
infile datalines dsd;
input macrocity $ 32.;
datalines;
London
Beijing
Buenos_Aires
;
run;
%macro createtablecity(city=);
proc sql;
/* all cities */
create table &city. as
select * from connection to myDatabase
(select * from mainTable
where city = "&city.");
quit;
%mend;
data _null_;
set mycities;
city = macrocity;
call execute('%createtablecity('||city||')');
run;
Similar to the other solutions here really, maybe a bit simpler... Pull out a distinct list of cities, place into macros, run SQL query within a do loop.
Proc sql noprint;
Select distinct city, count(city) as c
Into :n1-:n999, :c
From connection to mydb
(Select *
From mainTable)
;
Quit;
%macro createTables;
%do a=1 %to &c;
Proc sql;
Create table &&n&a as
Select *
From connection to myDb
(Select *
From mainTable
Where city="&&n&a")
;
Quit;
%end;
%mend createTables;
%createTables;