I am working with a huge dataset in sas trying to use proc sql and I need help setting up a like statement. I'm trying to extract all the columns that have 'eco' in the name
I'm getting an error in the where statement as it is not registering the second *.
Any help?
proc sql
select *
from cfy19e8
where * LIKE %eco%;
You could concatenate all of your columns with catx() and find any one that has the word eco.
data have;
input col1$ col2$ col3$;
datalines;
sadfeco kdoa wrfs
asdf asdf sadf
mfecosa mawoeco mfzeco
;
run;
data want;
set have;
where catx('|', col1, col2, col3) LIKE '%eco%';
run;
If you have a lot of character columns, you could use the shortcut _CHARACTER_ to concatenate all variables, then use find() within an if statement in a data step.
data want;
set have;
if(find(catx('|', of _CHARACTER_), 'eco') );
run;
Perhaps
proc contents noprint data=cfy19e8 out=eco_columns(where=(upcase(name) like '%ECO%'));
run;
title 'Columns with ECO in their name';
proc print data=eco_columns;
var name;
run;
I'm a beginner in SAS and i have difficulties with this exercice:
I have a very simple table with 2 columns and three lines
I try to find the request that will return me the name of the most little people (so it must return titi)
All what I found is to return the most little size (157) but i don't want this, I want the name related to the most little value!
Could you help me please?
Larapa
A SQL having clause is a good one for this. SAS will automatically summarize the data and merge it back to the original table, giving you one a one-line table with the name of the smallest value of taille.
proc sql noprint;
create table want as
select nom
from have
having taille = min(taille)
;
quit;
Here are some other ways you can do it:
Using PROC MEANS:
proc means data=have noprint;
id nom;
output out=want
min(taille) = min_taille;
run;
Using sort and a data step to keep only the first observation:
proc sort data=have;
by taille;
run;
data want;
set have;
if(_N_ = 1);
run;
I've created a user defined format by using proc format statements.Would like to create a macro over it in a way that if the input data changes, the code should able to do change accordingly.
Here is the code:
proc format ;
value $a 1='1-sepstrata'
0='0-Non-sepstrata'
A='A-sepstrata';
run;
In the dateset I've,a columns named stratum which has unique values such as 1,0,A.
Select the distinct values of STRATA and use it to generate the format definition in a file. Then use PROC FORMAT to create the format.
proc sql;
create table fmtdef as
select '$A' as fmtname
, strata as start
, catx('-',strata
,case when (strata='0') then 'Non-sepstrata' else 'sepstrata' end
) as label
from have
group by strata
order by fmtname,start
;
quit;
proc format lib=work.formats cntlin=fmtdef;
run;
I am a newbie to SAS Base, and I am struggling to create a simple program that extracts data from a table on my database, runs e.g. PROC MEANS, and writes the data back to the table.
I know how to use PROC SQL (read and update tables) and PROC MEANS, but I can't figure out how to combine the steps.
PROC SQL;
SELECT make,model,type,invoice,horsepower
FROM
SASHELP.CARS
;
QUIT;
PROC Means;
RUN;
What I want to accomplish is create an additional column in the dataset with e.g. the mean of the horsepower.. and in the end I want to write that computed column to the table on the database.
Edit
What I was looking for is this:
PROC SQL;
create table want as
select make,model,type,invoice,horsepower
, mean(horsepower) as mean_horsepower
from sashelp.cars
;
QUIT;
PROC MEANS DATA=want;
RUN;
SAS makes this very easy to do with SQL since it will automatically remerge summary statistics back to detailed records.
create table want as
select make,model,type,invoice,horsepower
, mean(horsepower) as mean_horsepower
from sashelp.cars
;
Or using normal SAS code.
proc means data=sashelp.cars nway noprint ;
var horsepower ;
output out=mean_horsepower mean=mean_horsepower ;
run;
data want ;
set sashelp.cars ;
if _n_=1 then set mean_horsepower (keep=mean_horsepower);
run;
What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;