SAS rename table with name with blanks - sas

I have 2 tables:retail with my data and col_dic as a dictionary for column names. In col_dic there are 2 columns - eng_name and eng_name_bl.
So th code is:
data _null_;
set col_dic end = last;
if _n_ eq 1 then call execute('proc datasets nolist lib=work; modify retail; rename');
call execute(catx('=', eng_name,eng_name_bl));
if last then call execute(';quit;');
run;
After executing log gives a mistake, where it wants '=' after blank in new column name.
How can i avoid it?
Example that does work:
data col_dic;
length eng_name eng_name_bl $20;
eng_name = 'AGE';
eng_name_bl = 'AGE_FIX';
output;
eng_name = 'HEIGHT';
eng_name_bl = 'HEIGHT_FIX';
output;
run;
data class;
set sashelp.class;
run;
data _null_;
set col_dic end = last;
if _n_ eq 1 then call execute('proc datasets nolist lib=work; modify class; rename');
call execute(catx('=', eng_name,eng_name_bl));
if last then call execute(';quit;');
run;

First of all don't do this. Variable names with spaces in them are pain in the neck. Why don't you just use the value with spaces in it as the LABEL instead of the NAME?
If you do want to specify a variable name that contains spaces then you need make sure to set
option validvarname=any;
Then in the code generation step use the NLITERAL() function to convert the string with spaces to a valid SAS name literal to avoid the syntax errors.
call execute(catx('=', nliteral(eng_name),nliteral(eng_name_bl)));

Related

How can I make the first row of a SAS dataset the variable names?

I have an already imported dataset where the first row contains the variable names. I know that typically when importing a dataset you use getnames = yes. However, if the data is already imported how can I make the first row the variable names using a data step?
Data looks like:
A B C
1 Name 1 Name 2 Name 3
2 2 4 66
3 3 5 6
Since reading the names as data probably made all of your variables character you can try just transposing the data twice to fix it. That will work well for small datasets.
So the first transpose will place the current name into the _NAME_ variable and convert each row into a column. The second proc transpose can drop the original name and use the first row (new COL1 variable) as the names.
proc transpose data=have out=wide ;
var _all_;
run;
proc transpose data=wide(drop=_name_ rename=(col1=_name_)) out=want(drop=_name_ _label_);
var col:;
id _name_;
run;
The problem with the already imported data is that all the numeric data was likely placed in a character variables because the 'first row' of data seen by the import process contained some character data, and drove the inference for automatic column construction.
Regardless, you will need to construct renaming pairs old-name=new-name for each variables that has to be renamed. The new-name being in row 1 makes it possible to transpose that row to arrange those name parts as data. SQL with :into and separated by can populate a macro variable for use in a proc datasets step that performs the column renaming without rewriting the entire data set. Finally, a DATA step with modify can remove a row in place, again, without rewriting the entire data set.
filename sandbox temp;
data _null_;
file sandbox;
put 'A,B,C';
put 'Name 1, Name 2, Name 3';
put '2,4,66';
put '3,5,6';
run;
proc import datafile=sandbox dbms=csv replace out=work.oops;
run;
proc transpose data=oops(obs=1) out=renames;
var _all_;
run;
proc sql noprint;
select cats(_name_,"=",compress(col1,,"KN"))
into :renames separated by ' '
from renames;
%put NOTE: &=renames;
proc datasets nolist lib=work;
modify oops;
rename &renames;
run;
data oops;
modify oops;
remove;
stop;
run;
%let syslast=oops;

set a dataset by dereferencing a variable

I would like to set a dataset by using a reference to dataset name however Iam getting error message: ERROR: File dataset_name123 does not exist(work.dataset123 does exist) What is wrong?
data _null_;
%let product = 'dataset_name123';
set work.&product nobs = row_no;
put row_no;
put &product;
run;
Member names are not quoted. Remove the quotes from your macro variable. In macro code everything is character so there is no need to add quotes around string literals. The quotes become part of the value of the macro variable.
%let product = dataset_name123;
%put &=product;
data _null_;
set work.&product nobs = row_no;
put row_no;
put "&product";
stop;
run;
If you do include quotes in a dataset reference then SAS will interpret it as the physical name of the dataset file itself. So code like:
data want;
set 'dataset_name123';
run;
would look for a filename 'dataset_name123.sas7bdat' in the current working directory.
It is not a great idea to do a %let statement in a data step. Macrovariables and SAS variables are created differently.
There are two problems in this code. First one is quotes around macrovariable, which after resolution will be used for table name and hence your query fails as table names cannot be in quotes .
second one is put statement for macro variable for macro variable to resolve you need %put.
below is modified code.
data class;
set sashelp.class;
run;
data _null_;
%let product = class;
set work.&product nobs = row_no;
put row_no;
%put &product;
run;

Create new variables from format values

What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;

Compress Newline character for dynamic varaibles

Dataset: Have
F1 F2
Student Section
Name No
Dataset "Have". Data has new line character.
I need to compress the newline character from the data.
I want to do this dynamically as sometimes the "Have" dataset may contain new variables like F3,F4,F5 etc.,
I have written as macro to do this.. However it is not working as expected.
When i execute the below code, first time I am getting error as invalid reference newcnt. If i execute for second time in the same session, i am not getting error.
PFB my code:
%macro update_2(newcnt);
data HAVE;
set HAVE;
%do i= 1 %to &newcnt;
%let colname = F&i;
&colname=compress(&colname,,'c');
%end;
run;
%mend update_2;
%macro update_1();
proc sql noprint;
select count(*) into :cnt from dictionary.columns where libname="WORK" and memname="HAVE";
quit;
%update_2(&cnt)
%mend update_1;
Note: All the variables have name as F1,F2,F3,F4.,
Please tell me what is going wrong..
If there is any other procedures, please help me.
In your macro %update_1 you're creating a macro variable called &cnt, but when you call %update_2 you refer to another macro variable, &colcnt. Try fixing this reference and see if your code behaves as expected.
We created our own function to clean unwanted characters from strings using proc fcmp. In this case, our function cleans tab characters, line feeds, and carriage returns.
proc fcmp outlib=common.funcs.funcs; /* REPLACE TARGET DESTINATION AS NECESSARY */
function clean(iField $) $200;
length cleaned $200;
bad_char_list = byte(10) || byte(9) || byte(13);
cleaned = translate(iField," ",bad_char_list);
return (cleaned );
endsub;
run;
Create some test data with a new line character in the middle of it, then export it and view the results. You can see the string has been split across lines:
data x;
length employer $200;
employer = cats("blah",byte(10),"diblah");
run;
proc export data=x outfile="%sysfunc(pathname(work))\x.csv" dbms=csv replace;
run;
Run our newly created clean() function against the string and export it again. You can see it is now on a single line as desired:
data y;
set x;
employer = clean(employer);
run;
proc export data=y outfile="%sysfunc(pathname(work))\y.csv" dbms=csv replace;
run;
Now to apply this method to all character variables in our desired dataset. No need for macros, just define an array referencing all the character variables, and iterate over them applying the clean() function as we go:
data cleaned;
set x;
array a[*] _char_;
do cnt=lbound(a) to hbound(a);
a[cnt] = clean(a[cnt]);
end;
run;
EDIT : Also note that fcmp may have some performance considerations to consider. If you are working with very large amounts of data, there may be other solutions that will perform better.
EDIT 6/15/2020 : Corrected missing length statement that could result in truncated responses.
Here's an example of Robert Penridge's function, as a call routine with an array as an argument. This probably only works in 9.4+ or possibly later updates of 9.3, when permanent arrays began being allowed to be used as arguments in this way.
I'm not sure if this could be done flexibly with an array as a function; without using macros (which require recompilation of the function constantly) I don't know how one could make the right size of array be returned without doing it as a call routine.
I added 'Z' to the drop list so it's obvious that it works.
options cmplib=work.funcs;
proc fcmp outlib=work.funcs.funcs;
sub clean(iField[*] $);
outargs iField;
bad_char_list = byte(11)|| byte(10) || byte(9) || byte(13)||"Z";
do _i = 1 to dim(iField);
iField[_i] = translate(iField[_i],trimn(" "),bad_char_list);
end;
endsub;
quit;
data y;
length employer1-employer5 $20;
array employer[4] $;
do _i = 1 to dim(employer);
employer[_i] = "Hello"||byte(32)||"Z"||"Goodbye";
end;
employer5 = "Hello"||byte(32)||"Z"||"Goodbye";
call clean(employer);
run;
proc print data=y;
run;
Here is another alternative. If newline is the only thing you want to remove, then we are talking about Char only, you may leverage implicit array and Do over,
data want;
set have;
array chr _character_;
do over chr;
chr=compress(chr,,'c');
end;
run;

Sas renaming variable with do loop and if then condition

I'm trying to rename variables x0 - x40 so that x0 will become y_q1_2014, x1 will become y_q4_2013, x2 will become y_q3_2013 and so on till x40 that will become y_q1_2004.
I want my new variable to display in its name the quarter and year of the observation. Now I have the following macro in SAS that is not working properly: the values of j and k are not changing according to the if - then condition. What am i doing wrong?
%macro rename(data);
%let j=1;
%let k=2014;
%do i = 0 %to 40 %by 1;
data mydata;
set &data.;
y_q&j._&k. = x&i.;
if &j.=1 then do k = &k.-1 and j = 4;
else do j=&j.-1;
run;
%end;
%mend;
This will likely be easier to do using the data step rather than a macro loop (as most things are!).
In this case, you have two problems:
How to mass-rename variables
How to convert x# to y_q#_####
An easy way to rename variables is to create a dataset with the variable names as rows, then create the new variable names. You can then pull that into a rename list very easily.
So something like this would do that.
*Create dataset with names in it.
data names;
set sashelp.vcolumn;
where memname='HAVE' and libname='WORK' and name =: 'X';
keep name;
run;
*some operation to determine new_name needs to go in that dataset also - coming later;
*Now create a list of rename macro calls.
proc sql;
select cats('%rename(var=',name,',newvar=',new_name,')')
into :renamelist separated by ' '
from names;
quit;
*Here is the simple rename macro.
%macro rename(var=,newvar=);
rename &var.=&newvar.;
%mend rename;
*Now do the renames. Can also go in a data step.
proc datasets lib=work;
modify have;
&renamelist.
quit;
How to convert is a more interesting question, and begs the question: is this a one time thing, or is this a repeated process? If it's a repeated process, does X0 always mean the most recent quarter in the data, or does it always mean q1 2014?
Assuming it is always the most recent quarter, you can use intnx to do this.
%let initdate='01JAN2014'd;
data have;
do x = 0 to 40;
qtr = intnx('QUARTER',&initdate,-1*x);
format qtr YYQ.;
output;
end;
run;
You can thus use this code (the portion inside the do loop, operating on an x that you pull out of the name in the dataset) in the earlier names data step to create new_name however you want. You might use the YYQ format in your new name if you have flexibility here (as it's standard, and the easiest solution). Otherwise, you would want to pull this apart either using put and then substring, or quarter() and year() functions off of the date variable here.