Convert multiple variables from char to numeric

Convert multiple variables from char to numeric - sas

I have been working on converting a number of variables in my table to numerical types from characters. I discovered the method to alter one variable and can continue doing so for each variable. However, I wanted to solicit SE because I am having trouble developing a sustainable solution.
How can I edit multiple variables at once in SAS Studio 3.5?
My attempt thus far:
What works:
data work.want(rename=(age_group='Age Group'n));
set work.import;
age_group=input('Age Group'n,8.);
drop 'Age Group'n;
run;
What doesn't work:
data work.want(rename=(age_group='Age Group'n), rename=(dwelling_type='Dwelling Type'n));
set work.import;
age_group=input('Age Group'n,8.);
dwelling_type=input('Dwelling Type'n,8.);
drop 'Age Group'n, 'Dwelling Type'n;
run;

For starters your RENAME statement is incorrect. I don't recommend using that type of variable notation though, so I'm going to suggest labels instead. To convert multiple variables use an array. You do have to list them out once at least though, in the array statement.
data work.want;
set work.import;
array num_vars(*) age_group dwelling_type;
array char_vars(*) 'Age Group'n 'Dwelling Type'n;
do i=1 to dim(num_vars);
num_vars(i) = input(char_vars(i), 8.);
end;
label age_group = 'Age Group'
dwelling_type = 'Dwelling Type';
run;
If you wanted to do a RENAME as a dataset option, you would do it as follows, no comma's and the keyword rename once.
(rename=(age_group='Age Group'n dwelling_type='Dwelling Type'n));

Related

Delete all observations starting with a list of values from database (SAS)

I am trying to find the optimized way to do this :
I want to delete from a character variable all the observations STARTING with different possible strings such as :
"Subtotal" "Including:"
So if it starts with any of these values (or many others that i didn't write here) then delete them from the dataset.
Best solution would be a macro variable containing all the values but i don't know how to deal with it. (%let list = Subtotal Including: but counts them as variables while they are values)
I did this :
data a ; set b ;
if findw(product,"Subtotal") then delete ;
if findw(product,"Including:") then delete;
...
...
Would appreciate any suggestions !Thanks

First figure out what SAS code you want. Then you can begin to worry about how to use macro logic or macro variables.
Do you just to exclude the strings that start with the values?
data want ;
set have ;
where product not in: ("Subtotal" "Including");
run;
Or do you want to subset based on the first "word" in the string variable?
where scan(product,1) not in ("Subtotal" "Including");
Or perhaps case insensitive?
where lowcase(scan(product,1)) not in ("subtotal" "including");
Now if the list of values is small enough (less than 64K bytes) then you could put the list into a macro variable.
%let list="Subtotal" "Including";
And then later use the macro variable to generate the WHERE statement.
where product not in: (&list);
You could even generate the macro variable from a dataset of prefix values.
proc sql noprint;
select quote(trim(prefix)) into :list separated by ' '
from prefixes
;
quit;

Using SAS finance XIRR function on multiple rows with different numbers of variables

I am trying to use the SAS XIRR function on a dataset. The syntax is:
finance('XIRR',value1, value2, value3...valuen,date1,date2,date3...daten);
My problem is that the data has different numbers of values/dates on each row. There could be up to 122 values/dates per row.
Where there are missing values the XIRR function fails, so I set all missing values to 0. Now the function fails as the 'missing' dates are now Jan1960. Anyone got any ideas?
in the code below cf1-cf122 are the cash flow values and ed1-ed122 are the dates.
/* remove blanks */
data irrtable3;
set irrtable2;
array change _numeric_;
do over change;
if change=. then change=0;
end;
run;
/* create irr */
data irrtable4;
set irrtable3;
IRR=finance('XIRR',OF CF1-CF122,OF ED1-ED122);
run;```

You can use codegen to construct a dynamic FINANCE(..) call, with a variable number of arguments, that is resolved by the macro system at DATA step run-time.
Using RESOLVE to compute the result in macro environment for many, many rows will likely have a noticeable slowness compared to plain DATA step.
Example:
data have;
v1=−10000; d1=mdy(1, 1, 2008);
v2=2750; d2=mdy(3, 1, 2008);
v3=4250; d3=mdy(10, 30, 2008);
v4=3250; d4=mdy(2, 15, 2009);
v5=2750; d5=mdy(4, 1, 2009);
output;
call missing(v5,d5); output;
call missing(v4,d4); output;
call missing(v3,d3); output;
call missing(v2,d2); output;
run;
options missing=' ';
data want;
set have;
args = catx(',', of v1-v5, of d1-d5);
result = resolve( cats (
'%sysfunc(FINANCE(XIRR,', args, '))'
));
run;
options missing='.';

From what I can tell (And I don't work with Finance functions, so I'm not an expert), if you have all of the 'filled' arguments prior to the 'unfilled', you are okay to just set everything to zero that's missing (both on the 'value' and 'date' side). Using the example Richard provides (which is the one from the SAS documentation):
data want2;
set have;
array v v1-v5;
array d d1-d5;
do _i_ = 1 to dim(v);
if missing(v[_i_]) then do;
v[_i_]=0; d[_i_]=0;
end;
end;
args = catx(',', of v1-v5, of d1-d5);
result =FINANCE('XIRR',of v1-v5, of d1-d5);
run;
That works and gets the same result as Richard's, and is probably faster.
This does require the 0s to all be at the end - if they're interspersed, and you can't use CALL SORTN to get them put all on one end - and your data is too big to use with RESOLVE, then I would construct this entirely in the macro language. You could do a few things, all of which are too long for this answer, but the simplest is probably to create code for every line, and put them behind if _n_ = 5 then do; &row5code.; end; for each row. This would be very long, certainly, but should be faster than the resolve (just a lot less maintainable). You could also do a CALL EXECUTE for each line, also slow but a possibility, or even DOSUBL.

Keep Variables Created by Macro

I have the following code where I rename column names; I would like to keep only the variables created by the macro. I do realize I can drop the old variables but am curious if there is a keep option I can place inside the macro.
So for example, in the datastep, I would want to keep only the variable that start with '%transform_this(JUNE19)';
Thanks!
%macro transform_this(x);
&x._Actual=input(Current_Month, 9.0);
&x._Actual_Per_Unit = input(B, 9.);
&x._Budget=input(C, 9.);
&x._Budget_Per_Unit=input(D, 9.);
&x._Variance=input(E, 9.);
&x._Prior_Year_Act=input(G, 9.);
Account_Number=input(H, 9.);
Account_Description=put(I, 35.);
&x._YTD_Actual=input(Year_to_Date, 9.);
&x._YTD_Actual_Per_Unit=input(L, 9.);
%mend transform_this;
data June_53410_v1;
set June_53410;
%transform_this(JUNE19);
if Account_Description='Account Description' then DELETE;
Drop Current_Month B C D E G H I Year_to_Date L M N;
run;

keep June19_: Account_:;
This keeps all variables starting with June19_ and Account_ which are the ones you need evidently.

am curious if there is a keep option I can place inside the macro.
You can definitely use keep in your macro:
%macro transform_this(x);
keep &x._Actual &x._Actual_Per_Unit
&x._Budget &x._Budget_Per_Unit
&x._Variance &x._Prior_Year_Act
Account_Number Account_Description
&x._YTD_Actual &x._YTD_Actual_Per_Unit
;
&x._Actual=input(Current_Month, 9.0);
/* ...and the rest of your code */
%mend transform_this;
Any reason you thought you can't?

Add two sentinel variables to the data step, one before the macro call and one after. Use the double dash -- variable name list construct in a keep statement and drop the sentinels in the data step output data set specifier drop= option.
data want (drop=sentinel1 sentinel2); /* remove sentinels */
set have;
retain sentinel1 0;
%myMacro (…)
retain sentinel2 0;
…
keep sentinel1--sentinel2; * keep all variables created by code between sentinel declarations;
run;
Name Range Lists
Name range lists rely on the order of variable definition, as shown in
the following table:
Name Range Lists
Variable List Included Variables
x -- a all variables in order of variable definition, from
variable x to variable a inclusive
x -NUMERIC- a all numeric variables from variable x to variable a inclusive
x -CHARACTER- a all character variables from variable x to variable a inclusive
Note: Notice that name range lists use a double hyphen ( -- ) to designate
the range between variables, and numbered range lists use a single
hyphen to designate the range.

How to use proc format with the number of lines?

I have a table like this :
|Num | Label
-----------------------
1|1 | a thing
2|2 | another thing
3|3 | something else
4|4 | whatever
I want to replace my values of my label column by something more generic for example the first two lines : label One, the two next ones label Two ...
|Num | Label
-----------------------
1|1 | label One
2|2 | label One
3|3 | label Two
4|4 | label Two
How can I do that using proc format procedure ? I was wondering if I can use either the number of lines or another column like Num.
I need to do something like this :
proc format;
value label_f
low-2 = "label One"
3-high = "label Two"
;
run;
But I want to specify the number of the line or the value of the Num column.

You could do what you are describing using the words format. You could swap out num for _N_ in the ceil function below in order to use the observation number instead of the value of num (if they are not always equal):
data have;
length num 8 label $20;
infile datalines dlm='|';
input num label $;
datalines;
1|a thing
2|another thing
3|something else
4|whatever
5|whatever else
6|so many things
;
run;
data want;
set have;
label=catx(' ','label',propcase(put(ceil(num/2),words.)));
run;
Although this answer is probably a bit too specific to your example and it may not apply in your actual context.

Gatsby:
It sounds like you want to format NUM instead of LABEL.
Where you want the use the 'generic' representation defined by your format simply place a FORMAT statement in the Proc being used:
PROC PRINT data=have;
format num label_f.;
RUN;
If you want both num and generic, you will need to add a new column to the data for use during processing. This can be done with a view:
data have_view / view=have_view;
set have;
num_replicate1 = num;
attrib num_replicate1 format=label_f. label='Generic';
num_replacement = put (num,label_f.);
attrib num_replacement label='Generic'; %* no format because the value is the formatted value of the original num;
run;
PROC PRINT data=have_view;
var num num_replicate1 num_replacement;
RUN;
If you want a the 'generic' representation of the NUM column to be used in by-processing as a grouping variable, you have several scenarios:
know apriori the generic representation is by-group clustered
use a view and process with BY or BY ... NOTSORTED if clusters are not in sort order
force ordering for use with by-group processing
use an ordered SQL view containing the replicate and process with BY
add a replicate variable to the data set, sort by the formatted value and process with BY
A direct backmap from label to num to generic is possible only if the label is known to be unique, or you know apriori the transformation backmap-num + num-map is unique.
Proc FORMAT also has a special value construct [format] that can be used to map different ranges of values according to different formatting rules. The other range can also map to a different format that itself has an other range that maps to yet another different format. The SAS format engine will log an error if you happen to define a recursive loop using this advanced kind of format mapping.
propaedeutics
One of my favorite Dorfman words.
Format does not replace underlying values. Format is a map from the underlying data value to a rendered representation. The map can be 1:1, many:1. The MultiLabel Format (MLF) feature of the format system can even perform 1:many and many:many mappings in procedures many MLF enabled procedures (which is most of them)
To replace an underlying value with it's formatted version you need to use the PUT, PUTC or PUTN functions. The PUT functions always outputs a character value.
character ⇒ PUT ⇒ character [ FILE / PUT ]
numeric ⇒ PUT ⇒ character [ FILE / PUT ]
There is no guarantee a mapped value will mapped to the same value, it depends on the format.
INFORMATs are similar to FORMATs, however the target value depend on the in format type
character ⇒ INPUT ⇒ character [ INFILE / INPUT ]
numeric ⇒ INPUT ⇒ character
character ⇒ INPUT ⇒ numeric [ INFILE / INPUT ]
numeric ⇒ INPUT ⇒ numeric
Custom formats are created with Proc FORMAT. The construction of a format is specified by either the VALUE statement, or the CNTLIN= option. CNTLIN lets you create formats directly from data and avoids really large VALUE statements that are hand-entered or code-generated (via say macro)
Data-centric 'formatting' performs the mapping through a left-join. This is prevalent in SQL data bases. Left-joins in SAS can be done through SQL, DATA Step MERGE BY and FORMAT application. 1:1 left-joins can also be done via Hash object SET POINT=

Is it possible to filter a data step on a newly computed variable?

In a basic data step I'm creating a new variable and I need to filter the dataset based on this new variable.
data want;
set have;
newVariable = 'aaa';
*lots of computations that change newVariable ;
*if xxx then newVariable = 'bbb';
*if yyy AND not zzz then newVariable = 'ccc';
*etc.;
where newVariable ne 'aaa';
run;
ERROR: Variable newVariable is not on file WORK.have.
I usually do this in 2 steps, but I'm wondering if there is a better way.
( Of course you could always write a complex where statement based on variables present in WORK.have. But in this case the computation of newVariable it's too complex and it is more efficient to do the filter in a 2nd data step )
I couldn't find any info on this, I apologize for the dumb question if the answer is in the documentation and I didn't find it. I'll remove the question if needed.
Thanks!

Use a subsetting if statement:
if newVariable ne 'aaa';
In general, if <condition>; is equivalent to if not(<condition>) then delete;. The delete statement tells SAS to abandon this iteration of the data step and go back to the start for the next iteration. Unless you have used an explicit output statement before your subsetting if statement, this will prevent a row from being output.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js