Create a new, named column based on a formatted numeric variable - sas

I have a dataset (DIN) that consists of formatted numeric variables (e.g., column 1 'BLD' has values 1-3, but they are formatted as 'Yes', 'No', 'Unknown'). All columns have slightly different formatting.
In each row, only one column has a value, the rest are missing. I am trying to use the following to get the maximum of each row (which will always be the non-missing value)
data DIN;
set DIN;
MAX = max(of BLD--VASC);
run;
Unfortunately as these columns are numeric the MAX column is showing as numbers, not the formatted value. I have tried using vvalue to get the formatted value, like below but I don't know how to do this for all columns at once.
data _null_;
set DIN;
BLD_C = vvalue(BLD);
run;
I felt like a do loop might help, and I tried looping over an array of variable names, but it just doesn't work. Nothing seems to happen
data DIN_C;
set DIN;
array nums(*) _numeric_;
do i = 1 to dim(nums);
nums_C = vvalue(nums(i));
end;
run;
Can anyone help me? Or is there another approach I could take for this problem?

You can use MAX() to find the actual non-missing numeric value. Then use WHICH() to find the index number of the variable with that value. Now you can use VVALUE() to find the formatted value of that variable.
data DIN_FIXED;
set DIN;
array _num BLD--VASC
length max 8 max_formatted $50 ;
MAX = max(of _num[*]);
if not missing(max) then max_formatted=vvalue(_num[which(max,of _num[*])]);
run;

Related

SAS proc transpose duplicate values issue

I need your help, please!
I'm doing a proc transpose on SAS, from a table that as only unique lines. However it is returning the following error
ERROR: The ID value "'OUTROS_CANAIS_Fatura Eletrónica'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
ID_CLIENTE=xxxxxxxxxx
When I check the original table the ID_CLIENTE xxxxxxxxxxx has two lines:
ID_CLIENTE MOTIVO Nr_Solicitacoes
xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - adesão 1
xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - cancelamento 1
I believe it is the '-' that is causing the issue (that comes with the original data), since they are clearly two different values.
Any ideas how to solve this?
EDIT: I've managed to replace the '-' value, however it still returns the same error...
Thank you!!
Proc TRANSPOSE ID statement turns data values into columns names when pivoting data. Column names are limited to 32 characters (and column labels are limited to 200 characters). Your ID values when truncated to 32 characters are the same value and you get the 'occurs twice' LOG message.
You can add a new variable to distinguish the id values and use the IDLABEL statement to store the original id values in the variable labels.
Example:
idnum is added to the data and is used to distinguish the id values. If you have many id values a hash can be used to dynamically assign a unique idnum for each id value
options validvarname = v7;
data have;
id = 'xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - adesão';
idnum = 1;
count = 1;
output;
id = 'xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - cancelamento 1';
idnum = 2;
output;
run;
proc transpose data=have out=want;
id idnum;
idlabel id;
var count;
run;
proc contents data=work.want;
run;
Figured it out!
SAS only allows 32 bites columns... It was a coincidence that ended in '-'.

Extracting coordinates from SVG path syntax

*the title may be misleading
I have (column) cells values as follows:
d="M200,170L149,385"
d="M200,170L150,387"
d="M200,170L275,384"
d="M200,170L49,317"
d="M200,170L92,347"
The values 200 & 170 in each cell represent the x and y origins respectively, while the second set of values (i.e. 149 and 385) represent the x and y values.
I want to separate the x-orgin, y-orgin, x and y values into four columns. (I'm relatively new to sas... I think these are cartesian coordinates)
How would I go about doing this?
Use the scan function. It is used to select the nth word of a string. First argument is the string you want parsed, second is the word (1st, 2nd, etc), and third lists your delimiters (characters that separate the words). That should be all you need.
data want;
set have;
origx = scan(d,1,'M,L');
origy = scan(d,2,'M,L');
x = scan(d,3,'M,L');
y = scan(d,4,'M,L');
run;
Do you have a SAS dataset with a variable named d in it, or do you have a text file? My first read was that you have a SAS dataset already, in which case you need to parse the variable. You could use SCAN() function, or plenty of other methods, e.g.:
data have;
input d $16.;
cards;
M200,170L149,385
M200,170L150,387
M200,170L275,384
M200,170L49,317
M200,170L92,347
;
run;
data want;
set have;
x_origin=scan(d,1,"M,L");
y_origin=scan(d,2,"M,L");
x=scan(d,3,"M,L");
y=scan(d,4,"M,L");
run;
proc print data=want;
run;

In SAS, use one variable to choose another variable to change

I have a data set with variables col1-col5 that contain an numeric identifier. There are 2000 possible values of the identifier. I also have 2000 variables named dX, where X is one of the 2000 different identifier values.
I want to loop over the col variables and then set the corresponding d variable that is indexed by the identifier to equal 1.
For example, suppose I have the observation:
col1 col2 col3 col4 col5 d10007 d10010 d10031 ... d10057 ...
10031 10057 . . . . . . .
I would want to set d10031 and d10057 to both equal 1.
Is this possible? If the numbers were sequential I see how to use an array, but given that they're not I can't see how to do it.
It can be done in an array. I'll explain, after the mandatory polemic about data structure.
This looks like it should be a vertical data structure, ie col d variables and multiple lines (with some ID tying them together).
Now, to do this in the structure you have:
You need to use the VNAME function. This allows you to get at the name of the array variable as a string. You can't take col1=10531 and create a statement d10531=1, but you can look at d10531 and compare its value to col1.
This is slow, because you need to loop twice over your variables, unless you have a reliable ordering. Your data above does respect the ordering (ie, COL1-n are in order, and D1-n are in order, so you can move left to right and not loop twice). If this isn't the case, then you may want to use call sortn with the COL array, if that's acceptable. The Dxx array should be able to be defined in proper order (if it's not in proper order on the dataset, you can construct the array statement in a macro variable ordering the variables there - the order on the array statement matters, the order in the dataset does not, unless you're using d:.)
Here's an example of the left to right structure.
data want;
set have;
array cols col1-col2000;
array ds d1:; *or whatever defines d;
_citer=1;
do _diter = 1 to dim(ds) while (_citer le dim(cols)); *safety check;
if compress(vname(ds[_diter]),,'kd') = cols[_citer] then do;
ds[_diter] = 1;
_citer+1;
end;
end;
run;
It iterates over ds, checks each one against the current col, and when it finds its match, sets that and then stops. This should be flexible - would work with any structure of ds, even if it has very many values. It will not, however, work if cols is not sorted in ascending order of value. If it's not, you would need to put an inner loop to check each cols variable, meaning you have [dim(ds)*dim(cols)] loop iterations instead of [dim(ds)] loop iterations at most.
Another alternative is to just create the entire sequential D array, then drop the 'fake' d variables at the end, like so:
data have;
Col1 = 10;
Col2 = 35;
Col3=.;
Array Dvars {*} d1 d10 d25 d35;
run;
/* Get a list of all actual D variable /*
proc sql noprint ;
select name into :dColumnsToKeep separated by ' '
From SASHELP.VColumn
where libname="WORK" and memname = "HAVE"
AND name LIKE 'd%';
;quit;
%put &dColumnsToKeep;
data want (keep=Col: &dColumnsToKeep);
set have;
array AllDVars {*} d1-d9999; *Set d9999 to as big as needed;
array ColVars {*} Col:;
do i = 1 to Dim(ColVars);
if colvars(i) ne . Then AllDvars(Colvars(i)) = 1;
end;
run;
This may be quicker processing, since it avoids looping. Though I dont know what the tradeoff is memory-wise to have SAS create 10K or 100K variables in the datastep.

Using informats when creating a dataset from another dataset

I've got a dataset that's full of data all in character format.
Now I want to create another dataset from this one, put put everything it it's correct decimal or date or character format.
Here's what I'm trying.
data work.testout;
attrib account_open_date informat = mmddyy10.;
do i = 1 to nobs;
set braw.accounts point = i nobs = nobs;
output;
end;
stop;
run;
this gives me:
Variable 'account_open_date' from data set braw.accounts (at line 7 column 21) has a different type (character) to the variable type on the data vector (numeric)
What's the best way of doing this?
You cannot use an informat to convert a variable directly from character to numeric. At least in SAS proper, you cannot convert a variable from character to numeric, period, without using an intermediary. You must do something along the lines of the following:
data want;
set have(rename=varwant=temp);
varwant=input(temp,MMDDYY10.);
drop temp;
run;
There you rename the (character) variable to a temporary name, then convert it to numeric using INPUT.

SAS date or numeric data?

%let months_back = %sysget(months_back);
data;
m = intnx('month', "&sysdate9"d, -&months_back - 2, 'begin');
m = intnx('day', put(m, date9.), 26, 'same');
m2back = put(m, yymmddd10.);
put m2back;
run;
NOTE: Character values have been converted to numeric values at the
places given by: (Line):(Column).
5:19 NOTE: Invalid numeric data, '01OCT2012' , at line 5 column 19.
I really don't know why this go wrong. The date string is numeric data?
PUT(m, date9.) is the culprit here. The 2nd argument of INTNX needs to be numeric (i.e. a date), the PUT function always returns a character value, in this instance '01OCT2012'. Just take out the PUT function completely and the code should work.
m = intnx('day', m, 26, 'same');
SAS stores dates as numbers - and in fact does not have a truly separate type for them. A SAS date is the number of days since 1/1/1960, so a bit over 19000 for today. The date format is entirely irrelevant to any date calculations - it is solely for human readibility.
The bit where you say:
"&sysdate9"d
actually converts the string "01JAN2012" to a numeric value (18304).
There's actually a quicker way to accomplish what you're trying to do. Because days correspond to whole numbers in SAS, to increment by one day you can simply add one to the value.
For example:
%let months_back=5;
data _null_;
m = intnx('month', today(), -&months_back - 2, 'begin');
m2 = intnx('day', m, 26, 'same');
m3 = intnx('month',"&sysdate9"d, -&months_back - 2)+26;
m2back = put(m2, yymmdd10.);
put m= date9. m2= yymmdd10. m3= yymmdd10.;
run;
M3 does your entire calculation in one step, by using the MONTH interval, then adding 26. INTNX('day'...) is basically pointless, unless there's some other value to using the function (using a shift index for example).
You also can see the use of a format in the PUT(log) statement here - you don't have to PUT it to a character value and then put that to the log to get the formatted value, just put (var) (format.); - and string together as many as you want that way.
Also, "&sysdate9."d is not the best way to get the current date. &sysdate. is only defined on startup of SAS, so if your session ran for 3 days you would not be on the current day (though perhaps that's desired?). Instead, the TODAY() function gets the current date, up to date no matter how long your SAS session has been running.
Finally - I recommend data _null_; if you don't want a dataset (and naming the result dataset if you do want it). data _null_ does not create a dataset. data; simply creates increasing numbers of datasets (data1, data2, ...) which quickly fill up your workspace and make it hard to tell what you're doing.