SAS proc transpose duplicate values issue - sas

I need your help, please!
I'm doing a proc transpose on SAS, from a table that as only unique lines. However it is returning the following error
ERROR: The ID value "'OUTROS_CANAIS_Fatura Eletrónica'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
ID_CLIENTE=xxxxxxxxxx
When I check the original table the ID_CLIENTE xxxxxxxxxxx has two lines:
ID_CLIENTE MOTIVO Nr_Solicitacoes
xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - adesão 1
xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - cancelamento 1
I believe it is the '-' that is causing the issue (that comes with the original data), since they are clearly two different values.
Any ideas how to solve this?
EDIT: I've managed to replace the '-' value, however it still returns the same error...
Thank you!!

Proc TRANSPOSE ID statement turns data values into columns names when pivoting data. Column names are limited to 32 characters (and column labels are limited to 200 characters). Your ID values when truncated to 32 characters are the same value and you get the 'occurs twice' LOG message.
You can add a new variable to distinguish the id values and use the IDLABEL statement to store the original id values in the variable labels.
Example:
idnum is added to the data and is used to distinguish the id values. If you have many id values a hash can be used to dynamically assign a unique idnum for each id value
options validvarname = v7;
data have;
id = 'xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - adesão';
idnum = 1;
count = 1;
output;
id = 'xxxxxxxxxx OUTROS_CANAIS_Fatura Eletrónica - cancelamento 1';
idnum = 2;
output;
run;
proc transpose data=have out=want;
id idnum;
idlabel id;
var count;
run;
proc contents data=work.want;
run;

Figured it out!
SAS only allows 32 bites columns... It was a coincidence that ended in '-'.

Related

How to remove first 4 and last 3 letters, digits or punctuation using PROC SQL in SAS Enterprise Guide?

I am trying to remove 009, and ,N from 009,A,N just to obtain the letter A in my dataset. Please guide how do I obtain such using PROC SQL in SAS or even in the data step. I want to keep the variable name the same and just remove the above-mentioned digits, letters and punctuation from the data.
Is your original value saved in one variable? If so, you can utilize the scan function in a data step or in Proc SQL to extract the second element from a string that contains comma delimiter:
data want (drop=str rename=(new_str=str));
set orig;
length new_str $ 1;
new_str = strip(scan(str,2,','));
run;
proc sql;
create table work.want
as select *, strip(scan(str,2,',')) as new_str length=1
from orig;
quit;
In Proc SQL, if you want to replace the original column with the updated column, you can replace the * in the SELECT clause with the names of all columns other than original variable you are modifying.
What you code really depends on what other values there are in the other rows of the data set.
From the one sample value you provide the following code could be what you want.
data have;
input myvar $char80.;
datalines;
009,A,N
009-A-N
Okay
run;
proc sql;
update work.have
set myvar = scan(myvar,2,',')
where count(myvar,',') > 1
;

Insert into function with SAS/SQL

I want to insert values into a new table, but I keep getting the same error: VALUES clause 1 attempts to insert more columns than specified after the INSERT table name. This is if I don't put apostrophes around my date. If I do put apostrophes then I get told that the data types do not correspond for the second value.
proc sql;
create table date_table
(cvmo char(6), next_beg_dt DATE);
quit;
proc sql;
insert into date_table
values ('201501', 2015-02-01)
values ('201502', 2015-03-01)
values ('201503', 2015-04-01)
values ('201504', 2015-05-01);
quit;
The second value has to remain as a date because it used with > and < symbols later on. I think the problem may be that 2015-02-01 just isn't a valid date format since I couldn't find it on the SAS website, but I would rather not change my whole table.
Date literals (constants) are quoted strings with the letter d immediately after the close quote. The string needs to be in a format that is valid for the DATE informat.
'01FEB2015'd
"01-feb-2015"d
'1feb15'd
If you really want to insert a series of dates then just use a data step with a DO loop. Also make sure to attach one of the many date formats to your date values so that they will print as human understandable text.
data data_table ;
length cvmo $6 next_beg_dt 8;
format next_beg_dt yymmdd10.;
do _n_=1 to 4;
cvmo=put(intnx('month','01JAN2015'd,_n_-1,'b'),yymmn6.);
next_beg_dt=intnx('month','01JAN2015'd,_n_,'b');
output;
end;
run;
#tom suggest you in comments how to use date and gives very good answer how to it efficently, which is less error prone than typing values. I am just putting the same into the insert statement.
proc sql;
create table date_table
(cvmo char(6), next_beg_dt DATE);
quit;
proc sql;
insert into date_table
values ('201501', "01FEB2015"D)
;

How do I do a cluster analysis on table with both character and numeric variables in SAS?

Account_id <- c("00qwerf1”, “00uiowe3”, “11heooiue” , “11heooihe” ,
"00sdffrg3”, “03vthjygjj”, “11mpouhhu” , “1poihbusw”)
Postcode <- c(“EN8 7WD”, “EN7 9BB”, “EN6 8YQ”, “EN8 7TT”, “EN7 9BC”, “EN6
8YQ”, “EN8 7WD”, “EN7 7WB)
Age <- c(“30”, “35”, “40”, “50”, “60”, “32”, “34”, “45”)
DF <- data.frame(Account_id, Postcode, Age)
I want to do cluster analysis on my dataframe in SAS. I understand that technically a dataframe is not used in SAS, however I have just used this format for illustration purposes. Account_id and Postcode are both character variables and Age is a numeric variable.
Below is the code that I have used after conducting a data step;
Proc fastclus data=DF maxc-8 maxiter=10 seed=5 out=clus;
Run;
The cluster analysis does not work because Account_id and Postcode are character variables. Is there a way to change these variables into numeric variables, or is there a clustering method that works with both character and numeric variables?
Before you can do clustering you need to define a metric that can be used to calculate the distance between observations. By default proc fastclus uses the Euclidean metric. This requires that all input variables are numeric and works best if they are all rescaled to have the same mean and variance, so that they are all equally important when growing clusters.
You could use postcode in a by statement if you wanted to perform a separate cluster analysis for each postcode, but if you want to use postcode itself as a clustering variable you will need to convert it to a numeric form. Replacing postcode with two variables for the latitude and longitude of postcode centroid might be a good option.
It's less obvious what would be a good option for your account ID variable, as this doesn't appear to be a measurement of anything. I would try to get hold of something else like account creation date or last activity date, which can be converted to a numeric value in a more obvious way.
You can determine the unique values of each variable and then assign the ordinality of the original value as it's numeric representation for the purpose of fastclus.
Sample code
Note: The FASTCLUS seed= option is a data set specifier, not a simple number (as is used with random number generators)
* hacky tweak to place your R coded data values in a SAS data set;
data have;
array _Account_id(8) $20 _temporary_ ("00qwerf1", "00uiowe3", "11heooiue" , "11heooihe" ,
"00sdffrg3", "03vthjygjj", "11mpouhhu" , "1poihbusw");
array _postcode(8) $7 _temporary_ ("EN8 7WD", "EN7 9BB", "EN6 8YQ", "EN8 7TT", "EN7 9BC", "EN6
8YQ", "EN8 7WD", "EN7 7WB");
array _age (8) $3 _temporary_ ("30", "35", "40", "50", "60", "32", "34", "45");
do _n_ = 1 to dim (_account_id);
Account_id = _account_id(_n_);
Postcode = _postcode(_n_);
Age = _age(_n_);
output;
end;
run;
* get lists of distinct values for each variable;
proc means noprint data=have;
class _all_;
ways 1;
output out=have_freq;
run;
* compute ordinal of each variables original value;
data have_freq2;
set have_freq;
if not missing(Account_id) then unum_Account_id + 1;
if not missing(Postcode) then unum_Postcode + 1;
if not missing(Age) then unum_Age + 1;
run;
* merge back by original value to obtain ordinal values;
proc sql;
create table have_unumified as
select
Account_id, Postcode, Age
, (select unum_Account_id from have_freq2 where have_freq2.Account_id = have.Account_id) as unum_Account_id
, (select unum_Postcode from have_freq2 where have_freq2.Postcode = have.Postcode) as unum_Postcode
, (select unum_Age from have_freq2 where have_freq2.Age = have.Age) as unum_Age
from have
;
run;
* fastclus on the ordinal values (seed= not specified);
Proc fastclus data=have_unumified maxc=8 maxiter=10 out=clus_on_unum;
var unum_:;
Run;

I want to add auto_increment column in a table in SAS

I want to add a auto_Increment column in a table in SAS.Following code add's a column but not increment the value.
Thanks In Advance.
proc sql;
alter table pmt.W_cur_qtr_recoveries
add ID integer;
quit;
Wow, going to try for my second "SAS doesn't do that" answer this morning. Risky stuff.
A SAS dataset cannot define an auto-increment column. Whether you are creating a new dataset or inserting records into an existing dataset, you are responsible for creating any increment counters (ie they are just normal numeric vars where you have set the values to what you want).
That said, there are DATA step statements such as the sum statement (e.g. MyCounter+1) that make it easier to implement counters. If you describe more details of your problem, people could provide some alternatives.
The correct answer at this time is to create the ID yourself, BUT the discussion wouldn't be complete without mentioning that there is an unsupported SQL function Monotonic that can do what you want. It's not reliable, yet it persists.
The code pattern for its usage is
select monotonic() as ID, ....
Use the _N_ automatic variable in a data step like:
DATA TEMPLIB.my_dataset (label="my dataset with auto increment variables");
SET TEMPREP.my_dataset;
sas_incr_num = _N_; * add an auto increment 'sas_incr_num' variable;
sas_incr_cat = cat("AB.",cats(repeat("0",5-ceil(log10(sas_incr_num+1))),sas_incr_num),".YZ"); * auto increment the sas_incr_num variable and add 5 leading zeros and concatenate strings on either end;
LABEL
sas_incr_num="auto number each row"
sas_incr_cat="auto number each row, leading zeros, and add strings along for fun"
...
There is no such thing as an auto increment column in a SAS dataset. You can use a data step to create a new dataset that has the new variable. You can use the same name to have it replace the old one when done.
data pmt.W_cur_qtr_recoveries;
set pmt.W_cur_qtr_recoveries;
ID+1;
run;
It really depends on what your intended outcome is. But I have thrown together an example of how you may want to tackle this. it is a little rough, but gives you something to work from.
/*JUST SETTING UP THE DAY ONE DATA WITH AN ID ATTACHED
YOU WOULD MAKE THE FIRST RUN EXECUTE DIFFERENTLY TO SUBSEQUENT RUNS BY USING THE EXISTS FUNCTION AND MACRO LANGUAGE,
BUT I WILL LET YOU INVESTIGATE THIS FURTHER AS IT MAY BE IRRELEVANT.*/
DATA DAY1;
SET SASHELP.CLASS;
ID+1;
RUN;
/*ON DAY 2 WE ARE APPENDING ADDITIONAL RECORDS TO THE EXISTING DATASET*/
DATA DAY2;
/*APPEND DATASETS*/
SET DAY1 SASHELP.CLASS;
/*HOLD VALUE IN PROGRAM DATA VECTOR (PDV) UNTIL EXPLICITLY CHANGED*/
RETAIN _ID;
/*ADD VARIABLE _ID AND POPULATE WITH ID. IN DOING THIS THE LAST INSTANCE OF THE ID WILL BE HELD IN THE PDV FOR THE
FIRST OF THE NEW RECORDS*/
IF ID ~= . THEN _ID = ID;
/*INCREMENT THE VALUE IN _ID BY 1 AND DO SO FOR EACH RECORD ADDED*/
ELSE DO;
_ID+1;
END;
/*DROP THE ORIGINAL ID;*/
DROP ID;
/*RENAME _ID TO ID*/
RENAME _ID = ID;
RUN;
where "W_prv_qtr_recoveries" is a table Name and "pmt" is a library name.
Thanks to user2337871.
DATA pmt.W_prv_qtr_recoveries;
SET pmt.W_prv_qtr_recoveries;
RETAIN _ID;
IF ID ~= . THEN _ID = ID;
ELSE DO;
_ID+1;
END;
DROP ID;
RENAME _ID = ID;
RUN;
Assuming that this autoincrement column will be used for every record that is inserted.
We can accomplish the same as follows:-
We will first check the latest key in the dataset
PROC SQL;
SELECT MAX(KEY) INTO :MK FROM MYDATA;
QUIT;
%put KeyOld=&MK;
Then we increment this key
Data _NULL_;
call symput('KeyNew',&MK+1);
run;
%put KeyNew=&KeyNew;
Here we hold the New record that we want to insert, and add the correspoding key
Data TEMP1;
set TEMP;
Key=&KeyNew;
run;
Finally we load the new record in our dataset
PROC APPEND BASE=MYDATA DATA=TEMP1 FORCE;
RUN;

How to create a new variable in SAS by extracting part of the value of an existing numeric variable?

I have two datasets in SAS that I would like to merge, but they have no common variables. One dataset has a "subject_id" variable, while the other has a "mom_subject_id" variable. Both of these variables are 9-digit codes that have just 3 digits in the middle of the code with common meaning, and that's what I need to match the two datasets on when I merge them.
What I'd like to do is create a new common variable in each dataset that is just the 3 digits from within the subject ID. Those 3 digits will always be in the same location within the 9-digit subject ID, so I'm wondering if there's a way to extract those 3 digits from the variable to make a new variable.
Thanks!
SQL(using sample data from Data Step code):
proc sql;
create table want2 as
select a.subject_id, a.other, b.mom_subject_id, b.misc
from have1 a JOIN have2 b
on(substr(a.subject_id,4,3)=substr(b.mom_subject_id,4,3));
quit;
Data Step:
data have1;
length subject_id $9;
input subject_id $ other $;
datalines;
abc001def other1
abc002def other2
abc003def other3
abc004def other4
abc005def other5
;
data have2;
length mom_subject_id $9;
input mom_subject_id $ misc $;
datalines;
ghi001jkl misc1
ghi003jkl misc3
ghi005jkl misc5
;
data have1;
length id $3;
set have1;
id=substr(subject_id,4,3);
run;
data have2;
length id $3;
set have2;
id=substr(mom_subject_id,4,3);
run;
Proc sort data=have1;
by id;
run;
Proc sort data=have2;
by id;
run;
data work.want;
merge have1(in=a) have2(in=b);
by id;
run;
an alternative would be to use
proc sql
and then use a join and the substr() just as explained above, if you are comfortable with sql
Assuming that your "subject_id" variable is a number then the substr function wont work as sas will try convert the number to a string. But by default it pads some paces on the left of the number.
You can use the modulus function mod(input, base) which returns the remainder when input is divided by base.
/*First get rid of the last 3 digits*/
temp_var = floor( subject_id / 1000);
/* then get the next three digits that we want*/
id = mod(temp_var ,1000);
Or in one line:
id = mod(floor(subject_id / 1000), 1000);
Then you can continue with sorting the new data sets by id and then merging.