SAS/IML: creating a dataset from multiple matrices - sas

Let's say I have a number of matrices in IML. They can be either numeric or character. How would I go about creating a single SAS dataset out of them?
I tried something like
n = {1 2 3, 4 5 6}; /* 2 x 3 numeric */
c = {'a' 'b', 'c' 'd'}; /* 2 x 2 character */
dsvars = {n c};
create dat var dsvars; /* should be a 2-obs, 5-variable dataset */
append;
but this turns n and c into column vectors and exports those, which is not what I want. Should I export n and c separately and merge them in a DATA step instead?

Your approach works when n and c are vectors. When they are matrices, there are a couple of ways to do this. I like to use the CREATE FROM and APPEND FROM syntax, and write the numerical and character matrices to separate data sets that I later merge:
proc iml;
n = {1 2 3, 4 5 6}; /* 2 x 3 numeric */
c = {'a' 'b', 'c' 'd'}; /* 2 x 2 character */
nNames = "n1":"n3";
cNames = "c1":"c2";
create ndat from n[colname=nNames];
append from n;
create cdat from c[colname=cNames];
append from c;
quit;
data dat;
merge ndat cdat;
run;
proc print;run;

Related

SAS changes all numerics to length 8 even when input data sets define numerics otherwise

I have two input datasets which I need to interweave. The input files have defined lengths for numeric fields depending on the size of the integer. When I interweave the datasets -- either a DATA or PROC SQL statement -- the lengths of numeric fields are all reset to the default of 8. Outside of explicitly defining the length for each field in a LENGTH statement, is there an option for SAS to keep the original attributes of the input columns?
More details ...
data A ;
length numeric_variable 3 ;
{input data}
;
data B ;
length numeric_variable 3 ;
{input data}
;
data AB ;
set A B ;
by some_id_variable ;
{stuff};
;
In the data set AB, the variable NUMERIC_VARIABLE is length 8 instead of 3. I can explicitly put another length statement in the "data AB" statement, but I have tons of columns.
Your description is wrong. A data step will set the length based on how it is first defined. If you just select the variable in SQL it keeps its length. However in SQL if you are doing something like UNION that combines variables from different sources then the length will be set to 8.
Example:
data one; length x 3; x=1; run;
data two; length x 5; x=2; run;
data one_two; set one two; run;
data two_one; set two one; run;
proc sql ;
create table sql_one as select * from one;
create table sql_two as select * from two;
create table sql_one_two as select * from one union select * from two;
create table sql_two_one as select * from two union select * from one;
quit;
proc sql;
select memname,name,length
from dictionary.columns
where libname='WORK'
and memname like '%ONE%'
or memname like '%TWO%'
;
quit;
Results:
Column
Member Name Column Name Length
----------------------------------------------------------------------------
ONE x 3
ONE_TWO x 3
SQL_ONE x 3
SQL_ONE_TWO x 8
SQL_TWO x 5
SQL_TWO_ONE x 8
TWO x 5
TWO_ONE x 5
So if you want define your variables then either add the length statement as you mentioned or create a template datasets and reference that in your data steps before referencing the other dataset(s). For complex SQL code you will need to include the LENGTH= option in your SELECT clause to force the lengths for the variables you are creating.
Can you post code that demonstrates the problem?
This code does NOT exhibit a final data set in which the numeric lengths get changed from 3 to 8.
data A; id = 'A'; length x 3; x=1;
data B; id = 'A'; length x 3; x=2;
data AB;
set A B;
by id;
run;
proc contents data=AB; run;
Contents
# Variable Type Len
1 id Char 1
2 x Num 3

Aggregate multiple vars on different groupings in one Proc SQL query

I need to aggregate about ten different vars on different groupings using Proc SQL;
Is there a way to achieve SUM () OVER ( [ partition_by_clause ] order_by_clause) in one sql query with different partition by clauses.
I've made an example here
data have;
infile cards;
input a b c d e f;
cards;
1 2 3 4 5
2 2 4 5 6
1 4 3 4 7
3 4 4 5 8
;
run;
proc sql;
create table want as
select *,
sum a over partiton by (b,c) as a1,
sum b over partiton by (c,d) as b1
sum c over partiton by (d,e) as c1
sum d over partiton by (a,c) as d1
from have
;
quit;
I don't want to wirte multiple sql queries and grouping on different vars and calculating one var in each step.
Hope that makes sense.
Proc SQL does not implement windowing functions and thus partition syntax therein as found in other SQL implementations. You can only do partition by with passthrough SQL to a connection that allows such syntax.
You could perform such a computation in DATA step using hashes.
data have;
infile cards;
input a b c d e ;
cards;
1 2 3 4 5
2 2 4 5 6
1 4 3 4 7
3 4 4 5 8
;
run;
data want;
if 0 then set have;
length a1 b1 c1 d1 8;
declare hash a1s();
a1s.defineKey('b', 'c');
a1s.defineData('a1');
a1s.defineDone();
declare hash b1s();
b1s.defineKey('c', 'd');
b1s.defineData('b1');
b1s.defineDone();
declare hash c1s();
c1s.defineKey('d', 'e');
c1s.defineData('c1');
c1s.defineDone();
declare hash d1s();
d1s.defineKey('a', 'c');
d1s.defineData('d1');
d1s.defineDone();
do while (not end);
set have end=end;
if a1s.find() = 0 then a1+a; else a1=a; a1s.replace();
if b1s.find() = 0 then b1+b; else b1=b; b1s.replace();
if c1s.find() = 0 then c1+c; else c1=c; c1s.replace();
if d1s.find() = 0 then d1+d; else d1=d; d1s.replace();
end;
do while (not last);
set have end=last;
a1s.find();
b1s.find();
c1s.find();
d1s.find();
output;
end;
format _numeric_ 4.;
stop;
run;

Order of columns(variables) in proc append

I am fairly new to SAS.
I wanted to append two datasets Dataset1 and Dataset2
The order of columns in Dataset1 is A B C
The order of columns in Dataset2 is b A c
Note the case of the column names(upper case and lower case)
So if I do
PROC APPEND BASE=Dataset1 DATA=Dataset2 FORCE;
RUN;
Will the appending happend in a desired way :
A should append to A
B should append to b
C should append to c
Neither case nor position matter for columns.
Columns are identified by their names, not their position.
Examples can help demonstrate this; try running the following one step at a time; read the comments, check the log and examine the data sets:
/* creates data set have1 with columns a (char), b (numeric) then c (numeric) */
data have1;
length a $ 1;
input a b c;
datalines;
1 2 3
4 5 6
7 8 9
;
/* creates data set have2 with columns b (char), a (numeric) then c (numeric) */
data have2;
length b $ 1;
input a b c;
datalines;
1 2 3
4 5 6
7 8 9
;
/* attempts append, but as a & b have different types, missing values result */
proc append base = have1
data = have2
force
;
run;
/* creates data set have3 with columns a (char), b (numeric) then c (numeric) */
data have3;
length a $ 1;
input a b c;
datalines;
1 2 3
4 5 6
7 8 9
;
/* creates data set have4 with columns b (numeric), a (char) then c (numeric) */
data have4;
length b 8;
length a $ 1;
input a b c;
datalines;
1 2 3
4 5 6
7 8 9
;
/* Appends successfully as variable types are the same even though order is different. */
/* Columns are identified by their names, not their position. */
proc append base = have3
data = have4
force
;
run;
EDIT: In answer to the question in the comment:
Having same type but different format.Example num type but DATE9.
format and other column has num type but ddmmyy format will this
cause any problem ?
The format of a variable affects the way it is displayed, the underlying data remains unchanged, so appending one numeric column with another is possible, the only difference would be is that the appended data will be in the same format as the base data, as noted by #J_Lard in the first comment to your question.
Again, an example might help to demonstrate this:
/* creates data set have5 with columns a (numeric, formst date9.) and text */
data have5;
format a date9.;
input text $char10.;
a = input(text,ddmmyy10.);
datalines;
31/07/2018
;
/* creates data set have6 with columns a (numeric, formst ddmmyy.) and text */
data have6;
format a ddmmyy.;
input text $char10.;
a = input(text,ddmmyy10.);
datalines;
31/07/2018
;
/* appends, but see warning in log about format */
proc append base = have5
data = have6
force
;
run;
Hopefully you can see the approach to take if you have more questions you need answering (create test data then append / process). If you still have problems then I would suggest asking a new question, with a link to this one, if relevant, supplying test data steps for others to run and the code you've tried with any log messages.

Get the unique combinations per variable in SAS

I am attempting to group by a variable that is not unique with a discrete variable to get the unique combinations per non-unique variable. For example:
A B
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c
5 e
I want:
A Unique_combos
1 a, b
2 a
3 a
4 b, d
5 e
My current attempt is something along the lines of:
proc sql outobs=50;
title 'Unique Combinations of b per a';
select a, b
from mylib.mydata
group by distinct a;
run;
If you are happy to use a data step instead of proc sql you can use the retain keyword combined with first/last processing:
Example data:
data have;
attrib b length=$1 format=$1. informat=$1.;
input a
b $
;
datalines;
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c
5 e
;
run;
Eliminate duplicates and make sure the data is sorted for first/last processing:
proc sql noprint;
create table tmp as select distinct a,b from have order by a,b;
quit;
Iterate over the distinct list and concatenate the values of b together:
data want;
length combinations $200; * ADJUST TO BE BIG ENOUGH TO STORE ALL THE COMBINATIONS;
set tmp;
by a;
retain combinations '';
if first.a then do;
combinations = '';
end;
combinations = catx(', ',combinations, b);
if last.a then do;
output;
end;
drop b;
run;
Result:
combinations a
a, b 1
a 2
a 3
b, d 4
c, e 5
You just need to put a distinct keyword in the select clause, eg:
title 'Unique Combinations of b per a';
proc sql outobs=50;
select distinct a, b
from mylib.mydata;
The run statement is unnecessary, the sql procedure is normally ended with a quit - although I personally never use it, as the statement will execute upon hitting the semicolon and the procedure quits anyway upon hitting the next step boundary.

How do I assign numeric values to the alphabet in SAS

I'm trying to convert a character string to a numeric variable and then sum the values of each character to use as a unique identifier for that field.
So for example, I would like A=1, B=2, C=3.....X=24 Y=25 Z=26.
Say my string is "CAB" so after running the code I would like the result to be an intermidiary column of numbers, where the value for CAB IS 3 1 2 and the result column would be derived by summing the string 3+1+2= 6 and show the value of the intermideate column, so the final value woud be 6.
Here is the sas code I used to convert the characters to numbers, but I need help with the result column.
DATA CHAR_VALUE;
SET WORK.XYZ;
CHAR_2_NUM=TRANSLATE(MY_VAR_CHAR, '1 2 3 ...24 25 26', 'A B C ...X Y Z');
NUM_CHAR=INPUT(CHAR_2_NUM,32.);
RUN;
Thanks in advance...I appreciate any help or suggestions.
-rachel
RANK will give the ASCII numeric value underlying a character; so A=65, B=66, Z=90, a=97, z=122.
So this should work (if you want only the uppercase values - not a different value for a than A):
data test;
charval='CAB';
do _t=1 to length(Charval);
numval=sum(numval,rank(char(upcase(charval),_t))-64);
end;
put _all_;
run;
Another option (Based on the comments below), is to build an informat with the relationships between letter and value. My loop iterates over each character A to Z, you can then put whatever value you want for each letter as label (I just put 1,2,3,4... but label= will change that).
data fmts;
retain fmtname 'CHARNUM' type 'i';
do _t=65 to 90;
start=byte(_t); *the character, so byte(65)='A';
label=_t-64; *the resulting number;
output;
end;
run;
proc format cntlin=fmts;
quit;
data test;
charval='CAB';
do _t=1 to length(Charval);
numval=sum(numval,input(char(upcase(charval),_t),CHARNUM.));
end;
put _all_;
run;
Finally, if you want to be able to construct this in the same datastep, you could construct the relationships in a hash table and look up the result. I can explain that if desired, though I'd like to see a more detailed example of what you want to do in terms of defining the relationship between a letter and its code.
If you need to see the intermediate values, you can do that by inserting a CAT function in the loop- I recommend CATX:
data test;
charval='CAB';
format intermed $100.;
do _t=1 to length(Charval);
numval=sum(numval,input(char(upcase(charval),_t),CHARNUM.));
intermed=catx('|',intermed,input(char(upcase(charval),_t),CHARNUM.)); *or the RANK portion from earlier;
end;
put _all_;
run;
That would give you 3|1|2, which you could then do math on via SCAN:
do _t = 1 to countc(intermed,'|')+1;
numval2 = sum(numval2,scan(intermed,_t,'|'));
end;
Your method to try and translate is a good attempt, but it will not really work. Here is a simple solution:
DATA CHAR_VALUE;
retain all_chars 'ABCDEFGHIJKLMMOPQRSTUVXXYZ';
set XYZ;
length CHAR_2_NUM $200;
CHAR_2_NUM = ' ';
NUM_CHAR = 0;
do i=1 to length(MY_VAR_CHAR);
if i=1 then CHAR_2_NUM = substr(MY_VAR_CHAR,i,1);
else CHAR_2_NUM = trim(CHAR_2_NUM) || ' ' || substr(MY_VAR_CHAR,i,1);
NUM_CHAR + index(all_chars,substr(MY_VAR_CHAR,i,1));
end;
drop i all_chars;
RUN;
This takes advantage of the fact that the indexed position of each character of your source variable in the all_chars variable corresponds to the mapping you desired.
UPDATED to also create your CHAR_2_NUM variable, which I overlooked in the original question.
Another simple solution is based on the collate function:
To convert a variable called MyNumbers (in the range of 1 to 26) to English upper-case characters, one can use:
collate(64 + MyNumbers, 64 + MyNumbers)
To obtain lower-case characters, one can use:
collate(96 + MyNumbers, 96 + MyNumbers)
Here's a quick example:
data _null_;
do MyNumbers = 1 to 26;
MyLettersUpper = collate(64 + MyNumbers, 64 + MyNumbers);
MyLettersLower = collate(96 + MyNumbers, 96 + MyNumbers);
put MyNumbers MyLettersUpper MyLettersLower;
end;
run;
1 A a
2 B b
3 C c
4 D d
5 E e
6 F f
7 G g
8 H h
9 I i
10 J j
11 K k
12 L l
13 M m
14 N n
15 O o
16 P p
17 Q q
18 R r
19 S s
20 T t
21 U u
22 V v
23 W w
24 X x
25 Y y
26 Z z
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds