proc report to create table as it is - sas

I am working on PROC REPORT. i want to crate table as it is
data have;
input A B $ C E F G I K L M N;
datalines;
1 japan 190 46 15 0 0 0 0 0 1
2 us 152 39 47 86 0 0 0 0 1
3 aus 50 6 36 41 0 0 0 0 1
;
proc report data=have;
column ("A" ("" A)) ("B" ("" B)) ("C" ("" C)) ("D" (("E" E) ("F" F))) ("G" ("" G))
("H" ("I" I) ('J' K L M) ("N" N));
define A / "" display;
define B / "" display;
define C / "" display;
define E / "" display;
define F / "" display;
define G / "" display;
define I / "" display;
define K / display;
define L / display;
define M / display;
define N / "" display;
run;
i want this type of table structure
enter image description here

I was not able to get PROC REPORT to do this. An alternative approach is given in this paper, which shows how to use the SAS Report Writing Interface to achieve this result in a data step.
(It is not compatible with SAS Report output but works for HTML, PDF etc.)
For your example, the code would be:
data _null_;
set have end=done;
* first start the table and create the header;
if _n_ eq 1 then do;
declare odsout t(); * create a report writing interface object named t;
t.table_start(); * start the table;
t.head_start(); * start the header (so that these items get the default header style);
* in this case the header is 3 rows in height. the ROWSPAN and COLSPAN items
control the size of each cell in the header;
t.row_start();
t.format_cell(text: 'A', rowspan:3);
t.format_cell(text: 'B', rowspan:3);
t.format_cell(text: 'C', rowspan:3);
t.format_cell(text: 'D', rowspan:2, colspan:2);
t.format_cell(text: 'G', rowspan:3);
t.format_cell(text: 'H', colspan:5);
t.row_end();
t.row_start();
t.format_cell(text: 'I', rowspan:2);
t.format_cell(text: 'J', colspan:3);
t.format_cell(text: 'N', rowspan:2);
t.row_end();
t.row_start();
t.format_cell(text:'E');
t.format_cell(text:'F');
t.format_cell(text:'K');
t.format_cell(text:'L');
t.format_cell(text:'M');
t.row_end();
t.head_end();
end;
* the rest of the cells are all simply data values;
t.row_start();
t.format_cell(text: A);
t.format_cell(text: B);
t.format_cell(text: C);
t.format_cell(text: E);
t.format_cell(text: F);
t.format_cell(text: G);
t.format_cell(text: I);
t.format_cell(text: K);
t.format_cell(text: L);
t.format_cell(text: M);
t.format_cell(text: N);
t.row_end();
if done then t.table_end();
run;

Related

Identifying groups/networks of customers

I am trying to create unique customer groups which are determined by customer interactivity across transactions.
Here is an example of the data:
Transaction #
Primary Customer
Cosigner
WANT: Customer Group
1
1
2
A
2
1
3
A
3
1
4
A
4
1
2
A
5
2
5
A
6
3
6
A
7
2
1
A
8
3
1
A
9
7
8
B
10
9
C
In this example, customer 1 is connected to customers 2-6 either directly or indirectly, so all transactions associated with customers 1-6 would be a part of an "A" group. Customer 7 and 8 are directly connected and would be labeled as a "B" group. Customer 9 has no connections and are the single member of the "C" group.
Any suggestions are appreciated!
Your data can be considered the edges of a graph. So your request is to find the connected subgraphs of that graph. That question has an answer on Stackoverflow and SAS Communities. But this question is more on topic than that older SO question. So let's post the subnet SAS macro from the SAS Communities answer here on SO where it will be easier to find.
This simple macro uses repeated PROC SQL queries to build the list of connected subgraphs until all of the original records have been assigned to a subgraph.
The macro is setup to let you pass in the name of the source dataset and the names of the two variables that hold the ids of the nodes.
So first let's convert your printout into an actual SAS dataset.
data have;
input id primary cosign want $;
cards;
1 1 2 A
2 1 3 A
3 1 4 A
4 1 2 A
5 2 5 A
6 3 6 A
7 2 1 A
8 3 1 A
9 7 8 B
10 9 . C
;
Now we can call the macro and tell it that PRIMARY and COSIGN are the variables with the node ids and that SUBNET is the name for the new variable to hold the ids of the connected subgraphs. NOTE: This version treats the graph as directed by default.
%subnet(in=have,out=want,from=primary,to=cosign,subnet=subnet);
Results:
Obs id primary cosign want subnet
1 1 1 2 A 1
2 2 1 3 A 1
3 3 1 4 A 1
4 4 1 2 A 1
5 5 2 5 A 1
6 6 3 6 A 1
7 7 2 1 A 1
8 8 3 1 A 1
9 9 7 8 B 2
10 10 9 . C 3
Here is the code of the %SUBNET() macro.
%macro subnet(in=,out=,from=from,to=to,subnet=subnet,directed=1);
/*----------------------------------------------------------------------
SUBNET - Build connected subnets from pairs of nodes.
Input Table :FROM TO pairs of rows
Output Table:input data with &subnet added
Work Tables:
NODES - List of all nodes in input.
NEW - List of new nodes to assign to current subnet.
Algorithm:
Pick next unassigned node and grow the subnet by adding all connected
nodes. Repeat until all unassigned nodes are put into a subnet.
To treat the graph as undirected set the DIRECTED parameter to 0.
----------------------------------------------------------------------*/
%local subnetid next getnext ;
%*----------------------------------------------------------------------
Put code to get next unassigned node into a macro variable. This query
is used in two places in the program.
-----------------------------------------------------------------------;
%let getnext= select node into :next from nodes where subnet=.;
%*----------------------------------------------------------------------
Initialize subnet id counter.
-----------------------------------------------------------------------;
%let subnetid=0;
proc sql noprint;
*----------------------------------------------------------------------;
* Get list of all nodes ;
*----------------------------------------------------------------------;
create table nodes as
select . as subnet, &from as node from &in where &from is not null
union
select . as subnet, &to as node from &in where &to is not null
;
*----------------------------------------------------------------------;
* Get next unassigned node ;
*----------------------------------------------------------------------;
&getnext;
%do %while (&sqlobs) ;
*----------------------------------------------------------------------;
* Set subnet to next id ;
*----------------------------------------------------------------------;
%let subnetid=%eval(&subnetid+1);
update nodes set subnet=&subnetid where node=&next;
%do %while (&sqlobs) ;
*----------------------------------------------------------------------;
* Get list of connected nodes for this subnet ;
*----------------------------------------------------------------------;
create table new as
select distinct a.&to as node
from &in a, nodes b, nodes c
where a.&from= b.node
and a.&to= c.node
and b.subnet = &subnetid
and c.subnet = .
;
%if "&directed" ne "1" %then %do;
insert into new
select distinct a.&from as node
from &in a, nodes b, nodes c
where a.&to= b.node
and a.&from= c.node
and b.subnet = &subnetid
and c.subnet = .
;
%end;
*----------------------------------------------------------------------;
* Update subnet for these nodes ;
*----------------------------------------------------------------------;
update nodes set subnet=&subnetid
where node in (select node from new )
;
%end;
*----------------------------------------------------------------------;
* Get next unassigned node ;
*----------------------------------------------------------------------;
&getnext;
%end;
*----------------------------------------------------------------------;
* Create output dataset by adding subnet number. ;
*----------------------------------------------------------------------;
create table &out as
select distinct a.*,b.subnet as &subnet
from &in a , nodes b
where a.&from = b.node
;
quit;
%mend subnet ;
You can use Hashes to compute your group identities and their members:
Example:
Proc DS2 is used for the succinctness of hash declaration and clarity that can be coded. The final pair Q H bridges two groups that were independent up-to that linkage point and requires the two groups to merge.
data customer;
length id1-id2 $8;
input id1-id2 ##; output;
datalines;
A B A C B A B D C A C D D C D .
E F E . F E F .
H J H K K L K M
P Q Q R R S S T
Q H
;
run;
%if %sysfunc(exist(vs)) %then %do;
proc delete data=vs;
proc delete data=gs;
%end;
options nosource;
proc ds2 ;
data _null_ ;
declare char(8) v1 v2 v;
declare double g gnew;
declare package hash vs([v], [v g], 0, '', 'ascending');
declare package hash gs([g], [g v], 0, '', 'ascending', '', '', 'multidata');
method add11(char(8) x1, char(8) x2); /* neither vertex has been seen before */
g + 1;
v = x1; vs.add(); gs.add();
v = x2; vs.add(); gs.add();
* put 'add00' x1 $char1. x2 $char1. ' ' g;
end;
method add10(char(8) x1, char(8) x2); /* x1 is not in a group, x2 is */
v = x2; vs.find(); * get group;
v = x1; vs.add(); * apply group to x2;
gs.add();
* put 'add10' x1 $char1. x2 $char1. ' ' g;
end;
method add01(char(8) x1, char(8) x2); /* x1 is in a group, x2 is not */
v = x1; vs.find(); * get group;
v = x2; vs.add(); * apply group to x1;
gs.add();
* put 'add01' x1 $char1. x2 $char1. ' ' g;
end;
method add00(char(8) x1, char(8) x2); /* both x1 and x2 are in a group */
declare double g1 g2;
v = x1; vs.find(); g1 = g; * get group of x1;
v = x2; vs.find(); g2 = g; * get group of x2;
if g1 ^= g2 then do;
* merge groups, v of higher group moved to lower group;
gnew = min(g1,g2);
g = max(g1,g2);
gs.find();
vs.replace([v], [v gnew]);
do while (gs.has_next() = 0);
gs.find_next();
vs.replace([v], [v gnew]);
end;
gs.removeall();
end;
* put 'add00' x1 $char1. x2 $char1. ' ' g g1 g2;
end;
method run();
declare int e1 e2;
declare char(2) f;
set customer;
if not missing(id1) and not missing(id2);
e1 = vs.check([id1]);
e2 = vs.check([id2]);
select (cats(e1^=0,e2^=0));
when ('11') add11(id1,id2);
when ('10') add10(id1,id2);
when ('01') add01(id1,id2);
when ('00') add00(id1,id2);
otherwise stop;
end;
end;
method term();
vs.output('vs');
gs.output('gs');
end;
run;
quit;

Aggregate multiple vars on different groupings in one Proc SQL query

I need to aggregate about ten different vars on different groupings using Proc SQL;
Is there a way to achieve SUM () OVER ( [ partition_by_clause ] order_by_clause) in one sql query with different partition by clauses.
I've made an example here
data have;
infile cards;
input a b c d e f;
cards;
1 2 3 4 5
2 2 4 5 6
1 4 3 4 7
3 4 4 5 8
;
run;
proc sql;
create table want as
select *,
sum a over partiton by (b,c) as a1,
sum b over partiton by (c,d) as b1
sum c over partiton by (d,e) as c1
sum d over partiton by (a,c) as d1
from have
;
quit;
I don't want to wirte multiple sql queries and grouping on different vars and calculating one var in each step.
Hope that makes sense.
Proc SQL does not implement windowing functions and thus partition syntax therein as found in other SQL implementations. You can only do partition by with passthrough SQL to a connection that allows such syntax.
You could perform such a computation in DATA step using hashes.
data have;
infile cards;
input a b c d e ;
cards;
1 2 3 4 5
2 2 4 5 6
1 4 3 4 7
3 4 4 5 8
;
run;
data want;
if 0 then set have;
length a1 b1 c1 d1 8;
declare hash a1s();
a1s.defineKey('b', 'c');
a1s.defineData('a1');
a1s.defineDone();
declare hash b1s();
b1s.defineKey('c', 'd');
b1s.defineData('b1');
b1s.defineDone();
declare hash c1s();
c1s.defineKey('d', 'e');
c1s.defineData('c1');
c1s.defineDone();
declare hash d1s();
d1s.defineKey('a', 'c');
d1s.defineData('d1');
d1s.defineDone();
do while (not end);
set have end=end;
if a1s.find() = 0 then a1+a; else a1=a; a1s.replace();
if b1s.find() = 0 then b1+b; else b1=b; b1s.replace();
if c1s.find() = 0 then c1+c; else c1=c; c1s.replace();
if d1s.find() = 0 then d1+d; else d1=d; d1s.replace();
end;
do while (not last);
set have end=last;
a1s.find();
b1s.find();
c1s.find();
d1s.find();
output;
end;
format _numeric_ 4.;
stop;
run;

product of common variables in two datasets

data a1
a b c
2 3 4
1 2 3
data a2
a b d
0 .3 1
0 .2 0
proc sql;
create table a3 as
select a.*, a.a * b.a + a.b * b.b as Value
from a1 a, a2 b;
There are many common columns in a1 and a2 (numeric columns with different values). I want to calculate Value as the 'sumproduct' of those common columns.
I try to avoid something like a.common1 * b.common1 + a.common2 * b.common2 + ...
A few steps of preprocessing are needed as far as I can tell....
Load your data:
data a1 ;
input a b c ;
cards ;
2 3 4
1 2 3
;run ;
data a2 ;
input a b d ;
cards ;
0 0.3 1
0 0.2 0
;run ;
Pull all variable names in A1 and A2 datasets (update your libname if required):
proc sql ;
create table data1 as
select libname, memname, name, label
from sashelp.vcolumn
where libname= 'WORK' and memname in ('A1','A2')
order by name
;quit ;
Keep only variables which are common to both datasets:
data data2 ;
set data1 ;
by name ;
if last.name and not first.name ;
run ;
Put both a list and a count of the common variables into macro variables:
proc sql ;
select name
into :commvarnames separated by ' '
from data2
;
select count(name)
into :commoncount
from data2
;quit ;
Read in your source datasets - load the first, transfer them to a temporary array (therefore they do not overwrite the variable values) and then load the second dataset and do your calculations in a do loop:
data output ;
set a1(keep=&commvarnames) ;
array one(&commoncount) _temporary_ ;
array two(&commoncount) &commvarnames ;
* Load A1 to temporary array ;
do i=1 to &commoncount ;
one(i)=two(i) ;
end ;
* Load A2 to variables ;
set a2(keep=&commvarnames) ;
do i=1 to &commoncount ;
product=sum(product,one(i)*two(i)) ;
end ;
run ;
It would take quite a bit of code to make this dynamic. I'd break it down like so:
Get lists of the variables present in each dataset
Merge the lists to get a list of the common variables
Feed this into some array logic in a data step
Will post some code later, but hopefully that's enough to give you some ideas.

aggregate by column value and paste row values together in SAS

I have a data set that looks like:
Have:
data have;
input a b c d e f g h ;
datalines;
1 0 0 0 0 0 1 0
0 0 1 0 1 0 0 0
0 0 0 1 0 1 0 0
0 1 0 0 0 0 0 1
;
run;
The columns a, b, c and d are four options to the question 1 on a 4-point scale. The value "1" in obs1 column A signifies that respondent has chosen option A for that question which signifies 4 on the 4 point scale.
a = 4, b = 3, c = 2 and d = 1.
The next question's options are e, f, g and h. The respondent has chosen option g which is 2 on the 4 point scale. e = 4, f = 3, g = 2 and h = 1.
The data set contains hundreds of columns like this. My idea is to collapse 4 columns into one getting values like : "1000", "0100", "0010", "0001" and then converting 1000 = 4, 0100 = 3, 0010 = 2 and 0001 = 1.
I want it to be like :
block col1 col2 col3 col4
1 1000 0100 0010 0001
2 0100 0010 1000 0001
3 1000 0100 1000 0010
I've gotten this far:
proc transpose data = have out = have_t;
run;
data have_t_block;
set have_t;
retain block;
if _n_ = 1 then block = 1;
if mod(_n_/4,1) = 0.25 and _n_ gt 1 then block +1;
run;
Is there a way to concatenate the row values while aggregating by block in SAS? I do this in R, like this:
#Create data
data <- data.frame(a = c(1, 0, 0), b = c(0, 1, 0), c = c(0, 0, 1), d = c(0, 0, 0), e = c(0, 1, 0), f = c(1, 0, 0), g = c(0, 0, 1), h = c(0, 0, 0), i = c(0, 0, 1), j = c(1, 0, 0), k = c(0, 0, 0), l = c(0, 1, 0))
#transpose
data <- data.frame(t(data))
#create a key for each group of 4
data$block <- rep(1:(nrow(data)/4), each = 4)
#convert data to long format and group by key (block) and use paste to concatenate
require(reshape2)
data_melt <- melt(data, id = c("block"))
trial <- data.frame(t(dcast(data_melt, block ~ variable, paste, collapse = "")))
First off, unless you misexplained your data, your transpose didn't help things very much here as there's no particular reason to have this have one column for each respondent - let's just have one column, period. Here's a better way to do this.
data have_t;
set have;
array cols a--h;
do _i = 1 to dim(cols);
value = cols[_i];
output;
end;
keep value; *and an ID I hope?;
run;
Making a dataset 'vertical' (one column) is very easy. Just loop over an array of all of your columns, for each set a common variable to that value, output. Normally i'd keep track of the variable name I was outputting also, but perhaps that's not necessary.
For your main problem, what you'll want to do is use retain, most likely, not dissimilar to how you handle block. Here I just calculate score directly:
data want;
set have_t;
retain score;
counter = mod(_n_,4);
if counter=1 then block+1; *slightly easier version of what you wrote;
if value=1 then score = 5-counter; *first=4, second=3, third=2, fourth=1;
if counter=0 then output;
*We never "clear" score here - to be safer you may want to do that in the if counter=1 block;
run;
If you want the intermediate '0010' or whatever, you can include that as well.
data want;
set have_t;
retain score int_Value;
length int_Value $4;
counter = mod(_n_,4);
if counter=1 then block+1; *slightly easier version of what you wrote;
if value=1 then score = 5-counter; *first=4, second=3, third=2, fourth=1;
int_value = cats(int_value,value);
if counter=0 then do;
output;
int_value=' '; *have to clear this every 4;
score=.; *here we might as well clear it;
end;
run;
If I understood your question correctly,try this:
data want;
do i=1 by 1 until(last.block);
set have_t_block;
array var $4. var1-var4;
array col col1-col4;
length var1-var4 $4.;
by block notsorted;
do over var;
var=cats(var,col);
end;
if last.block then output;
end;
keep var: block;
run;

SAS/IML: creating a dataset from multiple matrices

Let's say I have a number of matrices in IML. They can be either numeric or character. How would I go about creating a single SAS dataset out of them?
I tried something like
n = {1 2 3, 4 5 6}; /* 2 x 3 numeric */
c = {'a' 'b', 'c' 'd'}; /* 2 x 2 character */
dsvars = {n c};
create dat var dsvars; /* should be a 2-obs, 5-variable dataset */
append;
but this turns n and c into column vectors and exports those, which is not what I want. Should I export n and c separately and merge them in a DATA step instead?
Your approach works when n and c are vectors. When they are matrices, there are a couple of ways to do this. I like to use the CREATE FROM and APPEND FROM syntax, and write the numerical and character matrices to separate data sets that I later merge:
proc iml;
n = {1 2 3, 4 5 6}; /* 2 x 3 numeric */
c = {'a' 'b', 'c' 'd'}; /* 2 x 2 character */
nNames = "n1":"n3";
cNames = "c1":"c2";
create ndat from n[colname=nNames];
append from n;
create cdat from c[colname=cNames];
append from c;
quit;
data dat;
merge ndat cdat;
run;
proc print;run;