This is my data:
ID DATA1 DATA2 DATA3 DATA4
1 yes yes yes
2 yes yes
3 yes yes
What I like to get:
DATA1
- ID1
- ID3
DATA2
- ID1
- ID2
DATA3
- ID2
DATA4
- ID1
- ID3
Anyone knows what query or formula I can use for this?
try:
=ARRAYFORMULA(TRANSPOSE(SPLIT(TEXTJOIN("♦", 1, TRANSPOSE({B1:E1;
IF(B2:E="yes", "- "&A1&A2:A, )})), "♦")))
Related
I have row data by account level and I wish to group them by the account owner as a new data. Yes will take the priority.
Account_Owner Account_No Ever_Purchase Ever_Purchase_within_2days Ever_Deliver_in_2weeks
Tom 12345 Yes Yes No
Tom 34567 Yes No Yes
Tom 09876 No No No
Desired Outcome
Account_Owner Ever_Purchase Ever_Purchase_within_2days Ever_Deliver_in_2weeks
Tom Yes Yes Yes
I am sorry that I don't have any code because I don't know where to start.
You can use a DOW loop to track the group result for each ever_* variable in a temporary array.
proc format;
value yesno .,0 = 'No' other='Yes';
data have; input
Account_Owner $ Account_No Ever_Purchase $ Ever_Purchase_within_2days $ Ever_Deliver_in_2weeks $;
datalines;
Tom 12345 Yes Yes No
Tom 34567 Yes No Yes
Tom 09876 No No No
;
data want;
array evals(100) _temporary_; * presume never more than 100 flag variables;
call missing (of evals(*));
* dow loop;
do until (last.account_owner);
set have;
by account_owner;
array flags ever:;
do _n_ = 1 to dim(flags);
evals(_n_) = evals(_n_) or flags(_n_) = 'Yes'; * compute aggregate result;
end;
end;
* move results back into original variables;
do _n_ = 1 to dim(flags);
flags(_n_) = put(evals(_n_), yesno.);
end;
* implicit output, one row per group combination;
run;
Note: In an alternative solution you can convert Yes/No to numeric 1/0 you can use Proc SUMMARY or Proc MEANS to computed the group result (max of var would be 1 if any Yes and 0 if all No)
Difficult to put this in a question, but this is what I mean:
This is my data:
ID DATA1 DATA2 DATA3 DATA4
1 yes yes yes
2 yes yes
3 yes yes
What I like to get:
ID1
- DATA1
- DATA2
- DATA4
ID2
- DATA2
- DATA3
ID3
- DATA1
- DATA4
Anyone knows what query or formula I can use for this?
try:
=ARRAYFORMULA(TRANSPOSE(SPLIT(TEXTJOIN("♦"; 1; {
IF(A2:A<>""; A1&A2:A; )\ IF(B2:E="yes"; "- "&B1:E1; )}); "♦")))
I have a table1 that contains 4 different kind of ids
Data table1;
Input id1 $ id2 $ id3 $ final_id $;
Datalines;
1 a a1 p
2 b b2 q
- c c2 r
3 d - s
4 - d4 t
A table2 contains any of the ids from id1, id2 or id3 of table1:
Data table1;
Input id $ col1 $ col2 $;
Datalines;
1 gsh ywu
b hsjs kall
c2 jsjs ywe
3 sja weei
d4 ase uwh
I want to left join table1 on table2 such that I get a new column in table2 giving me final_id from table1.
How do i go about this problem?
Please help.
Thank you.
You can do it using SQL:
proc SQL noprint;
create table merged as
select b.final_id, a.*
from table2 as a left join table1 as b
on (a.id eq b.id1 or a.id eq b.id2 or a.id eq b.id3)
;
quit;
Using SAS, I am trying to transpose the data in a table so that each unique value for variables Class and Subclass become a dummy variable, by variable ID.
Have:
ID Class Subclass
-------------------------------
ID1 1 1a
ID1 1 1b
ID1 1 1c
ID1 2 2a
ID2 1 1a
ID2 1 1b
ID2 2 2a
ID2 2 2b
ID2 3 3a
ID3 1 1a
ID3 1 1d
ID3 2 2a
ID3 3 3a
ID3 3 3b
Want:
ID Class_1 Class_2 Class_3 Subclass_1a ... Subclass_3b
----------------------------------------------------...---------------
ID1 1 1 0 1 ... 0
ID2 1 1 1 1 ... 0
ID3 1 1 1 1 ... 0
I have tried transposing the data by variable ID with Class and Subclass in the ID-statement of the transpose procedure. This however produces variables consisting of concatenations of unique combinations of the values of Class and Subclass. Neither does that approach produce 0 and 1 values where no VAR is defined in the transpose procedure.
Do I need to create the actual dummy variables first before transposing the data to achieve the want table, or is there a more straightforward way?
Seems like you need the help of PROC TRANSREG to generate a design matrix that is reduced.
data id;
infile datalines firstobs=3;
input ID :$3. class subclass :$2.;
datalines;
ID Class Subclass
-------------------------------
ID1 1 1a
ID1 1 1b
ID1 1 1c
ID1 2 2a
ID2 1 1a
ID2 1 1b
ID2 2 2a
ID2 2 2b
ID2 3 3a
ID3 1 1a
ID3 1 1d
ID3 2 2a
ID3 3 3a
ID3 3 3b
;;;;
run;
proc print;
run;
proc transreg;
id id;
model class(class subclass / zero=none);
output design out=dummy(drop=class subclass);
run;
proc print;
run;
proc summary nway;
class id;
output out=want(drop=_type_) max(class: subclass:)=;
run;
proc print;
run;
you can also do distinct and use tranpose for each variable and merge it back.
data have;
input ID $ Class $ Subclass $ ;
datalines;
ID1 1 1a
ID1 1 1b
ID1 1 1c
ID1 2 2a
ID2 1 1a
ID2 1 1b
ID2 2 2a
ID2 2 2b
ID2 3 3a
ID3 1 1a
ID3 1 1d
ID3 2 2a
ID3 3 3a
ID3 3 3b
;
proc sql;
create table want1 as
select distinct id, class from have;
proc transpose data = want1 out=want1a(drop =_name_) prefix = class_;
by id;
id class;
var class;
run;
proc sql;
create table want2 as
select distinct id, subclass from have;
proc transpose data = want2 out=want2a(drop =_name_) prefix = Subclass_;
by id;
id subclass;
var Subclass;
run;
data want;
merge want1a want2a;
by id;
array class(*) class_: subclass_:;
do i = 1 to dim(class);
if missing(class(i)) then class(i)= "0";
else class(i) ="1";
end;
drop i;
run;
Here is some tricky code generation that uses a hash to map a value to an array index corresponding to a flag variable representing the existential state of <name>_<value>
data have;
input ID $ Class Subclass $; datalines;
ID1 1 1a
ID1 1 1b
ID1 1 1c
ID1 2 2a
ID2 1 1a
ID2 1 1b
ID2 2 2a
ID2 2 2b
ID2 3 3a
ID3 1 1a
ID3 1 1d
ID3 2 2a
ID3 3 3a
ID3 3 3b
run;
* create indexed name_value data for variable name construction and hash initialization;
proc sql ; * fresh proc to reset within proc monotonic tracker;
create table map1 as
select class, monotonic() as index
from (select distinct class from have);
proc sql noprint;
create table map2 as
select subclass, monotonic() as index
from (select distinct subclass from have);
* populate macro variable with pdv target variable names to be arrayed;
proc sql noprint;
select catx('_','class',class)
into :map1vars separated by ' '
from map1 order by index;
select catx('_','subclass',subclass)
into :map2vars separated by ' '
from map2 order by index;
* group wise flag <variable>_<value> combinations;
data want;
if _n_ = 1 then do;
if 0 then set map1 map2; * prep pdv with hash variables;
declare hash map1(dataset:'map1');
declare hash map2(dataset:'map2');
map1.defineKey('class');
map1.defineData('index');
map1.defineDone();
map2.defineKey('subclass');
map2.defineData('index');
map2.defineDone();
end;
* group wise flag pivot vars (existential extrusion);
do until (last.id);
set have;
by id;
array map1_ &map1vars; * array for <name>_<value> combinations;
array map2_ &map2vars;
* use hash lookup on value to find index into target array;
map1.find(); put index=; map1_[index] = 1;
map2.find(); put index=; map2_[index] = 1;
end;
keep id &map1vars &map2vars;
run;
Proc REPORT can show values across with counts of occurrence within the group.
proc report data=have;
define id / group;
define class / across;
define subclass / across;
run;
I have two datasets. Both have a common column- ID. I would like to check if ID from df1 lies in df2 and extract all such rows from df1. I'm doing this in SAS.
It is easily done in one sql query.
proc sql;
create table extract_from_df1 as
select
*
from
df1
where
id in (select id from df2)
;
quit;
There are lots of ways to do this. For example:
proc sql;
create table compare as select distinct
a.id as id1, b.id as id2
from table1 as a
left join table2 as b
on a.id = b.id;
quit;
and then keep matches. Or you can try:
proc sql;
delete from table2 where id2 in select distinct id1 from table1;
quit;
data df1;
input id name $;
cards;
1 abc
2 cde
3 fgh
4 ijk
;
run;
data df2;
input id address $;
cards;
1 abc
2 cde
5 ggh
6 ihh
7 jjj
;
run;
data c;
merge df1(in=x) df2(in=y);
if x and y;
keep id name;
run;
proc print data=c;
run;