Please help friends
data have;
input v_202002 $1. v_202003 $1. v_202001 $1.;
datalines;
a . b
. . b
a b b
. b a
b b a
;
What I am looking for - First time the value became 'b'
want dataset:
v_202002 v_202003 v_202001 output
a . b 202001
. . b 202001
a b b 202001
. b a 202003
b b a 202002
You can use the WHICHC() function to find the index into an array where the value appears. Then use the VNAME() function to get the name.
data want;
set have;
array vlist v: ;
index=whichc('b',of vlist[*]);
if index then output = substr(vname(vlist[index]),3);
run;
Results
Obs v_202002 v_202003 v_202001 index output
1 a b 3 202001
2 b 3 202001
3 a b b 2 202003
4 b a 2 202003
5 b b a 1 202002
Related
I'd like to assign to an empty field a value based on many values of other entries. Here my dataset:
input ID date $10. type typea $10. ;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 . 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 . 1 cb
5 . 2 b
;
run;
My goal is the following: for all empty entries of the variable "date", assign to it the same date of the record which has the same ID, the same type, but a different typea. If there aren't other records with the criteria described, leave the date field empty. So the output should be:
data temp;
input ID date $10. type typea $10.;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 10/12/2006 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 11/09/2008 1 cb
5 . 2 b
;
run;
I tried with something like that based on another answer on SO (SAS: get the first value where a condition is verified by group), but it doesn't work:
by ID type typea ;
run;
data temp;
set temp;
by ID type typea ;
if cat(first.ID, first.type, first.typea) then date_store=date;
if cat(ID eq ID and type ne type and typea eq typea) then do;
date_change_type1to2=date_store;
end;
run;
Do you have any hints? Thanks a lot!
You could use UPDATE statement to help you carry-forward the DATE values for a group.
data have;
input ID type typea :$10. date :yymmdd. ;
format date yymmdd10.;
datalines;
1 1 a 2006-11-10
2 2 a 2006-12-10
2 2 b .
3 5 p 2007-01-20
4 1 r .
5 1 ca 2008-09-11
5 1 cb .
5 2 b .
;
data want;
update have(obs=0) have;
by id type ;
output;
run;
If there are also missing values of TYPEA then those will also be carried forward. If you don't want that to happen you could re-read just those variables after the update.
data want;
update have(obs=0) have;
by id type ;
set have(keep=typea);
output;
run;
What I have
A dataset with 8 row x4 col
"Condition" "A_1" "B_1"
A 1 .
A 3 .
A 2 .
A 4 .
B . 4
B . 3
B . 5
B . 6
[
What I want is either:
What I want 1
(1)
"Condition" "A_1" "B_1"
A 1 .
A 3 .
A 2 .
A 4 .
B 4 .
B 3 .
B 5 .
B 6 .
OR, (2):
What I want 2
"Condition" "A_1" "B_1" "AB_1"
A 1 . 1
A 3 . 3
A 2 . 2
A 4 . 4
B . 4 4
B . 3 3
B . 5 5
B . 6 6
It was easy with STATA, R, and Excel (of course), but for the life of me I can't figure out this simple thing in SAS.
I tried,
data want;
if condition = "B" then A_1 = B_1;
set have;
run;
I also tried
data want;
if condition = "A" then AB_1 = A_1;
else AB_1 = B_1;
set have;
run;
The second code almost does the job except that the resulting AB_1 lags by 1 row.
What the hack...
Use coalesce. You also need your set statement before doing any of your logic. SAS reads a row when it encounters the set statement.
data want;
set have;
AB_1 = coalesce(A_1, B_1);
run;
for (2) you can try to use cats() function
data want;
set have;
AB_1 = cats(A_1,B_1);
run;
But in order to do the concatenate columns of different types you should also use the explicit PUT() function.
I have 2 sas output tables. First table has a,b,c columns and second table has d,e,f columns
First table is :
a b c
1 2 3
4 5 6
Second table is :
d e f
7 8 9
Is it possible to append them in one sheet with desired output
a b c
1 2 3
4 5 6
d e f
7 8 9
Yes, you can append the second table to the first using proc append , you just need to rename the columns in second table before appending.
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Full Code:
data table1;
input a b c;
datalines;
1 2 3
4 5 6
;
run;
data table2;
input d e f;
datalines;
7 8 9
;
run;
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Table1 will look like this after appending:
a=1 b=2 c=3
a=4 b=5 c=6
a=7 b=8 c=9
Another Option: If all your data is characters, you just need a create a third table to hold the column names you want to append.
Full Code:
data table1;
input a $ b $ c $;
datalines;
1 2 3
4 5 6
;
run;
data table2;
input d $ e $ f $;
datalines;
7 8 9
;
run;
data table2_names;
input d $ e $ f $;
datalines;
d e f
;
run;
proc append base=table1 data=table2_names(rename=(d=a e=b f=c)) ;run;
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Output:
a=1 b=2 c=3
a=4 b=5 c=6
a=d b=e c=f
a=7 b=8 c=9
I have observations with column ID, a, b, c, and d. I want to count the number of unique values in columns a, b, c, and d. So:
I want:
I can't figure out how to count distinct within each row, I can do it among multiple rows but within the row by the columns, I don't know.
Any help would be appreciated. Thank you
********************************************UPDATE*******************************************************
Thank you to everyone that has replied!!
I used a different method (that is less efficient) that I felt I understood more. I am still going to look into the ways listed below however to learn the correct method. Here is what I did in case anyone was wondering:
I created four tables where in each table I created a variable named for example ‘abcd’ and placed a variable under that name.
So it was something like this:
PROC SQL;
CREATE TABLE table1_a AS
SELECT
*
a as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table2_b AS
SELECT
*
b as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table3_c AS
SELECT
*
c as abcd
FROM table_I_have_with_all_columns
;
QUIT;
PROC SQL;
CREATE TABLE table4_d AS
SELECT
*
d as abcd
FROM table_I_have_with_all_columns
;
QUIT;
Then I stacked them (this means I have duplicate rows but that ok because I just want all of the variables in 1 column and I can do distinct count.
data ALL_STACK;
set
table1_a
table1_b
table1_c
table1_d
;
run;
Then I counted all unique values in ‘abcd’ grouped by ID
PROC SQL ;
CREATE TABLE count_unique AS
SELECT
My_id,
COUNT(DISTINCT abcd) as Count_customers
FROM ALL_STACK
GROUP BY my_id
;
RUN;
Obviously, it’s not efficient to replicate a table 4 times just to put a variables under the same name and then stack them. But my tables were somewhat small enough that I could do it and then immediately delete them after the stack. If you have a very large dataset this method would most certainly be troublesome. I used this method over the others because I was trying to use Procs more than loops, etc.
A linear search for duplicates in an array is O(n2) and perfectly fine for small n. The n for a b c d is four.
The search evaluates every pair in the array and has a flow very similar to a bubble sort.
data have;
input id a b c d; datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
55 . . . .
66 1 2 3 4
run;
The linear search for duplicates will occur on every row, and the count_distinct will be initialized automatically in each row to a missing (.) value. The sum function is used to increment the count when a non-missing value is not found in any prior array indices.
* linear search O(N**2);
data want;
set have;
array x a b c d;
do i = 1 to dim(x) while (missing(x(i)));
end;
if i <= dim(x) then count_distinct = 1;
do j = i+1 to dim(x);
if missing(x(j)) then continue;
do k = i to j-1 ;
if x(k) = x(j) then leave;
end;
if k = j then count_distinct = sum(count_distinct,1);
end;
drop i j k;
run;
Try to transpose dataset, each ID becomes one column, frequency each ID column by option nlevels, which count frequency of value, then merge back with original dataset.
Proc transpose data=have prefix=ID out=temp;
id ID;
run;
Proc freq data=temp nlevels;
table ID:;
ods output nlevels=count(keep=TableVar NNonMisslevels);
run;
data count;
set count;
ID=compress(TableVar,,'kd');
drop TableVar;
run;
data want;
merge have count;
by id;
run;
one more way using sortn and using conditions.
data have;
input id a b c d; datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
55 . . . .
66 1 2 3 4
77 . 3 . 4
88 . 9 5 .
99 . . 2 2
76 . . . 2
58 1 1 . .
50 2 . 2 .
66 2 . 7 .
89 1 1 1 .
75 1 2 3 .
76 . 5 6 7
88 . 1 1 1
43 1 . . 1
31 1 . . 2
;
data want;
set have;
_a=a; _b=b; _c=c; _d=d;
array hello(*) _a _b _c _d;
call sortn(of hello(*));
if a=. and b = . and c= . and d =. then count=0;
else count=1;
do i = 1 to dim(hello)-1;
if hello(i) = . then count+ 0;
else if hello(i)-hello(i+1) = . then count+0;
else if hello(i)-hello(i+1) = 0 then count+ 0;
else if hello(i)-hello(i+1) ne 0 then count+ 1;
end;
drop i _:;
run;
You could just put the unique values into a temporary array. Let's convert your photograph into data.
data have;
input id a b c d;
datalines;
11 2 3 4 4
22 1 8 1 1
33 6 . 1 2
44 . 1 1 .
;
So make an array of the input variables and another temporary array to hold the unique values. Then loop over the input variables and save the unique values. Finally count how many unique values there are.
data want ;
set have ;
array unique (4) _temporary_;
array values a b c d ;
call missing(of unique(*));
do _n_=1 to dim(values);
if not missing(values(_n_)) then
if not whichn(values(_n_),of unique(*)) then
unique(_n_)=values(_n_)
;
end;
count=n(of unique(*));
run;
Output:
Obs id a b c d count
1 11 2 3 4 4 3
2 22 1 8 1 1 2
3 33 6 . 1 2 3
4 44 . 1 1 . 1
I am attempting to group by a variable that is not unique with a discrete variable to get the unique combinations per non-unique variable. For example:
A B
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c
5 e
I want:
A Unique_combos
1 a, b
2 a
3 a
4 b, d
5 e
My current attempt is something along the lines of:
proc sql outobs=50;
title 'Unique Combinations of b per a';
select a, b
from mylib.mydata
group by distinct a;
run;
If you are happy to use a data step instead of proc sql you can use the retain keyword combined with first/last processing:
Example data:
data have;
attrib b length=$1 format=$1. informat=$1.;
input a
b $
;
datalines;
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c
5 e
;
run;
Eliminate duplicates and make sure the data is sorted for first/last processing:
proc sql noprint;
create table tmp as select distinct a,b from have order by a,b;
quit;
Iterate over the distinct list and concatenate the values of b together:
data want;
length combinations $200; * ADJUST TO BE BIG ENOUGH TO STORE ALL THE COMBINATIONS;
set tmp;
by a;
retain combinations '';
if first.a then do;
combinations = '';
end;
combinations = catx(', ',combinations, b);
if last.a then do;
output;
end;
drop b;
run;
Result:
combinations a
a, b 1
a 2
a 3
b, d 4
c, e 5
You just need to put a distinct keyword in the select clause, eg:
title 'Unique Combinations of b per a';
proc sql outobs=50;
select distinct a, b
from mylib.mydata;
The run statement is unnecessary, the sql procedure is normally ended with a quit - although I personally never use it, as the statement will execute upon hitting the semicolon and the procedure quits anyway upon hitting the next step boundary.