SAS code - sum of last N rows for every row

SAS code - sum of last N rows for every row - sas

I have a dataset like this for each ID;
Months
ID
Number
2018-07-01
1
0
2018-08-01
1
0
2018-09-01
1
1
2018-10-01
1
3
2018-11-01
1
1
2018-12-01
1
2
2019-01-01
1
0
2019-02-01
1
0
2019-03-01
1
1
2019-04-01
1
0
2019-05-01
1
0
2019-06-01
1
0
2019-07-01
1
1
2019-08-01
1
0
2019-09-01
1
0
2019-10-01
1
2
2019-11-01
1
0
2019-12-01
1
0
2020-01-01
1
0
2020-02-01
1
0
2020-03-01
1
0
2020-04-01
1
0
2020-05-01
1
0
2020-06-01
1
0
2020-07-01
1
0
2020-08-01
1
1
2020-09-01
1
0
2020-10-01
1
0
2020-11-01
1
1
2020-12-01
1
0
2021-01-01
1
0
2021-02-01
1
1
2021-03-01
1
1
2021-04-01
1
0
2018-07-01
2
0
.......
.......
.......
(Similar values for each ID)
I want a dataset like this;
Months
ID
Number
Sum_Next_6Number
2018-07-01
1
0
7
2018-08-01
1
0
7
2018-09-01
1
1
7
2018-10-01
1
3
4
2018-11-01
1
1
3
2018-12-01
1
2
1
2019-01-01
1
0
2
2019-02-01
1
0
2
2019-03-01
1
1
1
2019-04-01
1
0
3
2019-05-01
1
0
3
2019-06-01
1
0
3
2019-07-01
1
1
2
2019-08-01
1
0
2
2019-09-01
1
0
2
2019-10-01
1
2
0
2019-11-01
1
0
0
2019-12-01
1
0
0
2020-01-01
1
0
0
2020-02-01
1
0
1
2020-03-01
1
0
1
2020-04-01
1
0
1
2020-05-01
1
0
2
2020-06-01
1
0
2
2020-07-01
1
0
2
2020-08-01
1
1
2
2020-09-01
1
0
3
2020-10-01
1
0
3
2020-11-01
1
1
Nan
2020-12-01
1
0
Nan
2021-01-01
1
0
Nan
2021-02-01
1
1
Nan
2021-03-01
1
1
Nan
2021-04-01
1
0
Nan
2018-07-01
2
0
0
.......
.......
.......
.......
If there is no 6 months left then this values should be Nan.
Is there a way to do this? Thank you in advance.

data want(drop = i n);
set have curobs = c nobs = nobs;
Sum_Next_6Numbers = 0;
do p = c + 1 to 6 + c;
if p > nobs then do;
Sum_Next_6Numbers = .; leave;
end;
set have(keep = Number ID rename = (Number = n id = i)) point = p;
if id ne i then do;
Sum_Next_6Numbers = .; leave;
end;
Sum_Next_6Numbers + n;
end;
run;

Related

Count the number of unique ids for every subset of variables

I want to find the number of unique ids for every subset combination of the variables. For example
data have;
input id var1 var2 var3;
datalines;
5 1 0 0
5 1 1 1
5 1 0 1
5 0 0 0
6 1 0 0
7 1 1 1
8 1 0 1
9 0 0 0
10 1 0 0
11 1 0 0
12 1 . 1
13 0 0 1
;
run;
I want the result to be
var1 var2 var3 count
. . 0 5
. . 1 5
. 0 . 7
. 0 0 5
. 0 1 3
. 1 . 2
. 1 1 2
0 . . 3
0 . 0 2
0 . 1 1
0 0 . 3
0 0 0 2
0 0 1 1
1 . . 7
1 . 0 4
1 . 1 4
1 0 . 5
1 0 0 4
1 0 1 2
1 1 . 2
1 1 1 2
which is the result of appending all the possible proc sql; group bys (var1 is shown below)
proc sql;
create table sub1 as
select var1, count(distinct id) as count
from have
where not missing(var1)
group by var1
;
quit;
I don't care about the case where all variables are missing or when any of the variables in the group by are missing. Is there a more efficient way of doing this?

You can use Proc SUMMARY to compute the combinations of var1-var3 values for each id by group. From the SUMMARY output a SQL query can count the distinct ids per combination.
Example:
data have;
input id var1 var2 var3;
datalines;
5 1 0 0
5 1 1 1
5 1 0 1
5 0 0 0
6 1 0 0
7 1 1 1
8 1 0 1
9 0 0 0
10 1 0 0
11 1 0 0
12 1 . 1
13 0 0 1
;
proc summary noprint missing data=have;
by id;
class var1-var3;
output out=combos;
run;
proc sql;
create table want as
select var1, var2, var3, count(distinct id) as count
from combos
group by var1, var2, var3
;

Convert word Python Pandas Data Frame into Zero One Data Frame

Input
userID col1 col2 col3 col4 col5 col6 col7 col8 col9
1 Java c c++ php python perl html hadoop nodejs
2 nodejs c# c++ oops css html angular java php
3 php python html java angular hadoop c nodejs c#
4 python php css perl hadoop c nodejs c# html
5 perl css python hadoop c nodejs c# java php
6 Java python css perl nodejs c# java php hadoop
7 javascript java perl nodejs angular php mysql hadoop html
8 angular mysql mongodb cs hadoop angular oops html perl
9 nodejs hadoop mysql mongodb angular oops html python java
Desire Output
userID Java C C++ php python perl html hadoop nodejs oops mysql mongo
1 1 1 1 1 1 1 1 1 1 0 0 0
2 1 0 1 1 0 0 1 0 1 0 0 0
3 1 1 0 1 1 1 1 1 1 0 0 0
4 0 0 0 0 1 1 1 0 1 1 1 1

Use get_dummies + groupby by column names and aggregate max:
df = pd.get_dummies(df.set_index('userID'), prefix='', prefix_sep='')
df = df.groupby(level=0, axis=1).max().reset_index()
print (df)
userID Java angular c c# c++ cs css hadoop html java javascript \
0 1 1 0 1 0 1 0 0 1 1 0 0
1 2 0 1 0 1 1 0 1 0 1 1 0
2 3 0 1 1 1 0 0 0 1 1 1 0
3 4 0 0 1 1 0 0 1 1 1 0 0
4 5 0 0 1 1 0 0 1 1 0 1 0
5 6 1 0 0 1 0 0 1 1 0 1 0
6 7 0 1 0 0 0 0 0 1 1 1 1
7 8 0 1 0 0 0 1 0 1 1 0 0
8 9 0 1 0 0 0 0 0 1 1 1 0
mongodb mysql nodejs oops perl php python
0 0 0 1 0 1 1 1
1 0 0 1 1 0 1 0
2 0 0 1 0 0 1 1
3 0 0 1 0 1 1 1
4 0 0 1 0 1 1 1
5 0 0 1 0 1 1 1
6 0 1 1 0 1 1 0
7 1 1 0 1 1 0 0
8 1 1 1 1 0 0 1

Convert this Word DataFrame into Zero One Matrix Format DataFrame in Python Pandas

Want to convert user_Id and skills dataFrame matrix into zero one DataFrame matrix format user and their corresponding skills
Input DataFrame
user_Id skills
0 user1 [java, hdfs, hadoop]
1 user2 [python, c++, c]
2 user3 [hadoop, java, hdfs]
3 user4 [html, java, php]
4 user5 [hadoop, php, hdfs]
Desired Output DataFrame
user_Id java c c++ hadoop hdfs python html php
user1 1 0 0 1 1 0 0 0
user2 0 1 1 0 0 1 0 0
user3 1 0 0 1 1 0 0 0
user4 1 0 0 0 0 0 1 1
user5 0 0 0 1 1 0 0 1

You can join new DataFrame created by astype if need convert lists to str (else omit), then remove [] by strip and use get_dummies:
df = df[['user_Id']].join(df['skills'].astype(str).str.strip('[]').str.get_dummies(', '))
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0
df1 = df['skills'].astype(str).str.strip('[]').str.get_dummies(', ')
#if necessary remove ' from columns names
df1.columns = df1.columns.str.strip("'")
df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0

Using first function

I need to create a new variable WHLDR given the conditions below. I'm not sure the last else if is correct. So if multi > 1 and ref_1 = 0 if rel =0 and ref_1=1 then the first id which meets this condition whldr=1 if not then whldr =0, and continues. This is my code and sample data below.
data temp_all;
merge temp_1 (in=inA)
temp_2 (in=inB)
temp_3 (in=inC)
;
by id;
firstid=first.id;
if multi = 1 then do;
if rel = 0 then whldr=1;
else whldr = 0;
end;
else if multi > 1 and ref_1 >= 1 then do;
if rel =0 and ref_1=1 then whldr=1;
else whldr = 0;
end;
else if multi > 1 and ref_1 = 0 then do;
if rel =0 and ref_1=1 then do;
if rel =0 and ref_0 ne '0' then do;
if first.id=1 then whldr=1 ;
else whldr=0;
end;
end;
end;
run;
Here is sample data:
data have ;
input id a rel b multi ;
cards;
105 . 0 0 1
110 1 0 1 1
110 0 1 1 1
110 . 2 1 1
113 1 0 1 1
113 2 1 1 1
113 0 2 1 1
113 0 2 1 1
135 1 0 1 1
135 0 1 1 1
176 1 0 1 1
176 0 1 1 1
189 1 0 1 1
189 2 1 1 1
189 0 4 1 1
189 0 4 1 1
;

If you have a variable named WHLDR and you want the first observation where it has the value 1 then you can run a data step like this.
data want ;
set have (obs=1);
where whldr=1 ;
run;

find the count of num column changing by id

Please can anyone help me the follwing probelm.
I have following dummy data:
id num
1 1
1 2
1 1
1 2
1 1
1 2
2 1
2 15
2 1
2 1
2 1
2 15
2 1
2 15
How to count number of times num (column) is changing for each id?
Please find the results and new column.
I need results like this
id number no_of_times
1 1 1
1 2 1
1 1 1
1 2 2
1 1 1
1 2 3
2 1 1
2 15 1
2 1 1
2 1 1
2 1 1
2 15 2
2 1 1
2 15 3
Hope you can understand after seeing the results

The following hash approach works for the test data provided with the question:
data have;
input id number no_of_times_target;
cards;
1 1 1
1 2 1
1 1 1
1 2 2
1 1 1
1 2 3
2 1 1
2 15 1
2 1 1
2 1 1
2 1 1
2 15 2
2 1 1
2 15 3
;
run;
data want;
set have;
by id;
if _n_ = 1 then do;
length prev_number no_of_times 8;
declare hash h();
rc = h.definekey('number','prev_number');
rc = h.definedata('no_of_times');
rc = h.definedone();
end;
prev_number = lag(number);
if number > prev_number and not(first.id) then do;
rc = h.find();
no_of_times = sum(no_of_times,1);
rc = h.replace();
end;
else no_of_times = 1;
if last.id then rc = h.clear();
drop rc prev_number;
run;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SAS code - sum of last N rows for every row - sas

Related

Count the number of unique ids for every subset of variables

Convert word Python Pandas Data Frame into Zero One Data Frame

Convert this Word DataFrame into Zero One Matrix Format DataFrame in Python Pandas

Using first function

find the count of num column changing by id

Categories

Resources