Yearly conditional sum in SAS

Yearly conditional sum in SAS - sas

I have a below table
+------+------+------+------+------+-----+
| Yr | col1 | col2 | col3 | col4 | PQR |
+------+------+------+------+------+-----+
| 2012 | 1 | 0 | 1 | 1 | 2 |
| 2012 | 0 | 1 | 0 | 0 | 4 |
| 2013 | 1 | 1 | 1 | 1 | 6 |
| 2014 | 0 | 0 | 0 | 0 | 8 |
| 2012 | 1 | 0 | 1 | 1 | 7 |
| 2013 | 0 | 1 | 0 | 0 | 3 |
| 2014 | 1 | 0 | 1 | 1 | 2 |
| 2012 | 0 | 1 | 0 | 0 | 10 |
| 2014 | 0 | 0 | 1 | 0 | 12 |
| 2014 | 0 | 0 | 0 | 0 | 5 |
+------+------+------+------+------+-----+
The output I want is as below
+------+-------+------+------+------+
| | Total | 2012 | 2013 | 2014 |
+------+-------+------+------+------+
| col1 | 17 | 9 | 6 | 2 |
| col2 | 23 | 14 | 9 | 0 |
| col3 | 29 | 9 | 6 | 14 |
| col4 | 17 | 9 | 6 | 2 |
+------+-------+------+------+------+
For row col1 in my output table
The column `Total` is `SUM(PQR)` when `col1` is 1 my input table
The value `17` is `SUM(PQR)` when `col1` is 1 in my input table
The value in col `2012` is `SUM(PQR)` when `col1` is 1 and `Yr=2012` in my input table
The value `9` is `SUM(PQR)` when `col1` is 1 and `Yr=2012` in my input table
Similarly 6 in column 2013 is SUM(PQR) when col1 is 1 and Yr is 2013
Hope the process to get output table is understood
I want to achieve the above result with SAS.
Any help will be really appreciated

Transpose the data into a categorical form and use PQR as a weight in your aggregating sum. Proc TABULATE is very adept at creating such tabulations.
data have;
infile datalines dlm='|'; input
Yr col1 col2 col3 col4 PQR ; datalines;
| 2012 | 1 | 0 | 1 | 1 | 2 |
| 2012 | 0 | 1 | 0 | 0 | 4 |
| 2013 | 1 | 1 | 1 | 1 | 6 |
| 2014 | 0 | 0 | 0 | 0 | 8 |
| 2012 | 1 | 0 | 1 | 1 | 7 |
| 2013 | 0 | 1 | 0 | 0 | 3 |
| 2014 | 1 | 0 | 1 | 1 | 2 |
| 2012 | 0 | 1 | 0 | 0 | 10 |
| 2014 | 0 | 0 | 1 | 0 | 12 |
| 2014 | 0 | 0 | 0 | 0 | 5 |
run;
data have_row_id / view=have_row_id;
set have;
rowid+1;
run;
proc transpose data=have_row_id out=have_categorical;
by rowid yr pqr;
run;
proc tabulate data=have_categorical;
class yr _name_;
var col1;
weight pqr;
table _name_='', col1='' * sum=''*f=8. * (all='Total' yr='') / nocellmerge;
run;
The ='' removes labelling cells and compactifies the output.

Related

How to Sum all working days for each month but restart from 0 for every month in power Bi Dax

I would like to know how could I get the Sum of all working days for specific month but in the table starting each month's Sum over again.
This is my DateTable Now with this query for Work Days Sum:
Work Days Sum =
CALCULATE (
SUM ( 'DateTable'[Is working Day] ),
ALL ( 'DateTable' ),
'DateTable'[Date] <= EARLIER ( 'DateTable'[Date] )
)
Date | Month Order | Is working day | Work Days Sum |
January - 21 331
2022/01/01 | 1 | 0 | |
2022/01/02 | 1 | 0 | |
2022/01/03 | 1 | 1 | 1 |
2022/01/04 | 1 | 1 | 2 |
2022/01/05 | 1 | 1 | 3 |
2022/01/06 | 1 | 1 | 4 |
.....
2022/01/27 | 1 | 1 | 19 |
2022/01/28 | 1 | 1 | 20 |
2022/01/29 | 1 | 0 | 20 |
2022/01/30 | 1 | 0 | 20 |
2022/01/31 | 1 | 1 | 21 |
February 20 890
2022/02/01 | 2 | 1 | 22 |
2022/02/02 | 2 | 1 | 23 |
2022/02/03 | 2 | 1 | 24 |
2022/02/04 | 2 | 1 | 25 |
|
|
V
Date | Month Order | Is working day | Work Days Sum |
January - 21 21
2022/01/01 | 1 | 0 | |
2022/01/02 | 1 | 0 | |
2022/01/03 | 1 | 1 | 1 |
2022/01/04 | 1 | 1 | 2 |
2022/01/05 | 1 | 1 | 3 |
2022/01/06 | 1 | 1 | 4 |
.....
2022/01/27 | 1 | 1 | 19 |
2022/01/28 | 1 | 1 | 20 |
2022/01/29 | 1 | 0 | 20 |
2022/01/30 | 1 | 0 | 20 |
2022/01/31 | 1 | 1 | 21 |
February 20 41
2022/02/01 | 2 | 1 | 1 |
2022/02/02 | 2 | 1 | 2 |
2022/02/03 | 2 | 1 | 3 |
2022/02/04 | 2 | 1 | 4 |
2022/02/05 | 2 | 0 | 4 |
.....
Any idea on how I can change my dax query to achieve output of second table below the down arrow would be much appreciated.

Calculating if Start time occurs within 1 hour range for each person (Single column)

I'm trying to figure out how to calculate if start time for each subject occurs within 1 hour of each other. However I only have one column and two groups with two different dates for each. I have no comparative variable to a dhms time difference as they occur under the same column variable. I have thought of doing a lag on the first time and then an intchk to calculate the 24 hour time difference between each but I don't think i have sufficient arguments for the intchk function. Alternatively could maybe do a proc transpose and then do a timediff between each array variable but that seems messy. Anyone have less clunky and more efficient solutions as i might be overthinking this.
Sample Data:
+----------+-------+------+------------+------------+
| CLIENTID | GRPID | date | start_date | start_time |
+----------+-------+------+------------+------------+
| 2 | 1 | -2 | 10Nov2019 | 23:19:52 |
| 3 | 1 | -2 | 10Nov2019 | 23:22:51 |
| 4 | 1 | -2 | 10Nov2019 | 23:20:16 |
| 5 | 1 | -2 | 10Nov2019 | 23:21:30 |
| 6 | 1 | -2 | 10Nov2019 | 23:23:51 |
| 23 | 2 | -2 | 11Nov2019 | 23:11:38 |
| 24 | 2 | -2 | 11Nov2019 | 23:38:33 |
| 25 | 2 | -2 | 11Nov2019 | 23:15:01 |
| 26 | 2 | -2 | 11Nov2019 | 23:08:43 |
+----------+-------+------+------------+------------+

You can compile the start date and time into a temporary datetime variable (_start_dt) to ease the comparison. Then, taking the first datetime for each GRPID as the baseline, you could use a RETAIN statement to pass that baseline datetime (_base_dt) down the related data rows and find the time difference (time_diff) using the INTCK function with a dtsecond interval.
proc sort data=your_data;
by grpid clientid;
run;
data your_results (drop=_:);
retain CLIENTID GRPID DATE start_date start_time _base_dt;
format _base_dt _start_dt datetime16. time_diff time8.;
set your_data;
by grpid clientid;
_start_dt = dhms(start_date,hour(start_time),minute(start_time),second(start_time));
if first.grpid then _base_dt = _start_dt;
time_diff = intck('dtsecond', _base_dt, _start_dt);
run;
This gives the following results dataset:
+----------+-------+------+------------+------------+-----------+
| CLIENTID | GRPID | date | start_date | start_time | time_diff |
+----------+-------+------+------------+------------+-----------+
| 2 | 1 | -2 | 10Nov2019 | 23:19:52 | 00:00:00 |
| 3 | 1 | -2 | 10Nov2019 | 23:22:51 | 00:02:59 |
| 4 | 1 | -2 | 10Nov2019 | 23:20:16 | 00:00:24 |
| 5 | 1 | -2 | 10Nov2019 | 23:21:30 | 00:01:38 |
| 6 | 1 | -2 | 10Nov2019 | 23:23:51 | 00:03:59 |
| 23 | 2 | -2 | 11Nov2019 | 23:11:38 | 00:00:00 |
| 24 | 2 | -2 | 11Nov2019 | 23:38:33 | 00:26:55 |
| 25 | 2 | -2 | 11Nov2019 | 23:15:01 | 00:03:23 |
| 26 | 2 | -2 | 11Nov2019 | 23:08:43 | -0:02:55 |
+----------+-------+------+------------+------------+-----------+
I think I’ve interpreted your requirements correctly.. Let me know if not.

It sounds like you want to check if the RANGE of the start_time over each group is < 1 hour:
Coerce the start_date to a datetime value and add the start_time before computing the range.
data have;
input
CLIENTID GRPID date start_date: date9. start_time: hhmmss6.;
format start_date date9. start_time time8.;
datalines;
2 1 -2 10Nov2019 23:19:52
3 1 -2 10Nov2019 23:22:51
4 1 -2 10Nov2019 23:20:16
5 1 -2 10Nov2019 23:21:30
6 1 -2 10Nov2019 23:23:51
23 2 -2 11Nov2019 23:11:38
24 2 -2 11Nov2019 23:38:33
25 2 -2 11Nov2019 23:15:01
26 2 -2 11Nov2019 23:08:43
run;
proc sql;
create table want (label="start range status by group") as
select
grpid,
range(dhms(start_date,0,0,0)+start_time) as start_range format time8.,
calculated start_range < '24:00:00't as one_hr_start_flag
from have
group by grpid;
If you want to disregard the groups and focus only on the time of day, disregarding the date, the range computation would be:
* Presuming 'noon' is the center of the day;
proc sql;
create table want (label="time of day start range status overall") as
select
range(start_time) as range format time8.,
calculated range < '24:00:00't as one_hr_start_flag
from have;
Looking at only time is always troublesome for the cases of when the time value is slightly after midnight.

Proc sql and macro variables

I am trying to run a code that should work on tables created considering different factors. As these factors can be more than 1, I decided to create a macro %let to list them:
%let list= factor1 factor2 ...;
What I would like to do is run a code to create these tables using different factors. For each factor, I computed using proc means the mean and the standard deviation, so I should have the variables &list._mean and &list._stddev in the table created by the proc means for each factor. This table is labelled as t2 and I need to join to another table, t1. From t1 I am considering all the variables.
My main difficulties are, therefore, in the proc sql:
proc sql;
create table new_table as
select t1.*
, t2.&list._mean as mean
, t2.&list._stddev as stddev
from table1 as t1
left join table2 as t2
on t1.time=t2.time
order by t2.&list.
quit;
This code is returning an error and I think because I am considering t2.factor1 factor2, i.e. t2 is only applied to the first factor, not to the second one.
What I would expect is the following:
proc sql;
create table new_table as
select t1.*
, t2.factor1._mean as mean
, t2.factor1._stddev as stddev
from table1 as t1
left join table2 as t2
on t1.time=t2.time
order by t2.factor1.
quit;
and another one for factor2.
UPDATE CODE:
%macro test_v1(
_dtb
,_input
,_output
,_time
,_factor
);
data &_input.;
set &_dtb..&_input.;
keep &_col_period. &_factor.;
run;
proc sort data = work.&_input.
out = &_input._1;
by &_factor. &_time.;
run;
%put ERROR: 2
proc means data=&_input._1 nonobs mean stddev;
class &_time.;
var &_factor.;
output out=&_input._n (drop=_TYPE_) mean= stddev= /autoname ;
run;
%put ERROR: 3
proc sql;
create table work.&_input._data as
select t1.*
,t2.&_factor._mean as mean
,t2.&_factor._stddev as stddev
from &_input. as t1
left join &_input._n as t2
on t1.&_time.=t2.&_time.
order by &_factor.;
quit;
%mend test_v1;
Then my question is on how I can consider multiple factors, defined into a macro as a list, as columns of tables and as input data into a macro (for example: %test(dataset, tablename, list).

I suspect that trying to use PROC SQL is what is making the problem hard. If you stick to just using normal SAS syntax your space delimited list of variable names is easy to use.
So taking your code and tweaking it a little:
%macro test_v1
(_dtb /* Input libref */
,_input /* Input member name */
,_output /* Output dataset */
,_time /* Class/By variable(s) */
,_factor /* Analysis variable(s) */
);
proc sort data= &_dtb..&_input. out=_temp1;
by &_time. ;
run;
proc means data=_temp1 nonobs mean stddev;
by &_time.;
var &_factor.;
output out=_temp2 (drop=_TYPE_) mean= stddev= /autoname ;
run;
data &_output. ;
merge _temp1 _temp2 ;
by &_time.;
run;
%mend test_v1;
We can then test it using SASHELP.CLASS by using SEX as the "time" variable and HEIGHT and WEIGHT as the analysis variables.
%test_v1(_dtb=sashelp,_input=class,_output=want,_time=sex,_factor=height weight);

You can try to add macro loop to your macros by scanning list of factors. It could look like:
%macro test(list);
%do i=1 to %sysfunc(countw(&list,%str( )));
%let factorname=%scan(&list,&i,%str( ));
/* if macro variable list equals factor1 factor2 then there would be
two iterations in loop, i=1 factorname=factor1 and i=2 factorname=2*/
/*your code here*/
%end
%mend test;
UPDATE:
%macro test(_input, _output, factors_list); %macro d; %mend d;
%do i=1 %to %sysfunc(countw(&factors_list,%str( )));
%let tfactor=%scan(&factors_list,&i,%str( ));
proc sort data = work.&_input.
out = &_input._1;
by &factors_list. time;
run;
proc means data=&_input._1 nonobs mean stddev;
class time;
var &tfactor.;
output out=&_input._num (drop=_TYPE_) mean= stddev= /autoname ;
run;
proc sql;
create table &_output._&tfactor as
select t1.*
, t2.&tfactor._mean as mean
, t2.&tfactor._stddev as stddev
from &_input as t1
left join &_input._num as t2
on t1.time=t2.time
order by t1.&tfactor;
quit;
%end;
%mend test;
%test(have,newdata,factor1 factor2);
Have dataset:
+------+---------+---------+
| time | factor1 | factor2 |
+------+---------+---------+
| 1 | 12345 | 1234 |
| 2 | 123 | 12 |
| 3 | 1 | -1 |
| 4 | -12 | -123 |
| 5 | -1234 | -12345 |
| 6 | 9876 | 987 |
| 7 | 98 | 8 |
| 8 | 9 | 7 |
| 1 | 1234 | 123 |
| 2 | 12 | 1 |
| 3 | 12 | -12 |
| 4 | -123 | -1234 |
| 5 | -12345 | -123456 |
| 6 | 987 | 98 |
| 7 | 9 | -9 |
| 8 | 1234 | 1234 |
+------+---------+---------+
NEWDATA_FACTOR1:
+------+---------+---------+---------+--------------+
| time | factor1 | factor2 | mean | stddev |
+------+---------+---------+---------+--------------+
| 5 | -12345 | -123456 | -6789.5 | 7856.6634458 |
| 5 | -1234 | -12345 | -6789.5 | 7856.6634458 |
| 4 | -123 | -1234 | -67.5 | 78.488852712 |
| 4 | -12 | -123 | -67.5 | 78.488852712 |
| 3 | 1 | -1 | 6.5 | 7.7781745931 |
| 7 | 9 | -9 | 53.5 | 62.932503526 |
| 8 | 9 | 7 | 621.5 | 866.20580695 |
| 3 | 12 | -12 | 6.5 | 7.7781745931 |
| 2 | 12 | 1 | 67.5 | 78.488852712 |
| 7 | 98 | 8 | 53.5 | 62.932503526 |
| 2 | 123 | 12 | 67.5 | 78.488852712 |
| 6 | 987 | 98 | 5431.5 | 6285.472178 |
| 1 | 1234 | 123 | 6789.5 | 7856.6634458 |
| 8 | 1234 | 1234 | 621.5 | 866.20580695 |
| 6 | 9876 | 987 | 5431.5 | 6285.472178 |
| 1 | 12345 | 1234 | 6789.5 | 7856.6634458 |
+------+---------+---------+---------+--------------+
NEWDATA_FACTOR2:
+------+---------+---------+----------+--------------+
| time | factor1 | factor2 | mean | stddev |
+------+---------+---------+----------+--------------+
| 5 | -12345 | -123456 | -67900.5 | 78567.341564 |
| 5 | -1234 | -12345 | -67900.5 | 78567.341564 |
| 4 | -123 | -1234 | -678.5 | 785.5956339 |
| 4 | -12 | -123 | -678.5 | 785.5956339 |
| 3 | 12 | -12 | -6.5 | 7.7781745931 |
| 7 | 9 | -9 | -0.5 | 12.02081528 |
| 3 | 1 | -1 | -6.5 | 7.7781745931 |
| 2 | 12 | 1 | 6.5 | 7.7781745931 |
| 8 | 9 | 7 | 620.5 | 867.62002052 |
| 7 | 98 | 8 | -0.5 | 12.02081528 |
| 2 | 123 | 12 | 6.5 | 7.7781745931 |
| 6 | 987 | 98 | 542.5 | 628.61792847 |
| 1 | 1234 | 123 | 678.5 | 785.5956339 |
| 6 | 9876 | 987 | 542.5 | 628.61792847 |
| 1 | 12345 | 1234 | 678.5 | 785.5956339 |
| 8 | 1234 | 1234 | 620.5 | 867.62002052 |
+------+---------+---------+----------+--------------+

SAS:add one column from tableB to tableA

I have two table looks like and I want to add column score to tableA from tableB, then get tableC, how to do in SAS?
the only rule is to add a column in tableA name "score " and its value is same as column "score" in tableB (which are all the same in tableB)
+----+---+---+---+
| id | b | c | d |
+----+---+---+---+
| 1 | 5 | 7 | 2 |
| 2 | 6 | 8 | 3 |
| 3 | 7 | 8 | 1 |
| 4 | 5 | 7 | 2 |
| 5 | 6 | 8 | 3 |
| 6 | 7 | 8 | 1 |
+----+---+---+---+
tableA
+---+---+-------+
| e | f | score |
+---+---+-------+
| 3 | 7 | 11 |
| 4 | 6 | 11 |
| 5 | 5 | 11 |
+---+---+-------+
tableB
+----+---+---+---+-------+
| id | b | c | d | score |
+----+---+---+---+-------+
| 1 | 5 | 7 | 2 | 11 |
| 2 | 6 | 8 | 3 | 11 |
| 3 | 7 | 8 | 1 | 11 |
| 4 | 5 | 7 | 2 | 11 |
| 5 | 6 | 8 | 3 | 11 |
| 6 | 7 | 8 | 1 | 11 |
+----+---+---+---+-------+
tableC

If the "id" is present in both tables, you can use the following to create Table C:
PROC SQL;
CREATE TABLE tableC AS
SELECT a.*, b.score
FROM tableA a JOIN tableB b
ON a.id = b.id;
QUIT;
Please confirm that this is what you need?

Pandas: consecutive rows' value change comparison

I have a Dataframe with date as index:
Index | Opp id | Pipeline_Type |Amount
20170104 | 1 | Null | 10
20170104 | 2 | Sou | 20
20170104 | 3 | Inf | 25
20170118 | 1 | Inf | 12
20170118 | 2 | Null | 27
20170118 | 3 | Inf | 25
Now I want to calculate number of records(Opp id) for which Pipeline type has changed or amount has changed (+/-diff). Above no of records will be 2 for pipeline_type as well as for amount.
Please help me frame the solution.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Yearly conditional sum in SAS - sas

Related

How to Sum all working days for each month but restart from 0 for every month in power Bi Dax

Calculating if Start time occurs within 1 hour range for each person (Single column)

Proc sql and macro variables

SAS:add one column from tableB to tableA

Pandas: consecutive rows' value change comparison

Categories

Resources