I want to write a code like this:
DATA TEST;
SET original;
IF hour = 30 AND minutes = . or 0 THEN new_minutes = 30;
ELSE if hour = 60 AND minutes = . or 0 then new_minutes = 60;
RUN;
If I used
IF hour = 30 and minutes = . OR minutes = 0 THEN new_minutes = 30; THEN even if the new value should be 60 it changes to 30. I there a way to get around this besides changing all 0 to . (or all . to 0)
It sounds like you need ( ).
IF hour = 30 AND (minutes = . or minutes = 0) THEN new_minutes = 30;
Or you could use the in statement
IF hour = 30 AND minutes in ( . , 0) THEN new_minutes = 30;
You could use coalesce for this; that returns the first nonmissing result.
if hour = 30 and coalesce(minutes,0) = 0 then minutes=30;
I would also suggest that you could write it this way to simplify the code:
if hour in (30,60) and coalesce(minutes,0) = 0 then minutes=hour;
And you could even generalize that first part potentially, depending on what are otherwise legal value for hour. If hour can't be over 24 then
if hour > 24 and coalesce(minutes,0) = 0 then minutes=hour;
Would be one possibility, as well as checking to see if hour is divisible by 15 or 10 or something like that.
You could even write it without any if statements, providing you can accept new_minutes being zero if the criteria are not satisfied :
data want ;
set have ;
new_minutes = (hour in(30,60) and minutes in(.,0)) * hour ;
run ;
Related
So, I have a table with many columns and what I am trying to do is increment the number of sales that have been in that hour and then reset it after after the next hour. I have tried to use summarize key word, but it doesn't seem to be letting me accumulate it. At the moment my data is in 15min bands so the data shows sales in that 15 minute time period. But I would like to accumulate it into next hour.
This is what I have now
15minperiod. Sales
09:00:00. 10
09:15:00. 10
09:30:00. 10
09:45:00. 10
10:00:00. 10
10:15:00. 9
10:20:00. 13
This is what I would like to get:
15minperiod. Sales Sales in hour
09:00:00. 10. 10
09:15:00. 10 20
09:30:00. 10 30
09:45:00. 10 40
10:00:00. 10 10
10:15:00. 9 19
10:20:00. 13 32
Yes, this can be done with a calculated column like this:
Sales in an hour =
var currentTime = [15minperiod]
return
CALCULATE(
SUM('Data'[Sales]);
FILTER(
ALL('Data');
'Data'[15minperiod] <= currentTime && HOUR('Data'[15minperiod]) = HOUR(currentTime)
)
)
I have a column in dataframe which has article and its publication date (timestamp). I need to use this information to find out a freshness score of an article.
articleId publicationDate
0 581354 2017-09-17 15:16:55
1 581655 2017-09-18 07:37:51
2 580864 2017-09-16 06:44:39
3 581610 2017-09-18 06:30:30
4 581605 2017-09-18 07:22:24
Most recent article should get higher score. Timewindow should be half an hour (2 articles published in half an hour must get same score)
Some of the code below might be redundant but it seems to work:
df['score'] = df['publicationDate'] - df['publicationDate'].max()
df['score'] = (df['score'] / np.timedelta64(1, 'm')).apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x).rank(method='max')
So you convert timedelta to minutes, then round it to 30, and finally rank that value.
It can also be a one-liner if you please:
df['score'] = ((df['publicationDate'] - df['publicationDate'].max()) / np.timedelta64(1, 'm')).apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x).rank(method='max')
Explaination:
(df['publicationDate'] - df['publicationDate'].max() - subtract all dates from most recent one
(df['score'] / np.timedelta64(1, 'm')) - convert timedelta into minutes
.apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x) - roundup to 30 minutes excluding most recent timestamp
.rank(method='max') rank the results giving upper value to all those that have same rank.
EDIT:
To change rank of those older than 2 days you can use this:
df['diff'] = (df['publicationDate'] - df['publicationDate'].max()).apply(lambda x: x.days)
df.loc[df['diff']<=-2, 'score'] = 0
First line will give you timedelta in whole days, and second one will change rank to 0 where days are less or equal to -2.
I'm attempting to create a new variable in my dataset that stores a number, which is derived from a computation on another number from the same observation.
* Here is what my dataset looks like:
SubjectID Score MyNewScore
1001 5442822 0
1002 6406134 0
1003 16 0
Now, the variable Score is the sum of up to 23 distinct numbers (I'll call them "Responses"), ranging from 1 to 8,388,608.
/* Example of response values
1st response = 1
2nd response = 2
3rd response = 4
4th response = 8
5th response = 16
6th response = 32
...
23rd response = 8,388,608
*/
MyNewScore contains a count of these distinct responses used to obtain the value in Score. In my example dataset, MyNewScore should equal 9 as there are 9 responses used to arrive at a sum of 5,442,822.
I have nested a forvalues loop within a while loop in Stata that successfully calculates MyNewScore but I do not know how to replace the 0 that currently exists in the dataset with the result of my nested-loops.
Stata code used to calculate the value I'm after:
// Build a loop to create a Roland Morris Score
local score = 16
local count = 0
while `score' != 0 {
local ItemCode
forvalues i=1/24
local j = 2^(`i' - 1)
if `j' >= `score' continue, break
local ItemCode `j'
* display "`ItemCode'"
}
local score = `score' - `ItemCode'
if `score' > 1 {
local count = `count' + 1
display "`count'"
}
else if `score' == 1 {
local count = `count' + 1
display "`count'"
continue, break
}
}
How do I replace the 0s in MyNewScore with the output from the nested-loops? I have tried nesting these two loops in another while loop, with a `replace' command although that simply applies the count from the first observation, to all observations in the dataset.
I think there's an error in the value of the 23rd response, it should be 2^(23-1), which is 4,194,304.
The sum of the first 4 responses is 15; that's 1+2+4+8 or 2^4-1. The sum of all 23 responses is 2^23 - 1 so the largest possible value for Score is 8,388,607.
There's no need for a loop over observations here. You start with a cloned copy of the Score variable. You loop over each response, starting from the highest down to 1. At each pass, if the current score is higher or equal to the value of the response, you count that response and you subtract the value from the score.
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(SubjectID Score)
1001 5442822
1002 6406134
1003 16
1004 1
1005 19
1006 15
1007 8388607
end
clonevar x = Score
gen wanted = 0
qui forvalues i=23(-1)1 {
local response = 2^(`i'-1)
replace wanted = wanted + 1 if x >= `response'
replace x = x - `response' if x >= `response'
}
I think all that you would need to do is nest your code in a loop that goes through each variable in your dataset, like so:
// get total number of observations in dataset
local N = _N
// go through each observation and run the while loop
forvalues observation = 1/`N' {
local score = Score[`observation']
local count = 0
// your while loop here
while `score' != 0 {
...
}
replace MyNewScore = `ItemCode' in `observation' // (or whatever value you're after)
}
Is this what you're after?
My question is about the conditional cumulative sum in SAS. I think it can be explained better by using sample. I have following dataset:
Date Value
01/01/2001 10
02/01/2001 20
03/01/2001 30
04/01/2001 15
05/01/2001 25
06/01/2001 35
07/01/2001 20
08/01/2001 45
09/01/2001 35
I want to find the cumulative sum of value. My condition is if cumulative sum more than 70, it should be 70 and the next cumulative sum should be began from the excessive value over 70 and so on.. More preciesly, my new data should be:
Date Value Cumulative
01/01/2001 10 10
02/01/2001 20 30
03/01/2001 30 60
04/01/2001 15 70
05/01/2001 25 30 ( 75-70=5+25=30)
06/01/2001 35 65
07/01/2001 20 70
08/01/2001 45 60 ( 85-70=15+45=60)
09/01/2001 35 95 ( because its last value)
Many thanks in advance
Here is a solution, although there is bound to be one more elegant. It's split into two parts with if eof to satisfy the last observation condition.
data want;
set test end = eof;
if eof ^= 1 then do;
if cumulative = 70 then cumulative = extra;
Cumulative + value;
extra = cumulative - 70;
if extra > 0 then do;
cumulative = 70;
end;
end;
retain extra;
retain cumulative;
if eof = 1 then cumulative + value;
run;
I try to construct Table 2 by writing below SAS code but what I get is the Table 1. I could not figure out what I missed. Help very appreciated Thank you.
&counter = 4
data new;set set1;
total = 0;
a = 1;
do i = 1 to &counter;
call symputX('a',a);
total = total + Tem_&a.;
a = symget('a')+1;
call symputX('a',a);
end;
run;
Table 1
ID Amt Tem_1 Tem_2 Tem_3 Tem_4 total
4 500 1 4 5 900 3600
5 200 50 100 200 0 0
9 50 40 0 0 0 0
10 500 70 100 250 0 0
Table 2
ID Amt Tem_1 Tem_2 Tem_3 Tem_4 total
4 500 1 4 5 900 910
5 200 50 100 200 0 350
9 50 40 0 0 0 40
10 500 70 100 250 0 420
You cannot use SYMPUT and SYMGET that way, unfortunately. While you can use them to store/retrieve macro variable values, you cannot change the code sent to the compiler after execution.
Basically, SAS has to figure out the machine code for what it's supposed to do on every iteration of the data step loop before it looks at any data (this is called compiling). So the problem is, you can't define tem_&a. and expect to be allowed to change what _&a. is during execution, because it would change what that machine code needs to do, and SAS couldn't prepare for that sufficiently.
So, what you wrote the &a. would be resolved when the program compiled, and whatever value &a. had before your data step woudl be what tem_&a. would turn into. Presumably the first time you ran this it errored (&a. does not resolve and then an error about & being illegal in variable names), and then eventually the call symput did its job and &a got a 4 in it at the end of the loop, and forever more your tem_&a. resolved to tem_4.
The solution? Don't use macros for this. Instead, use arrays.
data new;
set set1;
total = 0;
array tem[&counter.] tem_1-tem_&counter.;
a = 1;
do i = 1 to &counter; *or do i = 1 to dim(tem);
total = total + Tem[i];
end;
run;
Or, of course, just directly sum them.
data new;
set set1;
total = sum(of tem_1-tem_4);
run;
If you REALLY like macro variables, you could of course do this in a macro do loop, though this is not recommended for this purpose as it's really better to stick with data step techniques. But this should work, anyway, if you run this inside a macro (this won't be valid in open code).
data new;
set set1;
total = 0;
%do i = 1 %to &counter;
total = total + Tem_&i.;
%end;
run;