Count instances within a certain timeframe - sas

Good afternoon,
I have a set of data that has an ID and a create timestamp. If the timestamps are within 15 minutes of each other for each ID, it is considered 1 "occurrence". There can be more than 1 occurrence per ID. As soon as the timestamp is > 15 min from the first for the id, I need it to be considered as a new occurrence. Then it starts over. From the new occurrence, look at the next record and if it is 15 minutes from the timestamp consider it a single occurrence, so on and so forth.
I hope that makes sense.
example below.
ID TIMESTAMP OCCURRENCE
123abc 7/19/2022 16:32 1
123abc 7/19/2022 16:35
123abc 7/19/2022 16:37
123abc 7/19/2022 16:39
123abc 7/19/2022 17:32 1
123abc 7/19/2022 17:40
123abc 7/19/2022 17:42
123abc 7/20/2022 19:35 1
123abc 7/21/2022 16:35 1
123abc 7/22/2022 23:42 1
123abc 7/22/2022 23:44
123abc 7/22/2022 23:45
123abc 7/22/2022 23:58 1
456deg 7/19/2022 16:42 1
456deg 7/19/2022 16:44
456deg 7/19/2022 17:15 1
456deg 7/19/2022 17:18
I'm not sure where to start. I'm intermediate in sas but not a code macro'er. Could someone help me out or point me into the right direction?

If the data is sorted by ID and TIMESTAMP then you can quickly number the occurrences by just using the DIF() function to calculate the difference in seconds between the two datetime values.
data want;
set have;
by id timestamp;
gap = dif(timestamp);
format gap time12.;
if first.id then occurrence=1;
else if gap > '00:15:00't then occurence+1;
run;
Post actual data as text (not photographs) to get tested code.

Related

Get output with flag if same record comes next time in informatica power center

I need to put flag if records again comes from source in informatica power center
This need to achieve in informatica power center. After that I will use filter transformation to pass only flag =1 records to output. Basically I need to track changed record of flag and load as scd 2 in target table.
Input
Number Code Date
1234 3 2022/01/22
1234 3 2022/01/23
1234 4 2022/01/24
1234 3 2022/01/25
1234 3 2022/01/26
1234 2 2022/01/27
1234 4 2022/01/28
4567 1 2022/01/29
4567 1 2022/01/20
4567 3 2022/01/21
Output
Number Code Date Flag
1234 3 2022/01/22 1
1234 3 2022/01/23 2
1234 4 2022/01/24 1
1234 3 2022/01/25 1
1234 3 2022/01/26 2
1234 2 2022/01/27 1
4567 1 2022/01/29 1
4567 1 2022/01/20 2
4567 3 2022/01/21 1
You need to use variable ports in an expression transformation to track values in the previous record and set a flag depending on whether a value has changed or not.
Because Informatica evaluates variable ports in order, if the variable port that compares the current record (input port) with the previous record (variable port x) is before variable port x, variable port x will hold the value from the previous record.
There are plenty of detailed examples of this common pattern if you google for them e.g. this one

Flag everytime when ID change date DAX

I have table where with orders, articles belonging to orders and their shipping dates. What I want to do is, flag every time when shipping date changed or (when all dates for OrderID are the same) flag only once.
I tried to use calculated columns wrote in DAX, like nextdate, prevdate, nextorder, prevorder and reffer to them, but it doesn't work
I would appreciate every tip how to solve my prblem. Thanks!
OrderID
Article ID
Shipping date
Flag
123
1
01.01.2012
1
123
2
01.01.2012
0
123
1
02.01.2012
1
1234
12
15.03.2012
1
678
12
25.05.2014
1
678
345
25.05.2014
0
678
567
25.05.2014
0

SQL retrieving 1st run data from a date range

I need to retrieve 1st run information from an Oracle database for a particular date range. First run means ignoring rows where serial numbers that are run at a later time.
Note: 1 = Passed, 0 = Fail
Example of my data is:
SERIALNUM TIMESTAMP_ PASSED …{more data}
001 2015-01-07T11:22:50 0
002 2015-01-07T11:24:00 0
003 2015-01-07T11:25:50 1
001 2015-01-07T11:26:50 1
004 2015-01-07T11:28:50 1
005 2015-01-07T11:29:50 1
006 2015-01-07T11:31:50 1
002 2015-01-07T11:30:50 0
002 2015-01-07T11:33:50 1
007 2015-01-07T11:35:50 1
008 2015-01-07T11:36:50 1
0010 2015-01-07T11:39:50 1
009 2015-01-07T11:37:50 1
Desired results, 10 units tested, 2 failed, 8 passed.
Using Excel to get my proper 1st run data I:
[step1] Delete rows outside of my date range.
[step2] Sort by SERIALNUM (1st level) TIMESTAMP_ (2nd level).
[step3] Remove Duplicate SERIALNUM.
[step4] Then count the number of passed units (1 = pass).
This gives me my desired results.
Changing the order gives me undesired results.
I can get the data from the database from my selected range by using:
SELECT SERIALNUM, TIMESTAMP_, PASSED
FROM dbTble
WHERE TO_DATE('01/07/2015 09:00:00', 'MM/DD/YYYY HH24:MI:SS') <= TIMESTAMP_
AND TIMESTAMP_ < TO_DATE('01/07/2015 14:59:59', 'MM/DD/YYYY HH24:MI:SS')
ORDER BY SERIALNUM, TIMESTAMP_
It seems like I should be using subquery, but I saw a note that subquery cannot be sorted.
How can I accomplish this with SQL command?
This will get the first run (only) of runs within the date range. If there is a run for a particular serialnum earlier than the minimum date of the range, it won't be excluded:
SELECT serialnum, timestamp_, passed FROM (
SELECT serialnum, timestamp_, passed
, ROW_NUMBER() OVER ( PARTITION BY serialnum ORDER BY timestamp_ ) AS rn
FROM dbtable
WHERE TO_DATE('01/07/2015 09:00:00', 'MM/DD/YYYY HH24:MI:SS') <= timestamp_
AND timestamp_ < TO_DATE('01/07/2015 14:59:59', 'MM/DD/YYYY HH24:MI:SS')
) WHERE rn = 1
ORDER BY serialnum, timestamp_
The window function ROW_NUMBER() ranks according to earliest (use DESC after timestamp_ to force latest).
Hope this helps.

Pandas dataframe applying NA to part of the data

Let me preface this with I am new at using pandas so I'm sorry if this question is basic or answered before, I looked online and couldn't find what I needed.
I have a dataframe that consists of a baseball teams schedule. Some of the games have been played already and as a result the results from the game are inputed in the dataframe. However, for games that are yet to happen, there is only the time they are to be played (eg 1:35 pm).
So, I would like to convert all of the values of the games yet to happen into Na's.
Thank you
As requested here is what the results dataframe for the Arizona Diamondbacks contains
print MLB['ARI']
0 0
1 0
2 0
3 1
4 0
5 0
6 0
7 0
8 1
9 0
10 1
...
151 3:40 pm
152 8:40 pm
153 8:10 pm
154 4:10 pm
155 4:10 pm
156 8:10 pm
157 8:10 pm
158 1:10 pm
159 9:40 pm
160 8:10 pm
161 4:10 pm
Name: ARI, Length: 162, dtype: object
Couldn't figure out any direct solution, only iterative
for i in xrange(len(MLB)):
if 'pm' in MLB.['ARI'].iat[i] or 'am' in MLB.['ARI'].iat[i]:
MLB.['ARI'].iat[i] = np.nan
This should work if your actual values (1s and 0s) are also strings. If they are numbers, try:
for i in xrange(len(MLB)):
if type(MLB.['ARI'].iat[i]) != type(1):
MLB.['ARI'].iat[i] = np.nan
The more idiomatic way to do this would be with the vectorised string methods.
http://pandas.pydata.org/pandas-docs/stable/basics.html#vectorized-string-methods
mask = MLB['ARI'].str.contains('pm') #create boolean array
MLB['ARI'][mask] = np.nan #the column names goes first
Create the boolean array from and then use it to select the data you want.
Make sure that the column name goes before the masking array, otherwise you'll be acting on a copy of the data and your original dataframe wont get updated.
MLB['ARI'][mask] #returns a view on MLB datafrmae, will be updated
MLB[mask]['ARI'] #returns a copy of MLB, wont be updated.

rrd graph configurate query

I am updating my RRD file with some counts...
For example:
time: value:
12:00 120
12:05 135
12:10 154
12:20 144
12:25 0
12:30 23
13:35 36
here my RRD is updating as below logic:
((current value)-(previous value))/((current time)-(previous time))
eg. ((135-120))/5 = 15
but my problem is when it comes 0 the reading will be negative:
((0-144))/5
Here " 0 " value comes with system failure only( from where the data is fetched)..It must not display this reading graph.
How can I configure like when 0 comes it will not update the "RRD graph" (skip this reading (0-144/5)) and next time it will take reading like ((23-0)/5) but not (23-144/10)
When specifying the data sources when creating the RRD, you can specify which range of values is acceptable.
DS:data_source:GAUGE:10:1:U will only accept values above 1.
So if you get a 0 during an update, rrd will replace it with unknown and i assume it can find a way to discard it.