I have a column in dataframe which has article and its publication date (timestamp). I need to use this information to find out a freshness score of an article.
articleId publicationDate
0 581354 2017-09-17 15:16:55
1 581655 2017-09-18 07:37:51
2 580864 2017-09-16 06:44:39
3 581610 2017-09-18 06:30:30
4 581605 2017-09-18 07:22:24
Most recent article should get higher score. Timewindow should be half an hour (2 articles published in half an hour must get same score)
Some of the code below might be redundant but it seems to work:
df['score'] = df['publicationDate'] - df['publicationDate'].max()
df['score'] = (df['score'] / np.timedelta64(1, 'm')).apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x).rank(method='max')
So you convert timedelta to minutes, then round it to 30, and finally rank that value.
It can also be a one-liner if you please:
df['score'] = ((df['publicationDate'] - df['publicationDate'].max()) / np.timedelta64(1, 'm')).apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x).rank(method='max')
Explaination:
(df['publicationDate'] - df['publicationDate'].max() - subtract all dates from most recent one
(df['score'] / np.timedelta64(1, 'm')) - convert timedelta into minutes
.apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x) - roundup to 30 minutes excluding most recent timestamp
.rank(method='max') rank the results giving upper value to all those that have same rank.
EDIT:
To change rank of those older than 2 days you can use this:
df['diff'] = (df['publicationDate'] - df['publicationDate'].max()).apply(lambda x: x.days)
df.loc[df['diff']<=-2, 'score'] = 0
First line will give you timedelta in whole days, and second one will change rank to 0 where days are less or equal to -2.
Related
So, I have a table with many columns and what I am trying to do is increment the number of sales that have been in that hour and then reset it after after the next hour. I have tried to use summarize key word, but it doesn't seem to be letting me accumulate it. At the moment my data is in 15min bands so the data shows sales in that 15 minute time period. But I would like to accumulate it into next hour.
This is what I have now
15minperiod. Sales
09:00:00. 10
09:15:00. 10
09:30:00. 10
09:45:00. 10
10:00:00. 10
10:15:00. 9
10:20:00. 13
This is what I would like to get:
15minperiod. Sales Sales in hour
09:00:00. 10. 10
09:15:00. 10 20
09:30:00. 10 30
09:45:00. 10 40
10:00:00. 10 10
10:15:00. 9 19
10:20:00. 13 32
Yes, this can be done with a calculated column like this:
Sales in an hour =
var currentTime = [15minperiod]
return
CALCULATE(
SUM('Data'[Sales]);
FILTER(
ALL('Data');
'Data'[15minperiod] <= currentTime && HOUR('Data'[15minperiod]) = HOUR(currentTime)
)
)
So I am programming in arm assembly on raspbian and I am trying to convert the epoch time using c/c++ libraries because that is what I am allowed to do, but I am confused as to how to do it. If I simply bl time it will give me the epoch time, but I am confused as to how I would get the return value in r0, then convert that into the local time in assembly using C or C++ libraries. I know localtime/gmtime and strftime exist, but its not as easy as getting the epoch and just bl localtime or bl strftime. Then I want to format it where I only get the local time and maybe am/pm. I am not interested in the date. I just need some helpful code, or some direction to be pushed into. Thanks
Edit: If its easier to just convert using math that would also be helpful
So to not do your homework for you, here is an example. All of this is basic grade school math, no magic.
If I have 345678 pennies how does that brake down into various dollar and coin amounts.
There are 100 pennies in a dollar and 100 dollars in a 100 dollar bill, so
345678 / (100*100) = 34 remainder 5678
Looking at units
pennies / (pennies/dollar * dollars/hundred) =
(pennies * dollars * hundreds) / (pennies * dollars )
pennies and dollars cancel out left with hundreds which is correct
so 34 hundred dollar bills with a remainder of 5678 pennies
repeat for 20 dollar bills
5678 / (100 * 20) = 2 remainder 1678
10 dollar bills
1678 / (100 * 10) = 1 remainder 678
5 dollar bills
678 / (100 * 5) = 1 remainder 178
one dollar bills
178 / (100 * 1) = 1 remainder 78
50 cent pieces
78 / 50 = 1 remainder 28
quarters (25 cents)
28 / 25 = 1 remainder 3
dimes (10 cents) 0 remainder 3
nickles (5 cents) 0 remainder 3
pennies the remainder 3
so 345678 pennies is equal to
34 100 dollar bills +
2 20 dollar bills +
1 10 dollar bill +
1 5 dollar bill +
1 1 dollar bill +
1 half dollar coin +
1 quarter +
3 pennies
check the work
34 * 100 * 100 = 340000
2 * 100 * 20 = 4000
1 * 100 * 10 = 1000
1 * 100 * 5 = 500
1 * 100 = 100
1 * 50 = 50
1 * 25 = 25
1 * 3 = 3
add that up you get 345678
If I simply wanted to know how many quarters
345678 / (25 * 1) = 13827 quarters with a remainder of 3.
it all works the same with 60 seconds per minute 60 minutes per hour 24 hours per day. 365 days for 1970, 365 days for 1971, 366 days for 1972 365 days for 1973 and so on
31 days for january, 28 days for february 2019, 31 days for march and so on
easier to adjust for timezone first 60 seconds * 60 minutes * hours of adjustment
add or subtract that off as needed, then work that number either through the years/months/days or if you simply care about time of day then you only need to divide by seconds per day. or you can divide by seconds per day and get total days as a result with fraction of a day as a remainder, the fraction of a day is the time of day today and the total days you can then later subtract off the years then months to find the date.
Extra credit, what year will computers with 32 bit time counters using the 1970 epoch have a Y2K like roll over event (causing crashes and death and destruction across the planet just like Y2K)?
The programming language is irrelevant until the algorithm is understood and ideally coded in a favorite high level language, to confirm/prove the algorithm. Then port that knowledge to some other programming language.
Shortcutting the steps will sometimes get you there faster but when the shortcut fails it fails in spectacular fashion.
time1 = timedelta(days=2, hours=6.20)
time2 = timedelta(hours=20.10)
sum_time = time1 + time2
print str(sum_time)
print sum_time.total_seconds() / 3600
Output:
3 days, 2:18:00
74.3
How to get output 74:18:00 ?
With total_Seconds / 3600 you only get the hours in decimal format.
You can use divmod to break down the seconds into full hours, minutes and seconds:
divmod(a, b)
Take two (non complex) numbers as arguments and return a pair of numbers consisting of their quotient and remainder when using long division
The code would look like this (I added 34 seconds in time2 to check if the seconds part is correct):
from datetime import timedelta
time1 = timedelta(days=2, hours=6.20)
time2 = timedelta(hours=20.10, seconds=34)
sum_time = time1 + time2
print str(sum_time)
hours, seconds = divmod(sum_time.total_seconds(), 3600)
minutes, seconds = divmod(seconds, 60)
print "%d:%02d:%02d" % (hours, minutes, seconds)
and the output will be:
3 days, 2:18:34
74:18:34
The result of the first divmod is 74 hours (quotient) and a remainder of 1114 seconds.
The second divmod is feeded with the remaining seconds from the line before (1114) and gives a result of 18 minutes and 34 seconds.
I'm using Sqlite with C++ and have two similar problems :
1) I need to select 4 entries to make an interpolation.
For example, my table could look like this :
angle (double) | color (double)
0 0.1
30 0.5
60 0.9
90 1.5
... ...
300 2.9
330 3.5
If I want to interpolate the value corresponding to 95°, I will use the entries 60°, 90°, 120° and 150°.
To get those entries, my request will be SELECT color FORM mytable WHERE angle BETWEEN 60 and 150, no big deal.
Now if I want 335°, I will need 300°, 330°, 360°(=0°) and 390°(=30°).
My query will then be SELECT color FORM mytable WHERE angle BETWEEN 300 and 330 OR angle BETWEEN 0 and 30.
I can't use SELECT color FORM mytable WHERE angle BETWEEN 300 and 390 because this will only return 2 colors.
Can I use the C API and user defined functions to include some kind of modulo meaning in my queries ?
It would be nice if I could use a user defined function to use the query [...] BETWEEN 300 and 390 and get as result the rows 300, 330, 0 and 30.
2) An other table looks like this :
speed (double) | color (double) | var (double)
0 0.1 0
10 0.5 1
20 0.9 2
30 1.5 3
... ... ...
In reality due to symmetry, color(speed) = color(-speed) but var(-speed) = myfunc(var(speed)).
I would like to make queries such as SELECT * FROM mytable WHERE speed BETWEEN -20 and 10 and be able to make a few operations with the API on the "virtual" rows with a negative speed and return them as a regular result.
For example I would like the result of the query SELECT * FROM mytable WHERE speed BETWEEN -20 and 10 to be like this :
speed (double) | color (double) | var (double)
-20 0.9 myfunc(2)
-10 0.5 myfunc(1)
0 0.1 0
10 0.5 1
Is that possible ?
Thanks for your help :)
I would suggest to use a query with two intervals :
SELECT * from mytable WHERE (speed >= MIN(?1,?2) AND speed <= MAX(?1,?2)) OR ((MAX(?1,?2) > 360) AND (speed >= 0 AND speed <= MAX(?1,?2)%360));
This example works fine if ?1 and ?2 are positive.
I'm already (successfully) recording and plotting 3 diff. temperature values (preset, room and outside).
"rrdtool create " + config.app_dir + "/" + config.rrd_name + " " + //
"--start N --step 300 " + // data bucket 5 min long
"DS:temp_preset:GAUGE:600:-30:40 " + // human defined
"DS:temp_living:GAUGE:600:-30:40 " + // measured in living room
"DS:temp_outside:GAUGE:600:-30:40 " + // online value
"RRA:AVERAGE:0.5:1:288 " + // 5 min avg., last 24 hours
"RRA:AVERAGE:0.5:12:168 " + // 1 hour avg., last 7 days
"RRA:AVERAGE:0.5:48:315 " + // 4 hour avg., last 30 days
"RRA:AVERAGE:0.5:288:365" // 1 day avg., last 365 days
Let's say I want to add another DS, but this one for recording of an on/off (1/0) value -- heater working / heater not working.
Would this be a correct DST and the xfiles factor:
DS:heater_state:GAUGE:600:0:1 \
RRA:LAST:0:1:288
GAUGE is fine, just note that as data is consolidated you will end up with values between 0 and 1 representing the amount of time during the observation interval the heater has been on ... if you multiply the data with 100 you would get a percentage.
Do not do any of the XFF or LAST RRA bits, just your normal
"RRA:AVERAGE:0.5:1:288 " + // 5 min avg., last 24 hours
"RRA:AVERAGE:0.5:12:168 " + // 1 hour avg., last 7 days
"RRA:AVERAGE:0.5:48:315 " + // 4 hour avg., last 30 days
"RRA:AVERAGE:0.5:288:365" // 1 day avg., last 365 days
will do fine ... except for added detail you may want to add MIN and MAX variantes to the three top consolidation levels.