rrdtool fetch command always returns -nan value - rrdtool

I have an rrd file whose output from rrdtool info is as follows:
$rrdtool info ifInOctets.rrd
filename = "ifInOctets.rrd"
rrd_version = "0003"
step = 300
last_update = 1497312000
header_size = 1416
ds[ifInOctets].index = 0
ds[ifInOctets].type = "COUNTER"
ds[ifInOctets].minimal_heartbeat = 900
ds[ifInOctets].min = 0.0000000000e+00
ds[ifInOctets].max = 9.9999999990e+09
ds[ifInOctets].last_ds = "3.4019552910E+09"
ds[ifInOctets].value = 8.1322780295e+08
ds[ifInOctets].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 2016
rra[0].cur_row = 1544
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 1488
rra[1].cur_row = 754
rra[1].pdp_per_row = 12
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 1.0418342945e+08
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 366
rra[2].cur_row = 84
rra[2].pdp_per_row = 288
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 1.1808838469e+09
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[3].cf = "MAX"
rra[3].rows = 366
rra[3].cur_row = 29
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 1.3983258476e+07
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[4].cf = "MIN"
rra[4].rows = 366
rra[4].cur_row = 101
rra[4].pdp_per_row = 288
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = 5.7478020724e+05
rra[4].cdp_prep[0].unknown_datapoints = 0
And the XML export of the data looks (in part) like:
<!-- Tue Jun 06 10:00:00 EDT 2017 / 1496757600 -->
<row>
<v>+6.3341370319E06</v>
</row>
<!-- Tue Jun 06 10:05:00 EDT 2017 / 1496757900 -->
<row>
<v>+3.0319877350E06</v>
</row>
<!-- Tue Jun 06 10:10:00 EDT 2017 / 1496758200 -->
<row>
<v>+9.8097124846E06</v>
</row>
<!-- Tue Jun 06 10:15:00 EDT 2017 / 1496758500 -->
<row>
<v>+1.0005290356E07</v>
</row>
<!-- Tue Jun 06 10:20:00 EDT 2017 / 1496758800 -->
<row>
But for some reason all that rrdtool will output is
$rrdtool fetch ifInOctets.rrd AVERAGE
ifInOctets
1497541500: -nan
1497541800: -nan
1497542100: -nan
1497542400: -nan
1497542700: -nan
So far i've tried adjusting the min, max, and step but so far i've had no luck. Any help would be much appreciated.

This is because you haven't defined values for resolution, start and end.
From rrdtool usage,
Usage: rrdtool [options] command command_options
* fetch - fetch data out of an RRD
rrdtool fetch filename.rrd CF
[-r|--resolution resolution]
[-s|--start start] [-e|--end end]
[-a|--align-start]
[-d|--daemon <address>]
This will work,
rrdtool fetch ifInOctets.rrd AVERAGE -r 300 -s 1496757600 -e 1496758800

Related

How to create a measure which counts the # of row divide by the total # of row

I have a table like this:
Table
Name
HasEntry
Time
A
true
jan 22
A
false
jan 22
A
true
jan 22
A
true
jan 22
B
true
jan 22
B
false
jan 22
B
true
jan 22
I want a measure with gives `Ratio = "# of HasEntry = true / # of row of each name"
that mean for A the ratio is 3/4 = 0.75, B is 2/3 = 0.66
I tried doing
Ratio = DIVIDE(COUNTROWS(FILTER(Table, Table[HasENtry] = TRUE)), COUNT(Table[HasENtry]))
But when I use the ratio in my y-axis of the line chart, I get error 'Can't display the visul
'The function COUNT cannot work with values of type BOOLEAN?
So how to count the # of row for each name in my measure?
Use COUNTA() instead on booleans.
https://dax.guide/counta/

All unique combinations of given length from list of values in Libre Office

I have several, let's say six, different values. Can be numbers from 1 to 6.
I want to quickly list all unique combinations of four, so 1-2-3-4, 1-2-3-5 ... 3-4-5-6, all of them, but without any numbers showing more than once.
I'd like to do it in Libre Office Calc or Libre Office Base, but thus far I haven't had much luck searching for a way to do it. I'd be really grateful for any ideas.
there you go:
1234 1235 1236 1243 1245 1246 1253 1254 1256 1263 1264 1265 1324 1325 1326 1342 1345 1346 1352 1354 1356 1362 1364 1365 1423 1425 1426 1432 1435 1436 1452 1453 1456 1462 1463 1465 1523 1524 1526 1532 1534 1536 1542 1543 1546 1562 1563 1564 1623 1624 1625 1632 1634 1635 1642 1643 1645 1652 1653 1654 2134 2135 2136 2143 2145 2146 2153 2154 2156 2163 2164 2165 2314 2315 2316 2341 2345 2346 2351 2354 2356 2361 2364 2365 2413 2415 2416 2431 2435 2436 2451 2453 2456 2461 2463 2465 2513 2514 2516 2531 2534 2536 2541 2543 2546 2561 2563 2564 2613 2614 2615 2631 2634 2635 2641 2643 2645 2651 2653 2654 3124 3125 3126 3142 3145 3146 3152 3154 3156 3162 3164 3165 3214 3215 3216 3241 3245 3246 3251 3254 3256 3261 3264 3265 3412 3415 3416 3421 3425 3426 3451 3452 3456 3461 3462 3465 3512 3514 3516 3521 3524 3526 3541 3542 3546 3561 3562 3564 3612 3614 3615 3621 3624 3625 3641 3642 3645 3651 3652 3654 4123 4125 4126 4132 4135 4136 4152 4153 4156 4162 4163 4165 4213 4215 4216 4231 4235 4236 4251 4253 4256 4261 4263 4265 4312 4315 4316 4321 4325 4326 4351 4352 4356 4361 4362 4365 4512 4513 4516 4521 4523 4526 4531 4532 4536 4561 4562 4563 4612 4613 4615 4621 4623 4625 4631 4632 4635 4651 4652 4653 5123 5124 5126 5132 5134 5136 5142 5143 5146 5162 5163 5164 5213 5214 5216 5231 5234 5236 5241 5243 5246 5261 5263 5264 5312 5314 5316 5321 5324 5326 5341 5342 5346 5361 5362 5364 5412 5413 5416 5421 5423 5426 5431 5432 5436 5461 5462 5463 5612 5613 5614 5621 5623 5624 5631 5632 5634 5641 5642 5643 6123 6124 6125 6132 6134 6135 6142 6143 6145 6152 6153 6154 6213 6214 6215 6231 6234 6235 6241 6243 6245 6251 6253 6254 6312 6314 6315 6321 6324 6325 6341 6342 6345 6351 6352 6354 6412 6413 6415 6421 6423 6425 6431 6432 6435 6451 6452 6453 6512 6513 6514 6521 6523 6524 6531 6532 6534 6541 6542 6543
PS: i don't think that there is a way to generate them in libre office, since i'm not aware of programming languages in that program, however you can compute them online or with a your script
If you need the script, save this code in a .html file and open it in a browser
<html>
<body>
<script>
function finish(arr, n){
for(let el in arr)
if(el != n)
return true;
return false;
}
function updateIndexes(arr, n){
for( i = 0; i < arr.length ; i++ ){
if(arr[i] < n-1){
arr[i]++;
return true;
}
arr[i] = 0;
}
return false
}
let from = [1,2,3,4,5,6].map((el)=>el.toString());
let length = 4;
let separator = '-'
let indexes = Array(length).fill().map(el=>el=0);
let results = [];
do{
results.push(indexes.map(index => from[index]).join(separator));
} while (updateIndexes(indexes, from.length));
body = document.getElementsByTagName('body')[0];
results.filter((el)=>{
for(i = 0; i < el.length ; i++)
for(j = i+1 ; j < el.length ; j++)
if(el.charAt(i) == el.charAt(j) && el.charAt(i) != separator)
return false;
return true;
}).forEach(el => body.innerHTML+= el.toString()+'<br>');
</script>
</body>
</html>
what you can customize is:
let from = [1,2,3,4,5,6]; to what numbers/letters you want
let length = 4; to the length of the string you want
let separator = '-' to the separator you want (the separator here intended is the one between each sequence generated, so in this case will be 1-2-3-4 for example)
Python has a library called itertools that does this.
import itertools
l = itertools.permutations(range(1,7), 4) # between 1 and 6 of length 4
for t in list(l):
print("{}, ".format("-".join(str(i) for i in t)), end='')
Result:
1-2-3-4, 1-2-3-5, 1-2-3-6, 1-2-4-3, 1-2-4-5, 1-2-4-6, 1-2-5-3, 1-2-5-4, 1-2-5-6, 1-2-6-3, 1-2-6-4, 1-2-6-5, 1-3-2-4, 1-3-2-5, 1-3-2-6, 1-3-4-2, 1-3-4-5, 1-3-4-6, 1-3-5-2, 1-3-5-4, 1-3-5-6, 1-3-6-2, 1-3-6-4, 1-3-6-5, 1-4-2-3, 1-4-2-5, 1-4-2-6, 1-4-3-2, 1-4-3-5, 1-4-3-6, 1-4-5-2, 1-4-5-3, 1-4-5-6, 1-4-6-2, 1-4-6-3, 1-4-6-5, 1-5-2-3, 1-5-2-4, 1-5-2-6, 1-5-3-2, 1-5-3-4, 1-5-3-6, 1-5-4-2, 1-5-4-3, 1-5-4-6, 1-5-6-2, 1-5-6-3, 1-5-6-4, 1-6-2-3, 1-6-2-4, 1-6-2-5, 1-6-3-2, 1-6-3-4, 1-6-3-5, 1-6-4-2, 1-6-4-3, 1-6-4-5, 1-6-5-2, 1-6-5-3, 1-6-5-4, 2-1-3-4, 2-1-3-5, 2-1-3-6, 2-1-4-3, 2-1-4-5, 2-1-4-6, 2-1-5-3, 2-1-5-4, 2-1-5-6, 2-1-6-3, 2-1-6-4, 2-1-6-5, 2-3-1-4, 2-3-1-5, 2-3-1-6, 2-3-4-1, 2-3-4-5, 2-3-4-6, 2-3-5-1, 2-3-5-4, 2-3-5-6, 2-3-6-1, 2-3-6-4, 2-3-6-5, 2-4-1-3, 2-4-1-5, 2-4-1-6, 2-4-3-1, 2-4-3-5, 2-4-3-6, 2-4-5-1, 2-4-5-3, 2-4-5-6, 2-4-6-1, 2-4-6-3, 2-4-6-5, 2-5-1-3, 2-5-1-4, 2-5-1-6, 2-5-3-1, 2-5-3-4, 2-5-3-6, 2-5-4-1, 2-5-4-3, 2-5-4-6, 2-5-6-1, 2-5-6-3, 2-5-6-4, 2-6-1-3, 2-6-1-4, 2-6-1-5, 2-6-3-1, 2-6-3-4, 2-6-3-5, 2-6-4-1, 2-6-4-3, 2-6-4-5, 2-6-5-1, 2-6-5-3, 2-6-5-4, 3-1-2-4, 3-1-2-5, 3-1-2-6, 3-1-4-2, 3-1-4-5, 3-1-4-6, 3-1-5-2, 3-1-5-4, 3-1-5-6, 3-1-6-2, 3-1-6-4, 3-1-6-5, 3-2-1-4, 3-2-1-5, 3-2-1-6, 3-2-4-1, 3-2-4-5, 3-2-4-6, 3-2-5-1, 3-2-5-4, 3-2-5-6, 3-2-6-1, 3-2-6-4, 3-2-6-5, 3-4-1-2, 3-4-1-5, 3-4-1-6, 3-4-2-1, 3-4-2-5, 3-4-2-6, 3-4-5-1, 3-4-5-2, 3-4-5-6, 3-4-6-1, 3-4-6-2, 3-4-6-5, 3-5-1-2, 3-5-1-4, 3-5-1-6, 3-5-2-1, 3-5-2-4, 3-5-2-6, 3-5-4-1, 3-5-4-2, 3-5-4-6, 3-5-6-1, 3-5-6-2, 3-5-6-4, 3-6-1-2, 3-6-1-4, 3-6-1-5, 3-6-2-1, 3-6-2-4, 3-6-2-5, 3-6-4-1, 3-6-4-2, 3-6-4-5, 3-6-5-1, 3-6-5-2, 3-6-5-4, 4-1-2-3, 4-1-2-5, 4-1-2-6, 4-1-3-2, 4-1-3-5, 4-1-3-6, 4-1-5-2, 4-1-5-3, 4-1-5-6, 4-1-6-2, 4-1-6-3, 4-1-6-5, 4-2-1-3, 4-2-1-5, 4-2-1-6, 4-2-3-1, 4-2-3-5, 4-2-3-6, 4-2-5-1, 4-2-5-3, 4-2-5-6, 4-2-6-1, 4-2-6-3, 4-2-6-5, 4-3-1-2, 4-3-1-5, 4-3-1-6, 4-3-2-1, 4-3-2-5, 4-3-2-6, 4-3-5-1, 4-3-5-2, 4-3-5-6, 4-3-6-1, 4-3-6-2, 4-3-6-5, 4-5-1-2, 4-5-1-3, 4-5-1-6, 4-5-2-1, 4-5-2-3, 4-5-2-6, 4-5-3-1, 4-5-3-2, 4-5-3-6, 4-5-6-1, 4-5-6-2, 4-5-6-3, 4-6-1-2, 4-6-1-3, 4-6-1-5, 4-6-2-1, 4-6-2-3, 4-6-2-5, 4-6-3-1, 4-6-3-2, 4-6-3-5, 4-6-5-1, 4-6-5-2, 4-6-5-3, 5-1-2-3, 5-1-2-4, 5-1-2-6, 5-1-3-2, 5-1-3-4, 5-1-3-6, 5-1-4-2, 5-1-4-3, 5-1-4-6, 5-1-6-2, 5-1-6-3, 5-1-6-4, 5-2-1-3, 5-2-1-4, 5-2-1-6, 5-2-3-1, 5-2-3-4, 5-2-3-6, 5-2-4-1, 5-2-4-3, 5-2-4-6, 5-2-6-1, 5-2-6-3, 5-2-6-4, 5-3-1-2, 5-3-1-4, 5-3-1-6, 5-3-2-1, 5-3-2-4, 5-3-2-6, 5-3-4-1, 5-3-4-2, 5-3-4-6, 5-3-6-1, 5-3-6-2, 5-3-6-4, 5-4-1-2, 5-4-1-3, 5-4-1-6, 5-4-2-1, 5-4-2-3, 5-4-2-6, 5-4-3-1, 5-4-3-2, 5-4-3-6, 5-4-6-1, 5-4-6-2, 5-4-6-3, 5-6-1-2, 5-6-1-3, 5-6-1-4, 5-6-2-1, 5-6-2-3, 5-6-2-4, 5-6-3-1, 5-6-3-2, 5-6-3-4, 5-6-4-1, 5-6-4-2, 5-6-4-3, 6-1-2-3, 6-1-2-4, 6-1-2-5, 6-1-3-2, 6-1-3-4, 6-1-3-5, 6-1-4-2, 6-1-4-3, 6-1-4-5, 6-1-5-2, 6-1-5-3, 6-1-5-4, 6-2-1-3, 6-2-1-4, 6-2-1-5, 6-2-3-1, 6-2-3-4, 6-2-3-5, 6-2-4-1, 6-2-4-3, 6-2-4-5, 6-2-5-1, 6-2-5-3, 6-2-5-4, 6-3-1-2, 6-3-1-4, 6-3-1-5, 6-3-2-1, 6-3-2-4, 6-3-2-5, 6-3-4-1, 6-3-4-2, 6-3-4-5, 6-3-5-1, 6-3-5-2, 6-3-5-4, 6-4-1-2, 6-4-1-3, 6-4-1-5, 6-4-2-1, 6-4-2-3, 6-4-2-5, 6-4-3-1, 6-4-3-2, 6-4-3-5, 6-4-5-1, 6-4-5-2, 6-4-5-3, 6-5-1-2, 6-5-1-3, 6-5-1-4, 6-5-2-1, 6-5-2-3, 6-5-2-4, 6-5-3-1, 6-5-3-2, 6-5-3-4, 6-5-4-1, 6-5-4-2, 6-5-4-3,
LibreOffice allows Python scripting, so the code can be added to Calc or Base by including it in a Python-UNO macro.

pandas - group by: create aggregation function using multiple columns

I have the following data frame:
id my_year my_month waiting_time target
001 2018 1 95 1
002 2018 1 3 3
003 2018 1 4 0
004 2018 1 40 1
005 2018 2 97 1
006 2018 2 3 3
007 2018 3 4 0
008 2018 3 40 1
I want to groupby my_year and my_month, then in each group I want to compute the my_rate based on
(# of records with waiting_time <= 90 and target = 1)/ total_records in the group
i.e. I am expecting output like:
my_year my_month my_rate
2018 1 0.25
2018 2 0.0
2018 3 0.5
I wrote the following code to compute the desired value my_rate:
def my_rate(data):
waiting_time_list = data['waiting_time']
target_list = data['target']
total = len(data)
my_count = 0
for i in range(len(data)):
if total_waiting_time_list[i] <= 90 and target_list[i] == 1:
my_count += 1
rate = float(my_count)/float(total)
return rate
df.groupby(['my_year','my_month']).apply(my_rate)
However, I got the following error:
KeyError 0
KeyErrorTraceback (most recent call last)
<ipython-input-29-5c4399cefd05> in <module>()
17
---> 18 df.groupby(['my_year','my_month']).apply(my_rate)
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
714 # ignore SettingWithCopy here in case the user mutates
715 with option_context('mode.chained_assignment', None):
--> 716 return self._python_apply_general(f)
717
718 def _python_apply_general(self, f):
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, f)
718 def _python_apply_general(self, f):
719 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 720 self.axis)
721
722 return self._wrap_applied_output(
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, f, data, axis)
1727 # group might be modified
1728 group_axes = _get_axes(group)
-> 1729 res = f(group)
1730 if not _is_indexed_like(res, group_axes):
1731 mutated = True
<ipython-input-29-5c4399cefd05> in conversion_rate(data)
8 #print total_waiting_time_list[i], target_list[i]
9 #print i, total_waiting_time_list[i], target_list[i]
---> 10 if total_waiting_time_list[i] <= 90:# and target_list[i] == 1:
11 convert_90_count += 1
12 #print 'convert ', convert_90_count
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
599 key = com._apply_if_callable(key, self)
600 try:
--> 601 result = self.index.get_value(self, key)
602
603 if not is_scalar(result):
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_value(self, series, key)
2426 try:
2427 return self._engine.get_value(s, k,
-> 2428 tz=getattr(series.dtype, 'tz', None))
2429 except KeyError as e1:
2430 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4363)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4046)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13913)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13857)()
KeyError: 0
Any idea what I did wrong here? And how do I fix it? Thanks!
I believe better is use mean of boolean mask per groups:
def my_rate(x):
return ((x['waiting_time'] <= 90) & (x['target'] == 1)).mean()
df = df.groupby(['my_year','my_month']).apply(my_rate).reset_index(name='my_rate')
print (df)
my_year my_month my_rate
0 2018 1 0.25
1 2018 2 0.00
2 2018 3 0.50
Any idea what I did wrong here?
Problem is waiting_time_list and target_list are not lists, but Series:
waiting_time_list = data['waiting_time']
target_list = data['target']
print (type(waiting_time_list))
<class 'pandas.core.series.Series'>
print (type(target_list))
<class 'pandas.core.series.Series'>
So if want indexing it failed, because in second group are indices 4,5, not 0,1.
if waiting_time_list[i] <= 90 and target_list[i] == 1:
For avoid it is possible convert Series to list:
waiting_time_list = data['waiting_time'].tolist()
target_list = data['target'].tolist()

python2 pandas: how to merge a part of another dataframe to a dataframe

I have a dataframe(df1) as following:
datetime m d 1d 2d 3d
2014-01-01 1 1 2 2 3
2014-01-02 1 2 3 4 3
2014-01-03 1 3 1 2 3
...........
2014-12-01 12 1 2 2 3
2014-12-31 12 31 2 2 3
Also I have another dataframe(df2) as following:
datetime m d
2015-01-02 1 2
2015-01-03 1 3
...........
2015-12-01 12 1
2015-12-31 12 31
I want to merge the 1d 2d 3d columns value of df1 to df2.
There are two conditions:
(1) only m and d are the same in both df1 and df2 can merge.
(2) if the index of df2 index % 30 ==0 don't merge, the value of 1d 2d 3d of these index can be Nan.
I mean I want the new dataframe of df2 like as following:
datetime m d 1d 2d 3d
2015-01-02 1 2 Nan Nan Nan
2015-01-03 1 3 1 2 3
...........
2015-12-01 12 1 2 2 3
2015-12-31 12 31 2 2 3
Thanks in advance!
I think you need add NaNs by loc and then merge with left join:
np.random.seed(10)
N = 365
rng = pd.date_range('2015-01-01', periods=N)
df_tr_2014 = pd.DataFrame(np.random.randint(10, size=(N, 3)), index=rng).reset_index()
df_tr_2014.columns = ['datetime','7d','15d','20d']
df_tr_2014.insert(1,'month', df_tr_2014['datetime'].dt.month)
df_tr_2014.insert(2,'day_m', df_tr_2014['datetime'].dt.day)
#print (df_tr_2014.head())
N = 366
rng = pd.date_range('2016-01-01', periods=N)
df_te = pd.DataFrame(index=rng)
df_te['month'] = df_te.index.month
df_te['day_m'] = df_te.index.day
df_te = df_te.reset_index()
#print (df_te.tail())
df2 = df_te.copy()
df1 = df_tr_2014.copy()
df1 = df1.set_index('datetime')
df1.index += pd.offsets.DateOffset(years=1)
#correct 29 February
y = df1.index[0].year
df1 = df1.reindex(pd.date_range(pd.datetime(y,1,1), pd.datetime(y,12,31)))
idx = df1.index[(df1.index.month == 2) & (df1.index.day == 29)]
df1.loc[idx, :] = df1.loc[idx - pd.Timedelta(1, unit='d'), :].values
df1.loc[idx, 'day_m'] = idx.day
df1[['month','day_m']] = df1[['month','day_m']].astype(int)
df1[['7d','15d', '20d']] = df1[['7d','15d', '20d']].astype(float)
df1.loc[np.arange(len(df1.index)) % 30 == 0, ['7d','15d','20d']] = 0
df1 = df1.reset_index()
print (df1.iloc[57:62])
index month day_m 7d 15d 20d
57 2016-02-27 2 27 2.0 0.0 1.0
58 2016-02-28 2 28 2.0 3.0 5.0
59 2016-02-29 2 29 2.0 3.0 5.0
60 2016-03-01 3 1 0.0 0.0 0.0
61 2016-03-02 3 2 7.0 6.0 9.0
Why don't you just remove the rows in df1 that don't match (m, d) pairs in df2?
df_new = df2.drop(df2[(not ((df2.m == df1.m) & (df2.n == df1.n)).any()) or (df2.index % 30 == 0)].index)
Or something along those lines.
Link to a related answer.
I'm not enormously familiar with Pandas and have not tested the above example.
df_te is df2
df_tr_2014 is df1
7d 15d 20 is 1d 2d 3d respectively in question. size_df_te is the length of df_te, month and day_m are m, d in df2
df_te['7d'] = 0
df_te['15d'] = 0
df_te['20d'] = 0
mj = 0
dj = 0
for i in range(size_df_te):
if i%30 != 0:
m = df_te.loc[i,'month']
d = df_te.loc[i,'day_m']
if (m== 2) & (d == 29):
m = 2
d = 28
dk_7 = df_tr_2014.loc[(df_tr_2014['month']==m) & (df_tr_2014['day_m']==d)]['7d']
dk_15 = df_tr_2014.loc[(df_tr_2014['month']==m) & (df_tr_2014['day_m']==d)]['15d']
dk_20 = df_tr_2014.loc[(df_tr_2014['month']==m) & (df_tr_2014['day_m']==d)]['20d']
df_te.loc[i,'7d'] = float(dk_7)
df_te.loc[i,'15d'] = float(dk_15)
df_te.loc[i,'20d'] = float(dk_20)
EDIT:
Sample data:
np.random.seed(10)
N = 365
rng = pd.date_range('2014-01-01', periods=N)
df_tr_2014 = pd.DataFrame(np.random.randint(10, size=(N, 3)), index=rng).reset_index()
df_tr_2014.columns = ['datetime','7d','15d','20d']
df_tr_2014.insert(1,'month', df_tr_2014['datetime'].dt.month)
df_tr_2014.insert(2,'day_m', df_tr_2014['datetime'].dt.day)
#print (df_tr_2014.head())
N = 365
rng = pd.date_range('2015-01-01', periods=N)
df_te = pd.DataFrame(index=rng)
df_te['month'] = df_te.index.month
df_te['day_m'] = df_te.index.day
df_te = df_te.reset_index()
#print (df_te.head())

Structure the output

I am trying to get a proper structured output into a csv.
Input:
00022d9064bc,1073260801,1073260803,819251,440006
00022d9064bc,1073260803,1073260810,819213,439954
00904b4557d3,1073260803,1073261920,817526,439458
00022de73863,1073260804,1073265410,817558,439525
00904b14b494,1073260804,1073262625,817558,439525
00022d1406df,1073260807,1073260809,820428,438735
00022d9064bc,1073260801,1073260803,819251,440006
00022dba8f51,1073260801,1073260803,819251,440006
00022de1c6c1,1073260801,1073260803,819251,440006
003065f30f37,1073260801,1073260803,819251,440006
00904b48a3b6,1073260801,1073260803,819251,440006
00904b83a0ea,1073260803,1073260810,819213,439954
00904b85d3cf,1073260803,1073261920,817526,439458
00904b14b494,1073260804,1073265410,817558,439525
00904b99499c,1073260804,1073262625,817558,439525
00904bb96e83,1073260804,1073265163,817558,439525
00904bf91b75,1073260804,1073263786,817558,439525
Code:
import pandas as pd
from datetime import datetime,time
import numpy as np
fn = r'00_Dart.csv'
cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2']
df = pd.read_csv(fn, header=None, names=cols)
df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime
# 'start' and 'end' for the reporting DF: `r`
# which will contain equal intervals (1 hour in this case)
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)
# building reporting DF: `r`
freq = '1H' # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)
# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1
r['LogCount'] = 0
r['UniqueIDCount'] = 0
for i, row in r.iterrows():
# intervals overlap test
# https://en.wikipedia.org/wiki/Interval_tree#Overlap_test
# i've slightly simplified the calculations of m and d
# by getting rid of division by 2,
# because it can be done eliminating common terms
u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID
r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]
r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time
#df.to_csv((r[r.LogCount > 0])'example.csv')
#print(r[r.LogCount > 0]) -- This gives the correct count and unique count but I want to write the output in a structure.
print (r['StartTime'], ['EndTime'], ['Day'], ['LogCount'], ['UniqueIDCount'])
Output: This is the output that I am getting which is not what I am looking for.
(2004-01-05 00:00:00 00:00:00
2004-01-05 01:00:00 01:00:00
2004-01-05 02:00:00 02:00:00
2004-01-05 03:00:00 03:00:00
2004-01-05 04:00:00 04:00:00
2004-01-05 05:00:00 05:00:00
2004-01-05 06:00:00 06:00:00
2004-01-05 07:00:00 07:00:00
2004-01-05 08:00:00 08:00:00
2004-01-05 09:00:00 09:00:00
And the Expected output headers are
StartTime, EndTime, Day, Count, UniqueIDCount
How do I structure the Write statement in code to have the above mentioned columns in my output csv.
Try This:
rout = r[['StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ]
print rout
rout.to_csv('results.csv', index=False)