What is RRD last_ds? - rrdtool

I am playing with MRTG and I configure it to use RRD to record the performance data (which is a switch interface byte counter). When I use "rrdtool info" to check the RRD file, I see that ds[ds0].last_ds is a number and it changes everytime the new data is inputted
# rrdtool info 10.0.3.129_24_bw.rrd
filename = "10.0.3.129_24_bw.rrd"
rrd_version = "0003"
step = 60
last_update = 1482950882
header_size = 2912
ds[ds0].index = 0
ds[ds0].type = "COUNTER"
ds[ds0].minimal_heartbeat = 600
ds[ds0].min = 0.0000000000e+00
ds[ds0].max = 1.2500000000e+08
ds[ds0].last_ds = "6332648954"
ds[ds0].value = 3.5016393443e+01
ds[ds0].unknown_sec = 0
ds[ds1].index = 1
ds[ds1].type = "COUNTER"
ds[ds1].minimal_heartbeat = 600
ds[ds1].min = 0.0000000000e+00
ds[ds1].max = 1.2500000000e+08
ds[ds1].last_ds = "32104385407"
ds[ds1].value = 5.3344262295e+01
ds[ds1].unknown_sec = 0
What is it exactly? Thanks!

last_ds is the last received value of the DS, prior to calculation of Rate, at last_update time. When a new update comes in with a new DS value, this is used to create the new value for the update interval new_value = ( new_ds - last_ds ) / ( current_time - last_update ) and this is then assigned to one (or more) Intervals (according to Data Normalisation) in order to be able to set values in the various RRAs.
last_ds is different from value as it is before rate calculations and normalisation.

Related

Python script | long running | Need suggestions to optimize

I have written this script to generate a dataset which would contain 15 minute time intervals based on the inputs provided for operational hours for all days of a week for 365 days.
example: Let us say Store 1 opens at 9 AM and closes at 9 PM on all days. That is 12 hours everyday. 12*4 = 48(15 minute periods a day). 48 * 365 = 17520 (15 minute periods for a year).
The sample dataset only contains 5 sites but there are about 9000 sites that this script needs to generate data for.
The script obviously runs for a handful of sites(100) and couple of days(2) but needs to run for sites(9000) and 365 days.
Looking for suggestions to make this run faster. This will be running on a local machine.
input data: https://drive.google.com/open?id=1uLYRUsJ2vM-TIGPvt5RhHDhTq3vr4V2y
output data: https://drive.google.com/open?id=13MZCQXfVDLBLFbbmmVagIJtm6LFDOk_T
Please let me know if I can help with anything more to get this answered.
def datetime_range(start, end, delta):
current = start
while current < end:
yield current
current += delta
import pandas as pd
import numpy as np
import cProfile
from datetime import timedelta, date, datetime
#inputs
empty_data = pd.DataFrame(columns=['store','timestamp'])
start_dt = date(2019, 1, 1)
days = 365
data = "input data | attached to the post"
for i in range(days):
for j in range(len(data.store)):
curr_date = start_dt + timedelta(days=i)
curr_date_year = curr_date.year
curr_date_month = curr_date.month
curr_date_day = curr_date.day
weekno = curr_date.weekday()
if weekno<5:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['m_f_open_hrs'].iloc[j],data['m_f_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['m_f_close_hrs'].iloc[j],data['m_f_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
elif weekno==5:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['sat_open_hrs'].iloc[j],data['sat_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['sat_close_hrs'].iloc[j],data['sat_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
else:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['sun_open_hrs'].iloc[j],data['sun_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['sun_close_hrs'].iloc[j],data['sun_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
final_data = empty_data
I think the most time consuming tasks in your script are the datetime calculations.
You should try to make all of those calculations using UNIX Time. It basically represents time as an integer that counts seconds... so you could take two UNIX dates and see the difference just by doing simple subtraction.
In my opinion you should perform all the operations like that... and when the process has finished you can make all the datetime conversions to a more readable date format.
Other thing that you should change in your script is all the code repetition that is almost identical. It won't improve the performance, but it improves readability, debugging and your skills as a programmer. As a simple example I have refactored some of the code (you probably can do better than what I did, but this is just an example).
def datetime_range(start, end, delta):
current = start
while current < end:
yield current
current += delta
from datetime import timedelta, date, datetime
import numpy as np
import cProfile
import pandas as pd
# inputs
empty_data = pd.DataFrame(columns=['store', 'timestamp'])
start_dt = date(2019, 1, 1)
days = 365
data = "input data | attached to the post"
for i in range(days):
for j in range(len(data.store)):
curr_date = start_dt + timedelta(days=i)
curr_date_year = curr_date.year
curr_date_month = curr_date.month
curr_date_day = curr_date.day
weekno = curr_date.weekday()
week_range = 'sun'
if weekno < 5:
week_range = 'm_f'
elif weekno == 5:
week_range = 'sat'
first_time = datetime(curr_date_year,curr_date_month,curr_date_day,data[week_range + '_open_hrs'].iloc[j],data[week_range + '_open_min'].iloc[j])
second_time = datetime(curr_date_year,curr_date_month,curr_date_day, data[week_range + '_close_hrs'].iloc[j],data[week_range + '_close_min'].iloc[j])
dts = [ dt.strftime('%Y-%m-%d %H:%M') for dt in datetime_range(first_time, second_time, timedelta(minutes=15)) ]
vert = pd.DataFrame(dts, columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
final_data = empty_data
Good luck!

How to use posgresql 'interval' in Django?

Here is my PostgreSQL statement.
select round(sum("amount") filter(where "date">=now()-interval '12 months')/12,0) as avg_12month from "amountTab"
How to use this in Django?
I have an object called 'Devc', with attribute 'date'.
I want to get the sum of the specific data within past 12 months, not past 365 days.
You can try this to get the data within the past 12 months.
today= datetime.now()
current_month_first_day = today.replace(day = 1)
previous_month_last_day = current_month_first_day - timedelta(days = 1)
past_12_month_first_day = previous_month_last_day - timedelta(days = 360)
past_12_month_first_day = past_12_month_first_day.replace(day = 1)
past_12_month_avg = Devc.objects.filter(date__range=(past_12_month_first_day,current_month_first_day)).aggregate(Sum('amount'))['amount']

Joining of curve fitting models

I have this 7 quasi-lorentzian curves which are fitted to my data.
and I would like to join them, to make one connected curved line. Do You have any ideas how to do this? I've read about ComposingModel at lmfit documentation, but it's not clear how to do this.
Here is a sample of my code of two fitted curves.
for dataset in [Bxfft]:
dataset = np.asarray(dataset)
freqs, psd = signal.welch(dataset, fs=266336/300, window='hamming', nperseg=16192, scaling='spectrum')
plt.semilogy(freqs[0:-7000], psd[0:-7000]/dataset.size**0, color='r', label='Bx')
x = freqs[100:-7900]
y = psd[100:-7900]
# 8 Hz
model = Model(lorentzian)
params = model.make_params(amp=6, cen=5, sig=1, e=0)
result = model.fit(y, params, x=x)
final_fit = result.best_fit
print "8 Hz mode"
print(result.fit_report(min_correl=0.25))
plt.plot(x, final_fit, 'k-', linewidth=2)
# 14 Hz
x2 = freqs[220:-7780]
y2 = psd[220:-7780]
model2 = Model(lorentzian)
pars2 = model2.make_params(amp=6, cen=10, sig=3, e=0)
pars2['amp'].value = 6
result2 = model2.fit(y2, pars2, x=x2)
final_fit2 = result2.best_fit
print "14 Hz mode"
print(result2.fit_report(min_correl=0.25))
plt.plot(x2, final_fit2, 'k-', linewidth=2)
UPDATE!!!
I've used some hints from user #MNewville, who posted an answer and using his code I got this:
So my code is similar to his, but extended with each peak. What I'm struggling now is replacing ready LorentzModel with my own.
The problem is when I do this, the code gives me an error like this.
C:\Python27\lib\site-packages\lmfit\printfuncs.py:153: RuntimeWarning:
invalid value encountered in double_scalars [[Model]] spercent =
'({0:.2%})'.format(abs(par.stderr/par.value))
About my own model:
def lorentzian(x, amp, cen, sig, e):
return (amp*(1-e)) / ((pow((1.0 * x - cen), 2)) + (pow(sig, 2)))
peak1 = Model(lorentzian, prefix='p1_')
peak2 = Model(lorentzian, prefix='p2_')
peak3 = Model(lorentzian, prefix='p3_')
# make composite by adding (or multiplying, etc) components
model = peak1 + peak2 + peak3
# make parameters for the full model, setting initial values
# using the prefixes
params = model.make_params(p1_amp=6, p1_cen=8, p1_sig=1, p1_e=0,
p2_ampe=16, p2_cen=14, p2_sig=3, p2_e=0,
p3_amp=16, p3_cen=21, p3_sig=3, p3_e=0,)
rest of the code is similar like at #MNewville
[![enter image description here][3]][3]
A composite model for 3 Lorentzians would look like this:
from lmfit import Model, LorentzianModel
peak1 = LorentzianModel(prefix='p1_')
peak2 = LorentzianModel(prefix='p2_')
peak3 = LorentzianModel(prefix='p3_')
# make composite by adding (or multiplying, etc) components
model = peak1 + peaks2 + peak3
# make parameters for the full model, setting initial values
# using the prefixes
params = model.make_params(p1_amplitude=10, p1_center=8, p1_sigma=3,
p2_amplitude=10, p2_center=15, p2_sigma=3,
p3_amplitude=10, p3_center=20, p3_sigma=3)
# perhaps set bounds to prevent peaks from swapping or crazy values
params['p1_amplitude'].min = 0
params['p2_amplitude'].min = 0
params['p3_amplitude'].min = 0
params['p1_sigma'].min = 0
params['p2_sigma'].min = 0
params['p3_sigma'].min = 0
params['p1_center'].min = 2
params['p1_center'].max = 11
params['p2_center'].min = 10
params['p2_center'].max = 18
params['p3_center'].min = 17
params['p3_center'].max = 25
# then do a fit over the full data range
result = model.fit(y, params, x=x)
I think the key parts you were missing were: a) just add models together, and b) use prefix to avoid name collisions of parameters.
I hope that is enough to get you started...

How to get the right demand in the newsvendor model?

So what I'm trying to accomplish is the newsvendor problem, where the program is supposed to run and give me the demand that gives me the best chances of turning a profit as per this link.
But the issue is that when I run the below code it gives me the right demand along with the demand for it.
q = {5:0.2 ,6:0.25 ,7:0.3 ,8:.25}
w = 55
p = 80
s = 40
cul5 = 0.2
cul6 = 0.25
cul7 = 0.3
cul8 = .25
overage = w - s
underage = p - w
crit = overage/float((underage)+(overage)) # its better to use floats within the parenthesis in python 2
cumul_q = {}
cumulativevalue = 0
for key, value in sorted(q.iteritems()):
cumulativevalue = cumulativevalue + value
# print key , value
cumul_q[key] = cumulativevalue
# print cumul_q
previous_key = None
for key, value, in sorted(cumul_q.iteritems(),reverse = True):
cumulprob = 1 - value
cumulprob1 = float(cumulprob)
if crit < cumulprob1:
continue
elif crit > cumulprob1:
print key
previous_key = key

Memory Error when exporting data to csv file

Hello I was hoping someone could help me with my college coursework, I have an issue with my code. I keep running into a memory error with my data export.
Is there any way I can reduce the memory that is being used or is there a different approach I can take?
For the course work I am given a file of 300 records about customer orders from a CSV file and then I have to export the Friday records to a new CSV file. Also I am required to print the most popular method for customer's orders and the total money raised from the orders but I have an easy plan for that.
This is my first time working with CSV so I'm not sure how to do it. When I run the program it tends to crash instantly or stop responding. Once it appeared with 'MEMORY ERROR' however that is all it appeared with. I'm using a college provided computer so I am not sure on the exact specs but I know it runs 4GB of memory.
defining count occurences predefined function
def countOccurences(target,array):
counter = 0
for element in array:
if element == target:
counter= counter + 1
print counter
return counter
creating user defined functions for the program
dataInput function used for collecting data from provided file
def dataInput():
import csv
recordArray = []
customerArray = []
f = open('E:\Portable Python 2.7.6.1\Choral Shield Data File(CSV).csv')
csv_f = csv.reader(f)
for row in csv_f:
customerArray.append(row[0])
ticketID = row[1]
day, area = datasplit(ticketID)
customerArray.append(day)
customerArray.append(area)
customerArray.append(row[2])
customerArray.append(row[3])
recordArray.append(customerArray)
f.close
return recordArray
def datasplit(variable):
day = variable[0]
area = variable[1]
return day,area
def dataProcessing(recordArray):
methodArray = []
wed_thursCost = 5
friCost = 10
record = 0
while record < 300:
method = recordArray[record][4]
methodArray.append(method)
record = record+1
school = countOccurences('S',methodArray)
website = countOccurences('W',methodArray)
if school > website:
school = True
elif school < website:
website = True
dayArray = []
record = 0
while record < 300:
day = recordArray[record][1]
dayArray.append(day)
record = record + 1
fridays = countOccurences('F',dayArray)
wednesdays = countOccurences('W',dayArray)
thursdays = countOccurences('T', dayArray)
totalFriCost = fridays * friCost
totalWedCost = wednesdays * wed_thursCost
totalThurCost = thursdays * wed_thursCost
totalCost = totalFriCost + totalWedCost + totalThurCost
return totalCost,school,website
My first attempt to writing to a csv file
def dataExport(recordArray):
import csv
fridayRecords = []
record = 0
customerIDArray = []
ticketIDArray = []
numberArray = []
methodArray = []
record = 0
while record < 300:
if recordArray[record][1] == 'F':
fridayRecords.append(recordArray[record])
record = record + 1
with open('\Courswork output.csv',"wb") as f:
writer = csv.writer(f)
for record in fridayRecords:
writer.writerows(fridayRecords)
f.close
My second attempt at writing to the CSV file
def write_file(recordArray): # write selected records to a new csv file
CustomerID = []
TicketID = []
Number = []
Method = []
counter = 0
while counter < 300:
if recordArray[counter][2] == 'F':
CustomerID.append(recordArray[counter][0])
TicketID.append(recordArray[counter][1]+recordArray[counter[2]])
Number.append(recordArray[counter][3])
Method.append(recordArray[counter][4])
fridayRecords = [] # a list to contain the lists before writing to file
for x in range(len(CustomerID)):
one_record = CustomerID[x],TicketID[x],Number[x],Method[x]
fridayRecords.append(one_record)
#open file for writing
with open("sample_output.csv", "wb") as f:
#create the csv writer object
writer = csv.writer(f)
#write one row (item) of data at a time
writer.writerows(recordArray)
f.close
counter = counter + 1
#Main Program
recordArray = dataInput()
totalCost,school,website = dataProcessing(recordArray)
write_file(recordArray)
In the function write_file(recordArray) in your second attempt the counter variable counter in the first while loop is never updated so the loop continues for ever until you run out of memory.