RRDTool: RRD file not updating - rrdtool

My RRD file is not updating, what is the reason?
The graph shows the legend with: -nanv
I created the RRD file using this syntax:
rrdtool create ups.rrd --step 300
DS:input:GAUGE:600:0:360
DS:output:GAUGE:600:0:360
DS:temp:GAUGE:600:0:100
DS:load:GAUGE:600:0:100
DS:bcharge:GAUGE:600:0:100
DS:battv:GAUGE:600:0:100
RRA:AVERAGE:0.5:12:24
RRA:AVERAGE:0.5:288:31
Then I updated the file with this syntax:
rrdtool update ups.rrd N:$inputv:$outputv:$temp:$load:$bcharge:$battv
And graphed it with this:
rrdtool graph ups-day.png
-t "ups "
-s -1day
-h 120 -w 616
-a PNG
-cBACK#F9F9F9
-cSHADEA#DDDDDD
-cSHADEB#DDDDDD
-cGRID#D0D0D0
-cMGRID#D0D0D0
-cARROW#0033CC
DEF:input=ups.rrd:input:AVERAGE
DEF:output=ups.rrd:output:AVERAGE
DEF:temp=ups.rrd:temp:AVERAGE
DEF:load=ups.rrd:load:AVERAGE
DEF:bcharge=ups.rrd:bcharge:AVERAGE
DEF:battv=ups.rrd:battv:AVERAGE
LINE:input#336600
AREA:input#32CD3260:"Input Voltage"
GPRINT:input:MAX:" Max %lgv"
GPRINT:input:AVERAGE:" Avg %lgv"
GPRINT:input:LAST:"Current %lgv\n"
LINE:output#4169E1:"Output Voltage"
GPRINT:output:MAX:"Max %lgv"
GPRINT:output:AVERAGE:" Avg %lgv"
GPRINT:output:LAST:"Current %lgv\n"
LINE:load#FD570E:"Load"
GPRINT:load:MAX:" Max %lg%%"
GPRINT:load:AVERAGE:" Avg %lg%%"
GPRINT:load:LAST:" Current %lg%%\n"
LINE:temp#000ACE:"Temperature"
GPRINT:temp:MAX:" Max %lgc"
GPRINT:temp:AVERAGE:" Avg %lgc"
GPRINT:temp:LAST:" Current %lgc"

You will need at least 13 updates, each 5min apart (IE, 12 PDP (primary data points)) before you can get a single CDP (consolidated data point) written to your RRAs, enabling you to get a data point on the graph. This is because your smallest resolution RRA is a Count 12, meaning you need 12 PDP to make one CDP.
Until you have enough data to write a CDP, you have nothing to graph, and your graph will always have unknown data.
Alternatively, add a smaller resolution RRA (maybe Count 1) so that you do not need to collect data for so long before you have a full CDP.

The update script needs to be run at exactly the same interval as defined in your database.
I see it has a step value of 300 so the database should be updated every 5 minutes.
Just place you update script in a cron job (you can also do it for your graph script)
For example,
sudo crontab -e
If run for the first time choose your favorite editor (I usually go with Vim) and add the full path location of your script and run it every 5 minutes. So add this (don't forget to rename the path):
*/5 * * * * /usr/local/update_script > /dev/null && /usr/local/graph_script > /dev/null
Save it, and wait a couple of minutes. I usually redirect the output to /dev/null in case of the output that can be generated by a script. So if a script that will be executed gives an output crontab will fail and send a notification.

Related

Create cron in chef

I want to create Cron in chef witch they verify size of the log if it's > 30mb it will delete it, here is my code:
cron_d 'ganglia_tomcat_thread_max' do
hour '0'
minute '1'
command "rm - f /srv/node/current/app/log/simplesamlphp.log"
only_if { ::File.size('/srv/node/current/app/log/simplesamlphp.log').to_f / 1024000 > 30 }
end
Can you help me in it please
Welcome to Stackoverflow!
I suggest you to go with existing tools like "logrotate". There is a chef cookbook available to manage logrotate.
Please note, that "cron" in chef manages the system cron service which runs independently of chef. You'll have to do the file size check within the "command". It's also better to use the cron_d resource as documented here.
In the way you create cron_d resource it will add cron task only when your log file has size greater than 30mb. In all other cases cron_d will be not created.
You can check that ruby code
File.size('file').to_f / 2**20
to get the file size in Megabytes - there is a slight difference in the result I believe that is more correct.
so you can go wirh 2 solutions for your specific case
create new cron_d resource when log file is less than 30 mb to remove existing cron and provision your node periodically
move the check of the file size in the command with bash and glue with && - in that case file will be dated only if size is greater than 30mb. something like that
du -k file.txt | cut -f1
will return size of the file in bytes
To me also correct way to to that is to use logrotate service and chef recipe for that.

Fluentd S3 output plugin not recognizing index

I am facing problems while using S3 output plugin with fluent-d.
s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
using %index at end never resolves to _0,_1 . But i always end up with log file names as
sflow_.log
while i need sflow_0.log
Regards,
Can you paste your fluent.conf?. It's hard to find the problem without full info. File creations are mainly controlled by time slice flag and buffer configuration..
<buffer>
#type file or memory
path /fluentd/buffer/s3
timekey_wait 1m
timekey 1m
chunk_limit_size 64m
</buffer>
time_slice_format %Y%m%d%H%M
With above, you create a file every minute and within 1min if your buffer limit is reached or due to any other factor another file is created with index 1 under same minute.

Write data to an RRD at irregular intervals

I'm trying to find out if I can store values captured at irregular intervals into an RRD.
I have a script which connects to an ActiveMQ server subscribes to a queue or topic and looks at the message header time stamp, compares it with Time.now to give me a delta.
The data I get from my script is as below;
000000.681 Time Delta
000000.793 Time Delta
000000.583 Time Delta
000001.994 Time Delta
The issue I face is that messages from the ActiveMQ don't necessarily come in at a 'regular interval' (e.g 1/sec, 1/2sec) They could come in at peak times as 5 a second, and quite times 1 every 10 seconds.
I'd like to be able to capture the output into an RRD so I can graph against it but having a look around on the internet it's not clear is this can be done, or if I'd be better off using a.n.other database/store to capture the data into.
The eventual output I'd like would be a graph showing the time delta for each message.
It looks like I could set the RRD using --step to 1 second, and the hart beat to 2 seconds having had a read of the docs.
I found a couple of posts here and here which talk about being careful with the intervals and the fact my data might be averaged, smoothed or otherwise messed about with when written to the RRD. But nothing I've found online has a similar usage case to mine so its a bit hard to know where I should be looking. I'd like my data to be stored as point for each message received.
I have a couple of RRD's setup for testing; one is taking the AVERAGE the other is taking the LAST to produce some graphs. My heartbeat is set for 100 seconds, but the interval is set to 1. I'm now getting data which looks correct. I'm also guessing that empty spaces in graph from the LAST RRA are due to my data coming in slower that 1 per second?
I'll post my create code & output as an answer.
rrdtool will always store data at regular intervals. As data is handed over to rrdtool, it first gets re-sampled to the --step interval. and then further consolidated to the intervals setup in the RRAs.
The exact arrival time of the data (to the millisecond) is taken into account as the re-sampling takes place ...
If two data points are further apart than specified by mrhb, the data is considered non-continuous and rrdtool will store 'unknown' for the interval affected.
I ended up making two sets of RRD's to experiment with;
rrdtool create test1.rrd \
--step '1' \
'DS:ds0:GAUGE:5:0:U' \
'RRA:AVERAGE:0.5:1:86400' \
'RRA:MAX:0.5:1:86400' \
'RRA:AVERAGE:0.5:60:10080' \
'RRA:MAX:0.5:60:10080' \
'RRA:AVERAGE:0.5:120:21600' \
'RRA:MAX:0.5:120:21600' \
'RRA:AVERAGE:0.5:300:105120' \
'RRA:MAX:0.5:300:105120'
and
rrdtool create test.rrd \
--step '1' \
'DS:ds0:GAUGE:5:0:U' \
'RRA:AVERAGE:0.5:1:86400' \
'RRA:LAST:0.5:1:86400' \
'RRA:AVERAGE:0.5:60:10080' \
'RRA:LAST:0.5:60:10080' \
'RRA:AVERAGE:0.5:120:21600' \
'RRA:LAST:0.5:120:21600' \
'RRA:AVERAGE:0.5:300:105120' \
'RRA:MAX:0.5:300:105120'
Which allows me to store;
1sec, archive is kept for 1day back
1min, archive is kept for 7day back
2min, archive is kept for 30day back
5min, archive is kept for 1year back
Which makes these nice graphs;
The graphs where made in PHP with the following code;
<?php
$opts = array(
'--width', '600',
'--height', '100',
'--title', 'Avg Time Delta xxxxxxxxxx (Last 1 Hr)',
'--vertical-label', 'Time Delta',
'--watermark', 'xxxxxxxxxx',
'--start', 'end-1h',
'DEF:out=test.rrd:ds0:AVERAGE',
'DEF:max=test.rrd:ds0:MAX',
'AREA:out#9966FF:Avg Time Delta',
'LINE:max#996600:Max Time Delta',
);
$ret = rrd_graph("graphs/1hr-graph.png", $opts);
if( !is_array($ret) )
{
$err = rrd_error();
echo "rrd_graph() ERROR: $err\n";
}
echo '<img src="http://server/graphs/1hr-graph.png">';
echo '<BR>';
?>
<?php
$opts = array(
'--width', '600',
'--height', '100',
'--title', 'Last Time Delta xxxxxxxxxx (Last 1 Hr)',
'--vertical-label', 'Time Delta',
'--watermark', 'xxxxxxxxxx',
'--start', 'end-1h',
'DEF:avg=test1.rrd:ds0:AVERAGE',
'DEF:last=test1.rrd:ds0:LAST',
'AREA:avg#99AAFF:Avg Time Delta',
'LINE:last#99AA00:Last Time Delta',
);
$ret = rrd_graph("graphs/1hr-last.png", $opts);
if( !is_array($ret) )
{
$err = rrd_error();
echo "rrd_graph() ERROR: $err\n";
}
echo '<img src="http://server/graphs/1hr-last.png">'
?>
From my own sanity checking and watching the data in realtime it looks like both of those graphs are correct, but behave in slightly different ways. When the data feed which this is monitoring is quite and I'm only getting 1 mesg every 10 sec I get a lot of gaps in the LAST graphs whereas the AVERAGE graphs are smoothed out to fill the gaps. I also tried with setting another RRD to ABSOLUTE but the graphs for that looks 'wrong' and the times are all below 1.0.
So it looks like I can feed my RRD at whatever interval I like from my script. It looks like the RRD will sample my data by its defined interval (In my case 1 sec) and then do what it needs to do based on the way I save it (Gauge, Absolute etc) With my heart-beat set to 100 I should always receive some data before that 100 sec times-out - thus avoiding NAN entries in my database.
At the moment I can't tell how well behaved this config will be during times of disruption (e.g delayed messages from the AMQ server) I will try and run some tests when I get some spare time and report back with anything significant.

How to handle FTE queued transfers

I have fte monitor with a '*.txt' as trigger condition, whenever a text file lands at source fte transfer file to destination, but when 10 files land at source at a time then fte is triggering 10 transfer request simultaneously & all the transfers are getting queued & stuck.
Please suggest how to handle this scenarios
Ok, I have just tested this case:
I want to transfer four *.xml files from directory right when they appear in that directory. So I have monitor set to *.xml and transfer pattern set to *.xml (see commands bellow).
Created with following commands:
fteCreateTransfer -sa AGENT1 -sm QM.FTE -da AGENT2 -dm QM.FTE -dd c:\\workspace\\FTE_tests\\OUT -de overwrite -sd delete -gt /var/IBM/WMQFTE/config/QM.FTE/FTE_TEST_TRANSFER.xml c:\\workspace\\FTE_tests\\IN\\*.xml
fteCreateMonitor -ma AGENT1 -mn FTE_TEST_TRANSFER -md c:\\workspace\\FTE_tests\\IN -mt /var/IBM/WMQFTE/config/TQM.FTE/FTE_TEST_TRANSFER.xml -tr match,*.xml
I got three different results depending on configuration changes:
1) just as commands are, default agent.properties:
in transfer log appeared 4 transfers
all 4 transfers tryed to transfer all four XML files
3 of them with partial success because agent could't delete source file
with success that transfered all files and deleted all source files
Well, with transfer type File to File, final state is in fact ok - four files in destination directory because the previous file are overwritten. But with File to Queue I got 16 messages in destination queue.
2) fteCreateMonitor command modified with parameter "-bs 100", default agent.properties:
in transfer log , there is only one transfer
this transfer is with partial success result
this transfer tryed to transfer 16 files (each XML four times)
agent was not able to delete any file, so source files remained in source directory
So in sum I got same total amount of files transfered (16) as in first result. And not even deleted source files.
3) just as commands are, agent.properties modified with parameter "monitorMaxResourcesInPoll=1":
in transfer log , there is only one transfer
this transfer is with success result
this transfer tryed to transfer four files and succeeded
agent was able to delete all source files
So I was able to get expected result only with this settings. But I am still not sure about appropriateness of the monitorMaxResourcesInPoll parameter set to "1".
Therefore for me the answer is: add
monitorMaxResourcesInPoll=1
to agent.properties. But this is in collision with other answers posted here, so I am little bit confused now.
tested on version 7.0.4.4
Check the box that says "Batch together the file transfers when multiple trigger files are found in one poll interval" (screen three).
Make sure that you set the maxFilesForTransfer in the agent.properties file to a value that is large enough for you, but be careful as this will affect all transfers.
You can also set monitorMaxResourcesInPoll=1 in the agent.properties file. I don't recommend this for 2 reasons: 1) it will affect all monitors 2) it may make it so that you can never catch up on all the files you have to transfer depending on your volume and poll interval.
Set your "Batch together the file transfers..." to a value more than 10:
Max Batch Size = 100

Limiting the lifetime of a file in Python

Helo there,
I'm looking for a way to limit a lifetime of a file in Python, that is, to make a file which will be auto deleted after 5 minutes after creation.
Problem:
I have a Django based webpage which has a service that generates plots (from user submitted input data) which are showed on the web page as .png images. The images get stored on the disk upon creation.
Image files are created per session basis, and should only be available a limited time after the user has seen them, and should be deleted 5 minutes after they have been created.
Possible solutions:
I've looked at Python tempfile, but that is not what I need, because the user should have to be able to return to the page containing the image without waiting for it to be generated again. In other words it shouldn't be destroyed as soon as it is closed
The other way that comes in mind is to call some sort of an external bash script which would delete files older than 5 minutes.
Does anybody know a preferred way doing this?
Ideas can also include changing the logic of showing/generating the image files.
You should write a Django custom management command to delete old files that you can then call from cron.
If you want no files older than 5 minutes, then you need to call it every 5 minutes of course. And yes, it would run unnecessarily when there are no users, but that shouln't worry you too much.
Ok that might be a good approach i guess...
You can write a script that checks your directory and delete outdated files, and choose the oldest file from the un-deleted files. Calculate how much time had passed since that file is created and calculate the remaining time to deletion of that file. Then call sleep function with remaining time. When sleep time ends and another loop begins, there will be (at least) one file to be deleted. If there is no files in the directory, set sleep time to 5 minutes.
In that way you will ensure that each file will be deleted exactly 5 minutes later, but when there are lots of files created simultaneously, sleep time will decrease greatly and your function will begin to check each file more and more often. To aviod that you add a proper latency to sleep function before starting another loop, like, if the oldest file is 4 minutes old, you can set sleep to 60+30 seconds (adding all time calculations 30 seconds).
An example:
from datetime import datetime
import time
import os
def clearDirectory():
while True:
_time_list = []
_now = time.mktime(datetime.now().timetuple())
for _f in os.listdir('/path/to/your/directory'):
if os.path.isfile(_f):
_f_time = os.path.getmtime(_f) #get file creation/modification time
if _now - _f_time < 300:
os.remove(_f) # delete outdated file
else:
_time_list.append(_f_time) # add time info to list
# after check all files, choose the oldest file creation time from list
_sleep_time = (_now - min(_time_list)) if _time_list else 300 #if _time_list is empty, set sleep time as 300 seconds, else calculate it based on the oldest file creation time
time.sleep(_sleep_time)
But as i said, if files are created oftenly, it is better to set a latency for sleep time
time.sleep(_sleep_time + 30) # sleep 30 seconds more so some other files might be outdated during that time too...
Also, it is better to read getmtime function for details.