Can RRDTOOL extract value from a text file? - rrdtool

i have connected a temperature sensor in my raspberry pi, the temperature data are sent by email in text file format every two hours.
I do not want to use rrdtool directly with the temperature sensor but i want rrdtool extract these values from text file.
Is this possible?
I was looking in google i did not find any solution but i found only for extracting value from rrd file.
Thank you for your help

RRDtool does not extract data itself. You have to write a little script to extract the data and then hand the data to rrdtool for storing and graphing. Here is a little example in Perl:
#!/usr/bin/perl
use strict;
open my $textfile, '-|', 'tail','-f','/path/to/file';
while (<$textfile>){
/regex-match (\d)/ && do {
system "rrdtoo","update","data.rrd","N:$1";
}
}
ps. you can also use the RRDs module that comes with rrdtool to access rrdtool functionality directly from within perl.

Related

Check_Mk RRD File Average Value Mismatch

I am trying to pull the data from the Check Mk server RRD File for CPU and Memory. From that, am trying to find the MAX and Average value for the particular Host for 1 month period. For the maximum value, I have fetched the file from the Check_mk server using the RRD fetch command and I get the exact value when I compared the output in the check_mk graph but when I try to do the same for the Average value I get the wrong output which does not match the Check_mk graph value and RRD File raw value. Kindly refer to the attached images where I have verified the value for average manually by fetching the data but it shows the wrong output.
Hello #Steve shipway,
Please find the requested data.
1)Structure of RRD File. Attached the image.
2)We are not generating the graph from the Check_mk . We are generating the RRD File using rrdtool dump CPU_utilization.xml > /tmp/CPU_utilization1.xml rrdtool fetch CPU_utilization_user.rrd MAX -r 6h -s Starting Date-e ending date.
Share
enter image description here

ClientError: Unable to parse csv: rows 1-1000, file

I've looked at the other answers to this issue and none of them are helping me. I am trying to run a simple random cut forest algorithm. I have a small data set of IPs which have been stripped down to only have numbers. I still get this error. It only has one column of these numbers. The CSV looks like this:
176162144
176862141
176762141
176761141
176562141
Have you looked at this sample notebook, and tried using it with your own data?
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/random_cut_forest/random_cut_forest.ipynb
In a nutshell, it reads the CSV file with Pandas and trains the model like this:
rcf = RandomCutForest(role=execution_role,
train_instance_count=1,
train_instance_type='ml.m4.xlarge',
data_location='s3://{}/{}/'.format(bucket, prefix),
output_path='s3://{}/{}/output'.format(bucket, prefix),
num_samples_per_tree=512,
num_trees=50)
# automatically upload the training data to S3 and run the training job
rcf.fit(rcf.record_set(taxi_data.value.as_matrix().reshape(-1,1)))
You didn't say what your use case was, but as you're working with IP addresses, you may find the IP Insights built-in algorithm useful too: https://docs.aws.amazon.com/sagemaker/latest/dg/ip-insights.html
I was utilizing the sample notebook Julien Simon mentioned earlier, but at some point the data was ending up as strings! The funny thing about RCF algorithms is they have to run on integer data.
What I did is I made sure to cast the array as an int array as a double check and vallah! It worked. I am at loss over how the data ended up in a string format but alas, that was the issue. Simple solution.

sd_journal_send to send binary data. How can I retrieve the data using journalctl?

I'm looking at systemd-journal as a method of collecting logs from external processors. I'm very interested in it's ability to collect binary data when necessary.
I'm simply testing and investigating journal right now. I'm well aware there are other, probably better, solutions.
I'm logging binary data like so:
// strData is a string container containing binary data
strData += '\0';
sd_journal_send(
"MESSAGE=test_msg",
"MESSAGE_ID=12345",
"BINARY=%s", strData.c_str(),
NULL);
The log line shows up when using the journalctl tool. I can find the log line like this from the terminal:
journalctl MESSAGE_ID=12345
I can get the binary data of all logs in journal like so from the terminal:
journalctl --field=BINARY
I need to get the binary data to a file so that I can access from a program and decode it. How can I do this?
This does not work:
journalctl --field=BINARY MESSAGE_ID=12345
I get there error:
"Extraneous arguments starting with 'MESSAGE_ID=1234567890987654321"
Any suggestions? The documentation on systemd-journal seems slim. Thanks in advance.
You just got the wrong option. See the docs for:
-F, --field=
Print all possible data values the specified field can take in all entries of the journal.
vs
--output-fields=
A comma separated list of the fields which should be included in the output.
You also have to specify the plain output format (-o cat) to get the raw content:
journalctl --output-fields=BINARY MESSAGE_ID=12345 -o cat

PDI - Multiple file input based on date in filename

I'm working with a project using Kettle (PDI).
I have to input multiple file of .csv or .xls and insert it into DB.
The file name are AAMMDDBBBB, where AA is code for city and BBBB is code for shop. MMDD is date format like MM-DD. For example LA0326F5CA.csv.
The Regexp I use in the Input file steps look like LA.\*\\.csv or DT.*\\.xls, which is return all files to insert it into DB.
Can you indicate me how to select the files the file just for yesterday (based on the MMDD of the file name).
As you need some "complex" logic in your selection, you cannot filter based only on regexp. I suggest you first read all filenames, then filter the filenames based on their "age", then read the file based on the selected filenames.
In detail:
Use the Get File Names step with the same regexp you currently use (LA.*\.csv or DT.*\.xls). You may be more restrictive at that stage with a Regexp like LA\d\d\d\d.....csv, to ensure MM and DD are numbers, and DDDD is exactly 4 characters.
Filter based on the date. You can do this with a Java Filter, but it would be an order of magnitude easier to use a Javascript Script to compute the "age" of you file and then to use a Filter rows to keep only the file of yesterday.
To compute the age of the file, extract the MM and DD, you can use (other methods are available):
var regexp = filename.match(/..(\d\d)(\d\d).*/);
if(regexp){
var age = new Date() - new Date(2018, regexp[1], regexp[2]);
age = age /1000 /60 /60 /24;
};
If you are not familiar with Javascript regexp: the match will test
the filename against the regexp and keep the values of the parenthesis
in an array. If the test succeed (which you must explicitly check to
avoid run time failure), use the values of the match to compute the
corresponding date, and subtract the date of today to get the age.
This age is in milliseconds, which is converted in days.
Use the Text File Input and Excel Input with the option Accept file from previous step. Note that CSV Input does not have this option, but the more powerful Text File Input has.
well I change the Java Filter with Modified Java Script Value and its work fine now.
Another question, how can I increase the Performance and Speed of my current transformation(now I have 2 trans. for 2 cities)? My insert update make my tranformation to slow and need almost 1hour and 30min to process 500k row of data with alot of field(300mb) and my data not only this, if is it work more fast and my company like to used it, im gonna do it with 10TB of data/years and its alot of trans and rows. I need sugestion about it

Export stata graph (data) to Excel?

Is there a simple way to export the "underlying" data of a Stata graph in order to reproduce that graph in MS Excel? Imagine you create a ROC curve using roctab y yhat, graph and you want to reproduce that graph in Excel.
I assume that you do not have access to the actual raw data that was used to compile the .gph in the first place, and somehow want to back engineer the .gph file... then, eek, good luck!
If you do however have the access to the data originally used then with new command available in Stata 13, You can use the function putexcel command
A more detailed description of the putexcel command can be found here stata press releasse on exporting tables to excel
The data in the .gph file are stored in the serset format between the and tags. There's no utility I know of that will parse the serset information, but it is very similar to Stata's dta file (v115 and below). I wrote up the basic file format information here. The Python library pandas has code for reading/writing dta files so with those you could probably create your own serset reader/writer.