I have an RRD DB which I have been writing power consumption data too, however I recently found it is 10 times too large.
Is there an easy way to rewrite all of the values in it?
Obviously I could export it as XML and modify but that is very tedious.
If by '10 times too large' you simply mean that the RRAs are too long, then you can use the command 'rrdtool resize rrdfile.rrd 1 shrink 100' to shrink RRA number 1 by 100 rows in the file rrdfile.rrd (take a backup first!). Note that you'll have to run this for each RRA that needs to be resized; use 'rrdtool info' to find out which RRAs are defined.
See the 'rrdtool help resize' for more details, or see the manual page.
If you want to add or remove DSs or add/remove RRAs entirely, then the only way to do it is to export to XML, modify the XML, and reimport. There is an 'rrdmerge' utility in the utils directory of v2.23beta of Routers2 that can help with other more drastic changes.
Related
What I'm trying to do is to deposit into HDFS blocks of size of 128MB I've been trying several processors but can't get the good one or I haven't identify the correct property:
This is how prety much the flow looks like:
Right now I'm using PutParquet but this processor doesn't have a property to do that
The previous processor is a MergeContent and this is the configuration
and on the SplitAvro I have next configuration
Hope someone can help I'm really stuck trying to do this.
You shouldn't need the SplitAvro or ConvertAvroToJSON, if you use MergeRecord instead you can supply an AvroReader and JsonRecordSetWriter and it will do the conversion for you. If you know the approximate number of records that will fit in an HDFS block, you can set that as the Maximum Number of Entries and also the Max Group Size. Keep in mind those are soft limits though, so you might want to set it to something safer like 100MB.
When you tried with your flow from the description, what did you observe? Were the files still too big, or did it not seem to obey the min/max limits, etc.?
Say I have a BigQuery table that contains 3M rows, and I want to export it to gcs.
What I do is standard bq extract <flags> ... <project_id>:<dataset_id>.<table_id> gs://<bucket>/file_name_*.<extension>
I am bound by a limit on the number of rows a file (part) can have. Is there a way to set a hard limit to the size of a file part?
For example, If I want each partition not to be above 10Mb for example, or even better, to set the maximum number of rows allowed to go in a file part? The documentation doesn't seem to mention any flags for this purpose.
You can't do it with BigQuery extract API.
But you can script it (perform an export of thousands of row in a loop) but you will have to pay for the processed data (the extract is free!). You can also set up a Dataflow job for this (but it's also not free!).
We need to create a graph with top 10 items, which will change from time to time, for example - top 10 processes consuming CPU or any other top 10 items, we can generate values for on the monitored server, with possibility to have names of the items on the graph.
Please tell me, is there any way to store this information using rrdtool?
Thanks
If you want to store this kind of information with rrdtool, you will have to create a separate rrd database for each item, update them accordingly and finally generate charts picking the 10 'top' rrd files ...
In other words, quite a lot of the magic has to happen in the script you write around rrdtool ... rrdtool will take care of storing the time series data ...
I am running a NetLogo model in BehaviorSpace each time varying number of runs. I have turtle-breed pigs, and they accumulate a table with patch-types as keys and number of visits to each patch-type as values.
In the end I calculate a list of mean number of visits from all pigs. The list has the same length as long as the original table has the same number of keys (number of patch-types). I would like to export this mean number of visits to each patch-type with BehaviorSpace.
Perhaps I could write a separate csv file (tried - creates many files, so lots of work later on putting them together). But I would rather have everything in the same file output after a run.
I could make a global variable for each patch-type but this seems crude and wrong. Especially if I upload a different patch configuration.
I tried just exporting the list, but then in Excel I see it with brackets e.g. [49 0 31.5 76 7 0].
So my question Q1: is there a proper way to export a list of values so that in BehaviorSpace table output csv there is a column for each value?
Q2: Or perhaps there is an example of how to output a single csv that looks exactly as I want it from BehaviorSpace?
PS: In my case the patch types are costs. And I might change those in the future and rerun everything. Ideally I would like to have as output: a graph of costs vs frequency of visits.
Thanks
If the lists are a fixed length that doesn't vary from run to run, you can get the items into separate columns by using one metric for each item. So in your BehaviorSpace experiment definition, instead of putting mylist, put item 0 mylist and item 1 mylist and so on.
If the lists aren't always the same length, you're out of luck. BehaviorSpace isn't flexible that way. You would have to write a separate program (in the programming language of your choice, perhaps NetLogo itself, perhaps an Excel macro, perhaps something else) to postprocess the BehaviorSpace output and make it look how you want.
I have downloaded yago.n3 dataset
However for testing I wish to work on a smaller version of the dataset (as the dataset is 2 GB) and even though i make a small change it takes me a lot of time to debug.
Therefore, I tried to copy a small portion of the data and create a separate file, however this did not work and threw lexical errors.
I saw the earlier posts, however the earlier post is about big datasets, whereas I am searching for smaller ones.
Is there any means by which I may obtain a smaller amount of the same dataset?
If you have an RDF parser at hand to read your yago.n3 file, you can parse it and write on a separate file as many RDF triples as you want/need for your smaller dataset to run your experiments with.
If you find some data in N-Triples format (i.e. one RDF triple per line) you can just take as many line as you want and make your dataset as small as you want: head -n 10 filename.nt would give you a tiny dataset of 10 triples.