Updating values in an existing RRD file - rrdtool

I have an rrd file working fine under cacti, drawing a graph for me for more than a year from one data source.
Today, I discovered that the data in the rrd file is not correct. I have corrected the data source and fixed problem for the future. However, I need to divide the previous data by 2 to correct it historically as well.
Is it possible to update the values of an rrd file? If yes, how can I achieve this?

Data in the RRAs cannot be directly changed once written.
However, what you can do is to dump the data to XML (using rrdtool dump); edit it using some other method; and then reload the data into a new RRD file (using rrdtool restore). This, of course, requires you to create some method to automatically modify the values in the XML file, but this should be simpler than trying to modify the RRD directly.

Related

Can I add a new column without rewriting an entire file?

I've been experimenting with Apache Arrow. I have used the column oriented memory mapped files for many years. In the past, I've used a separate file for each column. Arrow seems to like to store everything in one file. Is there a way to add a new column without rewriting the entire file?
The short answer is probably no.
Arrow's in-memory format & libraries support this. You can add a chunked array to a table by just creating a new table (this should be zero-copy).
However, it appears you are talking about storing tables in files. None of the common file formats in use (parquet, csv, feather) support partitioning a table in this way.
Keep in mind, if you are reading a parquet file, you can specify which column(s) you want to read and it will only read the necessary data. So if your goal is only to support individual column retrieval/query then you can just build one large table with all your columns.

Google Dataprep: Save GCS file name as one of the column

I have a Dataprep flow configured. The Dataset is a GCS folder (all files from it). Target is BigQuery table.
Since data is coming from multiple files, I want to have filename as of the columns in the resulting data.
Is that possible?
UPDATE: There's now a source metadata reference called $filepath—which, as you would expect, stores the local path to the file in Cloud Storage (starting at the top-level bucket). You can use this in formulas or add it to a new formula column and then do anything you want in additional recipe steps. (If your data source sample was created before this feature, you'll need to generate a new sample in order to see it in the interface)
Full notes for these metadata fields are available here: https://cloud.google.com/dataprep/docs/html/Source-Metadata-References_136155148
Original Answer
This is not currently possible out of the box. IF you're manually merging datasets with UNION, you could first process them to add a column with the source so that it's then present in the combined output.
If you're bulk-ingesting files, that doesn't help—but there is an open feature request open that you can comment on and/or follow for updates:
https://issuetracker.google.com/issues/74386476

save geotools query result in a shape file/csv file

I have implemented GeoTools Query tutorial. Can someone tell me that how could I apply a query in two shape files at a time and save the results in a new shape file or a csv file.
or even can i save the query results of a single shape file as done in the Query Tutorial in a new shape file or csv file?
You can apply your Query to as many shapefiles as you like just loop through them and keep adding to the feature collection where you are storing the answers.
So in pseudocode:
collection = new List
for file in shapefiles:
collection.add(file.queryResult(query));
writeShapefile(collection);
You can actually use a normal Java ArrayList (or Set) to store the results in and then use a ListFeatureCollection to write them out.
FeatureCollection features = new ListFeatureCollection(schema, feats);
Caveat: All your shapefiles must have the same contents (or your Query must only extract the common attributes). See my answer to this question for an example of ShapeFile writing.

Deleting a record within a binary file?

I've been messing around with binary files the last whole week. One thing I can't figure out how to do is selectively delete a record from the binary file.
Is this possible in c++? If not, what are ways in which you can do this.
I'm contemplating, maybe creating a duplicate file without the record inside, but that would seem rather inefficient for very large data files. Any more efficient ways, or any way of duplicating a file efficiently? I''m surprised there's ways to delete files, but not selectively delete from a file.
Unlike data structures in memory, you can't pull out a record from a file. You have to:
Read the contents of the file into an in-memory data structure
Remove the relevant data from the data structure in-memory.
Write the updated data structure back to the file.
Just make sure ever record have a flag "deleted" which you set to false when you first write the record, but then change to true if you want to mark the record deleted.

Is it possible to validate the column order when uploading data from flat files using aws copy command

I'm uploading data from zipped flat files to redshift using copy command, I would like to understand if there is any way to validate that the column order of the files is correct? (for example, if fields are all varchar then the data could be uploaded to the wrong columns).
In the copy command documentation it shows that you can specify the column order, but not for flat files, but I was wondering if there are any other approaches that would allow me to check how the columns have been supplied (for example, uploading only the header row into a dummy table to check, but that doesn't seem a possibility).
You can't really do this inside Redshift. COPY doesn't provide any options to only load a specific number of rows or perform any validation.
Your best option would be to do this in the tool where you schedule the loads. You can get the first line from a compressed file easily enough (zcat < file.z|head -1) but for a file on S3 you may have to download the whole thing first.
FWIW, the process generating the load file should be fully automated in such a way that the column order can't change. If these files are being manually prepared you're asking for all sorts of trouble.