I am trying to write an HDF5 file from C++. The file basically contains a large timeseries matrix in the following format
TimeStamp Property1 Property2
I have managed to write the data successfully, I created a dset and used the H5Dwrite function.
Now my question is how do I create a file header, in other words, if I want to write the following array to the file...
['TimeStamp', 'Property1', 'Property2']
...and tag it to the columns for ease of later use ( I am planning to analyze the matrix in Python). How to do that?
I tried to use H5Dwrite to write a string array but failed, I guess it wanted consistent datatypes, so it just wanted floats, which is the datatype for my data. Then I read about this metadata thing, but I am a bit lost as to how to use it? Any help would be much appreciated.
A related side question is can the first row of a matrix be a string and the others rows contain doubles?
Clean solution(s)
If you store your data as a 1D array of a compound datatype with members TimeStamp, Property1, Property2, etc. then the field names will be stored as metadata and it should be easy to read in Python.
I think there is another clean option but I will just mention it since I never used it myself: HDF5's Table Interface. Read the docs to see if you would prefer to use that.
Direct answers to your question
Now the dirty options: you could add string attributes to your existing dataset. There are multiple ways to do that. You could have a single string attribute with all the field names separated by semicolons, or one attribute per column. I don't recommend it since that would be terribly non-standard.
A related side question is can the first row of a matrix be a string and the others rows contain doubles?
No.
Example using a compound datatype
Assuming you have a struct defined like this:
struct Point { double timestamp, property1, property2; };
and a vector of Points:
std::vector<Point> points;
as well as a dataset dset and appropriate memory and file dataspaces, then you can create a compound datatype like this:
H5::CompType type(sizeof(DataPoint));
type.insertMember("TimeStamp", HOFFSET(Point, timestamp), H5::PredType::NATIVE_DOUBLE);
type.insertMember("Property1", HOFFSET(Point, property1), H5::PredType::NATIVE_DOUBLE);
type.insertMember("Property2", HOFFSET(Point, property2), H5::PredType::NATIVE_DOUBLE);
and write data to file like this:
dset.write(&points[0], type, mem_space, file_space);
Related
Multi-index column csv is
Its size is (8, 8415).
This csv file was made from pandas multi-index dataframe (python).
Its columns are [codes X financial items].
codes are
financial items are
How can I use this csv file to use its year(2014, 2015, ....) as index and codesXfinancial items as multi columns?
What kind of output you want is unclear. There are not many libraries to imitate pandas in C++. A very messy, convoluted and inelegant way of doing it is declaring a structure and then put it into a list. Something like,
struct dataframe{
double data;
int year;
int code;
char item[]; //or you can use "string item;"
}
Make a list of this structure either by a custom class or C++ native "list" class.
If you can provide a more detailed explanation of what kind of data structure you want in the program or what do you want to do with it, I would try to provide a better solution.
I am simulating a DELETE command from SQL in c++. I've tried all the possible data types to temporary store the data from a certain table( a .bin file in my case), but none of them worked.
If I store the entire file at once in a char buffer[5000], the text is safe, but the integers somehow are stored on the address, and in the buffer they look like " }+;-." and so on.
If I select them one by one and concatenate each value to a similar buffer with strncat, and with itoa() if needed, I end up having them perfect but I can't navigate in it using specific data-type size.
I would apreciate very much a suggestion into this. I've literally searched all the internet by now. Thank you!
I've come across a problem and have zero idea how to come about it. After a couple of days of searching for an answer I decided to make my own thread.
I'm trying to create a simple data base with a structure of data dictated by a configuration file. Let me give an example:
The program initially has a structure of a couple of strings:
struct Data
{
string name, surname, id;
};
The config file:
VAR NAME TYPE MAX LENGTH
NAME STRING 30
SURNAME STRING 30
ID INT 10
The program is supposed to open the file and then check the type of variables and if needed change the struct Data accordingly, or create a new one. The part of opening the file and navigating in it is easy, but I have no clue how to check the types.
I really hope I described the problem clearly.
Thanks for any hints!
Store the data as strings. Implement a way for the program to convert strings to the data you need. You can convert string to integers, doubles, or floats for example.
RocksDb: Multiple values per key (c++)
what i am trying to do
I am trying to adapt my simple blockchain implementation to save the blockchain to the hard drive periodically and so i looked info different db solutions. i decided to use RocksDb due to its ease of use and good documentation & examples. i read through the documentation and could not figure out how to adapt it to my use case.
i have a class Block
`
class Block {
public:
string PrevHash;
private:
blockheader header; // The header of the block
uint32_t index; // height of this block
std::vector<tx_data> transactions; // All transactions in the block in a vector
std::string hash; // The hash of the block
uint64_t timestamp; // The timestamp this block was created by the node
std::string data; // Extra data that can be appended to blocks (for example text or a smart contract)
// - The larger this feild the higher the fee and the max size is defined in config.h
};
which contains a few variables and a vector of a struct tx_data. i want to load this data into a rocksdb database.
what i have tried
after google failed to return any results on storing multiple values with one keypair i decided i would have to just enclose each block data in 0xa1 at the beginning then at the end 0x2a
*0x2a*
header
index
txns
hash
timestamp
data
*0x2a*
but decided there was surely a simpler way. I tried looking at the code used by turtlecoin, a currency that uses rocksdb for its database but the code there is practically indecipherable, i have heard about serialization but there seems to be little info out there on it.
perhaps i am misunderstanding the use of a DB?
You need to serialization it. Serialization is the process of taking a structured set of data and making it into one string, number or vector of bytes that can then be de-serialized later on back into that struct. One method would be to take the hash of the block and use it as the key in the db then crate a new struct which does not contain the hash. Then write a function that takes a Block struct and a path and constructs a BlockNoHash struct and saves it. Then another function to read a block from a hash and spit out a Block Struct. Very basically you could split each field with a charector which will never occur in the data (eg ` or |), though this means if one piece of the data is corrupted then you cant get any of the other data
There are two related questions here.
One is: how do you store complex data -- more than just a simple integer or string -- within a key-value store like RocksDB. As Leo says, you need to serialize them.
Rather than writing your own code, the typical easier way is to use a framework like Protobuf or Thrift to generate code to translate between your in-memory structures and a flat bytes representation suitable to store in a database (or send over the network.)
A related question, from the title: how do you store multiple values per key?
There are two main options:
Use a compound key, that distinguishes the various values. By walking a key prefix you can find all the values in a set of related keys. This is better if the values get very large or if you want to find and update them independently.
Or, make the value for a single key actually be a compound object that includes several inner values. This is easiest if you always want to fetch all the sub-values in a single operation.
I have been trying to figure out what the best way to do this would be, but haven't quite found an answer yet. I have a float array full of data collected from an inertial sensor and I would like to put it into the right format and output it to a CSV file. I'm using an mbed microcontroller with a local file system to store the file. It's the part about getting the format right that is confusing me at the minute.
I'd like my gyroscope/accelerometer values to be displayed in rows such as:
gx1, gy1, gz1, ax1, ay1, az1
gx2, gy2, gz2, ax2, ay2, az2
gx3, gy3, gz3, ax3, ay3, az3
I think these values first need to be converted to char before being written to the file, so I will need to do that and store them in a new array of type char. That's where I get confused, because I don't just want to copy the data into this new array all at once (thinking of using a for loop and spritf()) but I also want it to be formatted as displayed above, with the right breaks between rows.
The function that writes the content of the array to the file takes the array, its types size, the array size and the file object.
fwrite(converted_array, sizeof(char), sizeof(converted_array), FileObject);
What would be the best way to make sure that the array content is formatted like I want it to be?