Runtime Error in HDF5 file manipulation - c++

I was trying a program where I'll convert an array of structures to byte array and then save them to hdf5 dataset multiple times. (Dataset has dimension of 100, so Ill do the write operation 100 times). I dont have any problems in converting structure to byte array , I seem to run into problem when I try to select the hyperslab where I need to write data in the dataset. I am new to hdf5. Please help me with this problem.
#include "stdafx.h"
#include "h5cpp.h"
#include <iostream>
#include <conio.h>
#include <string>
#ifndef H5_NO_NAMESPACE
using namespace H5;
#endif
using std::cout;
using std::cin;
using std::string;
const H5std_string fName( "dset.h5" );
const H5std_string dsName( "dset" );
struct MyStruct
{
int x[1000],y[1000];
double z[1000];
};
int main()
{
try
{
MyStruct obj[10];
char* totalData;
char* inData;
hsize_t offset[1],count[1];
H5File file("sample.h5", H5F_ACC_TRUNC);
StrType type(PredType::C_S1,100*sizeof(obj));
Group *myGroup = new Group(file.createGroup("\\myGroup"));
hsize_t dim[] = {100};
DataSpace dSpace(1,dim);
DataSet dSet = myGroup->createDataSet("dSet", type, dSpace);
for(int m = 0; m < 100 ; m++)
{
for(int j = 0 ; j < 10 ; j++)
{
for(int i = 0 ; i < 1000 ; i++) // some random values stored
{
obj[j].x[i] = i*13 + i*19;
obj[j].y[i] = i*37 - i*18;
obj[j].z[i] = (i + 1) / (0.4 * i);
}
}
totalData = new char[sizeof(obj)]; // converting struct to byte array
memcpy(totalData, &obj, sizeof(obj));
cout<<"Start Write.\n";
cout<<"Total Size : "<<sizeof(obj)/1000<<"KB\n";
//Exception::dontPrint();
hsize_t dim[] = { 1 }; //I think am screwing up between this line and following 5 lines
DataSpace memSpace(1, dim);
offset[0] = m;
count[0] = 1;
dSpace.selectHyperslab(H5S_SELECT_SET, count, offset);
dSet.write(totalData, type, memSpace, dSpace);
cout<<"Write Done.\n";
cout<<"Read Start.\n";
inData = new char[sizeof(obj)];
dSet.read(inData, type);
cout<<"Read Done\n";
}
delete myGroup;
}
catch(Exception e)
{
e.printError();
}
_getch();
return 0;
}
The Output I get is,
And when I use H5S_SELECT_APPEND instead of H5S_SELECT_SET, the output says
Start Write.
Total Size : 160KB
HDF5-DIAG: Error detected in HDF5 (1.8.12) thread 0:
#000: ..\..\src\H5Shyper.c line 6611 in H5Sselect_hyperslab(): unable to set hyperslab selection
major: Dataspace
minor: Unable to initialize object
#001: ..\..\src\H5Shyper.c line 6477 in H5S_select_hyperslab(): invalid selection operation
major: Invalid arguments to routine
minor: Feature is unsupported
Please, help me with this situation. Thanks in advance..

The main problem is the size of your type datatype. It should be sizeof(obj) and not 100*sizeof(obj).
And anyway, you shouldn't be using a string datatype but an opaque datatype since that's what it is, so you can replace this whole line by:
DataType type(H5T_OPAQUE, sizeof(obj));
The second problem is in the read. Either you read everything and you need to make sure inData is big enough, that is 100*sizeof(obj) instead of sizeof(obj), or you need to select just the element you want to read just like for the write.

Related

Reading H5T_STRING datatype via C++ HDF5 API

The dataset I have is:
DATASET "/test_dataset" {
DATATYPE H5T_STRING {
STRSIZE 18;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
DATA {
(0): "Test_String_1\000\000\000\000\000\000\000\000",
(1): "Test_String_2\000\000\000\000", "Test_String_3",
(3): "Test_String_4\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
(4): "Test_String_5\000\000\000\000\000\000", "Test_String_6\000\000"
}
}
I have been trying to read it as follows:
std::vector<std::string> temp_container;
const H5std_string DATASET_NAME("/test_dataset");
H5::DataSet dataset= h5_file.openDataSet(DATASET_NAME);
H5::DataSpace dataspace= dataset.getSpace();
ndims = dataspace.getSimpleExtentDims(dims_out, NULL);
temp_container.resize(dims_out[0]);
H5::StrType datatype= dataset.getStrType();
dataset.read(&temp_container[0], datatype, dataspace);
I also tried to read it via native H5::PredTypes, but I couldn't find any types that are related to string.
First, I don't think variable temp_container declared as std::vector<std::string> can be used to store a variable-length string dataset. For a variable to be able to store such dataset it needs to be declared as char **.
Second, looking at the layout (i.e. dump) of the dataset above, it is a one-dimensional (size 6) of fixed-length (size 18) strings. Therefore, you need the variable to be declared as char temp_container[6][18];.
If you are not bound to a specific library take a look at HDFql as it greatly alleviates you from HDF5 low-level details. So, using HDFql in C++, the dataset above can be read as follows (assume that it is stored in HDF5 file test.h5):
// declare variables
char temp_container[6][18];
char script[100];
int i;
int j;
// prepare script to read dataset 'test_dataset' (from HDF5 file test.h5) and populate variable 'temp_container' with it
sprintf(script, "SELECT FROM test.h5 test_dataset INTO MEMORY %d", HDFql::variableTransientRegister(temp_container));
// execute script
HDFql::execute(script);
// display content of variable 'temp_container'
for(i = 0; i < 6; i++)
{
for(j = 0; j < 18; j++)
{
std::cout << temp_container[i][j];
}
std::cout << std::endl;
}

Reading a string array HDF5 Attribute in C++

I have working C++ code that writes HDF5 data with the column names stored in an attribute. I can successfully read and process the data in Matlab, but am trying to create a C++ reader. It reads the data ok, but when I attempt the read the header, I only get the first column name.
A snippet of the attribute creation process looks like:
// Snip of working code during the creation/recording of a DataSet named mpcDset:
std::vector<std::string> lcFieldnames;
lcFieldnames.clear();
lcFieldnames.push_back("Field1");
lcFieldnames.push_back("Field2");
lcFieldnames.push_back("Field3");
uint lnMaxStringLen = 10;
uint lnNumFields = lcFieldnames.size();
char* lpnBuffer = new char[lnNumFields*lnMaxStringLen];
memset((void*)lpnBuffer,0,lnNumFields*lnMaxStringLen);
int lnCount = 0;
for (auto& lnIndex : lcFieldnames)
{
lnIndex.copy(lpnBuffer + (lnCount *
lnMaxStringLen), lnMaxStringLen -1);
lnCount++;
}
hsize_t lpnHwriteDims[] = { lnNumFields, lnMaxStringLen };
H5::DataSpace lcAdspace(2, lpnHwriteDims, NULL);
H5::Attribute lcAttr = mpcDset->createAttribute(
std::string("header"),
H5::PredType::NATIVE_CHAR, lcAdspace);
lcAdspace.close();
lcAttr.write(H5::PredType::NATIVE_CHAR, lpnBuffer);
lcAttr.close();
delete [] lpnBuffer;
The code in question looks like:
// In another program, given an opened DataSet named mpcDset:
H5::Attribute lcAttr = mpcDset.openAttribute("header");
H5::DataType lcType = lcAttr.getDataType();
hsize_t lnSize = lcAttr.getStorageSize();
char* lpnBuffer = new char[lnSize];
lcAttr.read(lcType, lpnBuffer);
for (uint i=0;i<lnSize; i++)
{
std::cout<<lpnBuffer[i];
}
std::cout<<std::endl;
delete [] lpnBuffer;
lcAttr.close();
lnSize is large enough for all three fields (through inspection), but only "Field1" is output. Any suggestions as to what I am doing wrong?
Personally, to create an attribute that it is a list of strings in C++ I do as follows (something similar):
This code will write an attribute that it is 3 strings, then it will read each of them.
#include "H5Cpp.h"
#ifndef H5_NO_NAMESPACE
using namespace H5;
#endif
#include <iostream>
#include <string>
#include <vector>
using std::string;
using std::vector;
using std::cout;
using std::endl;
int main(int argc, char *argv[])
{
//WRITE ATTRIBUTE
{
try
{
//Example:
//Suppose that in the HDF5 file: 'myH5file_forExample.h5' there is a dataset named 'channel001'
//In that dataset we will create an attribute named 'Column_Names_Attribute'
//That attribute is a list of strings, each string is of variable length.
//The data of the attribute.
vector<string> att_vector;
att_vector.push_back("ColName1");
att_vector.push_back("ColName2 more characters");
att_vector.push_back("ColName3");
//HDF5 FILE
H5::H5File m_h5File;
m_h5File = H5File("myH5file_forExample.h5", H5F_ACC_RDWR); //Open file for read and write
DataSet theDataSet = m_h5File.openDataSet("/channel001"); //Open dataset
H5Object * myObject = &theDataSet;
//DATASPACE
StrType str_type(PredType::C_S1, H5T_VARIABLE);
const int RANK = 1;
hsize_t dims[RANK];
dims[0] = att_vector.size(); //The attribute will have 3 strings
DataSpace att_datspc(RANK, dims);
//ATTRIBUTE
Attribute att(myObject->createAttribute("Column_Names_Attribute" , str_type, att_datspc));
//Convert the vector into a C string array.
//Because the input function ::write requires that.
vector<const char *> cStrArray;
for(int index = 0; index < att_vector.size(); ++index)
{
cStrArray.push_back(att_vector[index].c_str());
}
//WRITE DATA
//att_vector must not change during this operation
att.write(str_type, (void*)&cStrArray[0]);
}
catch(H5::Exception &e)
{
std::cout << "Error in the H5 file: " << e.getDetailMsg() << endl;
}
}
//READ ATTRIBUTE
{
try
{
//HDF5 FILE
H5::H5File m_h5File;
m_h5File = H5File("myH5file_forExample.h5", H5F_ACC_RDONLY); //Open file for read
DataSet theDataSet = m_h5File.openDataSet("/channel001"); //Open dataset
H5Object * myObject = &theDataSet;
//ATTRIBUTE
Attribute att(myObject->openAttribute("Column_Names_Attribute"));
// READ ATTRIBUTE
// Read Attribute DataType
DataType attDataType = att.getDataType();
// Read the Attribute DataSpace
DataSpace attDataSpace = att.getSpace();
// Read size of DataSpace
// Dimensions of the array. Since we are working with 1-D, this is just one number.
hsize_t dim = 0;
attDataSpace.getSimpleExtentDims(&dim); //The number of strings.
// Read the Attribute Data. Depends on the kind of data
switch(attDataType.getClass())
{
case H5T_STRING:
{
char **rdata = new char*[dim];
try
{
StrType str_type(PredType::C_S1, H5T_VARIABLE);
att.read(str_type,(void*)rdata);
for(int iStr=0; iStr<dim; ++iStr)
{
cout << rdata[iStr] << endl;
delete[] rdata[iStr];
}
delete[] rdata;
break;
}
catch(...)
{
for(int iStr=0; iStr<dim; ++iStr)
{
delete[] rdata[iStr];
}
delete[] rdata;
throw std::runtime_error("Error while reading attribute.");
}
throw std::runtime_error("Not valid rank.");
break;
}
case H5T_INTEGER:
{
break;
}
case H5T_FLOAT:
{
break;
}
default:
{
throw std::runtime_error("Not a valid datatype class.");
}
}
}
catch(H5::Exception &e)
{
std::cout << "Error in the H5 file: " << e.getDetailMsg() << endl;
}
catch(std::runtime_error &e)
{
std::cout << "Error in the execution: " << e.what() << endl;
}
}
return 0;
}
Result of the write operation, seen in the HDFview program:

C++ Declaring arrays in class and declaring 2d arrays in class

I'm new with using classes and I encountered a problem while delcaring an array into a class. I want to initialize a char array for text limited to 50 characters and then replace the text with a function.
#ifndef MAP_H
#define MAP_H
#include "Sprite.h"
#include <SFML/Graphics.hpp>
#include <iostream>
class Map : public sprite
{
private:
char mapname[50];
int columnnumber;
int linenumber;
char casestatematricia[];
public:
void setmapname(char newmapname[50]);
void battlespace(int column, int line);
void setcasevalue(int col, int line, char value);
void printcasematricia();
};
#endif
By the way I could initialize my 2d array like that
char casestatematricia[][];
I want later to make this 2d array dynamic where I enter a column number and a line number like that
casestatematricia[linenumber][columnnumber]
to create a battlefield.
this is the cpp code so that you have an idea of what I want to do.
#include "Map.h"
#include <SFML/Graphics.hpp>
#include <iostream>
using namespace sf;
void Map::setmapname(char newmapname[50])
{
this->mapname = newmapname;
}
void Map::battlespace(int column, int line)
{
}
void Map::setcasevalue(int col, int line, char value)
{
}
void Map::printcasematricia()
{
}
thank you in advance.
Consider following common practice on this one.
Most (e.g. numerical) libraries don't use 2D arrays inside classes.
They use dynamically allocated 1D arrays and overload the () or [] operator to access the right elements in a 2D-like fashion.
So on the outside you never can tell that you're actually dealing with consecutive storage, it looks like a 2D array.
In this way arrays are easier to resize, more efficient to store, transpose and reshape.
Just a proposition for your problem:
class Map : public sprite
{
private:
std::string mapname;
int columnnumber;
int linenumber;
std::vector<char> casestatematricia;
static constexpr std::size_t maxRow = 50;
static constexpr std::size_t maxCol = 50;
public:
Map():
casestatematricia(maxRow * maxCol, 0)
{}
void setmapname(std::string newmapname)
{
if (newmapname.size() > 50)
{
// Manage error if you really need no more 50 characters..
// Or just troncate when you serialize!
}
mapname = newmapname;
}
void battlespace(int col, int row);
void setcasevalue(int col, int row, char value)
{
// check that col and line are between 0 and max{Row|Column} - 1
casestatematricia[row * maxRow + col] = value;
}
void printcasematricia()
{
for (std::size_t row = 0; row < maxRow; ++row)
{
for (std::size_t col = 0; col < maxCol; ++col)
{
char currentCell = casestatematricia[row * maxRow + col];
}
}
}
};
For access to 1D array like a 2D array, take a look at Access a 1D array as a 2D array in C++.
When you think about serialization, I guess you want to save it to a file. Just a advice: don't store raw memory to a file just to "save" time when your relaunch your soft. You just have a non portable solution! And seriously, with power of your computer, you don't have to be worry about time to load from file!
I propose you to add 2 methods in your class to save Map into file
void dump(std::ostream &os)
{
os << mapname << "\n";
std::size_t currentRow = 0;
for(auto c: casestatematricia)
{
os << static_cast<int>(c) << " ";
++currentRow;
if (currentRow >= maxRow)
{
currentRow = 0;
os << "\n";
}
}
}
void load(std::istream &is)
{
std::string line;
std::getline(is, line);
mapname = line;
std::size_t current_cell = 0;
while(std::getline(is, line))
{
std::istringstream is(line);
while(!is.eof())
{
char c;
is >> c;
casestatematricia[current_cell] = c;
++current_cell;
}
}
}
This solution is only given for example. They doesn't manage error and I have choose to store it in ASCII in file. You can change to store in binary, but, don't use direct write of raw memory. You can take a look at C - serialization techniques (just have to translate to C++). But please, don't use memcpy or similar technique to serialize
I hope I get this right. You have two questions. You want know how to assign the value of char mapname[50]; via void setmapname(char newmapname[50]);. And you want to know how to create a dynamic size 2D array.
I hope you are comfortable with pointers because in both cases, you need it.
For the first question, I would like to first correct your understanding of void setmapname(char newmapname[50]);. C++ functions do not take in array. It take in the pointer to the array. So it is as good as writing void setmapname(char *newmapname);. For better understanding, go to Passing Arrays to Function in C++
With that, I am going to change the function to read in the length of the new map name. And to assign mapname, just use a loop to copy each of the char.
void setmapname(char *newmapname, int length) {
// ensure that the string passing in is not
// more that what mapname can hold.
length = length < 50 ? length : 50;
// loop each value and assign one by one.
for(int i = 0; i < length; ++i) {
mapname[i] = newmapname[i];
}
}
For the second question, you can use vector like what was proposed by Garf365 need to use but I prefer to just use pointer and I will use 1D array to represent 2d battlefield. (You can read the link Garf365 provide).
// Declare like this
char *casestatematricia; // remember to initialize this to 0.
// Create the battlefield
void Map::battlespace(int column, int line) {
columnnumber = column;
linenumber = line;
// Clear the previous battlefield.
clearspace();
// Creating the battlefield
casestatematricia = new char[column * line];
// initialise casestatematricia...
}
// Call this after you done using the battlefield
void Map::clearspace() {
if (!casestatematricia) return;
delete [] casestatematricia;
casestatematricia = 0;
}
Just remember to call clearspace() when you are no longer using it.
Just for your benefit, this is how you create a dynamic size 2D array
// Declare like this
char **casestatematricia; // remember to initialize this to 0.
// Create the battlefield
void Map::battlespace(int column, int line) {
columnnumber = column;
linenumber = line;
// Clear the previous battlefield.
clearspace();
// Creating the battlefield
casestatematricia = new char*[column];
for (int i = 0; i < column; ++i) {
casestatematricia[i] = new char[line];
}
// initialise casestatematricia...
}
// Call this after you done using the battlefield
void Map::clearspace() {
if (!casestatematricia) return;
for(int i = 0; i < columnnumber; ++i) {
delete [] casestatematricia[i];
}
delete [][] casestatematricia;
casestatematricia = 0;
}
Hope this help.
PS: If you need to serialize the string, you can to use pascal string format so that you can support string with variable length. e.g. "11hello world", or "3foo".

c++ data structure for storing millions of int16

Good afternoon.
I have the following situation: there are three sets of data, each set is a two-dimensional table in which about 50 million fields. (~ 6000 lines and ~ 8000 columns).
That data are stored in binary files
Language - c + +
I only need to display this data.
But I stuck when tried to read.(std::vector used but the waiting time is too long)
What is the best way to read\store such amount of data? (std::vectors, simple pointers, special libraries)?
Maybe links to articles, books, or just personal experience?
Well, if you don't need all this data at once, you may use a memory mapped file technique and read data as it was a giant array. Generally operating system / file system cache works well enough for most applications, but certainly YMMV.
There's no reason you shouldn't use plain old read and write on ifstream/ofstream. The following code doesn't take very long for a BigArray b( 6000, 8000 );
#include <fstream>
#include <iostream>
#include <string>
#include <stdlib.h>
class BigArray {
public:
BigArray( int r, int c ) : rows(r), cols(c){
data = (int*)malloc(rows*cols*sizeof(int));
if( NULL == data ){
std::cout << "ERROR\n";
}
}
virtual ~BigArray(){ free( data ); }
void fill( int n ){
int v = 0;
int * intptr = data;
for( int irow = 0; irow < rows; irow++ ){
for( int icol = 0; icol < cols; icol++ ){
*intptr++ = v++;
v %= n;
}
}
}
void readFromFile( std::string path ){
std::ifstream inf( path.c_str(), std::ifstream::binary );
inf.read( (char*)data, rows*cols*sizeof(*data) );
inf.close();
}
void writeToFile( std::string path ){
std::ofstream outf( path.c_str(), std::ifstream::binary );
outf.write( (char*)data, rows*cols*sizeof(*data) );
outf.close();
}
private:
int rows;
int cols;
int* data;
};

How to best write out a std::vector < std::string > container to a HDF5 dataset?

Given a vector of strings, what is the best way to write them out to a HDF5 dataset? At the moment I'm doing something like the following:
const unsigned int MaxStrLength = 512;
struct TempContainer {
char string[MaxStrLength];
};
void writeVector (hid_t group, std::vector<std::string> const & v)
{
//
// Firstly copy the contents of the vector into a temporary container
std::vector<TempContainer> tc;
for (std::vector<std::string>::const_iterator i = v.begin ()
, end = v.end ()
; i != end
; ++i)
{
TempContainer t;
strncpy (t.string, i->c_str (), MaxStrLength);
tc.push_back (t);
}
//
// Write the temporary container to a dataset
hsize_t dims[] = { tc.size () } ;
hid_t dataspace = H5Screate_simple(sizeof(dims)/sizeof(*dims)
, dims
, NULL);
hid_t strtype = H5Tcopy (H5T_C_S1);
H5Tset_size (strtype, MaxStrLength);
hid_t datatype = H5Tcreate (H5T_COMPOUND, sizeof (TempConainer));
H5Tinsert (datatype
, "string"
, HOFFSET(TempContainer, string)
, strtype);
hid_t dataset = H5Dcreate1 (group
, "files"
, datatype
, dataspace
, H5P_DEFAULT);
H5Dwrite (dataset, datatype, H5S_ALL, H5S_ALL, H5P_DEFAULT, &tc[0] );
H5Dclose (dataset);
H5Sclose (dataspace);
H5Tclose (strtype);
H5Tclose (datatype);
}
At a minimum, I would really like to change the above so that:
It uses variable length strings
I don't need to have a temporary container
I have no restrictions over how I store the data so for example, it doesn't have to be a COMPOUND datatype if there is a better way to do this.
EDIT: Just to narrow the problem down, I'm relatively familiar with playing with the data on the C++ side, it's the HDF5 side where I need most of the help.
Thanks for your help.
[Many thanks to dirkgently for his help in answering this.]
To write a variable length string in HDF5 use the following:
// Create the datatype as follows
hid_t datatype = H5Tcopy (H5T_C_S1);
H5Tset_size (datatype, H5T_VARIABLE);
//
// Pass the string to be written to H5Dwrite
// using the address of the pointer!
const char * s = v.c_str ();
H5Dwrite (dataset
, datatype
, H5S_ALL
, H5S_ALL
, H5P_DEFAULT
, &s );
One solution for writing a container is to write each element individually. This can be achieved using hyperslabs.
For example:
class WriteString
{
public:
WriteString (hid_t dataset, hid_t datatype
, hid_t dataspace, hid_t memspace)
: m_dataset (dataset), m_datatype (datatype)
, m_dataspace (dataspace), m_memspace (memspace)
, m_pos () {}
private:
hid_t m_dataset;
hid_t m_datatype;
hid_t m_dataspace;
hid_t m_memspace;
int m_pos;
//...
public:
void operator ()(std::vector<std::string>::value_type const & v)
{
// Select the file position, 1 record at position 'pos'
hsize_t count[] = { 1 } ;
hsize_t offset[] = { m_pos++ } ;
H5Sselect_hyperslab( m_dataspace
, H5S_SELECT_SET
, offset
, NULL
, count
, NULL );
const char * s = v.c_str ();
H5Dwrite (m_dataset
, m_datatype
, m_memspace
, m_dataspace
, H5P_DEFAULT
, &s );
}
};
// ...
void writeVector (hid_t group, std::vector<std::string> const & v)
{
hsize_t dims[] = { m_files.size () } ;
hid_t dataspace = H5Screate_simple(sizeof(dims)/sizeof(*dims)
, dims, NULL);
dims[0] = 1;
hid_t memspace = H5Screate_simple(sizeof(dims)/sizeof(*dims)
, dims, NULL);
hid_t datatype = H5Tcopy (H5T_C_S1);
H5Tset_size (datatype, H5T_VARIABLE);
hid_t dataset = H5Dcreate1 (group, "files", datatype
, dataspace, H5P_DEFAULT);
//
// Select the "memory" to be written out - just 1 record.
hsize_t offset[] = { 0 } ;
hsize_t count[] = { 1 } ;
H5Sselect_hyperslab( memspace, H5S_SELECT_SET, offset
, NULL, count, NULL );
std::for_each (v.begin ()
, v.end ()
, WriteStrings (dataset, datatype, dataspace, memspace));
H5Dclose (dataset);
H5Sclose (dataspace);
H5Sclose (memspace);
H5Tclose (datatype);
}
Here is some working code for writing a vector of variable length strings using the HDF5 c++ API.
I incorporate some of the suggestions in the other posts:
use H5T_C_S1 and H5T_VARIABLE
use string::c_str() to obtain pointers to the strings
place the pointers into a vector of char* and pass to the HDF5 API
It is not necessary to create expensive copies of the string (e.g. with strdup()). c_str() returns a pointer to the null terminated data of the underlying string. This is precisely what the function is intended for. Of course, strings with embedded nulls will not work with this...
std::vector is guaranteed to have contiguous underlying storage, so using vector and vector::data() is the same as using raw arrays but is of course much neater and safer than the clunky, old-fashioned c way of doing things.
#include "H5Cpp.h"
void write_hdf5(H5::H5File file, const std::string& data_set_name,
const std::vector<std::string>& strings )
{
H5::Exception::dontPrint();
try
{
// HDF5 only understands vector of char* :-(
std::vector<const char*> arr_c_str;
for (unsigned ii = 0; ii < strings.size(); ++ii)
arr_c_str.push_back(strings[ii].c_str());
//
// one dimension
//
hsize_t str_dimsf[1] {arr_c_str.size()};
H5::DataSpace dataspace(1, str_dimsf);
// Variable length string
H5::StrType datatype(H5::PredType::C_S1, H5T_VARIABLE);
H5::DataSet str_dataset = file.createDataSet(data_set_name, datatype, dataspace);
str_dataset.write(arr_c_str.data(), datatype);
}
catch (H5::Exception& err)
{
throw std::runtime_error(string("HDF5 Error in " )
+ err.getFuncName()
+ ": "
+ err.getDetailMsg());
}
}
If you are looking at cleaner code: I suggest you create a functor that'll take a string and save it to the HDF5 Container (in a desired mode). Richard, I used the wrong algorithm, please re-check!
std::for_each(v.begin(), v.end(), write_hdf5);
struct hdf5 : public std::unary_function<std::string, void> {
hdf5() : _dataset(...) {} // initialize the HDF5 db
~hdf5() : _dataset(...) {} // close the the HDF5 db
void operator(std::string& s) {
// append
// use s.c_str() ?
}
};
Does that help get started?
I had a similar issue, with the caveat that I wanted a vector of strings stored as an attribute. The tricky thing with attributes is that we can't use fancy dataspace features like hyperslabs (at least with the C++ API).
But in either case, it may be useful to enter a vector of strings into a single entry in a dataset (if, for example, you always expect to read them together). In this case all the magic comes with the type, not with the dataspace itself.
There are basically 4 steps:
Make a vector<const char*> which points to the strings.
Create a hvl_t structure that points to the vector and contains it's length.
Create the datatype. This is a H5::VarLenType wrapping a (variable length) H5::StrType.
Write the hvl_t type to a dataset.
The really nice part of this method is that you're stuffing the whole entry into what HDF5 considers a scalar value. This means that making it an attribute (rather than a dataset) is trivial.
Whether you choose this solution or the one with each string in its own dataset entry is probably also a matter of the desired performance: if you're looking for random access to specific strings, it's probably better to write the strings out in a dataset so they can be indexed. If you're always going to read them all out together this solution may work just as well.
Here's a short example of how to do this, using the C++ API and a simple scalar dataset:
#include <vector>
#include <string>
#include "H5Cpp.h"
int main(int argc, char* argv[]) {
// Part 0: make up some data
std::vector<std::string> strings;
for (int iii = 0; iii < 10; iii++) {
strings.push_back("this is " + std::to_string(iii));
}
// Part 1: grab pointers to the chars
std::vector<const char*> chars;
for (const auto& str: strings) {
chars.push_back(str.data());
}
// Part 2: create the variable length type
hvl_t hdf_buffer;
hdf_buffer.p = chars.data();
hdf_buffer.len = chars.size();
// Part 3: create the type
auto s_type = H5::StrType(H5::PredType::C_S1, H5T_VARIABLE);
s_type.setCset(H5T_CSET_UTF8); // just for fun, you don't need this
auto svec_type = H5::VarLenType(&s_type);
// Part 4: write the output to a scalar dataset
H5::H5File out_file("vtest.h5", H5F_ACC_EXCL);
H5::DataSet dataset(
out_file.createDataSet("the_ds", svec_type, H5S_SCALAR));
dataset.write(&hdf_buffer, svec_type);
return 0;
}
I am late to the party but I've modified Leo Goodstadt's answer based on the comments regarding segfaults. I am on linux, but I don't have such problems. I wrote 2 functions, one to write a vector of std::string to a dataset of a given name in an open H5File, and another to read back the resulting data sets into a vector of std::string. Note there may unnecessary copying between types a few times that can be more optimised. Here is working code for writing and reading:
void write_varnames( const std::string& dsetname, const std::vector<std::string>& strings, H5::H5File& f)
{
H5::Exception::dontPrint();
try
{
// HDF5 only understands vector of char* :-(
std::vector<const char*> arr_c_str;
for (size_t ii = 0; ii < strings.size(); ++ii)
{
arr_c_str.push_back(strings[ii].c_str());
}
//
// one dimension
//
hsize_t str_dimsf[1] {arr_c_str.size()};
H5::DataSpace dataspace(1, str_dimsf);
// Variable length string
H5::StrType datatype(H5::PredType::C_S1, H5T_VARIABLE);
H5::DataSet str_dataset = f.createDataSet(dsetname, datatype, dataspace);
str_dataset.write(arr_c_str.data(), datatype);
}
catch (H5::Exception& err)
{
throw std::runtime_error(std::string("HDF5 Error in ")
+ err.getFuncName()
+ ": "
+ err.getDetailMsg());
}
}
And to read:
std::vector<std::string> read_string_dset( const std::string& dsname, H5::H5File& f )
{
H5::DataSet cdataset = f.openDataSet( dsname );
H5::DataSpace space = cdataset.getSpace();
int rank = space.getSimpleExtentNdims();
hsize_t dims_out[1];
int ndims = space.getSimpleExtentDims( dims_out, NULL);
size_t length = dims_out[0];
std::vector<const char*> tmpvect( length, NULL );
fprintf(stdout, "In read STRING dataset, got number of strings: [%ld]\n", length );
std::vector<std::string> strs(length);
H5::StrType datatype(H5::PredType::C_S1, H5T_VARIABLE);
cdataset.read( tmpvect.data(), datatype);
for(size_t x=0; x<tmpvect.size(); ++x)
{
fprintf(stdout, "GOT STRING [%s]\n", tmpvect[x] );
strs[x] = tmpvect[x];
}
return strs;
}
As you know,hdf5 file only accept data with the format as char*, which is
an address.So the most natural way is like dynamically creating consecutive address (the space size is given), and copying the value of vector into it.
char* strs = NULL;
strs = (char*)malloc(date.size() * (date[0].size() + 1) * (char)sizeof(char));
for (int i = 0; i < date.size(); i++) {
string s = date[i];
strcpy(strs + i * (date[0].size() + 1), date[i].c_str());
}
The complete code is shown as below,
bool writeString(hid_t file_id, vector<string>& date, string dateSetName) {
hid_t dataset_id, dataspace_id; /* identifiers */
herr_t status;
hid_t dtype;
size_t size;
hsize_t dims[1] = { date.size() };
dataspace_id = H5Screate_simple(1, dims, NULL);
dtype = H5Tcopy(H5T_C_S1);
size = (date[0].size() + 1) * sizeof(char);
status = H5Tset_size(dtype, size);
char* strs = NULL;
strs = (char*)malloc(date.size() * (date[0].size() + 1) * (char)sizeof(char));
for (int i = 0; i < date.size(); i++) {
string s = date[i];
strcpy(strs + i * (date[0].size() + 1), date[i].c_str());
}
dataset_id = H5Dcreate(file_id, dateSetName.c_str(), dtype, dataspace_id, H5P_DEFAULT,
H5P_DEFAULT, H5P_DEFAULT);
status = H5Dwrite(dataset_id, dtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, strs);
status = H5Dclose(dataset_id);
status = H5Sclose(dataspace_id);
status = H5Tclose(dtype);
free(strs);
return true;
}
Don't forget to free the pointer.
Instead of a TempContainer, you can use a simple std::vector (you could also templatized it to match T -> basic_string .
Something like this:
#include <algorithm>
#include <vector>
#include <string>
#include <functional>
class StringToVector
: std::unary_function<std::vector<char>, std::string> {
public:
std::vector<char> operator()(const std::string &s) const {
// assumes you want a NUL-terminated string
const char* str = s.c_str();
std::size_t size = 1 + std::strlen(str);
// s.size() != strlen(s.c_str())
std::vector<char> buf(&str[0], &str[size]);
return buf;
}
};
void conv(const std::vector<std::string> &vi,
std::vector<std::vector<char> > &vo)
{
// assert vo.size() == vi.size()
std::transform(vi.begin(), vi.end(),
vo.begin(),
StringToVector());
}
In the interest of having the ability to read std::vector<std::string> I'm posting my solution, based on the hints from Leo here https://stackoverflow.com/a/15220532/364818.
I've mixed C and C++ APIs. Please feel free to edit this and make it simpler.
Note that the HDF5 API returns a list of char*pointers when you call read. These char* pointers must be freed after use, otherwise there is a memory leak.
Usage example
H5::Attribute Foo = file.openAttribute("Foo");
std::vector<std::string> foos
Foo >> foos;
Here's the code
const H5::Attribute& operator>>(const H5::Attribute& attr0, std::vector<std::string>& array)
{
H5::Exception::dontPrint();
try
{
hid_t attr = attr0.getId();
hid_t atype = H5Aget_type(attr);
hid_t aspace = H5Aget_space(attr);
int rank = H5Sget_simple_extent_ndims(aspace);
if (rank != 1) throw PBException("Attribute " + attr0.getName() + " is not a string array");
hsize_t sdim[1];
herr_t ret = H5Sget_simple_extent_dims(aspace, sdim, NULL);
size_t size = H5Tget_size (atype);
if (size != sizeof(void*))
{
throw PBException("Internal inconsistency. Expected pointer size element");
}
// HDF5 only understands vector of char* :-(
std::vector<char*> arr_c_str(sdim[0]);
H5::StrType stringType(H5::PredType::C_S1, H5T_VARIABLE);
attr0.read(stringType, arr_c_str.data());
array.resize(sdim[0]);
for(int i=0;i<sdim[0];i++)
{
// std::cout << i << "=" << arr_c_str[i] << std::endl;
array[i] = arr_c_str[i];
free(arr_c_str[i]);
}
}
catch (H5::Exception& err)
{
throw std::runtime_error(string("HDF5 Error in " )
+ err.getFuncName()
+ ": "
+ err.getDetailMsg());
}
return attr0;
}
I don't know about HDF5, but you can use
struct TempContainer {
char* string;
};
and then copy the strings this way:
TempContainer t;
t.string = strdup(i->c_str());
tc.push_back (t);
This will allocate a string with the exact size, and also improves a lot when inserting or reading from the container (in your example there's an array copied, in this case only a pointer). You can also use std::vector:
std::vector<char *> tc;
...
tc.push_back(strdup(i->c_str());