Writing dataset of type H5T_ARRAY - c++

I'm trying to write data in HDF5 using the C++ API.
I work on Windows XP 64 bits with Visual Studio 2010. I use version 1.8.9.
The target is set to X86 so I had to use the 32 bits version of HDF (to be honest, I'm very new to programming with Windows and VS and didn't configure the whole thing myself, so I'm really not sure it was the right choice).
My issue happens when trying to write a part of a dataset of type H5T_ARRAY.
The HDF5 file structure I want to achieve is a dataset of 4 dimensions (i1,i2,i3,i4), which datatype is : array of double with 2 dimensions (a1,a2).
Here is the DDL to sum it up :
HDF5 "result.h5" {
GROUP "/" {
DATASET "mydata" {
DATATYPE H5T_ARRAY { [a1][a2] H5T_IEEE_F64LE }
DATASPACE SIMPLE { ( i1,i2,i3,i4) / ( i1,i2,i3,i4 ) }
DATA { <my data> }
}
}
Due to my program structure, I write this dataset element by element, ie H5T_ARRAY by H5T_ARRAY.
I've defined a class OutputFile to manage all the HDF5 I/O. It contains these attributes :
H5::H5File *_H5fileHandle ; // HDF5 file
H5::DataSpace *_dataspaceHandle ; // Handle of the Dataspace of the datasets
int _dataspaceRank ; // Rank of the dataspace
H5::ArrayType *_datatypeHandle ; // Handle of the datatype of the datasets (= array of N dimensions)
int _datatypeRank ; // Rank of the datatype
H5::DataSet *_datasetHandle ; // Handle of the dataset
The file is open right at the beginning of the program, and all the handles (dataspace, datatype and dataset) are set then :
void OutputFile ::createFile(std::string filename,
std::vector<int> dsdims,
std::vector<int> adims,
std::vector<std::string> datasetName) {
_filename = filename ;
_H5fileHandle = new H5::H5File(_filename.c_str(), H5F_ACC_TRUNC);
// Defining the dataspace
_dataspaceRank = dsdims.size() ;
hsize_t *h5dsdims = new hsize_t[_dataspaceRank] ;
for (int iDim=0 ; iDim < _dataspaceRank ; iDim++) h5dsdims[iDim] = hsize_t(dsdims[iDim]) ;
_dataspaceHandle = new H5::DataSpace(_dataspaceRank, h5dsdims, NULL);
// Defining the datatype = array type
_datatypeRank = adims.size() ;
hsize_t *h5adims = new hsize_t[_datatypeRank] ;
for (int iDim=0 ; iDim < _datatypeRank ; iDim++) h5adims[iDim] = hsize_t(adims[iDim]) ;
_datatypeHandle = new H5::ArrayType(H5::PredType::IEEE_F64LE, _datatypeRank, h5adims);
// Creating the dataset
_datasetHandle = _H5fileHandle->createDataSet( _datasetName.c_str(),*_datatypeHandle, *_dataspaceHandle );
// Clean up
delete h5dsdims ;
delete h5adims ;
}
Then, I write the data each time I get an element ready (i.e a H5T_ARRAY) :
void OutputFile::writeMyData(double **Values, int *positionInDataSet) {
// set the element position
hsize_t position[1][4] ;
position[0][0] = hsize_t(positionInDataset[0]);
position[0][1] = hsize_t(positionInDataset[1]);
position[0][2] = hsize_t(positionInDataset[2]);
position[0][3] = hsize_t(positionInDataset[3]);
_fileDataspace->selectElements( H5S_SELECT_SET, 1, (const hsize_t *)position);
//Set the memory dataspace
hsize_t memdims[] = {1} ;
H5::DataSpace memspace(1, memdims, NULL);
// set the memory datatype
hsize_t memTypeRank = 2 ;
hsize_t *memTypedims = new hsize_t[memTypeRank] ;
for (int iDim=0 ; iDim < memTypeRank ; iDim++) memTypedims[iDim] = hsize_t(dataDims[iDim]) ;
H5::ArrayType memtypeHandle(H5::PredType::IEEE_F64LE, memTypeRank, memTypedims);
_datasetHandle->write(Values, memtypeHandle, memspace, *_dataspaceHandle);
_H5fileHandle->flush(H5F_SCOPE_GLOBAL) ;
}
The Values argument is allocated in the calling function, with size [a1][a2].
Unfortunately, it doesn't work properly. I get invalid data in my HDF5 file, and all elements are equal (meaning that all H5T_ARRAY contains the same values).
Exemple :
(0,0,0,0): [ 5.08271e-275, 5.08517e-275, -7.84591e+298, -2.53017e-098, 0, 2.18992e-303,
5.08094e-275, 0, 2.122e-314, -7.84591e+298, 5.08301e-275, 5.08652e-275,
-7.84591e+298, -2.53017e-098, 0, 2.18994e-303, 5.08116e-275, 0,
2.122e-314, -7.84591e+298, 5.08332e-275, 5.08683e-275, -7.84591e+298, -2.53017e-098,
0, 2.18995e-303, 5.08138e-275, 0, 2.122e-314, -7.84591e+298 ],
... and so on for every element.
For now, I have :
checked that the content of the "Value" array in writeMyData() is correct and contains valid data
checked that, if I only write one element, then this element, and only this one, contains invalid data in the HDF5 files (the other ones contain only zeroes)
used those additional type combinations, without success :
memType = NATIVE_DOUBLE, fileType = IEEE_64LE
memType = NATIVE_DOUBLE, fileType = NATIVE_DOUBLE
memType = IEEE_32LE, fileType = IEEE_32_LE
checked that double-value attributes are written correctly, using the type IEEE_F64LE
tried to close the file at the end of writeMyData(), and open it at the beginning, to force writing data on the disk. The results are the same.
passed &Values instead of Values in the call to DataSet::write() (the results are the same).
I'm a bit at my wits' end. I've found examples for partial I/0 of a dataset and others for array datatypes, but nothing for partial writing of array-type datasets.
I guess it's a memory issue, my feeling is that I do something wrong when passing the "Values" array to DataSet::write(), but I can't pinpoint the problem.
Thanks in advance for any pointers you have.

Related

HDF5 -> H5Ldelete, H5F_FSPACE_STRATEGY_FSM_AGGR and free space

I am facing quite a pickle to understand the space management of HDF5 files.
I have written a code that creates a file that contains a group which contains a series of datasets. The user can decide to remove one or several datasets after they have been added to the file.
This is how I create my HDF5 file:
hid_t fcpl = H5Pcreate(H5P_FILE_CREATE);
hsize_t fsm_size = 0;
H5Pset_file_space_strategy(fcpl, H5F_FSPACE_STRATEGY_FSM_AGGR, 0, fsm_size);
hid_t fapl = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_libver_bounds(fapl, H5F_LIBVER_V110, H5F_LIBVER_LATEST);
H5::H5File data_file(filename, H5F_ACC_TRUNC, fcpl, fapl);
Then, I create a group with:
H5::Group data_group = data_file.createGroup("MyData");
Finally, I add a series of dataset (similar data in this simple case) to this group using:
std::vector<float> array = {1, 2, 3, 4, 5, 6};
for (uint8_t idx = 0; idx < 10; idx++)
{
std::string name = std::string("mydata_") + std::to_string(idx);
hsize_t dims_pol[1] = {array.size()};
H5::DataSpace dataspace_x(1, dims_pol);
H5::IntType datatype_x(H5::PredType::IEEE_F32LE);
H5::DataSet dataset_x;
dataset_x = data_group.createDataSet(name.c_str(), datatype_x, dataspace_x);
dataset_x.write(array.data(), H5::PredType::IEEE_F32LE);
dataset_x.close();
}
Doing so, I do have a file that is filled with the correct data as expected. The (correct from my poor understanding) superblock of the file is:
SUPER_BLOCK {
SUPERBLOCK_VERSION 3
FREELIST_VERSION 0
SYMBOLTABLE_VERSION 0
OBJECTHEADER_VERSION 0
OFFSET_SIZE 8
LENGTH_SIZE 8
BTREE_RANK 16
BTREE_LEAF 4
ISTORE_K 32
FILE_SPACE_STRATEGY H5F_FSPACE_STRATEGY_FSM_AGGR
FREE_SPACE_PERSIST FALSE
FREE_SPACE_SECTION_THRESHOLD 0
FILE_SPACE_PAGE_SIZE 4096
USER_BLOCK {
USERBLOCK_SIZE 0
}
}
The problem arises when I try to delete the datasets. To do that (while the file is still opened), I use the following procedure:
for (uint8_t idx = 0; idx < 10; idx++)
{
std::string name = std::string("mydata_") + std::to_string(idx);
H5Ldelete(data_group.getId(), name.c_str(), H5P_DEFAULT);
}
data_file.close();
From my understanding, the space should be freed when closing the file. However, if the file doesn't contain any reference to the data when opened with the viewer, the size doesn't decrease...
I also tried with unlinking the dataset. This is not working...
Any help would be greatly appreciated...
Thanks

A problem in saving "long double" variables to hdf5 file

I'm new to hdf5. I'm trying to save the results of a simulation in hdf5 file. The variables are long double. So, I mapped them into NATIVE_LDOUBLE. However, the saved values are completely wrong (fluctuating between very small and very large values).
When I save them with NATIVE_DOUBLE, everything is ok. But I need to save long double.
My question is how to properly save long double variables, and moreover quadruple precision variables?
I deeply appreciate your help. Examples are also appreciated.
here is the code
void createHDF5_2DProjectionFile(char* file_name,
CarGrid1D3V<long double>& ph_grid,
std::string first_dim,
std::string second_dim,
long double *x1, int size_x1,
long double *x2, int size_x2)
{
try
{
/* define the size of the datasets containing the coordinates x1
and x2
*/
PredType h5Int = PredType::NATIVE_INT;
PredType h5DoubleL = PredType::NATIVE_LDOUBLE;
PredType h5Double = PredType::NATIVE_DOUBLE;
/* Define the parameters of grid space
DS --> Data Space
*/
hsize_t x1_dims[1], x2_dims[1];
x1_dims[0] = size_x1;
x2_dims[0] = size_x2;
H5File *file_id = new H5File(file_name, H5F_ACC_TRUNC);
/* Saving string attribute
Create dataspace with H5S_SCALAR
Create string datatype of specific length of characters
Create attribute and write to it
*/
DataSpace attr_stringDS = DataSpace(H5S_SCALAR);
StrType strdatatype(PredType::C_S1, 64);
Attribute original_DistFun = file_id->createAttribute("/OriginalDistFun",
strdatatype, attr_stringDS);
original_DistFun.write(strdatatype, "1D3V");
Attribute projection = file_id->createAttribute("/Projection",
strdatatype, attr_stringDS);
projection.write(strdatatype, first_dim + " - " + second_dim);
/* Create the data spaces for grid points along each direction */
DataSpace* first_dimDS_id = new DataSpace(1, x1_dims, NULL);
DataSpace* second_dimDS_id = new DataSpace(1, x2_dims, NULL);
/* Create and fille the datasets for grid points along each direction */
DataSet *data_dim1 = new DataSet(file_id->createDataSet(first_dim,
h5DoubleL, *first_dimDS_id));
data_dim1->write(x1, h5DoubleL);
DataSet *data_dim2 = new DataSet(file_id->createDataSet(second_dim,
h5DoubleL, *second_dimDS_id));
data_dim2->write(x2, h5DoubleL);
/* Important attributes added to the file */
long double x_minmax[2], px_minmax[2],
py_minmax[2], pz_minmax[2], mom_steps[3],
ph_vols[3], spatial_steps[1];
x_minmax[0] = ph_grid.x_min_;
x_minmax[1] = ph_grid.x_max_;
px_minmax[0] = ph_grid.px_min_;
px_minmax[1] = ph_grid.px_max_;
py_minmax[0] = ph_grid.py_min_;
py_minmax[1] = ph_grid.py_max_;
pz_minmax[0] = ph_grid.pz_min_;
pz_minmax[1] = ph_grid.pz_max_;
mom_steps[0] = ph_grid.dpx_;
mom_steps[1] = ph_grid.dpy_;
mom_steps[2] = ph_grid.dpz_;
ph_vols[0] = ph_grid.dvs_;
ph_vols[1] = ph_grid.dvp_;
ph_vols[2] = ph_grid.dv_;
spatial_steps[0] = ph_grid.dx_;
ph_grid.print_characteristics();
std::cout << x_minmax[0] << " , " << x_minmax[1] << "\n";
/* define attributes configuration */
hsize_t space_1[1];
space_1[0] = 1;
hsize_t space_2[1];
space_2[0] = 2;
hsize_t space_3[1];
space_3[0] = 3;
DataSpace attr_space_1 = DataSpace(1, space_1);
DataSpace attr_space_2 = DataSpace(1, space_2);
DataSpace attr_space_3 = DataSpace(1, space_3);
Attribute x_interval = file_id->createAttribute("[x_min,x_max]",
h5DoubleL, attr_space_2);
x_interval.write(h5DoubleL, x_minmax);
Attribute px_interval = file_id->createAttribute("[px_min,px_max]",
h5DoubleL, attr_space_2);
px_interval.write(h5DoubleL, px_minmax);
Attribute py_interval = file_id->createAttribute("[py_min,py_max]",
h5DoubleL, attr_space_2);
py_interval.write(h5DoubleL, py_minmax);
Attribute pz_interval = file_id->createAttribute("[pz_min,pz_max]",
h5DoubleL, attr_space_2);
pz_interval.write(h5DoubleL, pz_minmax);
Attribute MomVolumes = file_id->createAttribute("[dpx,dpy,dpz]",
h5DoubleL, attr_space_3);
MomVolumes.write(h5DoubleL, mom_steps);
Attribute PhVolumes = file_id->createAttribute("[dv_s, dv_m, dv_t]",
h5DoubleL, attr_space_3);
PhVolumes.write(h5DoubleL, ph_vols);
Attribute SpatialVolumes = file_id->createAttribute("[dx]", PredType::NATIVE_DOUBLE,
attr_space_1);
SpatialVolumes.write(h5DoubleL, spatial_steps);
/* Free memory */
delete data_dim1;
delete data_dim2;
delete first_dimDS_id;
delete second_dimDS_id;
delete file_id;
}
catch(DataSetIException error)
{
error.printErrorStack();
}
catch(DataSpaceIException error)
{
error.printErrorStack();
}
catch(FileIException error)
{
error.printErrorStack();
}
}
Update
A great discussion and explanations are available on HDF5 forum where I posted the same question
https://forum.hdfgroup.org/t/a-problem-when-saving-native-ldouble-variables/9504
Also, Steven Varga provided examples to answer this question on his GitHub by constructing a user-defined datatype (see this link).

How to insert array of bytes in PostgreSQL table via libpq C++ API

I am trying to update table
CREATE TABLE some_table
(
id integer NOT NULL,
client_fid bigint NOT NULL,
index bytea[],
update_time timestamp without time zone
)
WITH (
OIDS = FALSE
using modified code snipped from here How to insert text array in PostgreSQL table in binary format using libpq?
#define BYTEAARRAYOID 1001
#define BYTEAOID 17
Here is a pgvals_t structure definition
struct pgvals_t
{
/* number of array dimensions */
int32_t ndims;
/* flag describing if array has NULL values */
int32_t hasNull;
/* Oid of data stored in array. In our case is 25 for TEXT */
Oid oidType;
/* Number of elements in array */
int32_t totalLen;
/* Not sure for this one.
I think it describes dimensions of elements in case of arrays storing arrays */
int32_t subDims;
/* Here our data begins */
} __attribute__ ((__packed__));
I've removed dataBegins pointer from struct as it affects data layout in memo
std::size_t nElems = _data.size();
uint32_t valsDataSize = sizeof(prx::pgvals_t) + sizeof(int32_t) * nElems +
sizeof(uint8_t)*nElems;
void *pData = malloc(valsDataSize);
prx::pgvals_t* pvals = (prx::pgvals_t*)pData;
/* our array has one dimension */
pvals->ndims = ntohl(1);
/* our array has no NULL elements */
pvals->hasNull = ntohl(0);
/* type of our elements is bytea */
pvals->oidType = ntohl(BYTEAOID);
/* our array has nElems elements */
pvals->totalLen = ntohl(nElems);
pvals->subDims = ntohl(1);
int32_t elemLen = ntohl(sizeof(uint8_t));
std::size_t offset = sizeof(elemLen) + sizeof(_data[0]);
char * ptr = (char*)(pvals + sizeof(prx::pgvals_t));
for(auto byte : _data){
memcpy(ptr, &elemLen, sizeof(elemLen));
memcpy(ptr + sizeof(elemLen), &byte, sizeof(byte));
ptr += offset;
}
Oid paramTypes[] = { BYTEAARRAYOID };
char * paramValues[] = {(char* )pData};
int paramLengths[] = { (int)valsDataSize };
int paramFormats[] = {1};
PGresult *res = PQexecParams(m_conn, _statement.c_str(),
1,
paramTypes,
paramValues,
paramLengths,
paramFormats,
1
);
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
std::string errMsg = PQresultErrorMessage(res);
PQclear(res);
throw std::runtime_error(errMsg);
}
free(pData);
The binary data is contained in std::vector variable and am using the following query in a _statement variable of type std::string
INSERT INTO some_table \
(id, client_id, \"index\", update_time) \
VALUES \
(1, 2, $1, NOW())
Now after call to PQExecParams I am get an exception with message
"incorrect binary data format in bind parameter 1"
What can be the problem here?
If you want to pass a bytea[] in binary format, you have to use the binary array format as read by array_recv and written by array_send.
You cannot just pass a C array.

multiple lists into a map

I had a question regarding my code below. I'm reading a file containing lots of data of which some stuff is irrelevant. The data is written out on one line, so I cannot use nextLine or something.
For each vertex, I save the relevant information into dataperpoint. When I go to the next vertex, I want to clear the list to fill it with new relevant information.
The issue that I have is that each time I clear dataperpoint, all values in Map get cleared. When I then try to fill it, all previous positions in the Map get the same values.
How can I do this and make sure that each vertex will get his own list?
Looking forward to your suggestions!
public static Map<Integer, List<Double>> readData(File f) // throws IO exception?
{
// Create Map to store the vertex and list with relevant information in
List<Double> dataperpoint = new ArrayList<Double>();
Map<Integer, List<Double>> data = new HashMap<>();
// Open the scanner
try (Scanner in = new Scanner(f))
{
// To make sure the correct localization is used
in.useLocale(Locale.US);
// The first six integers contain irrelevant information
for (int step = 1; step <= 6; step++)
{
in.nextInt();
}
// Do the information for vertex 0 separately, since it has one data point less
int vertex = in.nextInt();
for (int doubleinfo = 1; doubleinfo <= 4; doubleinfo++) // six relevant variables
{
dataperpoint.add(in.nextDouble());
}
// irrelevant information
for (int irrelevantinfo = 1; irrelevantinfo <= 2; irrelevantinfo++)
{
in.nextInt();
}
// Opening and Closing of time window
dataperpoint.add((double) in.nextInt());
dataperpoint.add((double) in.nextInt());
data.put(vertex, dataperpoint);
while (in.hasNext()) // think of different statement later
{
dataperpoint = new ArrayList<Double>();
vertex = in.nextInt();
for (int doubleinfo = 1; doubleinfo <= 4; doubleinfo++) // six relevant variables
{
dataperpoint.add(in.nextDouble());
}
// irrelevant information
for (int irrelevantinfo = 1; irrelevantinfo <= 3; irrelevantinfo++)
{
in.nextInt();
}
// Opening and Closing of time window
dataperpoint.add((double) in.nextInt());
dataperpoint.add((double) in.nextInt());
data.put(vertex, dataperpoint);
}
in.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
Use LinkedHashMap<> instead of HashMap<> it should solve your problem. Read this Difference between HashMap, LinkedHashMap and TreeMap

HDF5 :Create a Dataset with string

I am using HDF5 API and I am trying to create a dataset with variable-length string.
The struct is
struct dataX
{
std::string data;
};
I was using char[256] with a static hard coded size.
But I want it to be dynamic so after reading the HDF5 Doc, I found H5T_VARIABLE and used it as follows but it still fails.
H5Dcreate returns a negative value (means error).
hid_t mem_type;
mem_type = H5Tcopy( H5T_C_S1 );
H5Tset_size(mem_type,H5T_VARIABLE);
/* Create the memory data type. */
if ((mem_type_id = H5Tcreate (H5T_COMPOUND, mem_type )) < 0 ) {
return -1;
}
/* Insert fields. */
if ( H5Tinsert(mem_type_id, "field", 0, mem_type_id ) < 0 ) {
return -1;
}
/* Create a simple data space with unlimited size */
// hsize_t dims[1]={0};
// hsize_t maxdimsk[1]={ H5S_UNLIMITED };
if ( (sid = H5Screate_simple( 1, dims, maxdims )) < 0 ){
return -1;
}
/* Modify dataset creation properties, i.e. enable chunking */
plist_id = H5Pcreate (H5P_DATASET_CREATE);
//chunk==1
if ( H5Pset_chunk ( plist_id, 1, chunk ) < 0 ){
return -1;
}
H5Pset_alloc_time( plist_id, H5D_ALLOC_TIME_EARLY )
/* Set the fill value using a struct as the data type. */
// fill_data=0
if ( fill_data )
{
if ( H5Pset_fill_value( plist_id, mem_type_id, fill_data ) < 0 ){
LOG_ERROR << "cannot fill value " << LOG_ENDL;
return -1;
}
}
else {
if ( H5Pset_fill_time( plist_id, H5D_FILL_TIME_NEVER ) < 0 ) {
LOG_ERROR << "error" << LOG_ENDL;
}
}
/* Create the dataset. */
did = H5Dcreate( loc_id, dset_name, mem_type_id, sid, plist_id )
I tried H5D_ALLOC_TIME_LATE, thinking that maybe if it allocated the memory just before writing it would work but ... it didn't.
Now I'm stuck and I don't know what to do.
Did I miss something ?
Your mem_type_id doubly invalid:
the second argument of H5Tcreate should be the size of the compound datatype
in H5Tinsert, the last argument should be the datatype of the inserted field. Here I guess you meant mem_type instead of mem_type_id.
I don't know anything about what you are doing in particular, but to write variable length string, you do not need to create a compound type nor to set any special property lists. Basically your 3 first line are enough to create a valid variable-length string datatype (mem_type). Then you create the simple dataspace, then the dataset.
Have a look at this example, you will see it's pretty simple.