HDF5 :Create a Dataset with string

HDF5 :Create a Dataset with string - c++

I am using HDF5 API and I am trying to create a dataset with variable-length string.
The struct is
struct dataX
{
std::string data;
};
I was using char[256] with a static hard coded size.
But I want it to be dynamic so after reading the HDF5 Doc, I found H5T_VARIABLE and used it as follows but it still fails.
H5Dcreate returns a negative value (means error).
hid_t mem_type;
mem_type = H5Tcopy( H5T_C_S1 );
H5Tset_size(mem_type,H5T_VARIABLE);
/* Create the memory data type. */
if ((mem_type_id = H5Tcreate (H5T_COMPOUND, mem_type )) < 0 ) {
return -1;
}
/* Insert fields. */
if ( H5Tinsert(mem_type_id, "field", 0, mem_type_id ) < 0 ) {
return -1;
}
/* Create a simple data space with unlimited size */
// hsize_t dims[1]={0};
// hsize_t maxdimsk[1]={ H5S_UNLIMITED };
if ( (sid = H5Screate_simple( 1, dims, maxdims )) < 0 ){
return -1;
}
/* Modify dataset creation properties, i.e. enable chunking */
plist_id = H5Pcreate (H5P_DATASET_CREATE);
//chunk==1
if ( H5Pset_chunk ( plist_id, 1, chunk ) < 0 ){
return -1;
}
H5Pset_alloc_time( plist_id, H5D_ALLOC_TIME_EARLY )
/* Set the fill value using a struct as the data type. */
// fill_data=0
if ( fill_data )
{
if ( H5Pset_fill_value( plist_id, mem_type_id, fill_data ) < 0 ){
LOG_ERROR << "cannot fill value " << LOG_ENDL;
return -1;
}
}
else {
if ( H5Pset_fill_time( plist_id, H5D_FILL_TIME_NEVER ) < 0 ) {
LOG_ERROR << "error" << LOG_ENDL;
}
}
/* Create the dataset. */
did = H5Dcreate( loc_id, dset_name, mem_type_id, sid, plist_id )
I tried H5D_ALLOC_TIME_LATE, thinking that maybe if it allocated the memory just before writing it would work but ... it didn't.
Now I'm stuck and I don't know what to do.
Did I miss something ?

Your mem_type_id doubly invalid:
the second argument of H5Tcreate should be the size of the compound datatype
in H5Tinsert, the last argument should be the datatype of the inserted field. Here I guess you meant mem_type instead of mem_type_id.
I don't know anything about what you are doing in particular, but to write variable length string, you do not need to create a compound type nor to set any special property lists. Basically your 3 first line are enough to create a valid variable-length string datatype (mem_type). Then you create the simple dataspace, then the dataset.
Have a look at this example, you will see it's pretty simple.

Related

How to insert array of bytes in PostgreSQL table via libpq C++ API

I am trying to update table
CREATE TABLE some_table
(
id integer NOT NULL,
client_fid bigint NOT NULL,
index bytea[],
update_time timestamp without time zone
)
WITH (
OIDS = FALSE
using modified code snipped from here How to insert text array in PostgreSQL table in binary format using libpq?
#define BYTEAARRAYOID 1001
#define BYTEAOID 17
Here is a pgvals_t structure definition
struct pgvals_t
{
/* number of array dimensions */
int32_t ndims;
/* flag describing if array has NULL values */
int32_t hasNull;
/* Oid of data stored in array. In our case is 25 for TEXT */
Oid oidType;
/* Number of elements in array */
int32_t totalLen;
/* Not sure for this one.
I think it describes dimensions of elements in case of arrays storing arrays */
int32_t subDims;
/* Here our data begins */
} __attribute__ ((__packed__));
I've removed dataBegins pointer from struct as it affects data layout in memo
std::size_t nElems = _data.size();
uint32_t valsDataSize = sizeof(prx::pgvals_t) + sizeof(int32_t) * nElems +
sizeof(uint8_t)*nElems;
void *pData = malloc(valsDataSize);
prx::pgvals_t* pvals = (prx::pgvals_t*)pData;
/* our array has one dimension */
pvals->ndims = ntohl(1);
/* our array has no NULL elements */
pvals->hasNull = ntohl(0);
/* type of our elements is bytea */
pvals->oidType = ntohl(BYTEAOID);
/* our array has nElems elements */
pvals->totalLen = ntohl(nElems);
pvals->subDims = ntohl(1);
int32_t elemLen = ntohl(sizeof(uint8_t));
std::size_t offset = sizeof(elemLen) + sizeof(_data[0]);
char * ptr = (char*)(pvals + sizeof(prx::pgvals_t));
for(auto byte : _data){
memcpy(ptr, &elemLen, sizeof(elemLen));
memcpy(ptr + sizeof(elemLen), &byte, sizeof(byte));
ptr += offset;
}
Oid paramTypes[] = { BYTEAARRAYOID };
char * paramValues[] = {(char* )pData};
int paramLengths[] = { (int)valsDataSize };
int paramFormats[] = {1};
PGresult *res = PQexecParams(m_conn, _statement.c_str(),
1,
paramTypes,
paramValues,
paramLengths,
paramFormats,
1
);
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
std::string errMsg = PQresultErrorMessage(res);
PQclear(res);
throw std::runtime_error(errMsg);
}
free(pData);
The binary data is contained in std::vector variable and am using the following query in a _statement variable of type std::string
INSERT INTO some_table \
(id, client_id, \"index\", update_time) \
VALUES \
(1, 2, $1, NOW())
Now after call to PQExecParams I am get an exception with message
"incorrect binary data format in bind parameter 1"
What can be the problem here?

If you want to pass a bytea[] in binary format, you have to use the binary array format as read by array_recv and written by array_send.
You cannot just pass a C array.

A function to display contents of 1 or 2 dimensional array of any type

I needed to be able to display the contents of my various arrays (for debugging purposes at this point), and decided to write a function to help me with that. This is what I came up with.
The goal is to be able to display any type of incoming array (int, double, etc).
Because I never had any official programming training, I am wondering if what I have is too "inelegant" and could be improved by doing something obvious to a good computer science person, but not so to a layperson.
int
DisplayArrayInDebugWindow(
void** incoming_array,
char* array_type_str,
int array_last_index_dim_size,
int array_terminator,
HWND handle_to_display_window,
wchar_t* optional_array_name )
{
wchar_t message_bufferw[1000];
message_bufferw[0] = L'\0';
wchar_t temp_buffer[400];
if ( array_last_index_dim_size == 0 ) { array_last_index_dim_size = 1; }
// ----------------------------------------------------------------------------
// Processing for "int" type array
// ----------------------------------------------------------------------------
if ( 0 == (strcmp( array_type_str, "int" )) )
{
int j = 0;
swprintf( temp_buffer, L"%s\r\n", optional_array_name );
wcscat( message_bufferw, temp_buffer );
for ( int i = 0; ((int)(*((int*)( (int)incoming_array + i * (int)sizeof(int) * array_last_index_dim_size + j * (int)sizeof(int))))) != array_terminator; i++ )
{
swprintf( temp_buffer, L"%02i:\t", i );
wcscat( message_bufferw, temp_buffer );
for ( j; j < last_array_dim_size; j++ )
{
swprintf( temp_buffer, L"%i\t", ((int)(*((int*)( (int)incoming_array + i * (int)sizeof(int) * array_last_index_dim_size + j * (int)sizeof(int) )))) ); //
wcscat( message_bufferw, temp_buffer );
}
wcscat( message_bufferw, L"\r\n" );
// --------------------------------------------------------------------
// reset j to 0 each time
// --------------------------------------------------------------------
j = 0;
}
swprintf( temp_buffer, L"\nEnd of Array\n" );
wcscat( message_bufferw, temp_buffer );
SetWindowText( handle_to_display_window, message_bufferw );
}
return 0;
}
NB: When I pass in "incoming array", I type cast it as (void**) obviously.

When the data type changes but the algorithm doesn't, it's time to consider using templates.
template<class Element_Type>
print_array(Element_Type const * p_begin,
Element_Type const * p_end)
{
while (p_begin != p_end)
{
cout << *p_begin;
++p_begin;
}
}
The conversion from single dimension to multiple dimension is left as an exercise to the OP and readers.
Edit 1: Another alternative
At some point, the output function will need information about how to print the information you gave it.
One option is for you to write your own printf function that has format specifiers for the data you send it.
While another option is to pass a pointer to a function that prints the data.
The fundamental issue is that the output function needs to know how to print the data.
For C++, I suggest overriding operator<< in the class / structure. Since the class/structure knows the data, it can easily know how to print the data.

How do I cast a void pointer to a int[3]?

I need to call a 3rd party library and pass in an int[3] as a void * like this [works]:
int pattern[3] = {2,4,10};
if ( OSTaskCreate( BlinkLED,
( void * ) pattern,
( void * ) &BlinkTaskStack[USER_TASK_STK_SIZE],
( void * ) BlinkTaskStack,
MAIN_PRIO - 1 ) != OS_NO_ERR )
{
iprintf( "*** Error creating blink task\r\n" );
}
But now I need to parse a string to get the pattern array and I can't seem to get it right.
First I pass the string into the parser and get back the array:
int (&ParseBlinkOnCommand(char rxbuffer[3]))[3]
{
// Code parses rxbuffer and creates the 3 ints needed
int pattern[3] = {repeats, onTicks, offTicks};
return pattern;
}
Then I try to pass it to the OSTaskCreate just like I did before:
int pattern2[3] = ParseBlinkOnCommand(rxbuffer);
if ( OSTaskCreate( BlinkLED,
( void * ) pattern2,
( void * ) &BlinkTaskStack[USER_TASK_STK_SIZE],
( void * ) BlinkTaskStack,
MAIN_PRIO - 1 ) != OS_NO_ERR )
{
iprintf( "*** Error creating remote blink task\r\n" );
}
but I get the error 'array must be initialized with a brace-enclosed initializer'.
What is the right way to do this?

First, ParseBlinkOnCommand returns reference to local object and so return dangling reference.
Second C-array are not copyable, so int pattern2[3] = ParseBlinkOnCommand(rxbuffer); should be int (&pattern2)[3] = ParseBlinkOnCommand(rxbuffer);.
but why not using std::vector or std::array (or custom structure) ?
std::vector<int> ParseBlinkOnCommand(const char (&rxbuffer)[3])
{
// Code parses rxbuffer and creates the 3 ints needed
return {repeats, onTicks, offTicks};
}
And then
auto pattern2 = ParseBlinkOnCommand(rxbuffer);
if ( OSTaskCreate( BlinkLED,
pattern2.data(),
&BlinkTaskStack[USER_TASK_STK_SIZE],
BlinkTaskStack,
MAIN_PRIO - 1 ) != OS_NO_ERR )
{
iprintf( "*** Error creating remote blink task\r\n" );
}

Writing dataset of type H5T_ARRAY

I'm trying to write data in HDF5 using the C++ API.
I work on Windows XP 64 bits with Visual Studio 2010. I use version 1.8.9.
The target is set to X86 so I had to use the 32 bits version of HDF (to be honest, I'm very new to programming with Windows and VS and didn't configure the whole thing myself, so I'm really not sure it was the right choice).
My issue happens when trying to write a part of a dataset of type H5T_ARRAY.
The HDF5 file structure I want to achieve is a dataset of 4 dimensions (i1,i2,i3,i4), which datatype is : array of double with 2 dimensions (a1,a2).
Here is the DDL to sum it up :
HDF5 "result.h5" {
GROUP "/" {
DATASET "mydata" {
DATATYPE H5T_ARRAY { [a1][a2] H5T_IEEE_F64LE }
DATASPACE SIMPLE { ( i1,i2,i3,i4) / ( i1,i2,i3,i4 ) }
DATA { <my data> }
}
}
Due to my program structure, I write this dataset element by element, ie H5T_ARRAY by H5T_ARRAY.
I've defined a class OutputFile to manage all the HDF5 I/O. It contains these attributes :
H5::H5File *_H5fileHandle ; // HDF5 file
H5::DataSpace *_dataspaceHandle ; // Handle of the Dataspace of the datasets
int _dataspaceRank ; // Rank of the dataspace
H5::ArrayType *_datatypeHandle ; // Handle of the datatype of the datasets (= array of N dimensions)
int _datatypeRank ; // Rank of the datatype
H5::DataSet *_datasetHandle ; // Handle of the dataset
The file is open right at the beginning of the program, and all the handles (dataspace, datatype and dataset) are set then :
void OutputFile ::createFile(std::string filename,
std::vector<int> dsdims,
std::vector<int> adims,
std::vector<std::string> datasetName) {
_filename = filename ;
_H5fileHandle = new H5::H5File(_filename.c_str(), H5F_ACC_TRUNC);
// Defining the dataspace
_dataspaceRank = dsdims.size() ;
hsize_t *h5dsdims = new hsize_t[_dataspaceRank] ;
for (int iDim=0 ; iDim < _dataspaceRank ; iDim++) h5dsdims[iDim] = hsize_t(dsdims[iDim]) ;
_dataspaceHandle = new H5::DataSpace(_dataspaceRank, h5dsdims, NULL);
// Defining the datatype = array type
_datatypeRank = adims.size() ;
hsize_t *h5adims = new hsize_t[_datatypeRank] ;
for (int iDim=0 ; iDim < _datatypeRank ; iDim++) h5adims[iDim] = hsize_t(adims[iDim]) ;
_datatypeHandle = new H5::ArrayType(H5::PredType::IEEE_F64LE, _datatypeRank, h5adims);
// Creating the dataset
_datasetHandle = _H5fileHandle->createDataSet( _datasetName.c_str(),*_datatypeHandle, *_dataspaceHandle );
// Clean up
delete h5dsdims ;
delete h5adims ;
}
Then, I write the data each time I get an element ready (i.e a H5T_ARRAY) :
void OutputFile::writeMyData(double **Values, int *positionInDataSet) {
// set the element position
hsize_t position[1][4] ;
position[0][0] = hsize_t(positionInDataset[0]);
position[0][1] = hsize_t(positionInDataset[1]);
position[0][2] = hsize_t(positionInDataset[2]);
position[0][3] = hsize_t(positionInDataset[3]);
_fileDataspace->selectElements( H5S_SELECT_SET, 1, (const hsize_t *)position);
//Set the memory dataspace
hsize_t memdims[] = {1} ;
H5::DataSpace memspace(1, memdims, NULL);
// set the memory datatype
hsize_t memTypeRank = 2 ;
hsize_t *memTypedims = new hsize_t[memTypeRank] ;
for (int iDim=0 ; iDim < memTypeRank ; iDim++) memTypedims[iDim] = hsize_t(dataDims[iDim]) ;
H5::ArrayType memtypeHandle(H5::PredType::IEEE_F64LE, memTypeRank, memTypedims);
_datasetHandle->write(Values, memtypeHandle, memspace, *_dataspaceHandle);
_H5fileHandle->flush(H5F_SCOPE_GLOBAL) ;
}
The Values argument is allocated in the calling function, with size [a1][a2].
Unfortunately, it doesn't work properly. I get invalid data in my HDF5 file, and all elements are equal (meaning that all H5T_ARRAY contains the same values).
Exemple :
(0,0,0,0): [ 5.08271e-275, 5.08517e-275, -7.84591e+298, -2.53017e-098, 0, 2.18992e-303,
5.08094e-275, 0, 2.122e-314, -7.84591e+298, 5.08301e-275, 5.08652e-275,
-7.84591e+298, -2.53017e-098, 0, 2.18994e-303, 5.08116e-275, 0,
2.122e-314, -7.84591e+298, 5.08332e-275, 5.08683e-275, -7.84591e+298, -2.53017e-098,
0, 2.18995e-303, 5.08138e-275, 0, 2.122e-314, -7.84591e+298 ],
... and so on for every element.
For now, I have :
checked that the content of the "Value" array in writeMyData() is correct and contains valid data
checked that, if I only write one element, then this element, and only this one, contains invalid data in the HDF5 files (the other ones contain only zeroes)
used those additional type combinations, without success :
memType = NATIVE_DOUBLE, fileType = IEEE_64LE
memType = NATIVE_DOUBLE, fileType = NATIVE_DOUBLE
memType = IEEE_32LE, fileType = IEEE_32_LE
checked that double-value attributes are written correctly, using the type IEEE_F64LE
tried to close the file at the end of writeMyData(), and open it at the beginning, to force writing data on the disk. The results are the same.
passed &Values instead of Values in the call to DataSet::write() (the results are the same).
I'm a bit at my wits' end. I've found examples for partial I/0 of a dataset and others for array datatypes, but nothing for partial writing of array-type datasets.
I guess it's a memory issue, my feeling is that I do something wrong when passing the "Values" array to DataSet::write(), but I can't pinpoint the problem.
Thanks in advance for any pointers you have.

Retrieving data type information for columns in an Oracle OCCI ResultSet

After sending a simple query via OCCI (example: select * from ALL_USERS) I'm in the need to know the datatype for the column, for the moment I've been playing with the ResultSet::getColumnListMetaData() method without success.
Questions:
1. How can I get the datatype by using the aforementioned method and the MetaData class?
2. Is there any better documentation out there than the one already provided by oracle?

I've got this old code laying around, I guess it does exactly what you want. Its using OCI, not OCCI, but maybe it helps.
/* Get the number of columns in the query */
ub4 colCount = 0;
oraCheckErr( m_err, OCIAttrGet((dvoid *)_stmt, OCI_HTYPE_STMT, (dvoid *)&colCount,
0, OCI_ATTR_PARAM_COUNT, m_err));
ub2 oraType = 0;
OCIParam *col = 0;
ub4 nameLen, colWidth, charSemantics;
text *name;
for (ub4 i = 1; i <= colCount; i++)
{
/* get parameter for column i */
oraCheckErr( m_err, OCIParamGet((dvoid *)_stmt, OCI_HTYPE_STMT, m_err, (dvoid**)&col, i));
/* get data-type of column i */
oraType = 0;
oraCheckErr( m_err, OCIAttrGet((dvoid *)col, OCI_DTYPE_PARAM,
(dvoid *)&oraType, 0, OCI_ATTR_DATA_TYPE, m_err));
/* Retrieve the column name attribute */
nameLen = 0;
oraCheckErr( m_err, OCIAttrGet((dvoid*)col, OCI_DTYPE_PARAM,
(dvoid**) &name, &nameLen, OCI_ATTR_NAME, m_err ));
/* Retrieve the length semantics for the column */
charSemantics = 0;
oraCheckErr( m_err, OCIAttrGet((dvoid*)col, OCI_DTYPE_PARAM,
(dvoid*) &charSemantics,0, OCI_ATTR_CHAR_USED, m_err ));
colWidth = 0;
if (charSemantics)
/* Retrieve the column width in characters */
oraCheckErr( m_err, OCIAttrGet((dvoid*)col, OCI_DTYPE_PARAM,
(dvoid*) &colWidth, 0, OCI_ATTR_CHAR_SIZE, m_err ));
else
/* Retrieve the column width in bytes */
oraCheckErr( m_err, OCIAttrGet((dvoid*)col, OCI_DTYPE_PARAM,
(dvoid*) &colWidth,0, OCI_ATTR_DATA_SIZE, m_err ));
_elements.output.push_back( SQLElement( String(reinterpret_cast<char*>(name), nameLen), getSQLTypes( oraType ), i, colWidth ));
}
OCIHandleFree ( (dvoid*) _stmt, OCI_HTYPE_STMT );
EDIT: As per ypour request:
SQLTypes getSQLTypes(ub2 _oracleType)
{
switch( _oracleType )
{
case SQLT_INT:
return stInt;
case SQLT_FLT:
case SQLT_BDOUBLE:
return stDouble;
case SQLT_BFLOAT:
return stFloat;
case SQLT_ODT:
return stDate;
case SQLT_DATE:
case SQLT_TIMESTAMP:
case SQLT_TIMESTAMP_TZ:
case SQLT_TIMESTAMP_LTZ:
return stTimeStamp;
case SQLT_CHR:
case SQLT_NUM:
case SQLT_STR:
case SQLT_VCS:
default:
return stText;
}
}

You can use method:
MetaData::getInt(occi::MetaData::ATTR_DATA_TYPE);
and compare returned value with constants from enumeration of possible types which you can find in occiCommon.h:
enum Type { OCCI_SQLT_CHR=SQLT_CHR, OCCI_SQLT_NUM=SQLT_NUM ... }

Can't add a comment due to low reputation.
In case someone is interested in an OCCI example based on the answer by Pustovalov Dmitry.
auto results = statement->executeQuery(selectCommand);
auto columnMetaData = results->getColumnListMetaData();
while (results->next())
{
for ( size_t index = 0; index < columnMetaData.size(); ++index )
{
// Column Meta data is std::vector - zero based indexing while
// Oracle result-set getxyz() methods have one based indexing.
cout << "Column name: " << columnMetaData[index].getString(MetaData::ATTR_NAME) << endl;
switch(columnMetaData[index].getInt(MetaData::ATTR_DATA_TYPE))
{
case OCCI_SQLT_CHR:
cout << results->getString(index+1) << endl;
break;
case OCCI_SQLT_TIMESTAMP:
cout << results->getTimestamp(index+1).toText("YYYYMMDD HH24:MI:SS.FF", 0) << endl;
break;
}
}
}
More details available here

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

HDF5 :Create a Dataset with string - c++

Related

How to insert array of bytes in PostgreSQL table via libpq C++ API

A function to display contents of 1 or 2 dimensional array of any type

How do I cast a void pointer to a int[3]?

Writing dataset of type H5T_ARRAY

Retrieving data type information for columns in an Oracle OCCI ResultSet

Categories

Resources