Reading a string array HDF5 Attribute in C++ - c++

I have working C++ code that writes HDF5 data with the column names stored in an attribute. I can successfully read and process the data in Matlab, but am trying to create a C++ reader. It reads the data ok, but when I attempt the read the header, I only get the first column name.
A snippet of the attribute creation process looks like:
// Snip of working code during the creation/recording of a DataSet named mpcDset:
std::vector<std::string> lcFieldnames;
lcFieldnames.clear();
lcFieldnames.push_back("Field1");
lcFieldnames.push_back("Field2");
lcFieldnames.push_back("Field3");
uint lnMaxStringLen = 10;
uint lnNumFields = lcFieldnames.size();
char* lpnBuffer = new char[lnNumFields*lnMaxStringLen];
memset((void*)lpnBuffer,0,lnNumFields*lnMaxStringLen);
int lnCount = 0;
for (auto& lnIndex : lcFieldnames)
{
lnIndex.copy(lpnBuffer + (lnCount *
lnMaxStringLen), lnMaxStringLen -1);
lnCount++;
}
hsize_t lpnHwriteDims[] = { lnNumFields, lnMaxStringLen };
H5::DataSpace lcAdspace(2, lpnHwriteDims, NULL);
H5::Attribute lcAttr = mpcDset->createAttribute(
std::string("header"),
H5::PredType::NATIVE_CHAR, lcAdspace);
lcAdspace.close();
lcAttr.write(H5::PredType::NATIVE_CHAR, lpnBuffer);
lcAttr.close();
delete [] lpnBuffer;
The code in question looks like:
// In another program, given an opened DataSet named mpcDset:
H5::Attribute lcAttr = mpcDset.openAttribute("header");
H5::DataType lcType = lcAttr.getDataType();
hsize_t lnSize = lcAttr.getStorageSize();
char* lpnBuffer = new char[lnSize];
lcAttr.read(lcType, lpnBuffer);
for (uint i=0;i<lnSize; i++)
{
std::cout<<lpnBuffer[i];
}
std::cout<<std::endl;
delete [] lpnBuffer;
lcAttr.close();
lnSize is large enough for all three fields (through inspection), but only "Field1" is output. Any suggestions as to what I am doing wrong?

Personally, to create an attribute that it is a list of strings in C++ I do as follows (something similar):
This code will write an attribute that it is 3 strings, then it will read each of them.
#include "H5Cpp.h"
#ifndef H5_NO_NAMESPACE
using namespace H5;
#endif
#include <iostream>
#include <string>
#include <vector>
using std::string;
using std::vector;
using std::cout;
using std::endl;
int main(int argc, char *argv[])
{
//WRITE ATTRIBUTE
{
try
{
//Example:
//Suppose that in the HDF5 file: 'myH5file_forExample.h5' there is a dataset named 'channel001'
//In that dataset we will create an attribute named 'Column_Names_Attribute'
//That attribute is a list of strings, each string is of variable length.
//The data of the attribute.
vector<string> att_vector;
att_vector.push_back("ColName1");
att_vector.push_back("ColName2 more characters");
att_vector.push_back("ColName3");
//HDF5 FILE
H5::H5File m_h5File;
m_h5File = H5File("myH5file_forExample.h5", H5F_ACC_RDWR); //Open file for read and write
DataSet theDataSet = m_h5File.openDataSet("/channel001"); //Open dataset
H5Object * myObject = &theDataSet;
//DATASPACE
StrType str_type(PredType::C_S1, H5T_VARIABLE);
const int RANK = 1;
hsize_t dims[RANK];
dims[0] = att_vector.size(); //The attribute will have 3 strings
DataSpace att_datspc(RANK, dims);
//ATTRIBUTE
Attribute att(myObject->createAttribute("Column_Names_Attribute" , str_type, att_datspc));
//Convert the vector into a C string array.
//Because the input function ::write requires that.
vector<const char *> cStrArray;
for(int index = 0; index < att_vector.size(); ++index)
{
cStrArray.push_back(att_vector[index].c_str());
}
//WRITE DATA
//att_vector must not change during this operation
att.write(str_type, (void*)&cStrArray[0]);
}
catch(H5::Exception &e)
{
std::cout << "Error in the H5 file: " << e.getDetailMsg() << endl;
}
}
//READ ATTRIBUTE
{
try
{
//HDF5 FILE
H5::H5File m_h5File;
m_h5File = H5File("myH5file_forExample.h5", H5F_ACC_RDONLY); //Open file for read
DataSet theDataSet = m_h5File.openDataSet("/channel001"); //Open dataset
H5Object * myObject = &theDataSet;
//ATTRIBUTE
Attribute att(myObject->openAttribute("Column_Names_Attribute"));
// READ ATTRIBUTE
// Read Attribute DataType
DataType attDataType = att.getDataType();
// Read the Attribute DataSpace
DataSpace attDataSpace = att.getSpace();
// Read size of DataSpace
// Dimensions of the array. Since we are working with 1-D, this is just one number.
hsize_t dim = 0;
attDataSpace.getSimpleExtentDims(&dim); //The number of strings.
// Read the Attribute Data. Depends on the kind of data
switch(attDataType.getClass())
{
case H5T_STRING:
{
char **rdata = new char*[dim];
try
{
StrType str_type(PredType::C_S1, H5T_VARIABLE);
att.read(str_type,(void*)rdata);
for(int iStr=0; iStr<dim; ++iStr)
{
cout << rdata[iStr] << endl;
delete[] rdata[iStr];
}
delete[] rdata;
break;
}
catch(...)
{
for(int iStr=0; iStr<dim; ++iStr)
{
delete[] rdata[iStr];
}
delete[] rdata;
throw std::runtime_error("Error while reading attribute.");
}
throw std::runtime_error("Not valid rank.");
break;
}
case H5T_INTEGER:
{
break;
}
case H5T_FLOAT:
{
break;
}
default:
{
throw std::runtime_error("Not a valid datatype class.");
}
}
}
catch(H5::Exception &e)
{
std::cout << "Error in the H5 file: " << e.getDetailMsg() << endl;
}
catch(std::runtime_error &e)
{
std::cout << "Error in the execution: " << e.what() << endl;
}
}
return 0;
}
Result of the write operation, seen in the HDFview program:

Related

(C++) Fastest way possible for reading in matrix files (arbitrary size)

I'm developing a bioinformatic tool, which requires reading in millions of matrix files (average dimension = (20k, 20k)). They are tab-delimited text files, and they look something like:
0.53 0.11
0.24 0.33
Because the software reads the matrix files one at a time, memory is not an issue, but it's very slow. The following is my current function for reading in a matrix file. I first make a matrix object using a double pointer, then fill in the matrix by looping through an input file .
float** make_matrix(int nrow, int ncol, float val){
float** M = new float *[nrow];
for(int i = 0; i < nrow; i++) {
M[i] = new float[ncol];
for(int j = 0; j < ncol; j++) {
M[i][j] = val;
}
}
return M;
}
float** read_matrix(string fname, int dim_1, int dim_2){
float** K = make_matrix(dim_1, dim_2, 0);
ifstream ifile(fname);
for (int i = 0; i < dim_1; ++i) {
for (int j = 0; j < dim_2; ++j) {
ifile >> K[i][j];
}
}
ifile.clear();
ifile.seekg(0, ios::beg);
return K;
}
Is there a much faster way to do this? From my experience with python, reading in a matrix file using pandas is so much faster than using python for-loops. Is there a trick like that in c++?
(added)
Thanks so much everyone for all your suggestions and comments!
The fastest way, by far, is to change the way you write those files: write in binary format, two int first (width, height) then just dump your values.
You will be able to load it in just three read calls.
Just for fun, I measured the program posted above (using a 20,000x20,000 ASCII input file, as described) on my Mac Mini (3.2GHz i7 with SSD drive) and found that it took about 102 seconds to parse in the file using the posted code.
Then I wrote a version of the same function that uses the C stdio API (fopen()/fread()/fclose()) and does character-by-character parsing into a 1D float array. This implementation takes about 13 seconds to parse in the file on the same hardware, so it's about 7 times faster.
Both programs were compiled with g++ -O3 test_read_matrix.cpp.
float* faster_read_matrix(string fname, int numRows, int numCols)
{
FILE * fpIn = fopen(fname.c_str(), "r");
if (fpIn == NULL)
{
printf("Couldn't open file [%s] for input!\n", fname.c_str());
return NULL;
}
float* K = new float[numRows*numCols];
// We'll hold the current number in (numberBuf) until we're ready to parse it
char numberBuf[128] = {'\0'};
int numCharsInBuffer = 0;
int curRow = 0, curCol = 0;
while(curRow < numRows)
{
char tempBuf[4*1024]; // an arbitrary size
const size_t bytesRead = fread(tempBuf, 1, sizeof(tempBuf), fpIn);
if (bytesRead <= 0)
{
if (bytesRead < 0) perror("fread");
break;
}
for (size_t i=0; i<bytesRead; i++)
{
const char c = tempBuf[i];
if ((c=='.')||(c=='+')||(c=='-')||(isdigit(c)))
{
if ((numCharsInBuffer+1) < sizeof(numberBuf)) numberBuf[numCharsInBuffer++] = c;
else
{
printf("Error, number string was too long for numberBuf!\n");
}
}
else
{
if (numCharsInBuffer > 0)
{
// Parse the current number-chars we have assembled into (numberBuf) and reset (numberBuf) to empty
numberBuf[numCharsInBuffer] = '\0';
if (curCol < numCols) K[curRow*numCols+curCol] = strtod(numberBuf, NULL);
else
{
printf("Error, too many values in row %i! (Expected %i, found at least %i)\n", curRow, numCols, curCol);
}
curCol++;
}
numCharsInBuffer = 0;
if (c == '\n')
{
curRow++;
curCol = 0;
if (curRow >= numRows) break;
}
}
}
}
fclose(fpIn);
if (curRow != numRows) printf("Warning: I read %i lines in the file, but I expected there would be %i!\n", curRow, numRows);
return K;
}
I am dissatisfied with Jeremy Friesner’s otherwise excellent answer because it:
blames the problem to be with C++'s I/O system (which it is not)
fixes the problem by circumventing the actual I/O problem without being explicit about how it is a significant contributor to speed
modifies memory accesses which (may or may not) contribute to speed, and does so in a way that very large matrices may not be supported
The reason his code runs so much faster is because he removes the single most important bottleneck: unoptimized disk access. JWO’s original code can be brought to match with three extra lines of code:
float** read_matrix(std::string fname, int dim_1, int dim_2){
float** K = make_matrix(dim_1, dim_2, 0);
std::size_t buffer_size = 4*1024; // 1
char buffer[buffer_size]; // 2
std::ifstream ifile(fname);
ifile.rdbuf()->pubsetbuf(buffer, buffer_size); // 3
for (int i = 0; i < dim_1; ++i) {
for (int j = 0; j < dim_2; ++j) {
ss >> K[i][j];
}
}
// ifile.clear();
// ifile.seekg(0, std::ios::beg);
return K;
}
The addition exactly replicates Friesner’s design, but using the C++ library capabilities without all the extra programming grief on our end.
You’ll notice I also removed a couple lines at the bottom that should be inconsequential to program function and correctness, but which may cause a minor cumulative time issue as well. (If they are not inconsequential, that is a bug and should be fixed!)
How much difference this all makes depends entirely on the quality of the C++ Standard Library implementation. AFAIK the big three modern C++ compilers (MSVC, GCC, and Clang) all have sufficiently-optimized I/O handling to make the issue moot.
locale
One other thing that may also make a difference is to .imbue() the stream with the default "C" locale, which avoids a lot of special handling for numbers in locale-dependent formats other than what your files use. You only need to bother to do this if you have changed your global locale, though.
ifile.imbue(std::locale(""));
redundant initialization
Another thing that is killing your time is the effort to zero-initialize the array when you create it. Don’t do that if you don’t need it! (You don’t need it here because you know the total extents and will fill them properly. C++17 and later is nice enough to give you a zero value if the input stream goes bad, too. So you get zeros for unread values either way.)
dynamic memory block size
Finally, keeping memory accesses to an array of array should not significantly affect speed, but it still might be worth testing if you can change it. This is assuming that the resulting matrix will never be too large for the memory manager to return as a single block (and consequently crash your program).
A common design is to allocate the entire array as a single block, with the requested size plus size for the array of pointers to the rest of the block. This allows you to delete the array in a single delete[] statement. Again, I don’t believe this should be an optimization issue you need to care about until your profiler says so.
At the risk of the answer being considered incomplete (no code examples), I would like to add to the other answers additional options how to tackle the problem:
Use a binary format (width,height, values...) as file format and then use file mapping (MapViewOfFile() on Windows, mmap() or so on posix/unix systems).
Then, you can simply point your "matrix structure" pointer to the mapped address space and you are done. And in case, you do something like sparse access to the matrix, it can even save some real IO. If you always do full access to all elements of the matrix (no sparse matrices etc.), it is still quite elegant and probably faster than malloc/read.
Replacements for c++ iostream, which is known to be quite slow and should not be used for performance critical stuff:
Have a look at the {fmt} library, which has become quite popular in recent years and claims to be quite fast.
Back in the days, when I did a lot of numerics on large data sets, I always opted for binary files for storage. (It was back in the days, when the fastest CPU you get your hands on were the Pentium 1 (with the floating point bug :)). Back then, all was slower, memory was much more limited (we had MB not GB as units for RAM in our systems) and all in all, nearly 20 years have passed since.
So, as a refresher, I did write some code to show, how much faster than iostream and text files you can do if you do not have extra constraints (such as endianess of different cpus etc.).
So far, my little test only has an iostream and a binary file version with a) stdio fread() kind of loading and b) mmap(). Since I sit in front of a debian bullseye computer, my code uses linux specific stuff for the mmap() approach. To run it on Windows, you have to change a few lines of code and some includes.
Edit: I added a save function using {fmt} now as well.
Edit: I added a load function with stdio now as well.
Edit: To reduce memory workload, I reordered the code somewhat
and now only keep 2 matrix instances in memory at any given time.
The program does the following:
create a 20k x 20k matrix in ram (in a struct named Matrix_t). With random values, slowly generated by std::random.
Write the matrix with iostream to a text file.
Write the matrix with stdio to a binary file.
Create a new matrix textMatrix by loading its data from the text file.
Create a new matrix inMemoryMatrix by loading its data from the binary file with a few fread() calls.
mmap() the binary file and use it under the name mappedMatrix.
Compare each of the loaded matrices to the original randomMatrix to see if the round-trip worked.
Here the results I got on my machine after compiling this work of wonder with clang++ -O3 -o fmatio fast-matrix-io.cpp -lfmt:
./fmatio
creating random matrix (20k x 20k) (27.0775seconds)
the first 10 floating values in randomMatrix are:
57970.2 -365700 -986079 44657.8 826968 -506928 668277 398241 -828176 394645
saveMatrixAsText_IOSTREAM()
saving matrix with iostream. (192.749seconds)
saveMatrixAsText_FMT(mat0_fmt.txt)
saving matrix with {fmt}. (34.4932seconds)
saveMatrixAsBinary()
saving matrix into a binary file. (30.7591seconds)
loadMatrixFromText_IOSTREAM()
loading matrix from text file with iostream. (102.074seconds)
randomMatrix == textMatrix
comparing randomMatrix with textMatrix. (0.125328seconds)
loadMatrixFromText_STDIO(mat0_fmt.txt)
loading matrix from text file with stdio. (71.2746seconds)
randomMatrix == textMatrix
comparing randomMatrix with textMatrix (stdio). (0.124684seconds)
loadMatrixFromBinary(mat0.bin)
loading matrix from binary file into memory. (0.495685seconds)
randomMatrix == inMemoryMatrix
comparing randomMatrix with inMemoryMatrix. (0.124206seconds)
mapMatrixFromBinaryFile(mat0.bin)
mapping a view to a matrix in a binary file. (4.5883e-05seconds)
randomMatrix == mappedMatrix
comparing randomMatrix with mappedMatrix. (0.158459seconds)
And here is the code:
#include <cinttypes>
#include <memory>
#include <random>
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <chrono>
#include <limits>
#include <iomanip>
// includes for mmap()...
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <cstdio>
#include <cstdlib>
#include <unistd.h>
// includes for {fmt}...
#include <fmt/core.h>
#include <fmt/os.h>
struct StopWatch {
using Clock = std::chrono::high_resolution_clock;
using TimePoint =
std::chrono::time_point<Clock>;
using Duration =
std::chrono::duration<double>;
void start(const char* description) {
this->description = std::string(description);
tstart = Clock::now();
}
void stop() {
TimePoint tend = Clock::now();
Duration elapsed = tend - tstart;
std::cout << description << " (" << elapsed.count()
<< "seconds)" << std::endl;
}
TimePoint tstart;
std::string description;
};
struct Matrix_t {
uint32_t ncol;
uint32_t nrow;
float values[];
inline uint32_t to_index(uint32_t col, uint32_t row) const {
return ncol * row + col;
}
};
template <class Initializer>
Matrix_t *createMatrix
( uint32_t ncol,
uint32_t nrow,
Initializer initFn
) {
size_t nfloats = ncol*nrow;
size_t nbytes = UINTMAX_C(8) + nfloats * sizeof(float);
Matrix_t * result =
reinterpret_cast<Matrix_t*>(operator new(nbytes));
if (nullptr != result) {
result->ncol = ncol;
result->nrow = nrow;
for (uint32_t row = 0; row < nrow; row++) {
for (uint32_t col = 0; col < ncol; col++) {
result->values[result->to_index(col,row)] =
initFn(ncol,nrow,col,row);
}
}
}
return result;
}
void saveMatrixAsText_IOSTREAM(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsText_IOSTREAM()" << std::endl;
if (nullptr == matrix) {
std::cout << "cannot save matrix - no matrix!" << std::endl;
}
std::ofstream outFile(filePath);
if (outFile) {
outFile << matrix->ncol << " " << matrix->nrow << std::endl;
const auto defaultPrecision = outFile.precision();
outFile.precision
(std::numeric_limits<float>::max_digits10);
for (uint32_t row = 0; row < matrix->nrow; row++) {
for (uint32_t col = 0; col < matrix->ncol; col++) {
outFile << matrix->values[matrix->to_index(col,row)]
<< " ";
}
outFile << std::endl;
}
} else {
std::cout << "could not open " << filePath << " for writing."
<< std::endl;
}
}
void saveMatrixAsText_FMT(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsText_FMT(" << filePath << ")"
<< std::endl;
if (nullptr == matrix) {
std::cout << "cannot save matrix - no matrix!" << std::endl;
}
auto outFile = fmt::output_file(filePath);
outFile.print("{} {}\n", matrix->ncol, matrix->nrow);
for (uint32_t row = 0; row < matrix->nrow; row++) {
outFile.print("{}", matrix->values[matrix->to_index(0,row)]);
for (uint32_t col = 1; col < matrix->ncol; col++) {
outFile.print(" {}",
matrix->values[matrix->to_index(col,row)]);
}
outFile.print("\n");
}
}
void saveMatrixAsBinary(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsBinary()" << std::endl;
FILE * outFile = fopen(filePath, "wb");
if (nullptr != outFile) {
fwrite( &matrix->ncol, 4, 1, outFile);
fwrite( &matrix->nrow, 4, 1, outFile);
size_t nfloats = matrix->ncol * matrix->nrow;
fwrite( &matrix->values, sizeof(float), nfloats, outFile);
fclose(outFile);
} else {
std::cout << "could not open " << filePath << " for writing."
<< std::endl;
}
}
Matrix_t* loadMatrixFromText_IOSTREAM(const char* filePath) {
std::cout << "loadMatrixFromText_IOSTREAM()" << std::endl;
std::ifstream inFile(filePath);
if (inFile) {
uint32_t ncol;
uint32_t nrow;
inFile >> ncol;
inFile >> nrow;
uint32_t nfloats = ncol * nrow;
auto loader =
[&inFile]
(uint32_t , uint32_t , uint32_t , uint32_t )
-> float
{
float value;
inFile >> value;
return value;
};
Matrix_t * matrix = createMatrix( ncol, nrow, loader);
return matrix;
} else {
std::cout << "could not open " << filePath << "for reading."
<< std::endl;
}
return nullptr;
}
Matrix_t* loadMatrixFromText_STDIO(const char* filePath) {
std::cout << "loadMatrixFromText_STDIO(" << filePath << ")"
<< std::endl;
Matrix_t* matrix = nullptr;
FILE * inFile = fopen(filePath, "rt");
if (nullptr != inFile) {
uint32_t ncol;
uint32_t nrow;
fscanf(inFile, "%d %d", &ncol, &nrow);
auto loader =
[&inFile]
(uint32_t , uint32_t , uint32_t , uint32_t )
-> float
{
float value;
fscanf(inFile, "%f", &value);
return value;
};
matrix = createMatrix( ncol, nrow, loader);
fclose(inFile);
} else {
std::cout << "could not open " << filePath << "for reading."
<< std::endl;
}
return matrix;
}
Matrix_t* loadMatrixFromBinary(const char* filePath) {
std::cout << "loadMatrixFromBinary(" << filePath << ")"
<< std::endl;
FILE * inFile = fopen(filePath, "rb");
if (nullptr != inFile) {
uint32_t ncol;
uint32_t nrow;
fread( &ncol, 4, 1, inFile);
fread( &nrow, 4, 1, inFile);
uint32_t nfloats = ncol * nrow;
uint32_t nbytes = nfloats * sizeof(float) + UINT32_C(8);
Matrix_t* matrix =
reinterpret_cast<Matrix_t*>
(operator new (nbytes));
if (nullptr != matrix) {
matrix->ncol = ncol;
matrix->nrow = nrow;
fread( &matrix->values[0], sizeof(float), nfloats, inFile);
return matrix;
} else {
std::cout << "could not find memory for the matrix."
<< std::endl;
}
fclose(inFile);
} else {
std::cout << "could not open file "
<< filePath << " for reading." << std::endl;
}
return nullptr;
}
void freeMatrix(Matrix_t* matrix) {
operator delete(matrix);
}
Matrix_t* mapMatrixFromBinaryFile(const char* filePath) {
std::cout << "mapMatrixFromBinaryFile(" << filePath << ")"
<< std::endl;
Matrix_t * matrix = nullptr;
int fd = open( filePath, O_RDONLY);
if (-1 != fd) {
struct stat sb;
if (-1 != fstat(fd, &sb)) {
auto fileSize = sb.st_size;
matrix =
reinterpret_cast<Matrix_t*>
(mmap(nullptr, fileSize, PROT_READ, MAP_PRIVATE, fd, 0));
if (nullptr == matrix) {
std::cout << "mmap() failed!" << std::endl;
}
} else {
std::cout << "fstat() failed!" << std::endl;
}
close(fd);
} else {
std::cout << "open() failed!" << std::endl;
}
return matrix;
}
void unmapMatrix(Matrix_t* matrix) {
if (nullptr == matrix)
return;
size_t nbytes =
UINTMAX_C(8) +
sizeof(float) * matrix->ncol * matrix->nrow;
munmap(matrix, nbytes);
}
bool areMatricesEqual( const Matrix_t* m1, const Matrix_t* m2) {
if (nullptr == m1) return false;
if (nullptr == m2) return false;
if (m1->ncol != m2->ncol) return false;
if (m1->nrow != m2->nrow) return false;
// both exist and have same size...
size_t nfloats = m1->ncol * m1->nrow;
size_t nbytes = nfloats * sizeof(float);
return 0 == memcmp( m1->values, m2->values, nbytes);
}
int main(int argc, const char* argv[]) {
std::random_device rdev;
std::default_random_engine reng(rdev());
std::uniform_real_distribution<> rdist(-1.0E6F, 1.0E6F);
StopWatch sw;
auto randomInitFunction =
[&reng,&rdist]
(uint32_t ncol, uint32_t nrow, uint32_t col, uint32_t row)
-> float
{
return rdist(reng);
};
sw.start("creating random matrix (20k x 20k)");
Matrix_t * randomMatrix =
createMatrix(UINT32_C(20000),
UINT32_C(20000),
randomInitFunction);
sw.stop();
if (nullptr != randomMatrix) {
std::cout
<< "the first 10 floating values in randomMatrix are: "
<< std::endl;
std::cout << randomMatrix->values[0];
for (size_t i = 1; i < 10; i++) {
std::cout << " " << randomMatrix->values[i];
}
std::cout << std::endl;
sw.start("saving matrix with iostream.");
saveMatrixAsText_IOSTREAM("mat0_iostream.txt", randomMatrix);
sw.stop();
sw.start("saving matrix with {fmt}.");
saveMatrixAsText_FMT("mat0_fmt.txt", randomMatrix);
sw.stop();
sw.start("saving matrix into a binary file.");
saveMatrixAsBinary("mat0.bin", randomMatrix);
sw.stop();
sw.start("loading matrix from text file with iostream.");
Matrix_t* textMatrix =
loadMatrixFromText_IOSTREAM("mat0_iostream.txt");
sw.stop();
sw.start("comparing randomMatrix with textMatrix.");
if (!areMatricesEqual(randomMatrix, textMatrix)) {
std::cout << "randomMatrix != textMatrix!" << std::endl;
} else {
std::cout << "randomMatrix == textMatrix" << std::endl;
}
sw.stop();
freeMatrix(textMatrix);
textMatrix = nullptr;
sw.start("loading matrix from text file with stdio.");
textMatrix =
loadMatrixFromText_STDIO("mat0_fmt.txt");
sw.stop();
sw.start("comparing randomMatrix with textMatrix (stdio).");
if (!areMatricesEqual(randomMatrix, textMatrix)) {
std::cout << "randomMatrix != textMatrix!" << std::endl;
} else {
std::cout << "randomMatrix == textMatrix" << std::endl;
}
sw.stop();
freeMatrix(textMatrix);
textMatrix = nullptr;
sw.start("loading matrix from binary file into memory.");
Matrix_t* inMemoryMatrix =
loadMatrixFromBinary("mat0.bin");
sw.stop();
sw.start("comparing randomMatrix with inMemoryMatrix.");
if (!areMatricesEqual(randomMatrix, inMemoryMatrix)) {
std::cout << "randomMatrix != inMemoryMatrix!"
<< std::endl;
} else {
std::cout << "randomMatrix == inMemoryMatrix" << std::endl;
}
sw.stop();
freeMatrix(inMemoryMatrix);
inMemoryMatrix = nullptr;
sw.start("mapping a view to a matrix in a binary file.");
Matrix_t* mappedMatrix =
mapMatrixFromBinaryFile("mat0.bin");
sw.stop();
sw.start("comparing randomMatrix with mappedMatrix.");
if (!areMatricesEqual(randomMatrix, mappedMatrix)) {
std::cout << "randomMatrix != mappedMatrix!"
<< std::endl;
} else {
std::cout << "randomMatrix == mappedMatrix" << std::endl;
}
sw.stop();
unmapMatrix(mappedMatrix);
mappedMatrix = nullptr;
freeMatrix(randomMatrix);
} else {
std::cout << "could not create random matrix!" << std::endl;
}
return 0;
}
Please note, that binary formats where you simply cast to a struct pointer also depend on how the compiler does alignment and padding within structures. In my case, I was lucky and it worked. On other systems, you might have to tweak a little (#pragma pack(4) or something along that line) to make it work.

Unable to delete contents of dynamic array in C++

I've been beating my head against this one for awhile. In the deconstructor of my class, I have a for loop that is supposed to iterate through an array of objects and delete them. When I try though, I get a read access violation. The attached code is supposed to read info from two documents and use that to create Country objects.
#include "pch.h"
#include "CountryCatalogue.h"
#include "Country.h"
#include <iterator>
#include <map>
//imports for reading the files
#include <iostream>
#include <fstream>
CountryCatalogue::CountryCatalogue()
{
_maxSize = 10;
_curSize = 0;
_catalogue = new Country*[_maxSize];
}
CountryCatalogue::CountryCatalogue(std::string continentFileName, std::string countryFileName)
{
//block that opens the files and checks to make sure they can be read
//open up the files
std::ifstream inFile1;
std::ifstream inFile2;
//opening both text files and ensuring that the file is readable to the program
inFile1.open(continentFileName);
if (!inFile1) {
std::cout << "Unable to open file";
exit(1); // terminate with error
}
inFile2.open(countryFileName);
if (!inFile2) {
std::cout << "Unable to open file";
exit(1); // terminate with error
}
// read the continet file
// while there is still stuff to read in the file
std::string str;
while (!inFile1.eof())
{
std::string Country, Cont;
//reading lines from file and assigning to variables
std::getline(inFile1, Country);
std::getline(inFile1, Cont);
//mapping to variables read from file
_countryContinent.insert(std::pair<std::string, std::string>(Country, Cont));
_curSize++;
}
//closing file after use
inFile1.close();
//creating array
_catalogue = new Country*[_curSize+2];
//resetting size to zero for later itteration
_curSize = 0;
// read the country file
// while there is still stuff to read in the file
while (!inFile2.eof())
{
std::string name, POP, AREA;
int pop;
double area = 0.0;
std::getline(inFile2, name);
std::getline(inFile2, POP);
std::getline(inFile2, AREA);
if (!POP.empty() && POP[POP.length() - 1] == '\n') {
POP.erase(POP.length() - 1);
}
if (!AREA.empty() && AREA[AREA.length() - 1] == '\n') {
AREA.erase(AREA.length() - 1);
}
pop = std::stoi(POP);
area = std::stod(AREA);
//creating iterator to search through mapped values
std::map<std::string, std::string>::iterator it;
it = _countryContinent.find(name);
//creating empty string variable to store continent
std::string cont;
//using value found by iterator to make continent string
//ensuring value isn't the end valueof the map
if (it != _countryContinent.end()){
cont = it->second;
}
//std::cout << name << pop << area << cont << std::endl;
// add the country to the catalogue
addCountry(name, pop, area, cont);
}
}
CountryCatalogue::~CountryCatalogue() {
/*for (int i = 0; i < _curSize; i++){
delete _catalogue[i];
std::cout << "deleted" << i << std::endl;
}*/
delete[] _catalogue;
}
void CountryCatalogue::addCountry(std::string name, int pop, double area, std::string cont) {
//std::cout << name << pop << area << cont << std::endl;
//std::cout << _curSize << std::endl;
Country* toAdd = new Country(name, pop, area, cont);
if (_curSize == _maxSize) {
expandCapacity();
}
//adding country object to array
_catalogue[_curSize] = toAdd;
//adding to _curSize for next iteration
_curSize++;
}
void CountryCatalogue::printCountryCatalogue() {
std::string s;
/*for (int i = 0; i < _curSize; i++) {
s += _catalogue[i]->to_string() + "\n";
}*/
std::cout << _curSize << std::endl;
}
void CountryCatalogue::expandCapacity() {
//doubling array size
_maxSize = _maxSize * 2;
//creating pointer to new array of new size
Country** newCatalogue = new Country*[_maxSize];
//copying old array into new
for (int i = 0; i < _curSize; i++) {
newCatalogue[i] = _catalogue[i];
}
//deleting old array
delete[] _catalogue;
//making _catalogue point to newCatalogue
_catalogue = newCatalogue;
}
UPDATE:
What my code is supposed to do is get information from text files and create objects using that data. I am required to use an array instead of a vector. The code runs fine and I can create the country object. The issue is that I cannot add the created object to the _catalogue array, as I cannot delete it afterwards. When I attempt to iterate through the array, I receive a message saying Heap Corruption was detected.
Your problem is due to this line
_catalogue = new Country*[_curSize+2];
in the second constructor. You have forgotten to update _maxSize so you have a mismatch between _maxSize and the real allocated amount of memory.
Try:
_maxSize = _curSize+2;
_catalogue = new Country*[_maxSize];
You created _catalogue as a dynamic array.
To release the memory allocated for arrays of elements using new TYPE[SIZE] the syntax is:
delete[] _catalogue;
Loop is Needed for deleting memory allocated for Matrix elements. For example
int matrix = new int[rows][cols];
for (int i = 0; i < rows; ++i)
delete [] matrix[i];
The array is deleted row by row.

Runtime Error in HDF5 file manipulation

I was trying a program where I'll convert an array of structures to byte array and then save them to hdf5 dataset multiple times. (Dataset has dimension of 100, so Ill do the write operation 100 times). I dont have any problems in converting structure to byte array , I seem to run into problem when I try to select the hyperslab where I need to write data in the dataset. I am new to hdf5. Please help me with this problem.
#include "stdafx.h"
#include "h5cpp.h"
#include <iostream>
#include <conio.h>
#include <string>
#ifndef H5_NO_NAMESPACE
using namespace H5;
#endif
using std::cout;
using std::cin;
using std::string;
const H5std_string fName( "dset.h5" );
const H5std_string dsName( "dset" );
struct MyStruct
{
int x[1000],y[1000];
double z[1000];
};
int main()
{
try
{
MyStruct obj[10];
char* totalData;
char* inData;
hsize_t offset[1],count[1];
H5File file("sample.h5", H5F_ACC_TRUNC);
StrType type(PredType::C_S1,100*sizeof(obj));
Group *myGroup = new Group(file.createGroup("\\myGroup"));
hsize_t dim[] = {100};
DataSpace dSpace(1,dim);
DataSet dSet = myGroup->createDataSet("dSet", type, dSpace);
for(int m = 0; m < 100 ; m++)
{
for(int j = 0 ; j < 10 ; j++)
{
for(int i = 0 ; i < 1000 ; i++) // some random values stored
{
obj[j].x[i] = i*13 + i*19;
obj[j].y[i] = i*37 - i*18;
obj[j].z[i] = (i + 1) / (0.4 * i);
}
}
totalData = new char[sizeof(obj)]; // converting struct to byte array
memcpy(totalData, &obj, sizeof(obj));
cout<<"Start Write.\n";
cout<<"Total Size : "<<sizeof(obj)/1000<<"KB\n";
//Exception::dontPrint();
hsize_t dim[] = { 1 }; //I think am screwing up between this line and following 5 lines
DataSpace memSpace(1, dim);
offset[0] = m;
count[0] = 1;
dSpace.selectHyperslab(H5S_SELECT_SET, count, offset);
dSet.write(totalData, type, memSpace, dSpace);
cout<<"Write Done.\n";
cout<<"Read Start.\n";
inData = new char[sizeof(obj)];
dSet.read(inData, type);
cout<<"Read Done\n";
}
delete myGroup;
}
catch(Exception e)
{
e.printError();
}
_getch();
return 0;
}
The Output I get is,
And when I use H5S_SELECT_APPEND instead of H5S_SELECT_SET, the output says
Start Write.
Total Size : 160KB
HDF5-DIAG: Error detected in HDF5 (1.8.12) thread 0:
#000: ..\..\src\H5Shyper.c line 6611 in H5Sselect_hyperslab(): unable to set hyperslab selection
major: Dataspace
minor: Unable to initialize object
#001: ..\..\src\H5Shyper.c line 6477 in H5S_select_hyperslab(): invalid selection operation
major: Invalid arguments to routine
minor: Feature is unsupported
Please, help me with this situation. Thanks in advance..
The main problem is the size of your type datatype. It should be sizeof(obj) and not 100*sizeof(obj).
And anyway, you shouldn't be using a string datatype but an opaque datatype since that's what it is, so you can replace this whole line by:
DataType type(H5T_OPAQUE, sizeof(obj));
The second problem is in the read. Either you read everything and you need to make sure inData is big enough, that is 100*sizeof(obj) instead of sizeof(obj), or you need to select just the element you want to read just like for the write.

C++ Returning results from several threads into an array

I've a pattern-matching program which takes as input a string and returns a string closely matched by a dictionary. Since the algorithm takes several seconds to run one match query, I am attempting to use multi-threading to run batch queries.
I first read in a file containing a list of queries and for each query dispatch a new thread to perform the matching algorithm, returning the results into an array using pthread_join.
However, I'm getting some inconsistent results. For example, if my query file contains the terms "red, green, blue", I may receive "red, green, green" as the result. Another run may generate the correct "red, green, blue" result. It appears to sometimes be writing over the result in the array, but why would this happen since the array value is set according to the thread id?
Dictionary dict; // global, which performs the matching algorithm
void *match_worker(void *arg) {
char* temp = (char *)arg;
string strTemp(temp);
string result = dict.match(strTemp);
return (void *)(result.c_str());
}
void run(const string& queryFilename) {
// read in query file
vector<string> queries;
ifstream inquery(queryFilename.c_str());
string line;
while (getline(inquery, line)) {
queries.push_back(line);
}
inquery.close();
pthread_t threads[queries.size()];
void *results[queries.size()];
int rc;
size_t i;
for (i = 0; i < queries.size(); i++) {
rc = pthread_create(&threads[i], NULL, match_worker, (void *)(queries[i].c_str()));
if (rc) {
cout << "Failed pthread_create" << endl;
exit(1);
}
}
for (i = 0; i < queries.size(); i++) {
rc = pthread_join(threads[i], &results[i]);
if (rc) {
cout << "Failed pthread_join" << endl;
exit(1);
}
}
for (i = 0; i < queries.size(); i++) {
cout << (char *)results[i] << endl;
}
}
int main(int argc, char* argv[]) {
string queryFilename = arg[1];
dict.init();
run(queryFilename);
return 0;
}
Edit: As suggested by Zac, I modified the thread to explicitly put the result on the heap:
void *match_worker(void *arg) {
char* temp = (char *)arg;
string strTemp(temp);
int numResults = 1;
cout << "perform match for " << strTemp << endl;
string result = dict.match(strTemp, numResults);
string* tmpResult = new string(result);
return (void *)((*tmpResult).c_str());
}
Although, in this case, where would I put the delete calls? If I try putting the following at the end of the run() function it gives an invalid pointer error.
for (i = 0; i < queries.size(); i++) {
delete (char*)results[i];
}
Without debugging it, my guess is that it has something to do with the following:
void *match_worker(void *arg)
{
char* temp = (char *)arg;
string strTemp(temp);
string result = dict.match(strTemp); // create an automatic
return (void *)(result.c_str()); // return the automatic ... but it gets destructed right after this!
}
So when the next thread runs, it writes over the same memory location you are pointing to (by chance), and you are inserting the same value twice (not writing over it).
You should put the result on the heap to ensure it does not get destroyed between the time your thread exits and you store it in your main thread.
With your edit, you are trying to mix things up a bit too much. I've fixed it below:
void *match_worker(void *arg)
{
char* temp = (char *)arg;
string strTemp(temp);
int numResults = 1;
cout << "perform match for " << strTemp << endl;
string result = dict.match(strTemp, numResults);
string* tmpResult = new string(result);
return (void *)(tmpResult); // just return the pointer to the std::string object
}
Declare results as
// this shouldn't compile
//void* results[queries.size()];
std::string** results = new std::string[queries.size()];
for (int i = 0; i < queries.size(); ++i)
{
results[i] = NULL; // initialize pointers in the array
}
When you clean up the memory:
for (i = 0; i < queries.size(); i++)
{
delete results[i];
}
delete [] results; // delete the results array
That said, you would have a much easier time if you used the C++11 threading templates instead of mixing the C pthread library and C++.
The problem is caused by the lifetime of the local variable result and the data returned by the member function result.c_str(). You make this task unnecessary difficult by mixing C with C++. Consider using C++11 and its threading library. It makes the task much easier:
std::string match_worker(const std::string& query);
void run(const std::vector<std::string>& queries)
{
std::vector<std::future<std::string>> results;
results.reserve(queries.size());
for (auto& query : queries)
results.emplace_back(
std::async(std::launch::async, match_worker, query));
for (auto& result : results)
std::cout << result.get() << '\n';
}

Insert an array of tables into one table SQLite C/C++

I made my own database format, and it sadly required too much memory and the size of it got horrendous and upkeep was horrible.
So I'm looking for a way to store an array of a struct that's in an object into a table.
I'm guessing I need to use a blob, but all other options are welcome. An easy way to implement a blob would be helpful as well.
I've attached my saving code and related structures(Updated from my horrible post earlier)
#include "stdafx.h"
#include <string>
#include <stdio.h>
#include <vector>
#include "sqlite3.h"
using namespace std;
struct PriceEntry{
float cardPrice;
string PriceDate;
int Edition;
int Rarity;
};
struct cardEntry{
string cardName;
long pesize;
long gsize;
vector<PriceEntry> cardPrices;
float vThreshold;
int fav;
};
vector<cardEntry> Cards;
void FillCards(){
int i=0;
int j=0;
char z[32]={0};
for(j=0;j<3;j++){
cardEntry tmpStruct;
sprintf(z, "Card Name: %d" , i);
tmpStruct.cardName=z;
tmpStruct.vThreshold=1.00;
tmpStruct.gsize=0;
tmpStruct.fav=1;
for(i=0;i<3;i++){
PriceEntry ss;
ss.cardPrice=i+1;
ss.Edition=i;
ss.Rarity=i-1;
sprintf(z,"This is struct %d", i);
ss.PriceDate=z;
tmpStruct.cardPrices.push_back(ss);
}
tmpStruct.pesize=tmpStruct.cardPrices.size();
Cards.push_back(tmpStruct);
}
}
int SaveCards(){
// Create an int variable for storing the return code for each call
int retval;
int CardCounter=0;
int PriceEntries=0;
char tmpQuery[256]={0};
int q_cnt = 5,q_size = 256;
sqlite3_stmt *stmt;
sqlite3 *handle;
retval = sqlite3_open("sampledb.sqlite3",&handle);
if(retval)
{
printf("Database connection failed\n");
return -1;
}
printf("Connection successful\n");
//char create_table[100] = "CREATE TABLE IF NOT EXISTS users (uname TEXT PRIMARY KEY,pass TEXT NOT NULL,activated INTEGER)";
char create_table[] = "CREATE TABLE IF NOT EXISTS Cards (CardName TEXT, PriceNum NUMERIC, Threshold NUMERIC, Fav NUMERIC);";
retval = sqlite3_exec(handle,create_table,0,0,0);
printf( "could not prepare statemnt: %s\n", sqlite3_errmsg(handle) );
for(CardCounter=0;CardCounter<Cards.size();CardCounter++){
char Query[512]={0};
for(PriceEntries=0;PriceEntries<Cards[CardCounter].cardPrices.size();PriceEntries++){
//Here is where I need to find out the process of storing the vector of PriceEntry for Cards then I can modify this loop to process the data
}
sprintf(Query,"INSERT INTO Cards VALUES('%s', %d, %f, %d)",
Cards[CardCounter].cardName.c_str(),
Cards[CardCounter].pesize,
Cards[CardCounter].vThreshold,
Cards[CardCounter].fav); //My insert command
retval = sqlite3_exec(handle,Query,0,0,0);
if(retval){
printf( "Could not prepare statement: %s\n", sqlite3_errmsg(handle) );
}
}
// Insert first row and second row
sqlite3_close(handle);
return 0;
}
I tried googling but my results didn't suffice.
You have two types here: Cards and PriceEntries. And for each Card there can be many PriceEntries.
You can store Cards in one table, one Card per row. But you're puzzled about how to store the PriceEntries, right?
What you'd normally do here is have a second table for PriceEntries, keyed off a unique column (or columns) of the Cards table. I guess the CardName is unique to each card? Let's go with that. So your PriceEntry table would have a column CardName, followed by columns of PriceEntry information. You'll have a row for each PriceEntry, even if there are duplicates in the CardName column.
The PriceEntry table might look like:
CardName | Some PE value | Some other PE value
Ace | 1 | 1
Ace | 1 | 5
2 | 2 | 3
and so on. So when you want to find the array of PriceEntries for a card, you'd do
select * from PriceEntry where CardName = 'Ace'
And from the example data above you'd get back 2 rows, which you could shove into an array (if you wanted to).
No need for BLOBs!
This is a simple serialization and deserialization system. The class PriceEntry has been extended with serialization support (very simply). Now all you have to do is serialize a PriceEntry (or a set of them) to binary data and store it in a blob column. Later on, you get the blob data and from that deserialize a new PriceEntry with the same values. An example of how it is used is given at the bottom. Enjoy.
#include <iostream>
#include <vector>
#include <string>
#include <cstring> // for memcpy
using std::vector;
using std::string;
// deserialization archive
struct iarchive
{
explicit iarchive(vector<unsigned char> data)
: _data(data)
, _cursor(0)
{}
void read(float& v) { read_var(v); }
void read(int& v) { read_var(v); }
void read(size_t& v) { read_var(v); }
void read(string& v) { read_string(v); }
vector<unsigned char> data() { return _data; }
private:
template <typename T>
void read_var(T& v)
{
// todo: check that the cursor will not be past-the-end after the operation
// read the binary data
std::memcpy(reinterpret_cast<void*>(&v), reinterpret_cast<const void*>(&_data[_cursor]), sizeof(T));
// advance the cursor
_cursor += sizeof(T);
}
inline
void
read_string(string& v)
{
// get the array size
size_t sz;
read_var(sz);
// get alignment padding
size_t padding = sz % 4;
if (padding == 1) padding = 3;
else if (padding == 3) padding = 1;
// todo: check that the cursor will not be past-the-end after the operation
// resize the string
v.resize(sz);
// read the binary data
std::memcpy(reinterpret_cast<void*>(&v[0]), reinterpret_cast<const void*>(&_data[_cursor]), sz);
// advance the cursor
_cursor += sz + padding;
}
vector<unsigned char> _data; // archive data
size_t _cursor; // current position in the data
};
// serialization archive
struct oarchive
{
void write(float v) { write_var(v); }
void write(int v) { write_var(v); }
void write(size_t v) { write_var(v); }
void write(const string& v) { write_string(v); }
vector<unsigned char> data() { return _data; }
private:
template <typename T>
void write_var(const T& v)
{
// record the current data size
size_t s(_data.size());
// enlarge the data
_data.resize(s + sizeof(T));
// store the binary data
std::memcpy(reinterpret_cast<void*>(&_data[s]), reinterpret_cast<const void*>(&v), sizeof(T));
}
void write_string(const string& v)
{
// write the string size
write(v.size());
// get alignment padding
size_t padding = v.size() % 4;
if (padding == 1) padding = 3;
else if (padding == 3) padding = 1;
// record the data size
size_t s(_data.size());
// enlarge the data
_data.resize(s + v.size() + padding);
// store the binary data
std::memcpy(reinterpret_cast<void*>(&_data[s]), reinterpret_cast<const void*>(&v[0]), v.size());
}
vector<unsigned char> _data; /// archive data
};
struct PriceEntry
{
PriceEntry()
{}
PriceEntry(iarchive& in) // <<< deserialization support
{
in.read(cardPrice);
in.read(PriceDate);
in.read(Edition);
in.read(Rarity);
}
void save(oarchive& out) const // <<< serialization support
{
out.write(cardPrice);
out.write(PriceDate);
out.write(Edition);
out.write(Rarity);
}
float cardPrice;
string PriceDate;
int Edition;
int Rarity;
};
int main()
{
// create a PriceEntry
PriceEntry x;
x.cardPrice = 1;
x.PriceDate = "hi";
x.Edition = 3;
x.Rarity = 0;
// serialize it
oarchive out;
x.save(out);
// create a deserializer archive, from serialized data
iarchive in(out.data());
// deserialize a PriceEntry
PriceEntry y(in);
std::cout << y.cardPrice << std::endl;
std::cout << y.PriceDate << std::endl;
std::cout << y.Edition << std::endl;
std::cout << y.Rarity << std::endl;
}