I'm developing a bioinformatic tool, which requires reading in millions of matrix files (average dimension = (20k, 20k)). They are tab-delimited text files, and they look something like:
0.53 0.11
0.24 0.33
Because the software reads the matrix files one at a time, memory is not an issue, but it's very slow. The following is my current function for reading in a matrix file. I first make a matrix object using a double pointer, then fill in the matrix by looping through an input file .
float** make_matrix(int nrow, int ncol, float val){
float** M = new float *[nrow];
for(int i = 0; i < nrow; i++) {
M[i] = new float[ncol];
for(int j = 0; j < ncol; j++) {
M[i][j] = val;
}
}
return M;
}
float** read_matrix(string fname, int dim_1, int dim_2){
float** K = make_matrix(dim_1, dim_2, 0);
ifstream ifile(fname);
for (int i = 0; i < dim_1; ++i) {
for (int j = 0; j < dim_2; ++j) {
ifile >> K[i][j];
}
}
ifile.clear();
ifile.seekg(0, ios::beg);
return K;
}
Is there a much faster way to do this? From my experience with python, reading in a matrix file using pandas is so much faster than using python for-loops. Is there a trick like that in c++?
(added)
Thanks so much everyone for all your suggestions and comments!
The fastest way, by far, is to change the way you write those files: write in binary format, two int first (width, height) then just dump your values.
You will be able to load it in just three read calls.
Just for fun, I measured the program posted above (using a 20,000x20,000 ASCII input file, as described) on my Mac Mini (3.2GHz i7 with SSD drive) and found that it took about 102 seconds to parse in the file using the posted code.
Then I wrote a version of the same function that uses the C stdio API (fopen()/fread()/fclose()) and does character-by-character parsing into a 1D float array. This implementation takes about 13 seconds to parse in the file on the same hardware, so it's about 7 times faster.
Both programs were compiled with g++ -O3 test_read_matrix.cpp.
float* faster_read_matrix(string fname, int numRows, int numCols)
{
FILE * fpIn = fopen(fname.c_str(), "r");
if (fpIn == NULL)
{
printf("Couldn't open file [%s] for input!\n", fname.c_str());
return NULL;
}
float* K = new float[numRows*numCols];
// We'll hold the current number in (numberBuf) until we're ready to parse it
char numberBuf[128] = {'\0'};
int numCharsInBuffer = 0;
int curRow = 0, curCol = 0;
while(curRow < numRows)
{
char tempBuf[4*1024]; // an arbitrary size
const size_t bytesRead = fread(tempBuf, 1, sizeof(tempBuf), fpIn);
if (bytesRead <= 0)
{
if (bytesRead < 0) perror("fread");
break;
}
for (size_t i=0; i<bytesRead; i++)
{
const char c = tempBuf[i];
if ((c=='.')||(c=='+')||(c=='-')||(isdigit(c)))
{
if ((numCharsInBuffer+1) < sizeof(numberBuf)) numberBuf[numCharsInBuffer++] = c;
else
{
printf("Error, number string was too long for numberBuf!\n");
}
}
else
{
if (numCharsInBuffer > 0)
{
// Parse the current number-chars we have assembled into (numberBuf) and reset (numberBuf) to empty
numberBuf[numCharsInBuffer] = '\0';
if (curCol < numCols) K[curRow*numCols+curCol] = strtod(numberBuf, NULL);
else
{
printf("Error, too many values in row %i! (Expected %i, found at least %i)\n", curRow, numCols, curCol);
}
curCol++;
}
numCharsInBuffer = 0;
if (c == '\n')
{
curRow++;
curCol = 0;
if (curRow >= numRows) break;
}
}
}
}
fclose(fpIn);
if (curRow != numRows) printf("Warning: I read %i lines in the file, but I expected there would be %i!\n", curRow, numRows);
return K;
}
I am dissatisfied with Jeremy Friesner’s otherwise excellent answer because it:
blames the problem to be with C++'s I/O system (which it is not)
fixes the problem by circumventing the actual I/O problem without being explicit about how it is a significant contributor to speed
modifies memory accesses which (may or may not) contribute to speed, and does so in a way that very large matrices may not be supported
The reason his code runs so much faster is because he removes the single most important bottleneck: unoptimized disk access. JWO’s original code can be brought to match with three extra lines of code:
float** read_matrix(std::string fname, int dim_1, int dim_2){
float** K = make_matrix(dim_1, dim_2, 0);
std::size_t buffer_size = 4*1024; // 1
char buffer[buffer_size]; // 2
std::ifstream ifile(fname);
ifile.rdbuf()->pubsetbuf(buffer, buffer_size); // 3
for (int i = 0; i < dim_1; ++i) {
for (int j = 0; j < dim_2; ++j) {
ss >> K[i][j];
}
}
// ifile.clear();
// ifile.seekg(0, std::ios::beg);
return K;
}
The addition exactly replicates Friesner’s design, but using the C++ library capabilities without all the extra programming grief on our end.
You’ll notice I also removed a couple lines at the bottom that should be inconsequential to program function and correctness, but which may cause a minor cumulative time issue as well. (If they are not inconsequential, that is a bug and should be fixed!)
How much difference this all makes depends entirely on the quality of the C++ Standard Library implementation. AFAIK the big three modern C++ compilers (MSVC, GCC, and Clang) all have sufficiently-optimized I/O handling to make the issue moot.
locale
One other thing that may also make a difference is to .imbue() the stream with the default "C" locale, which avoids a lot of special handling for numbers in locale-dependent formats other than what your files use. You only need to bother to do this if you have changed your global locale, though.
ifile.imbue(std::locale(""));
redundant initialization
Another thing that is killing your time is the effort to zero-initialize the array when you create it. Don’t do that if you don’t need it! (You don’t need it here because you know the total extents and will fill them properly. C++17 and later is nice enough to give you a zero value if the input stream goes bad, too. So you get zeros for unread values either way.)
dynamic memory block size
Finally, keeping memory accesses to an array of array should not significantly affect speed, but it still might be worth testing if you can change it. This is assuming that the resulting matrix will never be too large for the memory manager to return as a single block (and consequently crash your program).
A common design is to allocate the entire array as a single block, with the requested size plus size for the array of pointers to the rest of the block. This allows you to delete the array in a single delete[] statement. Again, I don’t believe this should be an optimization issue you need to care about until your profiler says so.
At the risk of the answer being considered incomplete (no code examples), I would like to add to the other answers additional options how to tackle the problem:
Use a binary format (width,height, values...) as file format and then use file mapping (MapViewOfFile() on Windows, mmap() or so on posix/unix systems).
Then, you can simply point your "matrix structure" pointer to the mapped address space and you are done. And in case, you do something like sparse access to the matrix, it can even save some real IO. If you always do full access to all elements of the matrix (no sparse matrices etc.), it is still quite elegant and probably faster than malloc/read.
Replacements for c++ iostream, which is known to be quite slow and should not be used for performance critical stuff:
Have a look at the {fmt} library, which has become quite popular in recent years and claims to be quite fast.
Back in the days, when I did a lot of numerics on large data sets, I always opted for binary files for storage. (It was back in the days, when the fastest CPU you get your hands on were the Pentium 1 (with the floating point bug :)). Back then, all was slower, memory was much more limited (we had MB not GB as units for RAM in our systems) and all in all, nearly 20 years have passed since.
So, as a refresher, I did write some code to show, how much faster than iostream and text files you can do if you do not have extra constraints (such as endianess of different cpus etc.).
So far, my little test only has an iostream and a binary file version with a) stdio fread() kind of loading and b) mmap(). Since I sit in front of a debian bullseye computer, my code uses linux specific stuff for the mmap() approach. To run it on Windows, you have to change a few lines of code and some includes.
Edit: I added a save function using {fmt} now as well.
Edit: I added a load function with stdio now as well.
Edit: To reduce memory workload, I reordered the code somewhat
and now only keep 2 matrix instances in memory at any given time.
The program does the following:
create a 20k x 20k matrix in ram (in a struct named Matrix_t). With random values, slowly generated by std::random.
Write the matrix with iostream to a text file.
Write the matrix with stdio to a binary file.
Create a new matrix textMatrix by loading its data from the text file.
Create a new matrix inMemoryMatrix by loading its data from the binary file with a few fread() calls.
mmap() the binary file and use it under the name mappedMatrix.
Compare each of the loaded matrices to the original randomMatrix to see if the round-trip worked.
Here the results I got on my machine after compiling this work of wonder with clang++ -O3 -o fmatio fast-matrix-io.cpp -lfmt:
./fmatio
creating random matrix (20k x 20k) (27.0775seconds)
the first 10 floating values in randomMatrix are:
57970.2 -365700 -986079 44657.8 826968 -506928 668277 398241 -828176 394645
saveMatrixAsText_IOSTREAM()
saving matrix with iostream. (192.749seconds)
saveMatrixAsText_FMT(mat0_fmt.txt)
saving matrix with {fmt}. (34.4932seconds)
saveMatrixAsBinary()
saving matrix into a binary file. (30.7591seconds)
loadMatrixFromText_IOSTREAM()
loading matrix from text file with iostream. (102.074seconds)
randomMatrix == textMatrix
comparing randomMatrix with textMatrix. (0.125328seconds)
loadMatrixFromText_STDIO(mat0_fmt.txt)
loading matrix from text file with stdio. (71.2746seconds)
randomMatrix == textMatrix
comparing randomMatrix with textMatrix (stdio). (0.124684seconds)
loadMatrixFromBinary(mat0.bin)
loading matrix from binary file into memory. (0.495685seconds)
randomMatrix == inMemoryMatrix
comparing randomMatrix with inMemoryMatrix. (0.124206seconds)
mapMatrixFromBinaryFile(mat0.bin)
mapping a view to a matrix in a binary file. (4.5883e-05seconds)
randomMatrix == mappedMatrix
comparing randomMatrix with mappedMatrix. (0.158459seconds)
And here is the code:
#include <cinttypes>
#include <memory>
#include <random>
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <chrono>
#include <limits>
#include <iomanip>
// includes for mmap()...
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <cstdio>
#include <cstdlib>
#include <unistd.h>
// includes for {fmt}...
#include <fmt/core.h>
#include <fmt/os.h>
struct StopWatch {
using Clock = std::chrono::high_resolution_clock;
using TimePoint =
std::chrono::time_point<Clock>;
using Duration =
std::chrono::duration<double>;
void start(const char* description) {
this->description = std::string(description);
tstart = Clock::now();
}
void stop() {
TimePoint tend = Clock::now();
Duration elapsed = tend - tstart;
std::cout << description << " (" << elapsed.count()
<< "seconds)" << std::endl;
}
TimePoint tstart;
std::string description;
};
struct Matrix_t {
uint32_t ncol;
uint32_t nrow;
float values[];
inline uint32_t to_index(uint32_t col, uint32_t row) const {
return ncol * row + col;
}
};
template <class Initializer>
Matrix_t *createMatrix
( uint32_t ncol,
uint32_t nrow,
Initializer initFn
) {
size_t nfloats = ncol*nrow;
size_t nbytes = UINTMAX_C(8) + nfloats * sizeof(float);
Matrix_t * result =
reinterpret_cast<Matrix_t*>(operator new(nbytes));
if (nullptr != result) {
result->ncol = ncol;
result->nrow = nrow;
for (uint32_t row = 0; row < nrow; row++) {
for (uint32_t col = 0; col < ncol; col++) {
result->values[result->to_index(col,row)] =
initFn(ncol,nrow,col,row);
}
}
}
return result;
}
void saveMatrixAsText_IOSTREAM(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsText_IOSTREAM()" << std::endl;
if (nullptr == matrix) {
std::cout << "cannot save matrix - no matrix!" << std::endl;
}
std::ofstream outFile(filePath);
if (outFile) {
outFile << matrix->ncol << " " << matrix->nrow << std::endl;
const auto defaultPrecision = outFile.precision();
outFile.precision
(std::numeric_limits<float>::max_digits10);
for (uint32_t row = 0; row < matrix->nrow; row++) {
for (uint32_t col = 0; col < matrix->ncol; col++) {
outFile << matrix->values[matrix->to_index(col,row)]
<< " ";
}
outFile << std::endl;
}
} else {
std::cout << "could not open " << filePath << " for writing."
<< std::endl;
}
}
void saveMatrixAsText_FMT(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsText_FMT(" << filePath << ")"
<< std::endl;
if (nullptr == matrix) {
std::cout << "cannot save matrix - no matrix!" << std::endl;
}
auto outFile = fmt::output_file(filePath);
outFile.print("{} {}\n", matrix->ncol, matrix->nrow);
for (uint32_t row = 0; row < matrix->nrow; row++) {
outFile.print("{}", matrix->values[matrix->to_index(0,row)]);
for (uint32_t col = 1; col < matrix->ncol; col++) {
outFile.print(" {}",
matrix->values[matrix->to_index(col,row)]);
}
outFile.print("\n");
}
}
void saveMatrixAsBinary(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsBinary()" << std::endl;
FILE * outFile = fopen(filePath, "wb");
if (nullptr != outFile) {
fwrite( &matrix->ncol, 4, 1, outFile);
fwrite( &matrix->nrow, 4, 1, outFile);
size_t nfloats = matrix->ncol * matrix->nrow;
fwrite( &matrix->values, sizeof(float), nfloats, outFile);
fclose(outFile);
} else {
std::cout << "could not open " << filePath << " for writing."
<< std::endl;
}
}
Matrix_t* loadMatrixFromText_IOSTREAM(const char* filePath) {
std::cout << "loadMatrixFromText_IOSTREAM()" << std::endl;
std::ifstream inFile(filePath);
if (inFile) {
uint32_t ncol;
uint32_t nrow;
inFile >> ncol;
inFile >> nrow;
uint32_t nfloats = ncol * nrow;
auto loader =
[&inFile]
(uint32_t , uint32_t , uint32_t , uint32_t )
-> float
{
float value;
inFile >> value;
return value;
};
Matrix_t * matrix = createMatrix( ncol, nrow, loader);
return matrix;
} else {
std::cout << "could not open " << filePath << "for reading."
<< std::endl;
}
return nullptr;
}
Matrix_t* loadMatrixFromText_STDIO(const char* filePath) {
std::cout << "loadMatrixFromText_STDIO(" << filePath << ")"
<< std::endl;
Matrix_t* matrix = nullptr;
FILE * inFile = fopen(filePath, "rt");
if (nullptr != inFile) {
uint32_t ncol;
uint32_t nrow;
fscanf(inFile, "%d %d", &ncol, &nrow);
auto loader =
[&inFile]
(uint32_t , uint32_t , uint32_t , uint32_t )
-> float
{
float value;
fscanf(inFile, "%f", &value);
return value;
};
matrix = createMatrix( ncol, nrow, loader);
fclose(inFile);
} else {
std::cout << "could not open " << filePath << "for reading."
<< std::endl;
}
return matrix;
}
Matrix_t* loadMatrixFromBinary(const char* filePath) {
std::cout << "loadMatrixFromBinary(" << filePath << ")"
<< std::endl;
FILE * inFile = fopen(filePath, "rb");
if (nullptr != inFile) {
uint32_t ncol;
uint32_t nrow;
fread( &ncol, 4, 1, inFile);
fread( &nrow, 4, 1, inFile);
uint32_t nfloats = ncol * nrow;
uint32_t nbytes = nfloats * sizeof(float) + UINT32_C(8);
Matrix_t* matrix =
reinterpret_cast<Matrix_t*>
(operator new (nbytes));
if (nullptr != matrix) {
matrix->ncol = ncol;
matrix->nrow = nrow;
fread( &matrix->values[0], sizeof(float), nfloats, inFile);
return matrix;
} else {
std::cout << "could not find memory for the matrix."
<< std::endl;
}
fclose(inFile);
} else {
std::cout << "could not open file "
<< filePath << " for reading." << std::endl;
}
return nullptr;
}
void freeMatrix(Matrix_t* matrix) {
operator delete(matrix);
}
Matrix_t* mapMatrixFromBinaryFile(const char* filePath) {
std::cout << "mapMatrixFromBinaryFile(" << filePath << ")"
<< std::endl;
Matrix_t * matrix = nullptr;
int fd = open( filePath, O_RDONLY);
if (-1 != fd) {
struct stat sb;
if (-1 != fstat(fd, &sb)) {
auto fileSize = sb.st_size;
matrix =
reinterpret_cast<Matrix_t*>
(mmap(nullptr, fileSize, PROT_READ, MAP_PRIVATE, fd, 0));
if (nullptr == matrix) {
std::cout << "mmap() failed!" << std::endl;
}
} else {
std::cout << "fstat() failed!" << std::endl;
}
close(fd);
} else {
std::cout << "open() failed!" << std::endl;
}
return matrix;
}
void unmapMatrix(Matrix_t* matrix) {
if (nullptr == matrix)
return;
size_t nbytes =
UINTMAX_C(8) +
sizeof(float) * matrix->ncol * matrix->nrow;
munmap(matrix, nbytes);
}
bool areMatricesEqual( const Matrix_t* m1, const Matrix_t* m2) {
if (nullptr == m1) return false;
if (nullptr == m2) return false;
if (m1->ncol != m2->ncol) return false;
if (m1->nrow != m2->nrow) return false;
// both exist and have same size...
size_t nfloats = m1->ncol * m1->nrow;
size_t nbytes = nfloats * sizeof(float);
return 0 == memcmp( m1->values, m2->values, nbytes);
}
int main(int argc, const char* argv[]) {
std::random_device rdev;
std::default_random_engine reng(rdev());
std::uniform_real_distribution<> rdist(-1.0E6F, 1.0E6F);
StopWatch sw;
auto randomInitFunction =
[&reng,&rdist]
(uint32_t ncol, uint32_t nrow, uint32_t col, uint32_t row)
-> float
{
return rdist(reng);
};
sw.start("creating random matrix (20k x 20k)");
Matrix_t * randomMatrix =
createMatrix(UINT32_C(20000),
UINT32_C(20000),
randomInitFunction);
sw.stop();
if (nullptr != randomMatrix) {
std::cout
<< "the first 10 floating values in randomMatrix are: "
<< std::endl;
std::cout << randomMatrix->values[0];
for (size_t i = 1; i < 10; i++) {
std::cout << " " << randomMatrix->values[i];
}
std::cout << std::endl;
sw.start("saving matrix with iostream.");
saveMatrixAsText_IOSTREAM("mat0_iostream.txt", randomMatrix);
sw.stop();
sw.start("saving matrix with {fmt}.");
saveMatrixAsText_FMT("mat0_fmt.txt", randomMatrix);
sw.stop();
sw.start("saving matrix into a binary file.");
saveMatrixAsBinary("mat0.bin", randomMatrix);
sw.stop();
sw.start("loading matrix from text file with iostream.");
Matrix_t* textMatrix =
loadMatrixFromText_IOSTREAM("mat0_iostream.txt");
sw.stop();
sw.start("comparing randomMatrix with textMatrix.");
if (!areMatricesEqual(randomMatrix, textMatrix)) {
std::cout << "randomMatrix != textMatrix!" << std::endl;
} else {
std::cout << "randomMatrix == textMatrix" << std::endl;
}
sw.stop();
freeMatrix(textMatrix);
textMatrix = nullptr;
sw.start("loading matrix from text file with stdio.");
textMatrix =
loadMatrixFromText_STDIO("mat0_fmt.txt");
sw.stop();
sw.start("comparing randomMatrix with textMatrix (stdio).");
if (!areMatricesEqual(randomMatrix, textMatrix)) {
std::cout << "randomMatrix != textMatrix!" << std::endl;
} else {
std::cout << "randomMatrix == textMatrix" << std::endl;
}
sw.stop();
freeMatrix(textMatrix);
textMatrix = nullptr;
sw.start("loading matrix from binary file into memory.");
Matrix_t* inMemoryMatrix =
loadMatrixFromBinary("mat0.bin");
sw.stop();
sw.start("comparing randomMatrix with inMemoryMatrix.");
if (!areMatricesEqual(randomMatrix, inMemoryMatrix)) {
std::cout << "randomMatrix != inMemoryMatrix!"
<< std::endl;
} else {
std::cout << "randomMatrix == inMemoryMatrix" << std::endl;
}
sw.stop();
freeMatrix(inMemoryMatrix);
inMemoryMatrix = nullptr;
sw.start("mapping a view to a matrix in a binary file.");
Matrix_t* mappedMatrix =
mapMatrixFromBinaryFile("mat0.bin");
sw.stop();
sw.start("comparing randomMatrix with mappedMatrix.");
if (!areMatricesEqual(randomMatrix, mappedMatrix)) {
std::cout << "randomMatrix != mappedMatrix!"
<< std::endl;
} else {
std::cout << "randomMatrix == mappedMatrix" << std::endl;
}
sw.stop();
unmapMatrix(mappedMatrix);
mappedMatrix = nullptr;
freeMatrix(randomMatrix);
} else {
std::cout << "could not create random matrix!" << std::endl;
}
return 0;
}
Please note, that binary formats where you simply cast to a struct pointer also depend on how the compiler does alignment and padding within structures. In my case, I was lucky and it worked. On other systems, you might have to tweak a little (#pragma pack(4) or something along that line) to make it work.
I'm new to C++ and I'm trying to return a struct from a vector of structs by using 2 search criteria.
The function find_city is returning me everything from the defined range, regardless of whether it exists inside the vector of struct.
Here's my code:
struct cityLoc
{
int hRange;
int vRange;
int cityCode;
string cityName;
};
vector<cityLoc> cl1;
// the vector has already been preloaded with data
// function to return my struct from the vector
cityLoc find_city(int hRange, int vRange)
{
for (size_t i = 0; i < cl1.size(); i++)
{
if ((cl1[i].hRange = hRange) && (cl1[i].vRange = vRange))
{
return cl1[i];
}
}
}
int main()
{
for (int i = 0; i < 8; i++)
{
for (int j = 0; j <= 8; j++)
{
cityLoc this_city;
this_city = find_city(i, j);
cout << this_city.hRange << ", " << this_city.vRange << endl;
}
}
return 0;
}
Also, aside from this question, I was previously looking into std::find_if and didn't understand it. If I have the following code, what is the output? How do I modify it such that it returns a struct?
auto it = find_if(cl1.begin(), cl1.end(), [](cityLoc& cl) { return cl.hRange == 1; } );
You have a bug here:
if ((cl1[i].hRange = hRange) && (cl1[i].vRange = vRange))
Those = are assignments, not comparisons! Please enable compiler warnings and you won't be hurt by such obvious typos in future.
std::find_if will return the iterator to the found struct entry if it is successful, std::vector::end() otherwise. So, you should first validate the returning iterator if it is valid or not.
For example:
auto it = std::find_if( cl1.begin(), cl1.end(),
[](const cityLoc& cl) { return cl.hRange == 1; } );
if ( it == cl1.end() )
{
// ERROR: Not found! Return error code etc.
return -1;
}
// And, if found, process it here...
std::cout << it->hRange << '\n';
std::cout << it->vRange << '\n';
The criteria (predicate) part in std::find_if is a lambda expression.
In my exercise with OpenDDS I would like to create multiple topics from a single IDL structure, is that possible? otherwise please let me know how to do it.
I do it as below, please correct me if it is not the right way to do it.
The sample I use is available at OpenDDS-3.12/examples/DCPS/IntroductionToOpenDDS
The IDL is as follows,
StockQuoter.idl
---------------
module StockQuoter
{
#pragma DCPS_DATA_TYPE "StockQuoter::Quote"
#pragma DCPS_DATA_KEY "StockQuoter::Quote ticker"
struct Quote {
string ticker;
string exchange;
string full_name;
double value;
string data;
TimeBase::TimeT timestamp;
};
}
publisher.cpp
// Create TOPICS and TYPES Vector
std::stringstream ss;
for(unsigned int idx = 0; idx < 100; ++idx)
{
ss << (idx+1);
TOPICS.push_back("TOPIC" + std::string(ss.str()));
TYPES.push_back("TYPE" + std::string(ss.str()));
ss.clear();
ss.str(std::string());
}
// Register
for( unsigned int idx = 0; idx < 100; ++idx )
{
vec_quote_servent.push_back(new StockQuoter::QuoteTypeSupportImpl());
if (DDS::RETCODE_OK != vec_quote_servent[idx]->register_type(participant.in (), TYPES[idx].c_str()))
{
cerr << "register_type for " << TYPES[idx] << " failed." << endl;
ACE_OS::exit(1);
}
}
// Create a topics
for( unsigned int idx = 0; idx < 100; ++idx )
{
vec_quote_topic.push_back( participant->create_topic (TOPICS[idx].c_str(),
TYPES[idx].c_str(),
default_topic_qos,
DDS::TopicListener::_nil(),
::OpenDDS::DCPS::DEFAULT_STATUS_MASK));
if (CORBA::is_nil (vec_quote_topic[idx].in ())) {
cerr << "create_topic for " << TOPICS[idx] << " failed." << endl;
ACE_OS::exit(1);
}
}
// Create DataWriters
for( unsigned int idx = 0; idx < 100; ++idx )
{
vec_quote_base_dw.push_back( pub->create_datawriter(vec_quote_topic[idx].in (),
dw_default_qos,
DDS::DataWriterListener::_nil(),
::OpenDDS::DCPS::DEFAULT_STATUS_MASK) );
if (CORBA::is_nil (vec_quote_base_dw[idx].in ())) {
cerr << "create_datawriter for " << TOPICS[idx] << " failed." << endl;
ACE_OS::exit(1);
}
vec_quote_dw.push_back( StockQuoter::QuoteDataWriter::_narrow(vec_quote_base_dw[idx].in()) );
if (CORBA::is_nil (vec_quote_dw[idx].in ())) {
cerr << TOPICS[idx] << " could not be narrowed"<< endl;
ACE_OS::exit(1);
}
}
// Create handle
for( unsigned int idx = 0; idx < 100 ; ++idx )
{
{
StockQuoter::Quote topic2;
topic2.ticker = CORBA::string_dup(TOPICS[idx].c_str());
vec_topic_handle.push_back(vec_quote_dw[idx]->register_instance(topic2));
}
}
// Publish data
StockQuoter::Quote vec_quote;
vec_quote.exchange = STOCK_EXCHANGE_NAME;
vec_quote.ticker = CORBA::string_dup("VEC_TOPIC");
vec_quote.full_name = CORBA::string_dup("TOPIC Receipts");
vec_quote.value = 1600.0 + 10.0*i;
vec_quote.timestamp = get_timestamp();
for(unsigned int idx = 0; idx < 100; ++idx )
{
vec_quote.value += idx + 10;
cout << "Publishing " << TOPICS[idx] << " : " << vec_quote.value <<endl;
ret = vec_quote_dw[idx]->write(vec_quote, vec_topic_handle[idx]);
if (ret != DDS::RETCODE_OK) {
ACE_ERROR ((LM_ERROR, ACE_TEXT("(%P|%t) ERROR: TOPIC2 write returned %d.\n"), ret));
}
}
a, now I get the point you wanted to ask. you can define different topic types either in one file per topic, or all in one file. If you define more than one topic type in an IDL file, type support is generated for each file. Let me describe this more precisely with the same example you used. The IDL file for the IntroductionToOpenDDS example looks as follows:
#include "orbsvcs/TimeBase.idl"
module StockQuoter
{
#pragma DCPS_DATA_TYPE "StockQuoter::Quote"
#pragma DCPS_DATA_KEY "StockQuoter::Quote ticker"
struct Quote {
string ticker;
string exchange;
string full_name;
double value;
TimeBase::TimeT timestamp;
};
#pragma DCPS_DATA_TYPE "StockQuoter::ExchangeEvent"
#pragma DCPS_DATA_KEY "StockQuoter::ExchangeEvent exchange"
enum ExchangeEventType { TRADING_OPENED,
TRADING_CLOSED,
TRADING_SUSPENDED,
TRADING_RESUMED };
struct ExchangeEvent {
string exchange;
ExchangeEventType event;
TimeBase::TimeT timestamp;
};
};
As you can see, two types are defined: Quote and ExchangeEvent. When this IDL file gets compiled, type support for both Quote and ExchangeEvent is generated.
You already used the type support for using this line (QuoteTypeSupportImpl):
vec_quote_servent.push_back(new StockQuoter::QuoteTypeSupportImpl());
The same type support is generated for ExchangeEvent, you will find a type support called StockQuoter::ExchangeEvent with a StockQuoter::ExchangeEventTypeSupportImpl() method. Simply use this to create a topic of type ExchangeEvent.
I hope this helps. If more details are needed, feel free to ask.
you can create as many topics as you wish from a single IDL file. you are already doing it with this line:
participant->create_topic (TOPICS[idx].c_str(),
TYPES[idx].c_str(),
default_topic_qos,
DDS::TopicListener::_nil(),
::OpenDDS::DCPS::DEFAULT_STATUS_MASK);
however, each topic you created has the same type. you can also create different types for topics if you have to.
I am practicing using pointers to create objects and access data. I created a stuct called BigNum to represent a number with multiple digits. When I try to print the content of the struct inside the readDigits function, it can be printed pretty well. However, after passing the pointer to the main function, the content of the stuct is printed out to be random numbers. Why? How to fix it?
struct BigNum{
int numDigits; //the number of digits
int *digits; //the content of the big num
};
int main(){
BigNum *numPtr = readDigits();
for (int i=0; i<(numPtr->numDigits);i++ ){
std::cout << (numPtr->digits)[i] << std::endl;
}
return 0;
}
BigNum* readDigits(){
std::string digits;
std::cout << "Input a big number:" << std::endl;
std::cin >> digits;
int result[digits.length()];
toInt(digits,result);
BigNum *numPtr = new BigNum();
numPtr->numDigits = digits.length();
numPtr->digits = result;
/* When I try to print in here, it's totally okay!
std::cout << "Here is the content:" << std::endl;
for (int i=0; i<numPtr->numDigits;i++ ){
std::cout << (numPtr->digits)[i] << std::endl;
}
*/
return numPtr;
}
void toInt(std::string& str, int result[]){
for (int i=0;i<str.length() ;i++ ){
result[str.length()-i-1] = (int)(str[i]-'0');
}
}
BigNum* readDigits(){
//....
int result[digits.length()];
//....
numPtr->digits = result;
return numPtr;
}
result is stored on the stack. So if you return it as part of numPtr, it will be invalid as soon as you exit the function. Instead of storing it on the stack you have to allocate it with new.
You have undefined behavior because you assign address of automatic object to digits pointer. When readDigits() returns this memory is not valid anymore. You should assign to this pointer address of heap-based object (or some equivalent, e.g. use vector or smart pointer):
#include <vector>
struct BigNum{
int numDigits; //the number of digits
std::vector<int> digits; //the content of the big num
};
Then you can insert numbers into vector this way:
int input;
while ( std::cin >> input) //enter any non-integer to end the loop
{
digits.push_back(input);
}
The problem is that within the function BigNum* readDigits() you assign apointer to stack memory to the pointer of your newly allocated BigNum:
int result[digits.length()]; // <--- variable is on the stack!!!
toInt(digits,result);
BigNum *numPtr = new BigNum();
numPtr->numDigits = digits.length();
numPtr->digits = result; // <--- make pointer to stack memory available to caller of readDigits
Now if you proceed the access to numPtr->digits is ok since the memory of result is still valid on the stack (as long as you are within readDigits). Once you've left ´readDigits()´ the memory of result is overwritten depending on what you do (calling other functions, ...).
Right now I'm even wondering why you don't get a compiler error with ´int result[digits.length()];´ since ´digits.length()´ is not constant and the size of required stack memory has to be defined at compile time... so I'm thinking that the size of result is actually 0...?? Would be a nice thing to test!
My recommendation is to modify the code of readDigits as follows:
BigNum* readDigits()
{
std::string digits;
int i;
std::cout << "Input a big number:" << std::endl;
std::cin >> digits;
//int result[digits.length()];
//toInt(digits,result);
BigNum *numPtr = new BigNum();
numPtr->numDigits = digits.length();
numPtr->digits = (int *)malloc(sizeof(int) * numPtr->numDigits); // allocate heap memory for digits
toInt(digits, numPtr->digits);
/* When I try to print in here, it's totally okay!
std::cout << "Here is the content:" << std::endl;
for (i = 0; i <numPtr->numDigits; i++)
{
std::cout << (numPtr->digits)[i] << std::endl;
}
*/
return numPtr;
}
Remember to free your memory if ´BigNum *numPtr´ is no longer used (´free(numPtr->digits);´) otherwise you'll get a memory leak (sooner or later):
int main()
{
BigNum *numPtr = readDigits();
int i;
for (i = 0; i < (numPtr->numDigits); i++)
{
std::cout << (numPtr->digits)[i] << std::endl;
}
free(numPtr->digits); // free memory allocated by readDigits(..)
return 0;
}
Here is the code I have, it compiles and runs using g++ but I get a segmentation fault. I know it happens around the pthread_join statement but I cant figure out why.
#include <iostream>
#include <stdio.h>
#include <fstream>
#include <pthread.h>
#include <stdlib.h>
#include <sstream>
using namespace std;
struct data{
string filename;
int x;
int y;
};
void *threadFunction(void *input){
data *file = (data *) input;
string filename = file->filename;
ifstream myFile;
int xCount = 0;
int yCount = 0;
myFile.open(filename.c_str());
string line;
while(myFile >> line){
if(line == "X"){
xCount++;
}else if(line == "Y"){
yCount++;
}
}
file->x = xCount;
file->y = yCount;
return (void *) file;
}
int main(){
pthread_t myThreads[20];
data *myData = new data[20];
for(int i = 0; i < 20; i++){
ostringstream names;
names << "/filepath/input" << i+1 << ".txt";
myData[i].filename = names.str();
myData[i].x = 0;
myData[i].y = 0;
}
for(int i = 0; i < 20; i++){
int check = pthread_create(&myThreads[i], NULL, threadFunction, (void *) &myData[i]);
if(check != 0){
cout << "Error Creating Thread\n";
exit(-1);
}
}
int xCount = 0;
int yCount = 0;
for(int i = 0; i < 20; i++){
data* returnedItem;
pthread_join(myThreads[i], (void**) returnedItem);
xCount += returnedItem->x;
yCount += returnedItem->y;
}
cout << "Total X: " << xCount << "\n";
cout << "Total Y: " << yCount << "\n";
}
Am I not calling return properly from my threadFunction? I've been trying a bunch of different things and I still don't know what's going on...any help would be greatly appreciated! (the text file I open contain either an X or Y per line. My goal is to count the total number of Xs and Ys in 20 text files)
Second argument of pthread_join will be used to return, return value of the thread, so somewhere inside pthread_join we have a code that call *secondArgument = thread_return_value, but lets look at what you are doing here:
// You are not initializing returnedItem, so it contain some garbage value
// For example 0x12345678
data* returnedItem;
// Now you cast that garbage to (void**), and pthread_join will call
// *(0x12345678) = thread_return_value that will cause segmentation fault
pthread_join(myThreads[i], (void**) returnedItem);
But you want return value to be copied to returnedItem, am I right? in case that you answer is yes you should pass address of returnedItem to pthread_join so it can copy it there. So change your call to:
pthread_join(myThreads[i], (void**) &returnedItem);
pthread_join(myThreads[i], (void**) returnedItem);
should be
pthread_join(myThreads[i], (void**) &returnedItem);
You're asking join to set the value of returnedItem to be whatever void* your thread function returned ... so you need to give the address of returnedItem.
The second argument to pthread_join() is a void** into which the result is to be stored. You are passing a random value, however. This could should rather look something like this:
void* result;
pthread_join(myThread[i], &result);
data* returnedItem = static_cast<data*>(result);
Of course, this assumes that a data* was, indeed, returned.