Dynamic batch is not supported on Intel NCS2 vpu - c++

I'm trying to run FP16 person-detection-retail-0013 and person-reidentification-retail-0079 on Intel Neural Compute Stick hardware, but once I run the application to load the nets on the device I get this exception:
[INFERENCE ENGINE EXCEPTION] Dynamic batch is not supported
I've load the net with setting of the max batch size to 1 and I've started my project from the pedestrian tracker demo into the OpenVINO toolkit:
main.cpp --> CreatePedestrianTracker
CnnConfig reid_config(reid_model, reid_weights);
reid_config.max_batch_size = 16;
try {
if (ie.GetConfig(deviceName, CONFIG_KEY(DYN_BATCH_ENABLED)).as<std::string>() !=
PluginConfigParams::YES) {
reid_config.max_batch_size = 1;
std::cerr << "[DEBUG] Dynamic batch is not supported for " << deviceName << ". Fall back
to batch 1." << std::endl;
}
}
catch (const InferenceEngine::details::InferenceEngineException& e) {
reid_config.max_batch_size = 1;
std::cerr << e.what() << " for " << deviceName << ". Fall back to batch 1." << std::endl;
}
Cnn.cpp --> void CnnBase::InferBatch
void CnnBase::InferBatch(
const std::vector<cv::Mat>& frames,
std::function<void(const InferenceEngine::BlobMap&, size_t)> fetch_results) const {
const size_t batch_size = input_blob_->getTensorDesc().getDims()[0];
size_t num_imgs = frames.size();
for (size_t batch_i = 0; batch_i < num_imgs; batch_i += batch_size) {
const size_t current_batch_size = std::min(batch_size, num_imgs - batch_i);
for (size_t b = 0; b < current_batch_size; b++) {
matU8ToBlob<uint8_t>(frames[batch_i + b], input_blob_, b);
}
if ((deviceName_.find("MYRIAD") == std::string::npos) && (deviceName_.find("HDDL") ==
std::string::npos)) {
infer_request_.SetBatch(current_batch_size);
}
infer_request_.Infer();
fetch_results(outputs_, current_batch_size);
}
}
I suppose that the problem could be the topology of the detection net, but I ask if anyone has had the same problem and solved the issue.
Thank's.

I am afraid, myriad plugin does not support dynamic batch. Please try an updated version of the demo. You can find it, for example, here: https://github.com/opencv/open_model_zoo/tree/master/demos/pedestrian_tracker_demo
The demo is updated not to use dynamic batch at all.

Related

C++ code apparently executing out of sequence

The code
This is a project on Raspberry Pi using WiringPi. I have the following three member functions of a template class, along with pure virtuals for read() and write(). This base class is then subclassed by more specialized classes that provide the read() and write() function (sample shown down below):
// IChip.hpp (Root abstract class)
class IChip {
public:
virtual bool test() noexcept = 0;
};
// End IChip.hpp
// IMemory.hpp (class of interest to the question)
class IMemory: public IChip {
protected:
...
TAddr m_wordCount;
TWord m_dataMax;
// ctor and dtor, and more member fields
public:
virtual TWord read(const TAddr addr) const noexcept = 0;
virtual void write(const TAddr addr, const TWord data) const noexcept = 0;
// accessors and whatnot ...
bool march(bool keepGoing = false) noexcept;
bool checkerboard(bool keepGoing = false) noexcept;
bool test() noexcept final override;
};
// End IMemory.hpp
// IMemory.cpp
template <typename TAddr, typename TWord>
bool IMemory<TAddr, TWord>::march(bool keepGoing) noexcept {
bool result = true;
TAddr i;
TWord r;
const uint64_t totalIter = (m_wordCount * 6) - 1;
uint64_t counter = 0;
std::cout << "Starting MARCH test." << std::endl;
for (i = 0; i < m_wordCount; i++) {
this->write(i, 0);
std::cout << '\r' << counter << " / " << totalIter << std::flush;
counter++;
}
for (i = 0; i < m_wordCount; i++) {
r = this->read(i);
if (r != 0) {
result = false;
if (!keepGoing)
return result;
}
this->write(i, m_dataMax);
std::cout << '\r' << counter << " / " << totalIter << std::flush;
counter++;
}
// 4 more similar loops
std::cout << std::endl;
std::cout << "MARCH test done." << std::endl;
return result;
}
template <typename TAddr, typename TWord>
bool IMemory<TAddr, TWord>::checkerboard(bool keepGoing) noexcept {
bool result = true;
TAddr i;
TWord r;
TWord curWord;
const uint64_t totalIter = (m_wordCount * 4) - 1;
uint64_t counter = 0;
std::cout << "Starting CHECKERBOARD test." << std::endl;
curWord = 0;
for (i = 0; i < m_wordCount; i++) {
this->write(i, curWord);
std::cout << '\r' << counter << " / " << totalIter << std::flush;
counter++;
curWord = curWord == 0 ? m_dataMax : 0;
}
curWord = 0;
for (i = 0; i < m_wordCount; i++) {
r = this->read(i);
if (r != curWord) {
result = false;
if (!keepGoing)
return result;
}
std::cout << '\r' << counter << " / " << totalIter << std::flush;
counter++;
curWord = curWord == 0 ? m_dataMax : 0;
}
// 2 more similar loops ...
std::cout << std::endl;
std::cout << "CHECKERBOARD test done." << std::endl;
return result;
}
template <typename TAddr, typename TWord>
bool IMemory<TAddr, TWord>::test() noexcept {
bool march_result = this->march();
bool checkerboard_result = this->checkerboard();
bool result = march_result && checkerboard_result;
std::cout << "MARCH: " << (march_result ? "Passed" : "Failed") << std::endl;
std::cout << "CHECKERBOARD: " << (checkerboard_result ? "Passed" : "Failed") << std::endl;
return result;
}
// Explicit instantiation
template class IMemory<uint16_t, uint8_t>;
// End IMemory.cpp
// Sample read() and write() from HM62256, a subclass of IMemory<uint16_t, uint8_t>
// These really just bitbang onto / read data from pins with appropriate timings for each chip.
// m_data and m_address are instances of a Bus class, that is just a wrapper around an array of pins, provides bit-banging and reading functionality.
uint8_t HM62256::read(uint16_t addr) const noexcept {
uint8_t result = 0;
m_data->setMode(INPUT);
m_address->write(addr);
digitalWrite(m_CSPin, LOW);
digitalWrite(m_OEPin, LOW);
delayMicroseconds(1);
result = m_data->read();
digitalWrite(m_OEPin, HIGH);
digitalWrite(m_CSPin, HIGH);
delayMicroseconds(1);
return result;
}
void HM62256::write(uint16_t addr, uint8_t data) const noexcept {
digitalWrite(m_OEPin, HIGH);
delayMicroseconds(1);
m_address->write(addr);
delayMicroseconds(1);
m_data->setMode(OUTPUT);
m_data->write(data);
digitalWrite(m_CSPin, LOW);
digitalWrite(m_WEPin, LOW);
delayMicroseconds(1);
digitalWrite(m_WEPin, HIGH);
digitalWrite(m_CSPin, HIGH);
delayMicroseconds(1);
}
// main.cpp
void hm62256_test() {
const uint8_t ADDR_PINS[] = {4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18};
const uint8_t DATA_PINS[] = {19, 20, 21, 22, 23, 24, 25, 26};
Chiptools::Memory::HM62256 *device = new Chiptools::Memory::HM62256(ADDR_PINS, DATA_PINS, 2, 3, 27);
device->setup();
bool result = device->test();
std::cout << "Device " << ( result ? "passed all" : "failed some") << " tests." << std::endl;
delete device;
}
int main(int argc, char *argv[]) {
wiringPiSetupGpio();
hm62256_test();
}
The output
Now when I run this, sometimes it works just fine:
Starting MARCH test.
196607 / 196607
MARCH test done.
Starting CHECKERBOARD test.
131071 / 131071
CHECKERBOARD test done.
MARCH: Passed
CHECKERBOARD: Passed
Device passed all tests.
But randomly I will get this output:
Starting MARCH test.
67113 / 196607Starting CHECKERBOARD test.
33604 / 131071MARCH: Failed
CHECKERBOARD: Failed
Device failed some tests.
Toolchain info
gcc 8.3.0 arm-linux / C++14
Cmake 3.16.3
No threading.
Compiler & Linker flags:
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fsanitize=address,leak,undefined")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=address,leak,undefined -static-libasan")
The issue and what I tried
I have a couple dozen chips. All of the chips work fine with a TL866ii programmer / tester. This happens with all of these chips. So that rules out the chips as a source of the issue.
Well, at first I thought maybe I'm not flushing the cout stream properly, but AFAIK std::endl does flush the output, so that's not that.
Next, I set a few breakpoints: (A) right before march() returns, (B) right where checkerboard() is called (2nd line in test()), (C) at the 1st line inside checkerboard().
When output was as expected, breakpoints were hit in this order A, B, C.
When output was not as expected, breakpoints were hit in this order B, C, A.
What it looks like is happening is, sometimes checkerboard() is called while march() is still running, causing random GPIO output at which point one or both tests fail.
While I'm looking for a solution to this, I'm more interested in some insight on what is happening. I would've thought that since my code is not making use of multithreading, and per my understanding of the C++ standard, statements are executed one by one to completion before the next statement is executed. I'm aware that some compiler implementations do reorder statements for optimization, but AFAIK it should not affect the semantics of my code. I might be wrong as that stuff is way over my head.
This might not be an answer, but it's too long for a comment.
Is A at the return statement after the "March test done" line?
I'm basing the following comments based off this output:
Starting MARCH test.
67113 / 196607Starting CHECKERBOARD test.
33604 / 131071MARCH: Failed
CHECKERBOARD: Failed
Device failed some tests.
What appears to be happening is your MARCH test is failing in the 3rd loop, thus returning early (within the loop). Your Checkerboard test then fails within the 2nd loop, and also returns early. If A is at the position I mentioned, then I think it's just luck or a compiler quirk that that breakpoint is hit.
That is to say, logically, I wouldn't expect breakpoint A to be hit at all when the failure occurs, only for B and C. I think A being hit at the end is probably down to how it program was compiled, and maybe some odd optimizations. Or perhaps where in the assembly the debugger is putting the breakpoint; it might just be on a final instruction that's going to be called anyway. Try putting the breakpoint on the std::cout line before the return and see if it's still hit.
To expand on your comment, this is what I'm seeing in the problem output:
Starting MARCH test.
67113 / 196607 [march() returns early] [checkerboard() starts] Starting CHECKERBOARD test.
33604 / 131071 [checkerboard() returns early] [test() reports results] MARCH: Failed
CHECKERBOARD: Failed
Device failed some tests.
All in all, I think the output will match your expectations if you change your return lines from this:
if (!keepGoing)
return result;
to something like this:
if (!keepGoing) {
std::cout << std::endl;
std::cout << "MARCH test failed." << std::endl;
return result;
}
Which I would expect to produce an output like this:
Starting MARCH test.
67113 / 196607
MARCH test failed
Starting CHECKERBOARD test.
33604 / 131071
CHECKERBOARD test failed
MARCH: Failed
CHECKERBOARD: Failed
Device failed some tests.

(C++) Fastest way possible for reading in matrix files (arbitrary size)

I'm developing a bioinformatic tool, which requires reading in millions of matrix files (average dimension = (20k, 20k)). They are tab-delimited text files, and they look something like:
0.53 0.11
0.24 0.33
Because the software reads the matrix files one at a time, memory is not an issue, but it's very slow. The following is my current function for reading in a matrix file. I first make a matrix object using a double pointer, then fill in the matrix by looping through an input file .
float** make_matrix(int nrow, int ncol, float val){
float** M = new float *[nrow];
for(int i = 0; i < nrow; i++) {
M[i] = new float[ncol];
for(int j = 0; j < ncol; j++) {
M[i][j] = val;
}
}
return M;
}
float** read_matrix(string fname, int dim_1, int dim_2){
float** K = make_matrix(dim_1, dim_2, 0);
ifstream ifile(fname);
for (int i = 0; i < dim_1; ++i) {
for (int j = 0; j < dim_2; ++j) {
ifile >> K[i][j];
}
}
ifile.clear();
ifile.seekg(0, ios::beg);
return K;
}
Is there a much faster way to do this? From my experience with python, reading in a matrix file using pandas is so much faster than using python for-loops. Is there a trick like that in c++?
(added)
Thanks so much everyone for all your suggestions and comments!
The fastest way, by far, is to change the way you write those files: write in binary format, two int first (width, height) then just dump your values.
You will be able to load it in just three read calls.
Just for fun, I measured the program posted above (using a 20,000x20,000 ASCII input file, as described) on my Mac Mini (3.2GHz i7 with SSD drive) and found that it took about 102 seconds to parse in the file using the posted code.
Then I wrote a version of the same function that uses the C stdio API (fopen()/fread()/fclose()) and does character-by-character parsing into a 1D float array. This implementation takes about 13 seconds to parse in the file on the same hardware, so it's about 7 times faster.
Both programs were compiled with g++ -O3 test_read_matrix.cpp.
float* faster_read_matrix(string fname, int numRows, int numCols)
{
FILE * fpIn = fopen(fname.c_str(), "r");
if (fpIn == NULL)
{
printf("Couldn't open file [%s] for input!\n", fname.c_str());
return NULL;
}
float* K = new float[numRows*numCols];
// We'll hold the current number in (numberBuf) until we're ready to parse it
char numberBuf[128] = {'\0'};
int numCharsInBuffer = 0;
int curRow = 0, curCol = 0;
while(curRow < numRows)
{
char tempBuf[4*1024]; // an arbitrary size
const size_t bytesRead = fread(tempBuf, 1, sizeof(tempBuf), fpIn);
if (bytesRead <= 0)
{
if (bytesRead < 0) perror("fread");
break;
}
for (size_t i=0; i<bytesRead; i++)
{
const char c = tempBuf[i];
if ((c=='.')||(c=='+')||(c=='-')||(isdigit(c)))
{
if ((numCharsInBuffer+1) < sizeof(numberBuf)) numberBuf[numCharsInBuffer++] = c;
else
{
printf("Error, number string was too long for numberBuf!\n");
}
}
else
{
if (numCharsInBuffer > 0)
{
// Parse the current number-chars we have assembled into (numberBuf) and reset (numberBuf) to empty
numberBuf[numCharsInBuffer] = '\0';
if (curCol < numCols) K[curRow*numCols+curCol] = strtod(numberBuf, NULL);
else
{
printf("Error, too many values in row %i! (Expected %i, found at least %i)\n", curRow, numCols, curCol);
}
curCol++;
}
numCharsInBuffer = 0;
if (c == '\n')
{
curRow++;
curCol = 0;
if (curRow >= numRows) break;
}
}
}
}
fclose(fpIn);
if (curRow != numRows) printf("Warning: I read %i lines in the file, but I expected there would be %i!\n", curRow, numRows);
return K;
}
I am dissatisfied with Jeremy Friesner’s otherwise excellent answer because it:
blames the problem to be with C++'s I/O system (which it is not)
fixes the problem by circumventing the actual I/O problem without being explicit about how it is a significant contributor to speed
modifies memory accesses which (may or may not) contribute to speed, and does so in a way that very large matrices may not be supported
The reason his code runs so much faster is because he removes the single most important bottleneck: unoptimized disk access. JWO’s original code can be brought to match with three extra lines of code:
float** read_matrix(std::string fname, int dim_1, int dim_2){
float** K = make_matrix(dim_1, dim_2, 0);
std::size_t buffer_size = 4*1024; // 1
char buffer[buffer_size]; // 2
std::ifstream ifile(fname);
ifile.rdbuf()->pubsetbuf(buffer, buffer_size); // 3
for (int i = 0; i < dim_1; ++i) {
for (int j = 0; j < dim_2; ++j) {
ss >> K[i][j];
}
}
// ifile.clear();
// ifile.seekg(0, std::ios::beg);
return K;
}
The addition exactly replicates Friesner’s design, but using the C++ library capabilities without all the extra programming grief on our end.
You’ll notice I also removed a couple lines at the bottom that should be inconsequential to program function and correctness, but which may cause a minor cumulative time issue as well. (If they are not inconsequential, that is a bug and should be fixed!)
How much difference this all makes depends entirely on the quality of the C++ Standard Library implementation. AFAIK the big three modern C++ compilers (MSVC, GCC, and Clang) all have sufficiently-optimized I/O handling to make the issue moot.
locale
One other thing that may also make a difference is to .imbue() the stream with the default "C" locale, which avoids a lot of special handling for numbers in locale-dependent formats other than what your files use. You only need to bother to do this if you have changed your global locale, though.
ifile.imbue(std::locale(""));
redundant initialization
Another thing that is killing your time is the effort to zero-initialize the array when you create it. Don’t do that if you don’t need it! (You don’t need it here because you know the total extents and will fill them properly. C++17 and later is nice enough to give you a zero value if the input stream goes bad, too. So you get zeros for unread values either way.)
dynamic memory block size
Finally, keeping memory accesses to an array of array should not significantly affect speed, but it still might be worth testing if you can change it. This is assuming that the resulting matrix will never be too large for the memory manager to return as a single block (and consequently crash your program).
A common design is to allocate the entire array as a single block, with the requested size plus size for the array of pointers to the rest of the block. This allows you to delete the array in a single delete[] statement. Again, I don’t believe this should be an optimization issue you need to care about until your profiler says so.
At the risk of the answer being considered incomplete (no code examples), I would like to add to the other answers additional options how to tackle the problem:
Use a binary format (width,height, values...) as file format and then use file mapping (MapViewOfFile() on Windows, mmap() or so on posix/unix systems).
Then, you can simply point your "matrix structure" pointer to the mapped address space and you are done. And in case, you do something like sparse access to the matrix, it can even save some real IO. If you always do full access to all elements of the matrix (no sparse matrices etc.), it is still quite elegant and probably faster than malloc/read.
Replacements for c++ iostream, which is known to be quite slow and should not be used for performance critical stuff:
Have a look at the {fmt} library, which has become quite popular in recent years and claims to be quite fast.
Back in the days, when I did a lot of numerics on large data sets, I always opted for binary files for storage. (It was back in the days, when the fastest CPU you get your hands on were the Pentium 1 (with the floating point bug :)). Back then, all was slower, memory was much more limited (we had MB not GB as units for RAM in our systems) and all in all, nearly 20 years have passed since.
So, as a refresher, I did write some code to show, how much faster than iostream and text files you can do if you do not have extra constraints (such as endianess of different cpus etc.).
So far, my little test only has an iostream and a binary file version with a) stdio fread() kind of loading and b) mmap(). Since I sit in front of a debian bullseye computer, my code uses linux specific stuff for the mmap() approach. To run it on Windows, you have to change a few lines of code and some includes.
Edit: I added a save function using {fmt} now as well.
Edit: I added a load function with stdio now as well.
Edit: To reduce memory workload, I reordered the code somewhat
and now only keep 2 matrix instances in memory at any given time.
The program does the following:
create a 20k x 20k matrix in ram (in a struct named Matrix_t). With random values, slowly generated by std::random.
Write the matrix with iostream to a text file.
Write the matrix with stdio to a binary file.
Create a new matrix textMatrix by loading its data from the text file.
Create a new matrix inMemoryMatrix by loading its data from the binary file with a few fread() calls.
mmap() the binary file and use it under the name mappedMatrix.
Compare each of the loaded matrices to the original randomMatrix to see if the round-trip worked.
Here the results I got on my machine after compiling this work of wonder with clang++ -O3 -o fmatio fast-matrix-io.cpp -lfmt:
./fmatio
creating random matrix (20k x 20k) (27.0775seconds)
the first 10 floating values in randomMatrix are:
57970.2 -365700 -986079 44657.8 826968 -506928 668277 398241 -828176 394645
saveMatrixAsText_IOSTREAM()
saving matrix with iostream. (192.749seconds)
saveMatrixAsText_FMT(mat0_fmt.txt)
saving matrix with {fmt}. (34.4932seconds)
saveMatrixAsBinary()
saving matrix into a binary file. (30.7591seconds)
loadMatrixFromText_IOSTREAM()
loading matrix from text file with iostream. (102.074seconds)
randomMatrix == textMatrix
comparing randomMatrix with textMatrix. (0.125328seconds)
loadMatrixFromText_STDIO(mat0_fmt.txt)
loading matrix from text file with stdio. (71.2746seconds)
randomMatrix == textMatrix
comparing randomMatrix with textMatrix (stdio). (0.124684seconds)
loadMatrixFromBinary(mat0.bin)
loading matrix from binary file into memory. (0.495685seconds)
randomMatrix == inMemoryMatrix
comparing randomMatrix with inMemoryMatrix. (0.124206seconds)
mapMatrixFromBinaryFile(mat0.bin)
mapping a view to a matrix in a binary file. (4.5883e-05seconds)
randomMatrix == mappedMatrix
comparing randomMatrix with mappedMatrix. (0.158459seconds)
And here is the code:
#include <cinttypes>
#include <memory>
#include <random>
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <chrono>
#include <limits>
#include <iomanip>
// includes for mmap()...
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <cstdio>
#include <cstdlib>
#include <unistd.h>
// includes for {fmt}...
#include <fmt/core.h>
#include <fmt/os.h>
struct StopWatch {
using Clock = std::chrono::high_resolution_clock;
using TimePoint =
std::chrono::time_point<Clock>;
using Duration =
std::chrono::duration<double>;
void start(const char* description) {
this->description = std::string(description);
tstart = Clock::now();
}
void stop() {
TimePoint tend = Clock::now();
Duration elapsed = tend - tstart;
std::cout << description << " (" << elapsed.count()
<< "seconds)" << std::endl;
}
TimePoint tstart;
std::string description;
};
struct Matrix_t {
uint32_t ncol;
uint32_t nrow;
float values[];
inline uint32_t to_index(uint32_t col, uint32_t row) const {
return ncol * row + col;
}
};
template <class Initializer>
Matrix_t *createMatrix
( uint32_t ncol,
uint32_t nrow,
Initializer initFn
) {
size_t nfloats = ncol*nrow;
size_t nbytes = UINTMAX_C(8) + nfloats * sizeof(float);
Matrix_t * result =
reinterpret_cast<Matrix_t*>(operator new(nbytes));
if (nullptr != result) {
result->ncol = ncol;
result->nrow = nrow;
for (uint32_t row = 0; row < nrow; row++) {
for (uint32_t col = 0; col < ncol; col++) {
result->values[result->to_index(col,row)] =
initFn(ncol,nrow,col,row);
}
}
}
return result;
}
void saveMatrixAsText_IOSTREAM(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsText_IOSTREAM()" << std::endl;
if (nullptr == matrix) {
std::cout << "cannot save matrix - no matrix!" << std::endl;
}
std::ofstream outFile(filePath);
if (outFile) {
outFile << matrix->ncol << " " << matrix->nrow << std::endl;
const auto defaultPrecision = outFile.precision();
outFile.precision
(std::numeric_limits<float>::max_digits10);
for (uint32_t row = 0; row < matrix->nrow; row++) {
for (uint32_t col = 0; col < matrix->ncol; col++) {
outFile << matrix->values[matrix->to_index(col,row)]
<< " ";
}
outFile << std::endl;
}
} else {
std::cout << "could not open " << filePath << " for writing."
<< std::endl;
}
}
void saveMatrixAsText_FMT(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsText_FMT(" << filePath << ")"
<< std::endl;
if (nullptr == matrix) {
std::cout << "cannot save matrix - no matrix!" << std::endl;
}
auto outFile = fmt::output_file(filePath);
outFile.print("{} {}\n", matrix->ncol, matrix->nrow);
for (uint32_t row = 0; row < matrix->nrow; row++) {
outFile.print("{}", matrix->values[matrix->to_index(0,row)]);
for (uint32_t col = 1; col < matrix->ncol; col++) {
outFile.print(" {}",
matrix->values[matrix->to_index(col,row)]);
}
outFile.print("\n");
}
}
void saveMatrixAsBinary(const char* filePath,
const Matrix_t* matrix) {
std::cout << "saveMatrixAsBinary()" << std::endl;
FILE * outFile = fopen(filePath, "wb");
if (nullptr != outFile) {
fwrite( &matrix->ncol, 4, 1, outFile);
fwrite( &matrix->nrow, 4, 1, outFile);
size_t nfloats = matrix->ncol * matrix->nrow;
fwrite( &matrix->values, sizeof(float), nfloats, outFile);
fclose(outFile);
} else {
std::cout << "could not open " << filePath << " for writing."
<< std::endl;
}
}
Matrix_t* loadMatrixFromText_IOSTREAM(const char* filePath) {
std::cout << "loadMatrixFromText_IOSTREAM()" << std::endl;
std::ifstream inFile(filePath);
if (inFile) {
uint32_t ncol;
uint32_t nrow;
inFile >> ncol;
inFile >> nrow;
uint32_t nfloats = ncol * nrow;
auto loader =
[&inFile]
(uint32_t , uint32_t , uint32_t , uint32_t )
-> float
{
float value;
inFile >> value;
return value;
};
Matrix_t * matrix = createMatrix( ncol, nrow, loader);
return matrix;
} else {
std::cout << "could not open " << filePath << "for reading."
<< std::endl;
}
return nullptr;
}
Matrix_t* loadMatrixFromText_STDIO(const char* filePath) {
std::cout << "loadMatrixFromText_STDIO(" << filePath << ")"
<< std::endl;
Matrix_t* matrix = nullptr;
FILE * inFile = fopen(filePath, "rt");
if (nullptr != inFile) {
uint32_t ncol;
uint32_t nrow;
fscanf(inFile, "%d %d", &ncol, &nrow);
auto loader =
[&inFile]
(uint32_t , uint32_t , uint32_t , uint32_t )
-> float
{
float value;
fscanf(inFile, "%f", &value);
return value;
};
matrix = createMatrix( ncol, nrow, loader);
fclose(inFile);
} else {
std::cout << "could not open " << filePath << "for reading."
<< std::endl;
}
return matrix;
}
Matrix_t* loadMatrixFromBinary(const char* filePath) {
std::cout << "loadMatrixFromBinary(" << filePath << ")"
<< std::endl;
FILE * inFile = fopen(filePath, "rb");
if (nullptr != inFile) {
uint32_t ncol;
uint32_t nrow;
fread( &ncol, 4, 1, inFile);
fread( &nrow, 4, 1, inFile);
uint32_t nfloats = ncol * nrow;
uint32_t nbytes = nfloats * sizeof(float) + UINT32_C(8);
Matrix_t* matrix =
reinterpret_cast<Matrix_t*>
(operator new (nbytes));
if (nullptr != matrix) {
matrix->ncol = ncol;
matrix->nrow = nrow;
fread( &matrix->values[0], sizeof(float), nfloats, inFile);
return matrix;
} else {
std::cout << "could not find memory for the matrix."
<< std::endl;
}
fclose(inFile);
} else {
std::cout << "could not open file "
<< filePath << " for reading." << std::endl;
}
return nullptr;
}
void freeMatrix(Matrix_t* matrix) {
operator delete(matrix);
}
Matrix_t* mapMatrixFromBinaryFile(const char* filePath) {
std::cout << "mapMatrixFromBinaryFile(" << filePath << ")"
<< std::endl;
Matrix_t * matrix = nullptr;
int fd = open( filePath, O_RDONLY);
if (-1 != fd) {
struct stat sb;
if (-1 != fstat(fd, &sb)) {
auto fileSize = sb.st_size;
matrix =
reinterpret_cast<Matrix_t*>
(mmap(nullptr, fileSize, PROT_READ, MAP_PRIVATE, fd, 0));
if (nullptr == matrix) {
std::cout << "mmap() failed!" << std::endl;
}
} else {
std::cout << "fstat() failed!" << std::endl;
}
close(fd);
} else {
std::cout << "open() failed!" << std::endl;
}
return matrix;
}
void unmapMatrix(Matrix_t* matrix) {
if (nullptr == matrix)
return;
size_t nbytes =
UINTMAX_C(8) +
sizeof(float) * matrix->ncol * matrix->nrow;
munmap(matrix, nbytes);
}
bool areMatricesEqual( const Matrix_t* m1, const Matrix_t* m2) {
if (nullptr == m1) return false;
if (nullptr == m2) return false;
if (m1->ncol != m2->ncol) return false;
if (m1->nrow != m2->nrow) return false;
// both exist and have same size...
size_t nfloats = m1->ncol * m1->nrow;
size_t nbytes = nfloats * sizeof(float);
return 0 == memcmp( m1->values, m2->values, nbytes);
}
int main(int argc, const char* argv[]) {
std::random_device rdev;
std::default_random_engine reng(rdev());
std::uniform_real_distribution<> rdist(-1.0E6F, 1.0E6F);
StopWatch sw;
auto randomInitFunction =
[&reng,&rdist]
(uint32_t ncol, uint32_t nrow, uint32_t col, uint32_t row)
-> float
{
return rdist(reng);
};
sw.start("creating random matrix (20k x 20k)");
Matrix_t * randomMatrix =
createMatrix(UINT32_C(20000),
UINT32_C(20000),
randomInitFunction);
sw.stop();
if (nullptr != randomMatrix) {
std::cout
<< "the first 10 floating values in randomMatrix are: "
<< std::endl;
std::cout << randomMatrix->values[0];
for (size_t i = 1; i < 10; i++) {
std::cout << " " << randomMatrix->values[i];
}
std::cout << std::endl;
sw.start("saving matrix with iostream.");
saveMatrixAsText_IOSTREAM("mat0_iostream.txt", randomMatrix);
sw.stop();
sw.start("saving matrix with {fmt}.");
saveMatrixAsText_FMT("mat0_fmt.txt", randomMatrix);
sw.stop();
sw.start("saving matrix into a binary file.");
saveMatrixAsBinary("mat0.bin", randomMatrix);
sw.stop();
sw.start("loading matrix from text file with iostream.");
Matrix_t* textMatrix =
loadMatrixFromText_IOSTREAM("mat0_iostream.txt");
sw.stop();
sw.start("comparing randomMatrix with textMatrix.");
if (!areMatricesEqual(randomMatrix, textMatrix)) {
std::cout << "randomMatrix != textMatrix!" << std::endl;
} else {
std::cout << "randomMatrix == textMatrix" << std::endl;
}
sw.stop();
freeMatrix(textMatrix);
textMatrix = nullptr;
sw.start("loading matrix from text file with stdio.");
textMatrix =
loadMatrixFromText_STDIO("mat0_fmt.txt");
sw.stop();
sw.start("comparing randomMatrix with textMatrix (stdio).");
if (!areMatricesEqual(randomMatrix, textMatrix)) {
std::cout << "randomMatrix != textMatrix!" << std::endl;
} else {
std::cout << "randomMatrix == textMatrix" << std::endl;
}
sw.stop();
freeMatrix(textMatrix);
textMatrix = nullptr;
sw.start("loading matrix from binary file into memory.");
Matrix_t* inMemoryMatrix =
loadMatrixFromBinary("mat0.bin");
sw.stop();
sw.start("comparing randomMatrix with inMemoryMatrix.");
if (!areMatricesEqual(randomMatrix, inMemoryMatrix)) {
std::cout << "randomMatrix != inMemoryMatrix!"
<< std::endl;
} else {
std::cout << "randomMatrix == inMemoryMatrix" << std::endl;
}
sw.stop();
freeMatrix(inMemoryMatrix);
inMemoryMatrix = nullptr;
sw.start("mapping a view to a matrix in a binary file.");
Matrix_t* mappedMatrix =
mapMatrixFromBinaryFile("mat0.bin");
sw.stop();
sw.start("comparing randomMatrix with mappedMatrix.");
if (!areMatricesEqual(randomMatrix, mappedMatrix)) {
std::cout << "randomMatrix != mappedMatrix!"
<< std::endl;
} else {
std::cout << "randomMatrix == mappedMatrix" << std::endl;
}
sw.stop();
unmapMatrix(mappedMatrix);
mappedMatrix = nullptr;
freeMatrix(randomMatrix);
} else {
std::cout << "could not create random matrix!" << std::endl;
}
return 0;
}
Please note, that binary formats where you simply cast to a struct pointer also depend on how the compiler does alignment and padding within structures. In my case, I was lucky and it worked. On other systems, you might have to tweak a little (#pragma pack(4) or something along that line) to make it work.

Is my program causing my out-of-memory (killed) error?

I have written my own code using Tensorflow's C API to do inference (= using a trained artificial neural network) within a C++ Fluid Dynamics simulation program. However at some point the computation stops and gives me this error:
mpirun noticed that process rank 10 with PID 0 on node node134 exited on signal 9 (Killed).
I meanwhile noticed that this is probably happening due to the fact of no remaining memory: the moment the computation stops both RAM and Swp are fully occupied.
I do not understand why this is the case. But the only things I changed since the program was running without error is the code I added to it.
Within the fluid dynamics software I programmed this:
auto t_start_0 = std::chrono::high_resolution_clock::now();
const char* frozenGraphName = "/home/elias/Lr75-57_FPVANN_premix/data/FPV_ANN_tabulated_Standard_500.pb";
const char* inputOperationName = "input_1";
const char* outputOperationName = "dense_2/BiasAdd";
int no_of_inputs = in_mean.size();
int no_of_outputs = out_mean.size();
int cellsAndPatches = (input_f_zeta_PVNorm.size())/no_of_inputs;
std::vector<int64_t> input_dimensions = {cellsAndPatches,no_of_inputs};
std::vector<int64_t> output_dimensions = {cellsAndPatches,no_of_outputs};
Inference* inf = new Inference(frozenGraphName,inputOperationName,outputOperationName,no_of_inputs,no_of_outputs,input_dimensions,output_dimensions,cellsAndPatches);
output_real = inf->doInference(input_f_zeta_PVNorm);
delete inf;
auto t_end_0 = std::chrono::high_resolution_clock::now();
auto total_0 = std::chrono::duration<float, std::milli>(t_end_0 - t_start_0).count();
std::cout << "TOTAL INFERENCE TIME C API: " << total_0 << std::endl;
The constructor of my class Inference looks like this:
Inference::Inference(const char* fgn, const char* iname, const char* oname, int nIn, int nOut, std::vector<int64_t> dimIn,std::vector<int64_t> dimOut, int CP):no_input_sizes(nIn),no_output_sizes(nOut),noCellsPatches(CP)
{
TF_Buffer* graph_def = read_file(fgn);
graph = TF_NewGraph();
status = TF_NewStatus();
TF_ImportGraphDefOptions* graph_opts = TF_NewImportGraphDefOptions();
TF_GraphImportGraphDef(graph, graph_def, graph_opts, status);
if(TF_GetCode(status)!=TF_OK)
{
std::cout << "ERROR: Unable to import graph " << TF_Message(status) << std::endl;
}
num_bytes_in = noCellsPatches*no_input_sizes*sizeof(float);
num_bytes_out = noCellsPatches*no_output_sizes*sizeof(float);
in_dims = dimIn;
out_dims = dimOut;
in_name = strdup(iname);
out_name = strdup(oname);
TF_DeleteImportGraphDefOptions(graph_opts);
TF_DeleteBuffer(graph_def);
}
The doInference-method looks like this:
std::vector<float> Inference::doInference(std::vector<float> inVals)
{
assert((inVals.size()%no_input_sizes)==0);
std::cout << "EFFECTIVE BATCH SIZE: " << inVals.size() << std::endl;
float **normalizedInputs = new float* [noCellsPatches]; // allocate pointers
normalizedInputs[0] = new float [noCellsPatches*no_input_sizes]; // allocate data
// set pointers
for (int i = 1; i < noCellsPatches; ++i) {
normalizedInputs[i] = &normalizedInputs[i-1][no_input_sizes];
}
for(int i=0;i<noCellsPatches;i++)
{
for(int j=0;j<no_input_sizes;j++)
{
normalizedInputs[i][j]=inVals.at(no_input_sizes*i+j);
}
}
const char* iname = in_name;
TF_Operation* input_op = TF_GraphOperationByName(graph,iname); // assure string value is correct by viewing the frozen graph in Tensorboard
TF_Output input = {input_op,0};
inputs = &input;
assert(inputs!=0);
const char* oname = out_name;
TF_Operation* output_op = TF_GraphOperationByName(graph,oname); // assure string value is correct by viewing the frozen graph in Tensorboard
TF_Output output = {output_op,0};
outputs = &output;
assert(outputs!=0);
int64_t in_dims_arr[] = {noCellsPatches,no_input_sizes};
TF_Tensor* input_value = TF_NewTensor(TF_FLOAT,in_dims_arr,2,&normalizedInputs[0][0],num_bytes_in,&Deallocator, 0); // normalizedInputs at Arg 4 before
TF_Tensor* const input_value_const = input_value; // const pointer to TF_Tensor
TF_Tensor* const* input_values = &input_value_const; // pointer to const pointer to TF_Tensor
assert(input_values!=0);
int64_t out_dims_arr[] = {noCellsPatches,no_output_sizes};
TF_Tensor* output_value = TF_AllocateTensor(TF_FLOAT, out_dims_arr, 2, num_bytes_out); // pointer to TF_Tensor //Arg2!
TF_Tensor** output_values = &output_value; // pointer to pointer to TF_Tensor
assert(output_values!=0);
std::cout << "Running session..." << std::endl;
TF_SessionOptions* sess_opts = TF_NewSessionOptions();
int limitCPUThreads = 1; // if you want to limit the inference to a number of CPU Threads you can do that here
int limitNumberOfCPUs = 0;
if((limitCPUThreads!=0)&&(limitNumberOfCPUs!=0))
{
std::cout << "ERROR! You cannnot limit both number of CPUs and number of threads!" << std::endl;
}
if((limitCPUThreads!=0)&&(limitNumberOfCPUs==0))
{
std::cout << "WARNING! You are limiting CPU inference to " << limitCPUThreads << " CPU Thread(s) / Core(s)!" << std::endl;
uint8_t intra_op_parallelism_threads = limitCPUThreads; // for operations that can be parallelized internally, such as matrix multiplication
uint8_t inter_op_parallelism_threads = limitCPUThreads; // for operationss that are independent in your TensorFlow graph because there is no directed path between them in the dataflow graph
uint8_t config[]={0x10,intra_op_parallelism_threads,0x28,inter_op_parallelism_threads};
TF_SetConfig(sess_opts,config,sizeof(config),status);
if (TF_GetCode(status) != TF_OK)
{
printf("ERROR: %s\n", TF_Message(status));
}
}
if((limitCPUThreads==0)&&(limitNumberOfCPUs!=0)) // HIER SCHEINT NOCH ETWAS NICHT ZU STIMMEN!
{
std::cout << "WARNING! You are limiting CPU inference to " << limitNumberOfCPUs << " CPU(s)!" << std::endl;
uint8_t numberOfCPUs = limitNumberOfCPUs;
uint8_t config[] = {0xa, 0x7, 0xa, 0x3, 0x43, 0x50, 0x55, 0x10, 0x01};
std::cout << config << std::endl;
TF_SetConfig(sess_opts,config,sizeof(config),status);
if (TF_GetCode(status) != TF_OK)
{
printf("ERROR: %s\n", TF_Message(status));
}
}
TF_Session* session = TF_NewSession(graph, sess_opts, status);
assert(TF_GetCode(status)==TF_OK);
auto t_start = std::chrono::high_resolution_clock::now();
TF_SessionRun(session,nullptr,inputs,input_values,1,outputs,output_values,1,nullptr,0,nullptr,status);
auto t_end = std::chrono::high_resolution_clock::now();
auto total = std::chrono::duration<float, std::milli>(t_end - t_start).count();
std::cout << "time required for inference: " << total << std::endl;
float* out_vals = static_cast<float*>(TF_TensorData(*output_values));
std::vector<float> results(no_output_sizes*noCellsPatches,0);
for(int i=0;i<noCellsPatches;i++)
{
for(int j=0;j<no_output_sizes;j++)
{
results.at(i*no_output_sizes+j) = *out_vals;
out_vals++;
}
}
std::cout << "Successfully ran session!" << std::endl;
TF_CloseSession(session,status);
TF_DeleteSession(session,status);
TF_DeleteSessionOptions(sess_opts);
delete [] normalizedInputs[0];
delete [] normalizedInputs;
return results;
}
Is there some kind of memory leak that I did not recognize? Or what could be the reason it works for some hundred timesteps and then crashes?
Thanks in advance!

How to inserta nd retreive Multidimension array with static data in it into database using **insert** function of mongo-cxx-driver?

I tried the conventional way of passing an array to a wrapper function in which I'm using insertOne to insert data using for loop. No build issues, but while running, I'm hitting this error: Microsoft C++ exception: mongocxx::v_noabi::bulk_write_exception at memory location 0x000000B26C12DF30. Here is my source code.
int main(void) {
char EUI[][20] = { "10205E3710014240", "10205e37100142cc" ,"10205E6910001E58", "10205E371001426C" };
char IP[][15] = { "192.168.85.117" , "192.168.85.114", "192.168.85.186", "192.168.85.168" };
int i = 4;
push_data(IP, EUI, i);
while (1);
}
void push_data(char IP[][15], char EUI[][20], int count)
{
mongocxx::instance inst{};
mongocxx::client conn{ mongocxx::uri{} };
auto collection = conn["new"]["collection"];
int a;
builder::stream::document builder{};
auto in_array = builder << "subdocs" << builder::stream::open_array;
for (a = 0; a<count; a++) {
in_array = in_array << builder::stream::open_document << EUI[a] << IP[a]
<< builder::stream::close_document;
}
auto after_array = in_array << builder::stream::close_array;
bsoncxx::document::value doc = after_array << builder::stream::finalize;
bsoncxx::document::view view = doc.view();
for (a = 0; a < count; a++) {
collection.insert_one(doc.view());
}
auto cursor = collection.find({});
for (auto&& doc : cursor) {
std::cout << bsoncxx::to_json(doc) << std::endl;
}
}
Almost certainly, an exception has been thrown from collection.insert_one(doc.view());. You should catch that exception (by using try, and catch), and inspect the contents of the exception, which should tell you more about what is going wrong.

OpenDDS - Create multiple topics from single IDL structure

In my exercise with OpenDDS I would like to create multiple topics from a single IDL structure, is that possible? otherwise please let me know how to do it.
I do it as below, please correct me if it is not the right way to do it.
The sample I use is available at OpenDDS-3.12/examples/DCPS/IntroductionToOpenDDS
The IDL is as follows,
StockQuoter.idl
---------------
module StockQuoter
{
#pragma DCPS_DATA_TYPE "StockQuoter::Quote"
#pragma DCPS_DATA_KEY "StockQuoter::Quote ticker"
struct Quote {
string ticker;
string exchange;
string full_name;
double value;
string data;
TimeBase::TimeT timestamp;
};
}
publisher.cpp
// Create TOPICS and TYPES Vector
std::stringstream ss;
for(unsigned int idx = 0; idx < 100; ++idx)
{
ss << (idx+1);
TOPICS.push_back("TOPIC" + std::string(ss.str()));
TYPES.push_back("TYPE" + std::string(ss.str()));
ss.clear();
ss.str(std::string());
}
// Register
for( unsigned int idx = 0; idx < 100; ++idx )
{
vec_quote_servent.push_back(new StockQuoter::QuoteTypeSupportImpl());
if (DDS::RETCODE_OK != vec_quote_servent[idx]->register_type(participant.in (), TYPES[idx].c_str()))
{
cerr << "register_type for " << TYPES[idx] << " failed." << endl;
ACE_OS::exit(1);
}
}
// Create a topics
for( unsigned int idx = 0; idx < 100; ++idx )
{
vec_quote_topic.push_back( participant->create_topic (TOPICS[idx].c_str(),
TYPES[idx].c_str(),
default_topic_qos,
DDS::TopicListener::_nil(),
::OpenDDS::DCPS::DEFAULT_STATUS_MASK));
if (CORBA::is_nil (vec_quote_topic[idx].in ())) {
cerr << "create_topic for " << TOPICS[idx] << " failed." << endl;
ACE_OS::exit(1);
}
}
// Create DataWriters
for( unsigned int idx = 0; idx < 100; ++idx )
{
vec_quote_base_dw.push_back( pub->create_datawriter(vec_quote_topic[idx].in (),
dw_default_qos,
DDS::DataWriterListener::_nil(),
::OpenDDS::DCPS::DEFAULT_STATUS_MASK) );
if (CORBA::is_nil (vec_quote_base_dw[idx].in ())) {
cerr << "create_datawriter for " << TOPICS[idx] << " failed." << endl;
ACE_OS::exit(1);
}
vec_quote_dw.push_back( StockQuoter::QuoteDataWriter::_narrow(vec_quote_base_dw[idx].in()) );
if (CORBA::is_nil (vec_quote_dw[idx].in ())) {
cerr << TOPICS[idx] << " could not be narrowed"<< endl;
ACE_OS::exit(1);
}
}
// Create handle
for( unsigned int idx = 0; idx < 100 ; ++idx )
{
{
StockQuoter::Quote topic2;
topic2.ticker = CORBA::string_dup(TOPICS[idx].c_str());
vec_topic_handle.push_back(vec_quote_dw[idx]->register_instance(topic2));
}
}
// Publish data
StockQuoter::Quote vec_quote;
vec_quote.exchange = STOCK_EXCHANGE_NAME;
vec_quote.ticker = CORBA::string_dup("VEC_TOPIC");
vec_quote.full_name = CORBA::string_dup("TOPIC Receipts");
vec_quote.value = 1600.0 + 10.0*i;
vec_quote.timestamp = get_timestamp();
for(unsigned int idx = 0; idx < 100; ++idx )
{
vec_quote.value += idx + 10;
cout << "Publishing " << TOPICS[idx] << " : " << vec_quote.value <<endl;
ret = vec_quote_dw[idx]->write(vec_quote, vec_topic_handle[idx]);
if (ret != DDS::RETCODE_OK) {
ACE_ERROR ((LM_ERROR, ACE_TEXT("(%P|%t) ERROR: TOPIC2 write returned %d.\n"), ret));
}
}
a, now I get the point you wanted to ask. you can define different topic types either in one file per topic, or all in one file. If you define more than one topic type in an IDL file, type support is generated for each file. Let me describe this more precisely with the same example you used. The IDL file for the IntroductionToOpenDDS example looks as follows:
#include "orbsvcs/TimeBase.idl"
module StockQuoter
{
#pragma DCPS_DATA_TYPE "StockQuoter::Quote"
#pragma DCPS_DATA_KEY "StockQuoter::Quote ticker"
struct Quote {
string ticker;
string exchange;
string full_name;
double value;
TimeBase::TimeT timestamp;
};
#pragma DCPS_DATA_TYPE "StockQuoter::ExchangeEvent"
#pragma DCPS_DATA_KEY "StockQuoter::ExchangeEvent exchange"
enum ExchangeEventType { TRADING_OPENED,
TRADING_CLOSED,
TRADING_SUSPENDED,
TRADING_RESUMED };
struct ExchangeEvent {
string exchange;
ExchangeEventType event;
TimeBase::TimeT timestamp;
};
};
As you can see, two types are defined: Quote and ExchangeEvent. When this IDL file gets compiled, type support for both Quote and ExchangeEvent is generated.
You already used the type support for using this line (QuoteTypeSupportImpl):
vec_quote_servent.push_back(new StockQuoter::QuoteTypeSupportImpl());
The same type support is generated for ExchangeEvent, you will find a type support called StockQuoter::ExchangeEvent with a StockQuoter::ExchangeEventTypeSupportImpl() method. Simply use this to create a topic of type ExchangeEvent.
I hope this helps. If more details are needed, feel free to ask.
you can create as many topics as you wish from a single IDL file. you are already doing it with this line:
participant->create_topic (TOPICS[idx].c_str(),
TYPES[idx].c_str(),
default_topic_qos,
DDS::TopicListener::_nil(),
::OpenDDS::DCPS::DEFAULT_STATUS_MASK);
however, each topic you created has the same type. you can also create different types for topics if you have to.