The low performance for C++ thread pool

The low performance for C++ thread pool - c++

int step = 100;
for (int i = 0; i < jobs.size(); i += step) {
thread_pool* osgbpool=new thread_pool(mt);
for (size_t k = 0; k < step; ++k) {
if (i + k == jobs.size())
{
break;
}
else {
auto it = jobs[i + k];
osgbpool->push_task(processProjectMainNoTrans, it, out_path);
}
}
osgbpool->wait_for_tasks();
delete osgbpool;
}
The above code is my C++ code. There are some explanations, jobs are my job queue; osgbpool is my thread pool; with time goes by, I will find the cpu is getting down and the application is becoming slow. Hence, I want to know the reason why this situation happend?
Here are my threadpool.h and threadpool.cpp :
#pragma once
/**
* #file thread_pool.hpp
* #author Barak Shoshany (baraksh#gmail.com) (http://baraksh.com)
* #version 2.0.0
* #date 2021-08-14
* #copyright Copyright (c) 2021 Barak Shoshany. Licensed under the MIT license. If you use this library in published research, please cite it as follows:
* - Barak Shoshany, "A C++17 Thread Pool for High-Performance Scientific Computing", doi:10.5281/zenodo.4742687, arXiv:2105.00613 (May 2021)
*
* #brief A C++17 thread pool for high-performance scientific computing.
* #details A modern C++17-compatible thread pool implementation, built from scratch with high-performance scientific computing in mind. The thread pool is implemented as a single lightweight and self-contained class, and does not have any dependencies other than the C++17 standard library, thus allowing a great degree of portability. In particular, this implementation does not utilize OpenMP or any other high-level multithreading APIs, and thus gives the programmer precise low-level control over the details of the parallelization, which permits more robust optimizations. The thread pool was extensively tested on both AMD and Intel CPUs with up to 40 cores and 80 threads. Other features include automatic generation of futures and easy parallelization of loops. Two helper classes enable synchronizing printing to an output stream by different threads and measuring execution time for benchmarking purposes. Please visit the GitHub repository at https://github.com/bshoshany/thread-pool for documentation and updates, or to submit feature requests and bug reports.
*/
#define THREAD_POOL_VERSION "v2.0.0 (2021-08-14)"
#include <atomic> // std::atomic
#include <chrono> // std::chrono
#include <cstdint> // std::int_fast64_t, std::uint_fast32_t
#include <functional> // std::function
#include <future> // std::future, std::promise
#include <iostream> // std::cout, std::ostream
#include <memory> // std::shared_ptr, std::unique_ptr
#include <mutex> // std::mutex, std::scoped_lock
#include <queue> // std::queue
#include <thread> // std::this_thread, std::thread
#include <type_traits> // std::common_type_t, std::decay_t, std::enable_if_t, std::is_void_v, std::invoke_result_t
#include <utility> // std::move
// ============================================================================================= //
// Begin class thread_pool //
/**
* #brief A C++17 thread pool class. The user submits tasks to be executed into a queue. Whenever a thread becomes available, it pops a task from the queue and executes it. Each task is automatically assigned a future, which can be used to wait for the task to finish executing and/or obtain its eventual return value.
*/
class thread_pool
{
typedef std::uint_fast32_t ui32;
typedef std::uint_fast64_t ui64;
public:
// ============================
// Constructors and destructors
// ============================
/**
* #brief Construct a new thread pool.
*
* #param _thread_count The number of threads to use. The default value is the total number of hardware threads available, as reported by the implementation. With a hyperthreaded CPU, this will be twice the number of CPU cores. If the argument is zero, the default value will be used instead.
*/
thread_pool(const ui32 &_thread_count = std::thread::hardware_concurrency())
: thread_count(_thread_count ? _thread_count : std::thread::hardware_concurrency()), threads(new std::thread[_thread_count ? _thread_count : std::thread::hardware_concurrency()])
{
create_threads();
}
/**
* #brief Destruct the thread pool. Waits for all tasks to complete, then destroys all threads. Note that if the variable paused is set to true, then any tasks still in the queue will never be executed.
*/
~thread_pool()
{
wait_for_tasks();
running = false;
destroy_threads();
}
// =======================
// Public member functions
// =======================
/**
* #brief Get the number of tasks currently waiting in the queue to be executed by the threads.
*
* #return The number of queued tasks.
*/
ui64 get_tasks_queued() const
{
const std::scoped_lock lock(queue_mutex);
return tasks.size();
}
/**
* #brief Get the number of tasks currently being executed by the threads.
*
* #return The number of running tasks.
*/
ui32 get_tasks_running() const
{
return tasks_total - (ui32)get_tasks_queued();
}
/**
* #brief Get the total number of unfinished tasks - either still in the queue, or running in a thread.
*
* #return The total number of tasks.
*/
ui32 get_tasks_total() const
{
return tasks_total;
}
/**
* #brief Get the number of threads in the pool.
*
* #return The number of threads.
*/
ui32 get_thread_count() const
{
return thread_count;
}
/**
* #brief Parallelize a loop by splitting it into blocks, submitting each block separately to the thread pool, and waiting for all blocks to finish executing. The user supplies a loop function, which will be called once per block and should iterate over the block's range.
*
* #tparam T1 The type of the first index in the loop. Should be a signed or unsigned integer.
* #tparam T2 The type of the index after the last index in the loop. Should be a signed or unsigned integer. If T1 is not the same as T2, a common type will be automatically inferred.
* #tparam F The type of the function to loop through.
* #param first_index The first index in the loop.
* #param index_after_last The index after the last index in the loop. The loop will iterate from first_index to (index_after_last - 1) inclusive. In other words, it will be equivalent to "for (T i = first_index; i < index_after_last; i++)". Note that if first_index == index_after_last, the function will terminate without doing anything.
* #param loop The function to loop through. Will be called once per block. Should take exactly two arguments: the first index in the block and the index after the last index in the block. loop(start, end) should typically involve a loop of the form "for (T i = start; i < end; i++)".
* #param num_blocks The maximum number of blocks to split the loop into. The default is to use the number of threads in the pool.
*/
template <typename T1, typename T2, typename F>
void parallelize_loop(const T1 &first_index, const T2 &index_after_last, const F &loop, ui32 num_blocks = 0)
{
typedef std::common_type_t<T1, T2> T;
T the_first_index = (T)first_index;
T last_index = (T)index_after_last;
if (the_first_index == last_index)
return;
if (last_index < the_first_index)
{
T temp = last_index;
last_index = the_first_index;
the_first_index = temp;
}
last_index--;
if (num_blocks == 0)
num_blocks = thread_count;
ui64 total_size = (ui64)(last_index - the_first_index + 1);
ui64 block_size = (ui64)(total_size / num_blocks);
if (block_size == 0)
{
block_size = 1;
num_blocks = (ui32)total_size > 1 ? (ui32)total_size : 1;
}
std::atomic<ui32> blocks_running = 0;
for (ui32 t = 0; t < num_blocks; t++)
{
T start = ((T)(t * block_size) + the_first_index);
T end = (t == num_blocks - 1) ? last_index + 1 : ((T)((t + 1) * block_size) + the_first_index);
blocks_running++;
push_task([start, end, &loop, &blocks_running]
{
loop(start, end);
blocks_running--;
});
}
while (blocks_running != 0)
{
sleep_or_yield();
}
}
/**
* #brief Push a function with no arguments or return value into the task queue.
*
* #tparam F The type of the function.
* #param task The function to push.
*/
template <typename F>
void push_task(const F &task)
{
tasks_total++;
{
const std::scoped_lock lock(queue_mutex);
tasks.push(std::function<void()>(task));
}
}
/**
* #brief Push a function with arguments, but no return value, into the task queue.
* #details The function is wrapped inside a lambda in order to hide the arguments, as the tasks in the queue must be of type std::function<void()>, so they cannot have any arguments or return value. If no arguments are provided, the other overload will be used, in order to avoid the (slight) overhead of using a lambda.
*
* #tparam F The type of the function.
* #tparam A The types of the arguments.
* #param task The function to push.
* #param args The arguments to pass to the function.
*/
template <typename F, typename... A>
void push_task(const F &task, const A &...args)
{
push_task([task, args...]
{ task(args...); });
}
/**
* #brief Reset the number of threads in the pool. Waits for all currently running tasks to be completed, then destroys all threads in the pool and creates a new thread pool with the new number of threads. Any tasks that were waiting in the queue before the pool was reset will then be executed by the new threads. If the pool was paused before resetting it, the new pool will be paused as well.
*
* #param _thread_count The number of threads to use. The default value is the total number of hardware threads available, as reported by the implementation. With a hyperthreaded CPU, this will be twice the number of CPU cores. If the argument is zero, the default value will be used instead.
*/
void reset(const ui32 &_thread_count = std::thread::hardware_concurrency())
{
bool was_paused = paused;
paused = true;
wait_for_tasks();
running = false;
destroy_threads();
thread_count = _thread_count ? _thread_count : std::thread::hardware_concurrency();
threads.reset(new std::thread[thread_count]);
paused = was_paused;
running = true;
create_threads();
}
/**
* #brief Submit a function with zero or more arguments and no return value into the task queue, and get an std::future<bool> that will be set to true upon completion of the task.
*
* #tparam F The type of the function.
* #tparam A The types of the zero or more arguments to pass to the function.
* #param task The function to submit.
* #param args The zero or more arguments to pass to the function.
* #return A future to be used later to check if the function has finished its execution.
*/
template <typename F, typename... A, typename = std::enable_if_t<std::is_void_v<std::invoke_result_t<std::decay_t<F>, std::decay_t<A>...>>>>
std::future<bool> submit(const F &task, const A &...args)
{
std::shared_ptr<std::promise<bool>> task_promise(new std::promise<bool>);
std::future<bool> future = task_promise->get_future();
push_task([task, args..., task_promise]
{
try
{
task(args...);
task_promise->set_value(true);
}
catch (...)
{
try
{
task_promise->set_exception(std::current_exception());
}
catch (...)
{
}
}
});
return future;
}
/**
* #brief Submit a function with zero or more arguments and a return value into the task queue, and get a future for its eventual returned value.
*
* #tparam F The type of the function.
* #tparam A The types of the zero or more arguments to pass to the function.
* #tparam R The return type of the function.
* #param task The function to submit.
* #param args The zero or more arguments to pass to the function.
* #return A future to be used later to obtain the function's returned value, waiting for it to finish its execution if needed.
*/
template <typename F, typename... A, typename R = std::invoke_result_t<std::decay_t<F>, std::decay_t<A>...>, typename = std::enable_if_t<!std::is_void_v<R>>>
std::future<R> submit(const F &task, const A &...args)
{
std::shared_ptr<std::promise<R>> task_promise(new std::promise<R>);
std::future<R> future = task_promise->get_future();
push_task([task, args..., task_promise]
{
try
{
task_promise->set_value(task(args...));
}
catch (...)
{
try
{
task_promise->set_exception(std::current_exception());
}
catch (...)
{
}
}
});
return future;
}
/**
* #brief Wait for tasks to be completed. Normally, this function waits for all tasks, both those that are currently running in the threads and those that are still waiting in the queue. However, if the variable paused is set to true, this function only waits for the currently running tasks (otherwise it would wait forever). To wait for a specific task, use submit() instead, and call the wait() member function of the generated future.
*/
void wait_for_tasks()
{
while (true)
{
if (!paused)
{
if (tasks_total == 0)
break;
}
else
{
if (get_tasks_running() == 0)
break;
}
sleep_or_yield();
}
}
// ===========
// Public data
// ===========
/**
* #brief An atomic variable indicating to the workers to pause. When set to true, the workers temporarily stop popping new tasks out of the queue, although any tasks already executed will keep running until they are done. Set to false again to resume popping tasks.
*/
std::atomic<bool> paused = false;
/**
* #brief The duration, in microseconds, that the worker function should sleep for when it cannot find any tasks in the queue. If set to 0, then instead of sleeping, the worker function will execute std::this_thread::yield() if there are no tasks in the queue. The default value is 1000.
*/
ui32 sleep_duration = 1000;
private:
// ========================
// Private member functions
// ========================
/**
* #brief Create the threads in the pool and assign a worker to each thread.
*/
void create_threads()
{
for (ui32 i = 0; i < thread_count; i++)
{
threads[i] = std::thread(&thread_pool::worker, this);
}
}
/**
* #brief Destroy the threads in the pool by joining them.
*/
void destroy_threads()
{
for (ui32 i = 0; i < thread_count; i++)
{
threads[i].join();
}
}
/**
* #brief Try to pop a new task out of the queue.
*
* #param task A reference to the task. Will be populated with a function if the queue is not empty.
* #return true if a task was found, false if the queue is empty.
*/
bool pop_task(std::function<void()> &task)
{
const std::scoped_lock lock(queue_mutex);
if (tasks.empty())
return false;
else
{
task = std::move(tasks.front());
tasks.pop();
return true;
}
}
/**
* #brief Sleep for sleep_duration microseconds. If that variable is set to zero, yield instead.
*
*/
void sleep_or_yield()
{
if (sleep_duration)
std::this_thread::sleep_for(std::chrono::microseconds(sleep_duration));
else
std::this_thread::yield();
}
/**
* #brief A worker function to be assigned to each thread in the pool. Continuously pops tasks out of the queue and executes them, as long as the atomic variable running is set to true.
*/
void worker()
{
while (running)
{
std::function<void()> task;
if (!paused && pop_task(task))
{
task();
tasks_total--;
}
else
{
sleep_or_yield();
}
}
}
// ============
// Private data
// ============
/**
* #brief A mutex to synchronize access to the task queue by different threads.
*/
mutable std::mutex queue_mutex = {};
/**
* #brief An atomic variable indicating to the workers to keep running. When set to false, the workers permanently stop working.
*/
std::atomic<bool> running = true;
/**
* #brief A queue of tasks to be executed by the threads.
*/
std::queue<std::function<void()>> tasks = {};
/**
* #brief The number of threads in the pool.
*/
ui32 thread_count;
/**
* #brief A smart pointer to manage the memory allocated for the threads.
*/
std::unique_ptr<std::thread[]> threads;
/**
* #brief An atomic variable to keep track of the total number of unfinished tasks - either still in the queue, or running in a thread.
*/
std::atomic<ui32> tasks_total = 0;
};
// End class thread_pool //
// ============================================================================================= //
// ============================================================================================= //
// Begin class synced_stream //
/**
* #brief A helper class to synchronize printing to an output stream by different threads.
*/
class synced_stream
{
public:
/**
* #brief Construct a new synced stream.
*
* #param _out_stream The output stream to print to. The default value is std::cout.
*/
synced_stream(std::ostream &_out_stream = std::cout)
: out_stream(_out_stream) {};
/**
* #brief Print any number of items into the output stream. Ensures that no other threads print to this stream simultaneously, as long as they all exclusively use this synced_stream object to print.
*
* #tparam T The types of the items
* #param items The items to print.
*/
template <typename... T>
void print(const T &...items)
{
const std::scoped_lock lock(stream_mutex);
(out_stream << ... << items);
}
/**
* #brief Print any number of items into the output stream, followed by a newline character. Ensures that no other threads print to this stream simultaneously, as long as they all exclusively use this synced_stream object to print.
*
* #tparam T The types of the items
* #param items The items to print.
*/
template <typename... T>
void println(const T &...items)
{
print(items..., '\n');
}
private:
/**
* #brief A mutex to synchronize printing.
*/
mutable std::mutex stream_mutex = {};
/**
* #brief The output stream to print to.
*/
std::ostream &out_stream;
};
// End class synced_stream //
// ============================================================================================= //
// ============================================================================================= //
// Begin class timer //
/**
* #brief A helper class to measure execution time for benchmarking purposes.
*/
class timer
{
typedef std::int_fast64_t i64;
public:
/**
* #brief Start (or restart) measuring time.
*/
void start()
{
start_time = std::chrono::steady_clock::now();
}
/**
* #brief Stop measuring time and store the elapsed time since start().
*/
void stop()
{
elapsed_time = std::chrono::steady_clock::now() - start_time;
}
/**
* #brief Get the number of milliseconds that have elapsed between start() and stop().
*
* #return The number of milliseconds.
*/
i64 ms() const
{
return (std::chrono::duration_cast<std::chrono::milliseconds>(elapsed_time)).count();
}
private:
/**
* #brief The time point when measuring started.
*/
std::chrono::time_point<std::chrono::steady_clock> start_time = std::chrono::steady_clock::now();
/**
* #brief The duration that has elapsed between start() and stop().
*/
std::chrono::duration<double> elapsed_time = std::chrono::duration<double>::zero();
};
// End class timer //
// ============================================================================================= //

I'm the author of the thread pool library you are using.
First of all, your question implicitly presents my work as your own. Please give credit when credit is due...
Second, you seem to be creating a new a thread_pool object and then deleting it in every iteration of the for loop. That's not how you're supposed to use the thread pool, and indeed it defeats the whole purpose of a thread pool.
The point of the thread pool is to avoid the overhead of creating and destroying a new thread for each task, but the way you're using it now, you are forcing your program to create and destroy all the threads all over again in each iteration of the loop, which is most likely the reason your program is being slow.
Instead, create the thread_pool object only once when the program starts, and then use the same object throughout the program by submitting jobs to it.
Also, there is absolutely no reason to use manual memory allocation (new and delete) here. The thread_pool object itself is small, and you only need to create one such object, so it takes a negligible amount of memory in the stack, and you do not need to allocate memory for it on the heap. Furthermore, manual memory allocation in C++ can easily lead to memory leaks if not used correctly. Instead, just create the object as usual (e.g. thread_pool pool).
I suggest that you check out the documentation for the thread pool library for examples of how to use it correctly.

A few things
If the number of threads is greater than or equal to the number of cores and all threads are doing non-blocking work, then of course your computer is going to slow down. Consider having the thread pool use with hardware_concurrency()-1 to save some cycles for the rest of the application or computer.
Your sleep_or_yield function is inefficient. learn how to properly use std::condition_variable with a mutex and lock for the thread to sleep until a condition changes.
I'm not sure if this is the root issue of your performance problems, but I think it will help.

Related

FreeRTOS task mutually exclusive execution

I have a FreeRTOS task generator_task that generates random number once a second and stores it in buffer (this is int32_t for simplifying).
Some of generated values are magic, some are not. Task generator_task can not check if generated value is magic (requirement by design).
I have two equal FreeRTOS tasks magic_printer_task that detect button press, calls int32_t get_magic_value() and prints ButtonX pressed, magic value is %d. Function get_magic_value() must be blocking and very simple to use.
Function get_magic_value() is allowed to check if value is magic or not. Function get_magic_value() from specific task is required (by design) to monopolize reading values generated by generator_task until it gets value that treated as magic. From start and until generator_task generates magic value, get_magic_value from (example)task1 must not allow any other task to read generated values. So, other tasks should wait until task1 will get a magic value. If get_magic_value() reads non-magic value, it continues to monopolize generator and waits for next value.
Question is: How should I swap running-blocked states between two specific tasks, not giving other tasks to intercept control over those two specific tasks?
Could you help me to write lines marked with // =>>> to achieve my goals?
FreeRTOS-like C Pseudo Code:
int32_t buffer;
SemaphoreHandle_t buffer_lock;
SemaphoreHandle_t generator_attachment_lock;
bool result_accepted;
void generator_task(void* pvParams)
{
for(;;)
{
xSemaphoreTake(buffer_lock, portMAX_DELAY);
buffer = random(); buffer *= 2; buffer += random();
result_accepted = false;
// =>>> Check if some task locked generator_attachment_lock
// =>>> Give control to task who locked generator_attachment_lock
// =>>> Suspend until task who locked generator_attachment_lock wakes me
if (!result_accepted) {printf("Not accepted: %d", buffer);}
xSemaphoreGive(buffer_lock);
vTaskDelay(1000);
}
}
int32_t get_magic_value()
{
int32_t value;
xSemaphoreTake(generator_attachment_lock, portMAX_DELAY);
bool is_value_magic = false;
do {
// =>>> Suspend until generator_task wakes me
value = get_generator_value();
is_value_magic = value_is_magic(value);
result_accepted = is_value_magic;
// =>>> Wake generator_task
} while (!is_value_magic);
xSemaphoreGive(generator_attachment_lock);
return value;
}
int32_t get_generator_value()
{
int32_t value;
xSemaphoreTake(buffer_lock, portMAX_DELAY);
value = buffer;
xSemaphoreGive(buffer_lock);
return value;
}
void magic_printer_task(void* pvParams)
{
char * btnName = (char*)pvParams;
for(;;)
{
if (button_pressed(btnName)) { printf("%s magic value: %d", btnName, get_magic_value());}
vTaskDelay(50);
}
}
void main()
{
buffer_lock = xSemaphoreCreateMutex();
generator_attachment_lock = xSemaphoreCreateMutex();
xSemaphoreGive(buffer_lock);
xSemaphoreGive(generator_attachment_lock);
xTaskCreate(generator_task, ...);
xTaskCreate(magic_printer_task, ..., "Button1");
xTaskCreate(magic_printer_task, ..., "Button2");
}
//MOCK
bool value_is_magic(int32_t value) {return (value % 5 == 0); }
//MOCK
bool button_pressed(char* btnName) { return random() % 50 == 0; }

physx multithreading copy transform data

I have a scene with tons of similar objects moved by physx, and i want to draw all of this using opengl instansing. So, i need to form a array with transform data of each object and pass it in to opengl shader. And, currently, filling an array is bottleneck of my app, because physx simulation using 16 thread, but creating array use just one thread.
So, i created data_transfer_task class, which contain two indexes, start and stop and move transform data of physx objects between this indexes to array.
class data_transfer_task : public physx::PxTask {
public:
int start;
int stop;
start_transfer_task* base_task;
data_transfer_task(int start, int stop, start_transfer_task* task, physx::PxTaskManager *mtm) :physx::PxTask() {
this->start = start;
this->stop = stop;
this->mTm = mtm;
base_task = task;
}
void update_transforms();
virtual const char* getName() const { return "data_transfer_task"; }
virtual void run();
};
void data_transfer_task::update_transforms() {
for (int i = start; i < stop; i++) {
auto obj = base_task->objects->at(i);
auto transform = obj->getGlobalPose();
DrawableObject* dr = (DrawableObject*)obj->userData;
auto pos = transform.p;
auto rot = transform.q;
dr->set_position(glm::vec3(pos.x, pos.y, pos.z));
dr->set_rotation(glm::quat(rot.w, rot.x, rot.y, rot.z));
}
}
void data_transfer_task::run() { update_transforms(); }
I created another class start_transfer_task, which creates and sheduled tasks according to thread count.
class start_transfer_task : public physx::PxLightCpuTask{
public:
start_transfer_task(physx::PxCpuDispatcher* disp, std::vector<physx::PxRigidDynamic*>* obj, physx::PxTaskManager* mtm) :physx::PxLightCpuTask() {
this->mTm = mtm;
this->dispatcher = disp;
this->objects = obj;
}
physx::PxCpuDispatcher* dispatcher;
std::vector<physx::PxRigidDynamic*>* objects;
void start();
virtual const char* getName() const { return "start_transfer_task"; }
virtual void run();
};
void start_transfer_task::start() {
int thread_count = dispatcher->getWorkerCount();
int obj_count = objects->size();
int batch_size = obj_count / thread_count;
int first_size = batch_size + obj_count % thread_count;
auto task = new data_transfer_task(0, first_size, this, this->mTm);
this->mTm->submitUnnamedTask(*task, physx::PxTaskType::TT_CPU);
task->removeReference();
if (batch_size > 0) {
for (int i = 1; i < thread_count; i++) {
task = new data_transfer_task(first_size + batch_size * (i - 1), first_size + batch_size * i, this, this->mTm);
this->mTm->submitUnnamedTask(*task, physx::PxTaskType::TT_CPU);
task->removeReference();
}
}
}
void data_transfer_task::run() { update_transforms(); }
I create start_transfer_task instance before call simulate, pass start_transfer_task to simulate, and i expect that start_transfer_task should run after all physx task done its own job, so write and read api calls dont owerlap, and calling fetchResults(block=true) continue execution only where all of my tasks finish copy transform data.
while (is_simulate) {
auto transfer_task = new start_transfer_task(gScene->getCpuDispatcher(), &objects, gScene->getTaskManager());
gScene->simulate(1.0f / 60.0f, transfer_task);
gScene->fetchResults(true);
//some other logic to call graphics api and sleep to sustain 30 updates per second
But i got many warnings about read and write api call owerlapping like this.
\physx\source\physx\src\NpWriteCheck.cpp (53) : invalid operation : Concurrent API write call or overlapping API read and write call detected during physx::NpScene::simulateOrCollide from thread 8492! Note that write operations to the SDK must be sequential, i.e., no overlap with other write or read calls, else the resulting behavior is undefined. Also note that API writes during a callback function are not permitted.
And, sometimes after start my app, i got a strange assert message.
physx\source\task\src\TaskManager.cpp(195) : Assertion failed: !mPendingTasks"
So, what i doing wrong ?

The concurrent API call warning is essentially telling you that you are calling for multiple thread PhysX API functions that are supposed to be single threaded.
Using the PhysX API you have to be very careful because it is not thread safe, and the thread safeness is left to the user.
Read this for more information.

Is that any potential memory leak or deadlock with following threading code?

I have created a threading library and it is used in our company program and the program is a customized software which is in C++ based. As shown in the code below, there are some customized variable types, such as FloatM1D, Ints and etc.
In the threading library, i'm using pthread_create (can't use the newer threading option due the C++ based is in old version) to fire threads that do calculation, then store the results in dictionary for later use.
Multiple threads are fired at one-go (DoEVM routine as shown in the code will be called multiple times at upper level) and wait for each thread completion before retrieval of the result. The thread wait completion (WaitThreadCompletion() as shown in the code) will be called only once.
After the retrieval of the results, I will cleanup all the dictionary.
My problem is, the calculation routine (VSA.calculateEvm() as shown in the code) in the thread is throwing segmentation error intermittently and caused my program hang when it tries to check the thread completion in pthread_join. And I have no access/visible to VSA.calculateEvm() routine because this came from third party library, thus can't debugged deeper.
The error:
An invalid memory reference (segmentation violation) has occurred.
The code from the threading library is provided at below and need some help to check if there is any issue with it that can cause the segmentation error?
Any advise is welcome and thanks in advance.
#define PARALLEL_DEVM_VERSION 2.0
#include <iostream>
#include <cstdio>
#include <cstring>
#include <string>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <EVM.h>
#include <cstdlib>
#include <fstream>
ThreadingModule EVMx;
void *thread_EVM(void *arg);
pthread_mutex_t lock_x = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock_y = PTHREAD_MUTEX_INITIALIZER;
/*THREAD SHARED DATA STRUCTURE*/
struct ThreadData
{
//Warning: Do not use custom variable types in struct, it does not work good with pthread
float if_Freq;
float sample_Rate;
int tnum;
bool enable_Plots;
bool ResequenceIsNeeded;
int if_Bin;
int RAT;
};
typedef struct ThreadData m_ThreadData_Type;
BoolS m_Parallel_EVM;
map<int, FloatM> m_TnumVsDEVM_map;
map<int, FloatM1D> m_Tnum_ReseqArray_map;
map<int, pthread_t> m_TnumVsThreadIds_map;
map<int, FloatM> m_TnumVsResDEVM_map;
map<int, int> m_TnumVsUTPaver_map;
IntS m_tnum;
/*THREAD SHARED DATA - THESE HAVE TO USE WITH MUTEX LOCK */
FloatM1D m_measArrayTmp;
FloatM1D m_refArrayTmp;
UnsignedS m_if_Bin;
UnsignedS m_RAT;
Dot11acChannelType chan_Type;
Dot11PilotTracking pilot_Tracking;
Dot11EqualizerTraining EQ_Train;
Dot11Trigger trig;
StringS wave_ID;
/* RESERVED STRING */
IntS TEST_ID_PREFIX = 1000;
IntS ResequenceTimeDomainToComplex(FloatM1D measArray, FloatM1D complexOutArray, const UnsignedS ifBin, const UnsignedS stride);
IntS Evm (FloatM1D measArray, FloatM1D refArray, const Dot11acChannelType chanType, const Dot11PilotTracking PILOT_TRACK, const FloatS ifFreq, const FloatS sampleRate, const Dot11EqualizerTraining EQ_TRAIN, const Dot11Trigger TRIGGER, OfdmEvmData &evmResults, const BoolS enablePlots = false , const StringS plotTag = "");
BoolS Init(int TestNum, FloatM1D resdevm, IntS resdevmIndex, BoolS enabledataPlots=false, BoolS enableEVMsweep=false, BoolS UTP_AVG=true, IntS NumOfAverage=1);
void WaitThreadCompletion(IntS &totalThreads);
void PostThreadsCleanup();
map<int, FloatM> GetDEVMs();
map<int, FloatM1D> GetResDEVMs();
ThreadingModule::ThreadingModule()
{
//empty constr
}
IntS ThreadingModule::ResequenceTimeDomainToComplex(FloatM1D measArray, FloatM1D &complexOutArray, const UnsignedS ifBin, const UnsignedS stride)
{
/**
* #brief Resequence array wrapper. If in parallel mode, it will do it at EVM thread.
* #param measArray measured array.
* #param complexOutArray array
* #param ifBin argument
* #param stride argument
* #param enableEVMsweep argument
* #return Parallel EVM flag.
*/
IntS rslt;
IntS ret;
/*## PARALLEL MODE ##*/
//Push the array to dictionary, resequence it later
if (!m_Tnum_ReseqArray_map.count(m_tnum)>0){
m_Tnum_ReseqArray_map[m_tnum] = measArray;
m_if_Bin = ifBin;
m_RAT = stride;
} else {
//Fail safe
ERR.ReportError(ERR_GENERIC_WARNING,"Duplicated TNUM found in the dictionary in ResequenceTimeDomainToComplex.",UTL_VOID, NO_SITES, UTL_VOID);
}
return rslt;
}
IntS ThreadingModule::DoEvm(FloatM1D measArray, FloatM1D refArray,const Dot11acChannelType chanType, const Dot11PilotTracking PILOT_TRACK, const FloatS ifFreq, const FloatS sampleRate, const Dot11EqualizerTraining EQ_TRAIN, const Dot11Trigger TRIGGER, OfdmEvmData &evmResults, const BoolS enablePlots, const StringS plotTag)
{
/**
* #brief EVM wrapper.
* #param measArray measured array.
* #param refArray reference array.
* #param chanType channel type.
* #param PILOT_TRACK pilot tracking
* #param ifFreq IF freq.
* #param sampleRate EVM flag sample rate.
* #param Dot11EqualizerTraining equalizer training.
* #param Dot11Trigger trigger.
* #param OfdmEvmData devm results.
* #param enablePlots enable plots
* #param plotTag plot tagging.
* #return Ints not using.
*
*/
IntS rslt, ret;
BoolS m_isResequenceNeeded(false);
FloatM1D m_tmp_meas_Arr;
UnsignedS m_tmpifBin(0);
UnsignedS m_tmpRAT(0);
IntS m_tmp_tnum;
/*## PARALLEL MODE ##*/
/* check if resequence array is needed */
if(m_Tnum_ReseqArray_map.count(m_tnum)>0){
m_isResequenceNeeded=true;
m_tmp_meas_Arr=m_Tnum_ReseqArray_map[m_tnum];
m_tmpifBin= m_if_Bin;
m_tmpRAT=m_RAT;
m_Tnum_ReseqArray_map.erase(m_tnum);
}
/* instantiate the structure */
m_ThreadData_Type *tdata;
tdata =(m_ThreadData_Type *)malloc(sizeof (m_ThreadData_Type));
/* mutex lock to initialize all thread shared variables */
pthread_mutex_lock(&lock_y);
m_measArrayTmp = m_isResequenceNeeded ? m_tmp_meas_Arr : measArray;
m_refArrayTmp = refArray;
chan_Type = chanType;
pilot_Tracking = PILOT_TRACK;
EQ_Train = EQ_TRAIN;
trig = TRIGGER;
wave_ID = plotTag;
pthread_mutex_unlock(&lock_y);
/* thread specific data */
tdata->if_Freq = ifFreq;
tdata->sample_Rate = sampleRate;
tdata->enable_Plots = enablePlots;
tdata->tnum = m_tnum;
tdata->ResequenceIsNeeded=m_isResequenceNeeded;
tdata->if_Bin = m_tmpifBin;
tdata->RAT = m_tmpRAT;
/* create thread */
ret = pthread_create(&m_TnumVsThreadIds_map[m_tnum], NULL, thread_EVM, (void*) tdata);
if (ret!=0) ERR.ReportError(ERR_GENERIC_WARNING , "Error in create EVM thread for TNUM="+ m_tnum.GetText()+".", ret, NO_SITES, UTL_VOID);
return rslt;
}
void* thread_EVM(void* arg)
{
/**
* #brief EVM wrapper.
* #param arg Arguments passed in with structure
*
*/
FloatM1D m_meas_Arr;
FloatM1D m_ref_Arr;
FloatM1D m_cmplx_Arr;
Dot11acChannelType m_chan_Type;
Dot11PilotTracking m_pilot_Tracking;
FloatS m_freq;
FloatS m_sample_Rate;
Dot11EqualizerTraining m_EQ_Train;
Dot11Trigger m_trig;
IntS m_testnum;
BoolS m_enable_Plots;
StringS m_wave_ID;
OfdmEvmData m_evm_Results;
UnsignedS m_Bin;
UnsignedS m_RAT;
BoolS m_isResequencedNeeded(false);
/* initialize all the local variables from arg */
m_ThreadData_Type *tdata;
tdata=(m_ThreadData_Type*)arg;
/* mutex lock to retrieve thread shared data */
pthread_mutex_lock(&lock_y);
m_meas_Arr = m_measArrayTmp;
m_ref_Arr = m_refArrayTmp;
m_pilot_Tracking = pilot_Tracking;
m_EQ_Train = EQ_Train;
m_chan_Type = chan_Type;
m_trig = trig;
m_wave_ID = wave_ID;
pthread_mutex_unlock(&lock_y);
/* initialize thread specific data */
m_freq = tdata->if_Freq;
m_sample_Rate = tdata->sample_Rate;
m_testnum = tdata->tnum;
m_enable_Plots = tdata->enable_Plots;
m_Bin = tdata->if_Bin;
m_RAT = tdata->RAT;
m_isResequencedNeeded = tdata->ResequenceIsNeeded;
/* check if resequence array is needed */
if (m_isResequencedNeeded){
m_cmplx_Arr.Resize(m_ref_Arr.GetSize());
VSA.DOT11.AC.ResequenceTimeDomainToComplex(m_meas_Arr, m_cmplx_Arr, m_Bin, m_RAT);
m_meas_Arr=m_cmplx_Arr;
}
/* perform evm calculation */
VSA.calculateEvm(m_meas_Arr, m_ref_Arr, m_chan_Type, m_pilot_Tracking, m_freq, m_sample_Rate, m_EQ_Train, m_trig, m_evm_Results, m_enable_Plots, m_wave_ID);
pthread_mutex_lock(&lock_x);
m_TnumVsDEVM_map[m_testnum]= m_evm_Results.evm;
pthread_mutex_unlock(&lock_x);
free(tdata);
return NULL;
}
void ThreadingModule::WaitThreadCompletion(IntS &totalThreads)
{
/**
* #brief Check all the created threads are complete.
* #param totalThreads Total number of created threads.
*
*/
int ret;
map<int, pthread_t>::iterator itr;
IntS tnum;
pthread_t threadiD;
/* check each threads if completed */
for (itr = m_TnumVsThreadIds_map.begin(); itr != m_TnumVsThreadIds_map.end(); ++itr) {
tnum = itr->first;
threadiD = itr->second;
ret=pthread_join(threadiD, NULL);
if (ret!=0) ERR.ReportError(ERR_GENERIC_WARNING , "Can't complete the thread for TNUM="+ tnum.GetText()+".", UTL_VOID, NO_SITES, UTL_VOID);
}
totalThreads = m_TnumVsThreadIds_map.size();
}
map<int, FloatM> ThreadingModule::GetDEVMs()
{
return m_TnumVsDEVM_map;
}
map<int, FloatM> ThreadingModule::GetResDEVMs()
{
return m_TnumVsResDEVM_map;
}
void ThreadingModule::PostThreadsCleanup()
{
/**
* #brief Post threads completion cleanup. Make sure to run this only after datalog the results.
* WARNING: Only call this routine when all threads complete executed.
*/
TEST_ID_PREFIX =1000;
m_TnumVsDEVM_map.clear();
m_Tnum_ReseqArray_map.clear();
m_TnumVsThreadIds_map.clear();
m_TnumVsResDEVM_map.clear();
m_TnumVsUTPaver_map.clear();
pthread_mutex_lock(&lock_y);
m_measArrayTmp.Clear();
m_refArrayTmp.Clear();
pthread_mutex_unlock(&lock_y);
}

c++11 multi-reader / multi-writer queue using atomics for object state and perpetual incremented indexes

I am using atomics and a circular buffer in order to implement a multi-reader threads, multi-writer threads object pool.
It is difficult to investigate because instrumenting code leads to bug vanishment !
The model
Producers (or writer threads) request an Element to the Ring in order to 'prepare' the element. When terminated, the writer thread changes the element state so a reader can 'consume' it. After that, the element becomes available again for writing.
Consumers (or reader threads) request an object to the Ring in order to 'read' the object.
After 'releasing' the object, the object is in a state::Ready state, eg available to be consume by a reader thread.
It can fail if no object is available eg the next free object in the Ring is not on state::Unused state.
The 2 classes, Element and Ring
Element :
to be written, a writer thread must successfully exchange the _state member from state::Unused to state::LockForWrite
when finished, the writer thread force the state to state::Ready (it should be the only to handle this Element)
to be read, a rader thread must successfully exchange the _state member from state::Ready to state::LockForRead
when finished, the reader thread force the state to state::Unused (it should be the only to handle this Element)
Summarized :
writers lifecycle : state::Unused -> state::LockForWrite -> state::Ready
readers lifecycle : state::Ready -> state::LockForRead -> state::Unused
Ring
has a vector of Element , seen as a circular buffer.
std::atomic<int64_t> _read, _write; are the 2 indexes used to access the elements via :
_elems[ _write % _elems.size() ] for writers,
_elems[ _read % _elems.size() ] for readers.
When a reader has successfully LockForRead an object, the _read index is incremented.
When a writer has successfully LockForWrite an object, the _write index is incremented.
The main :
We add to a vector some writers and readers threads sharing the same Ring. Each thread just try to get_read or get_write element and release them just after.
Based on Element transition everything should be fine but one can observe that the Ring at some point gets blocked like because some elements in the ring are in state state::Ready with a _write % _elems.size() index pointing on it and symetrically, some elements in the ring are in state state::Unused with a _read % _elems.size() index pointing on it ! Both = deadlock.
#include<atomic>
#include<vector>
#include<thread>
#include<iostream>
#include<cstdint>
typedef enum : int
{
Unused, LockForWrite, Ready, LockForRead
}state;
class Element
{
std::atomic<state> _state;
public:
Element():_state(Unused){ }
// a reader need to successfully make the transition Ready => LockForRead
bool lock_for_read() { state s = Ready; return _state.compare_exchange_strong(s, LockForRead); }
void unlock_read() { state s = Unused; _state.store(s); }
// a reader need to successfully make the transition Unused => LockForWrite
bool lock_for_write() { state s = Unused; return _state.compare_exchange_strong(s, LockForWrite); }
void unlock_write() { state s = Ready; _state.store(s); }
};
class Ring
{
std::vector<Element> _elems;
std::atomic<int64_t> _read, _write;
public:
Ring(size_t capacity)
: _elems(capacity), _read(0), _write(0) {}
Element * get_for_read() {
Element * ret = &_elems[ _read.load() % _elems.size() ];
if (!ret->lock_for_read()) // if success, the object belongs to the caller thread as reader
return NULL;
_read.fetch_add(1); // success! incr _read index
return ret;
}
Element * get_for_write() {
Element * ret = &_elems[ _write.load() % _elems.size() ];
if (!ret->lock_for_write())// if success, the object belongs to the caller thread as writer
return NULL;
_write.fetch_add(1); // success! incr _write index
return ret;
}
void release_read(Element* e) { e->unlock_read();}
void release_write(Element* e) { e->unlock_write();}
};
int main()
{
const int capacity = 10; // easy to process modulo[![enter image description here][1]][1]
std::atomic<bool> stop=false;
Ring ring(capacity);
std::function<void()> writer_job = [&]()
{
std::cout << "writer starting" << std::endl;
Element * e;
while (!stop)
{
if (!(e = ring.get_for_write()))
continue;
// do some real writer job ...
ring.release_write(e);
}
};
std::function<void()> reader_job = [&]()
{
std::cout << "reader starting" << std::endl;
Element * e;
while (!stop)
{
if (!(e = ring.get_for_read()))
continue;
// do some real reader job ...
ring.release_read(e);
}
};
int nb_writers = 1;
int nb_readers = 2;
std::vector<std::thread> threads;
threads.reserve(nb_writers + nb_readers);
std::cout << "adding writers" << std::endl;
while (nb_writers--)
threads.push_back(std::thread(writer_job));
std::cout << "adding readers" << std::endl;
while (nb_readers--)
threads.push_back(std::thread(reader_job));
// wait user key press, halt in debugger after 1 or 2 seconds
// in order to reproduce problem and watch ring
std::cin.get();
stop = true;
std::cout << "waiting all threads...\n";
for (auto & th : threads)
th.join();
std::cout << "end" << std::endl;
}
This "watch debugger screeshot" has been took pausing the program after running 1 second. As you can see, _read is pointing to the element 8 marked as state::Unused so no transition can unblock this state for this reader, except a writer but _write index is pointing on element 0 with state state::Ready !
My question: what did I missed in this ? Structurally I am sure the sequence is correct but I am missing some atomic trick ...
os tested : rhel5/gcc 4.1.2, rhel 7/gcc 4.8, win10/ms visual 2015, win10/mingw

Yann's answer is correct about the problem: your threads can create "holes" in the sequence by reading and writing elements out-of-order if there's a delay between the read/write lock and the increment of the index. The fix is to verify that the index has not changed between the initial read and the increment, a la:
class Element
{
std::atomic<state> _state;
public:
Element():_state(Unused){ }
// a reader need to successfully make the transition Ready => LockForRead
bool lock_for_read() {
state s = Ready;
return _state.compare_exchange_strong(s, LockForRead);
}
void abort_read() { _state = Ready; }
void unlock_read() { state s = Unused; _state.store(s); }
// a reader need to successfully make the transition Unused => LockForWrite
bool lock_for_write() {
state s = Unused;
return _state.compare_exchange_strong(s, LockForWrite);
}
void abort_write() { _state = Unused; }
void unlock_write() { state s = Ready; _state.store(s); }
};
class Ring
{
std::vector<Element> _elems;
std::atomic<int64_t> _read, _write;
public:
Ring(size_t capacity)
: _elems(capacity), _read(0), _write(0) {}
Element * get_for_read() {
auto i = _read.load();
Element * ret = &_elems[ i % _elems.size() ];
if (ret->lock_for_read()) {
// if success, the object belongs to the caller thread as reader
if (_read.compare_exchange_strong(i, i + 1))
return ret;
// Woops, reading out of order.
ret->abort_read();
}
return NULL;
}
Element * get_for_write() {
auto i = _write.load();
Element * ret = &_elems[ i % _elems.size() ];
if (ret->lock_for_write()) {
// if success, the object belongs to the caller thread as writer
if (_write.compare_exchange_strong(i, i + 1))
return ret;
// Woops, writing out of order.
ret->abort_write();
}
return NULL;
}
void release_read(Element* e) { e->unlock_read();}
void release_write(Element* e) { e->unlock_write();}
};

You do not have atomic section around the increment of the two shared counters _read and _write.
That looks bad to me, you could switch another element without meaning to.
Imagine this scenario,
1 reader R1 and 1 writer W are happily cooperating.
Reader 2 executes : Element * ret = &_elems[ _read.load() % _elems.size() ];
and gets pushed off the cpu.
Now R1 and W are still playing together, so the positions of _read and _write are now arbitrary w.r.t. the element ret that R2 is pointing.
Now at some point R2 gets scheduled, and it so happens that *ret_ is readable (again possibly, R1 and W went around the block a few times).
Ouch, as you see, we will read it, and increment "_read", but _read has no relation to _ret. This creates kind of holes, elements that have not been read, but that are below _read index.
So, make critical sections to ensure that increment of _read/_write is done in the same semantic step as the actual lock.

a simple function

Hi all
I need to declare a variable in Node to keep position of that node so I declare it in node.h
like: std::vector<double> exPosition;(public)
then I defined a simple function for getting this variable like:virtual Vector
GetmyPosition (void) const=0;
then in node.cc I wrote this simple function:
node::GetmyPosition (void) const
{
return exPosition;
}
but when i run it it has this error:
cannot allocate an object of abstract type 'ns3::Node'
note: because the following virtual functions are pure within 'ns3::Node';
here is the complete code for node.h:
#ifndef NODE_H
#define NODE_H
#include <vector>
#include "ns3/object.h"
#include "ns3/callback.h"
#include "ns3/ptr.h"
#include "ns3/net-device.h"
namespace ns3 {
class Application;
class Packet;
class Address;
/**
* \ingroup node
*
* \brief A network Node.
*
* This class holds together:
* - a list of NetDevice objects which represent the network interfaces
* of this node which are connected to other Node instances through
* Channel instances.
* - a list of Application objects which represent the userspace
* traffic generation applications which interact with the Node
* through the Socket API.
* - a node Id: a unique per-node identifier.
* - a system Id: a unique Id used for parallel simulations.
*
* Every Node created is added to the NodeList automatically.
*/
class Node : public Object
{
public:
/// exposition of the node
//std::vector<Ptr<Node> > exPosition;
std::vector<double> exPosition;
/// current position of the node
//std::vector<double> cPosition;
static TypeId GetTypeId (void);
/**
* Must be invoked by subclasses only.
*/
Node();
/**
* \param systemId a unique integer used for parallel simulations.
*
* Must be invoked by subclasses only.
*/
Node(uint32_t systemId);
virtual ~Node();
/**
* \returns the unique id of this node.
*
* This unique id happens to be also the index of the Node into
* the NodeList.
*/
uint32_t GetId (void) const;
/**
* \returns the system id for parallel simulations associated
* to this node.
*/
uint32_t GetSystemId (void) const;
/**
* \param device NetDevice to associate to this node.
* \returns the index of the NetDevice into the Node's list of
* NetDevice.
*
* Associate this device to this node.
*/
uint32_t AddDevice (Ptr<NetDevice> device);
/**
* \param index the index of the requested NetDevice
* \returns the requested NetDevice associated to this Node.
*
* The indexes used by the GetDevice method start at one and
* end at GetNDevices ()
*/
Ptr<NetDevice> GetDevice (uint32_t index) const;
/**
* \returns the number of NetDevice instances associated
* to this Node.
*/
uint32_t GetNDevices (void) const;
/**
* \param application Application to associate to this node.
* \returns the index of the Application within the Node's list
* of Application.
*
* Associated this Application to this Node. This method is called
* automatically from Application::Application so the user
* has little reasons to call this method directly.
*/
uint32_t AddApplication (Ptr<Application> application);
/**
* \param index
* \returns the application associated to this requested index
* within this Node.
*/
Ptr<Application> GetApplication (uint32_t index) const;
/**
* \returns the number of applications associated to this Node.
*/
uint32_t GetNApplications (void) const;
/**
* A protocol handler
*
* \param device a pointer to the net device which received the packet
* \param packet the packet received
* \param protocol the 16 bit protocol number associated with this packet.
* This protocol number is expected to be the same protocol number
* given to the Send method by the user on the sender side.
* \param sender the address of the sender
* \param receiver the address of the receiver; Note: this value is
* only valid for promiscuous mode protocol
* handlers. Note: If the L2 protocol does not use L2
* addresses, the address reported here is the value of
* device->GetAddress().
* \param packetType type of packet received
* (broadcast/multicast/unicast/otherhost); Note:
* this value is only valid for promiscuous mode
* protocol handlers.
*/
typedef Callback<void,Ptr<NetDevice>, Ptr<const Packet>,uint16_t,const Address &,
const Address &, NetDevice::PacketType> ProtocolHandler;
/**
* \param handler the handler to register
* \param protocolType the type of protocol this handler is
* interested in. This protocol type is a so-called
* EtherType, as registered here:
* http://standards.ieee.org/regauth/ethertype/eth.txt
* the value zero is interpreted as matching all
* protocols.
* \param device the device attached to this handler. If the
* value is zero, the handler is attached to all
* devices on this node.
* \param promiscuous whether to register a promiscuous mode handler
*/
void RegisterProtocolHandler (ProtocolHandler handler,
uint16_t protocolType,
Ptr<NetDevice> device,
bool promiscuous=false);
/**
* \param handler the handler to unregister
*
* After this call returns, the input handler will never
* be invoked anymore.
*/
void UnregisterProtocolHandler (ProtocolHandler handler);
/**
* \returns true if checksums are enabled, false otherwise.
*/
static bool ChecksumEnabled (void);
protected:
/**
* The dispose method. Subclasses must override this method
* and must chain up to it by calling Node::DoDispose at the
* end of their own DoDispose method.
*/
virtual void DoDispose (void);
virtual void DoStart (void);
private:
/**
* \param device the device added to this Node.
*
* This method is invoked whenever a user calls Node::AddDevice.
*/
virtual void NotifyDeviceAdded (Ptr<NetDevice> device);
bool NonPromiscReceiveFromDevice (Ptr<NetDevice> device, Ptr<const Packet>, uint16_t protocol, const Address &from);
bool PromiscReceiveFromDevice (Ptr<NetDevice> device, Ptr<const Packet>, uint16_t protocol,
const Address &from, const Address &to, NetDevice::PacketType packetType);
bool ReceiveFromDevice (Ptr<NetDevice> device, Ptr<const Packet>, uint16_t protocol,
const Address &from, const Address &to, NetDevice::PacketType packetType, bool promisc);
void Construct (void);
struct ProtocolHandlerEntry {
ProtocolHandler handler;
Ptr<NetDevice> device;
uint16_t protocol;
bool promiscuous;
};
typedef std::vector<struct Node::ProtocolHandlerEntry> ProtocolHandlerList;
uint32_t m_id; // Node id for this node
uint32_t m_sid; // System id for this node
std::vector<Ptr<NetDevice> > m_devices;
std::vector<Ptr<Application> > m_applications;
ProtocolHandlerList m_handlers;
};
} //namespace ns3
#endif /* NODE_H */
and also the complete code for node.cc
/* -*- Mode:C++; c-file-style:"gnu"; indent-tabs-mode:nil; -*- */
/*
* Copyright (c) 2006 Georgia Tech Research Corporation, INRIA
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation;
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Authors: George F. Riley<riley#ece.gatech.edu>
* Mathieu Lacage <mathieu.lacage#sophia.inria.fr>
*/
#include "node.h"
#include "node-list.h"
#include "net-device.h"
#include "application.h"
#include "ns3/packet.h"
#include "ns3/simulator.h"
#include "ns3/object-vector.h"
#include "ns3/uinteger.h"
#include "ns3/log.h"
#include "ns3/assert.h"
#include "ns3/global-value.h"
#include "ns3/boolean.h"
#include "ns3/simulator.h"
#include "ns3/vector.h"
NS_LOG_COMPONENT_DEFINE ("Node");
namespace ns3{
NS_OBJECT_ENSURE_REGISTERED (Node);
GlobalValue g_checksumEnabled = GlobalValue ("ChecksumEnabled",
"A global switch to enable all checksums for all protocols",
BooleanValue (false),
MakeBooleanChecker ());
//Vector exposition = (0.0, 0.0, 0.0);
/*.AddAttribute ("exPosition", "The previous position of this node.",
TypeId::ATTR_GET,
VectorValue (Vector (0.0, 0.0, 0.0)), // ignored initial value.
MakeVectorAccessor (&Node::m_exposition),
MakeVectorChecker ())
.AddAttribute ("cPosition", "The current position of this node.",
TypeId::ATTR_GET,
VectorValue (Vector (0.0, 0.0, 0.0)), // ignored initial value.
MakeVectorAccessor (&Node::m_cposition),
MakeVectorChecker ())*/
TypeId
Node::GetTypeId (void)
{
static TypeId tid = TypeId ("ns3::Node")
.SetParent<Object> ()
.AddConstructor<Node> ()
.AddAttribute ("DeviceList", "The list of devices associated to this Node.",
ObjectVectorValue (),
MakeObjectVectorAccessor (&Node::m_devices),
MakeObjectVectorChecker<NetDevice> ())
.AddAttribute ("ApplicationList", "The list of applications associated to this Node.",
ObjectVectorValue (),
MakeObjectVectorAccessor (&Node::m_applications),
MakeObjectVectorChecker<Application> ())
.AddAttribute ("Id", "The id (unique integer) of this Node.",
TypeId::ATTR_GET, // allow only getting it.
UintegerValue (0),
MakeUintegerAccessor (&Node::m_id),
MakeUintegerChecker<uint32_t> ())
;
return tid;
}
Node::Node()
: m_id(0),
m_sid(0)
{
exPosition.at(1)= 0;
exPosition.at(2)= 0;
exPosition.at(3)= 0;
Construct ();
}
Node::Node(uint32_t sid)
: m_id(0),
m_sid(sid)
{
exPosition.at(1)= 0;
exPosition.at(2)= 0;
exPosition.at(3)= 0;
Construct ();
}
void
Node::Construct (void)
{
m_id = NodeList::Add (this);
//exPosition =(0.0,0.0,0.0);
}
Node::~Node ()
{}
uint32_t
Node::GetId (void) const
{
return m_id;
}
uint32_t
Node::GetSystemId (void) const
{
return m_sid;
}
uint32_t
Node::AddDevice (Ptr<NetDevice> device)
{
uint32_t index = m_devices.size ();
m_devices.push_back (device);
device->SetNode (this);
device->SetIfIndex(index);
device->SetReceiveCallback (MakeCallback (&Node::NonPromiscReceiveFromDevice, this));
Simulator::ScheduleWithContext (GetId (), Seconds (0.0),
&NetDevice::Start, device);
NotifyDeviceAdded (device);
return index;
}
Ptr<NetDevice>
Node::GetDevice (uint32_t index) const
{
NS_ASSERT_MSG (index < m_devices.size (), "Device index " << index <<
" is out of range (only have " << m_devices.size () << " devices).");
return m_devices[index];
}
uint32_t
Node::GetNDevices (void) const
{
return m_devices.size ();
}
uint32_t
Node::AddApplication (Ptr<Application> application)
{
uint32_t index = m_applications.size ();
m_applications.push_back (application);
application->SetNode (this);
Simulator::ScheduleWithContext (GetId (), Seconds (0.0),
&Application::Start, application);
return index;
}
Ptr<Application>
Node::GetApplication (uint32_t index) const
{
NS_ASSERT_MSG (index < m_applications.size (), "Application index " << index <<
" is out of range (only have " << m_applications.size () << " applications).");
return m_applications[index];
}
uint32_t
Node::GetNApplications (void) const
{
return m_applications.size ();
}
void
Node::DoDispose()
{
m_handlers.clear ();
for (std::vector<Ptr<NetDevice> >::iterator i = m_devices.begin ();
i != m_devices.end (); i++)
{
Ptr<NetDevice> device = *i;
device->Dispose ();
*i = 0;
}
m_devices.clear ();
for (std::vector<Ptr<Application> >::iterator i = m_applications.begin ();
i != m_applications.end (); i++)
{
Ptr<Application> application = *i;
application->Dispose ();
*i = 0;
}
m_applications.clear ();
Object::DoDispose ();
}
void
Node::DoStart (void)
{
for (std::vector<Ptr<NetDevice> >::iterator i = m_devices.begin ();
i != m_devices.end (); i++)
{
Ptr<NetDevice> device = *i;
device->Start ();
}
for (std::vector<Ptr<Application> >::iterator i = m_applications.begin ();
i != m_applications.end (); i++)
{
Ptr<Application> application = *i;
application->Start ();
}
Object::DoStart ();
}
void
Node::NotifyDeviceAdded (Ptr<NetDevice> device)
{}
void
Node::RegisterProtocolHandler (ProtocolHandler handler,
uint16_t protocolType,
Ptr<NetDevice> device,
bool promiscuous)
{
struct Node::ProtocolHandlerEntry entry;
entry.handler = handler;
entry.protocol = protocolType;
entry.device = device;
entry.promiscuous = promiscuous;
// On demand enable promiscuous mode in netdevices
if (promiscuous)
{
if (device == 0)
{
for (std::vector<Ptr<NetDevice> >::iterator i = m_devices.begin ();
i != m_devices.end (); i++)
{
Ptr<NetDevice> dev = *i;
dev->SetPromiscReceiveCallback (MakeCallback (&Node::PromiscReceiveFromDevice, this));
}
}
else
{
device->SetPromiscReceiveCallback (MakeCallback (&Node::PromiscReceiveFromDevice, this));
}
}
m_handlers.push_back (entry);
}
void
Node::UnregisterProtocolHandler (ProtocolHandler handler)
{
for (ProtocolHandlerList::iterator i = m_handlers.begin ();
i != m_handlers.end (); i++)
{
if (i->handler.IsEqual (handler))
{
m_handlers.erase (i);
break;
}
}
}
bool
Node::ChecksumEnabled (void)
{
BooleanValue val;
g_checksumEnabled.GetValue (val);
return val.Get ();
}
bool
Node::PromiscReceiveFromDevice (Ptr<NetDevice> device, Ptr<const Packet> packet, uint16_t protocol,
const Address &from, const Address &to, NetDevice::PacketType packetType)
{
NS_LOG_FUNCTION(this);
return ReceiveFromDevice (device, packet, protocol, from, to, packetType, true);
}
bool
Node::NonPromiscReceiveFromDevice (Ptr<NetDevice> device, Ptr<const Packet> packet, uint16_t protocol,
const Address &from)
{
NS_LOG_FUNCTION(this);
return ReceiveFromDevice (device, packet, protocol, from, from, NetDevice::PacketType (0), false);
}
bool
Node::ReceiveFromDevice (Ptr<NetDevice> device, Ptr<const Packet> packet, uint16_t protocol,
const Address &from, const Address &to, NetDevice::PacketType packetType, bool promiscuous)
{
NS_ASSERT_MSG (Simulator::GetContext () == GetId (), "Received packet with erroneous context ; " <<
"make sure the channels in use are correctly updating events context " <<
"when transfering events from one node to another.");
NS_LOG_DEBUG("Node " << GetId () << " ReceiveFromDevice: dev "
<< device->GetIfIndex () << " (type=" << device->GetInstanceTypeId ().GetName ()
<< ") Packet UID " << packet->GetUid ());
bool found = false;
for (ProtocolHandlerList::iterator i = m_handlers.begin ();
i != m_handlers.end (); i++)
{
if (i->device == 0 ||
(i->device != 0 && i->device == device))
{
if (i->protocol == 0 ||
i->protocol == protocol)
{
if (promiscuous == i->promiscuous)
{
i->handler (device, packet, protocol, from, to, packetType);
found = true;
}
}
}
}
return found;
}
}//namespace ns3
I'll be thankful if someone help me about it.
Bests
Bahar

You cannot instantiate an object that has a virtual function declared as =0.
You can have pointers to them, but the pointed elements will actually have a type that inherits from your base class.

GetMyPosition is defined, does it need to be pure virtual? Have you tried removing =0 from function declaration?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

The low performance for C++ thread pool - c++

Related

FreeRTOS task mutually exclusive execution

physx multithreading copy transform data

Is that any potential memory leak or deadlock with following threading code?

c++11 multi-reader / multi-writer queue using atomics for object state and perpetual incremented indexes

a simple function

Categories

Resources