physx multithreading copy transform data - c++

I have a scene with tons of similar objects moved by physx, and i want to draw all of this using opengl instansing. So, i need to form a array with transform data of each object and pass it in to opengl shader. And, currently, filling an array is bottleneck of my app, because physx simulation using 16 thread, but creating array use just one thread.
So, i created data_transfer_task class, which contain two indexes, start and stop and move transform data of physx objects between this indexes to array.
class data_transfer_task : public physx::PxTask {
public:
int start;
int stop;
start_transfer_task* base_task;
data_transfer_task(int start, int stop, start_transfer_task* task, physx::PxTaskManager *mtm) :physx::PxTask() {
this->start = start;
this->stop = stop;
this->mTm = mtm;
base_task = task;
}
void update_transforms();
virtual const char* getName() const { return "data_transfer_task"; }
virtual void run();
};
void data_transfer_task::update_transforms() {
for (int i = start; i < stop; i++) {
auto obj = base_task->objects->at(i);
auto transform = obj->getGlobalPose();
DrawableObject* dr = (DrawableObject*)obj->userData;
auto pos = transform.p;
auto rot = transform.q;
dr->set_position(glm::vec3(pos.x, pos.y, pos.z));
dr->set_rotation(glm::quat(rot.w, rot.x, rot.y, rot.z));
}
}
void data_transfer_task::run() { update_transforms(); }
I created another class start_transfer_task, which creates and sheduled tasks according to thread count.
class start_transfer_task : public physx::PxLightCpuTask{
public:
start_transfer_task(physx::PxCpuDispatcher* disp, std::vector<physx::PxRigidDynamic*>* obj, physx::PxTaskManager* mtm) :physx::PxLightCpuTask() {
this->mTm = mtm;
this->dispatcher = disp;
this->objects = obj;
}
physx::PxCpuDispatcher* dispatcher;
std::vector<physx::PxRigidDynamic*>* objects;
void start();
virtual const char* getName() const { return "start_transfer_task"; }
virtual void run();
};
void start_transfer_task::start() {
int thread_count = dispatcher->getWorkerCount();
int obj_count = objects->size();
int batch_size = obj_count / thread_count;
int first_size = batch_size + obj_count % thread_count;
auto task = new data_transfer_task(0, first_size, this, this->mTm);
this->mTm->submitUnnamedTask(*task, physx::PxTaskType::TT_CPU);
task->removeReference();
if (batch_size > 0) {
for (int i = 1; i < thread_count; i++) {
task = new data_transfer_task(first_size + batch_size * (i - 1), first_size + batch_size * i, this, this->mTm);
this->mTm->submitUnnamedTask(*task, physx::PxTaskType::TT_CPU);
task->removeReference();
}
}
}
void data_transfer_task::run() { update_transforms(); }
I create start_transfer_task instance before call simulate, pass start_transfer_task to simulate, and i expect that start_transfer_task should run after all physx task done its own job, so write and read api calls dont owerlap, and calling fetchResults(block=true) continue execution only where all of my tasks finish copy transform data.
while (is_simulate) {
auto transfer_task = new start_transfer_task(gScene->getCpuDispatcher(), &objects, gScene->getTaskManager());
gScene->simulate(1.0f / 60.0f, transfer_task);
gScene->fetchResults(true);
//some other logic to call graphics api and sleep to sustain 30 updates per second
But i got many warnings about read and write api call owerlapping like this.
\physx\source\physx\src\NpWriteCheck.cpp (53) : invalid operation : Concurrent API write call or overlapping API read and write call detected during physx::NpScene::simulateOrCollide from thread 8492! Note that write operations to the SDK must be sequential, i.e., no overlap with other write or read calls, else the resulting behavior is undefined. Also note that API writes during a callback function are not permitted.
And, sometimes after start my app, i got a strange assert message.
physx\source\task\src\TaskManager.cpp(195) : Assertion failed: !mPendingTasks"
So, what i doing wrong ?

The concurrent API call warning is essentially telling you that you are calling for multiple thread PhysX API functions that are supposed to be single threaded.
Using the PhysX API you have to be very careful because it is not thread safe, and the thread safeness is left to the user.
Read this for more information.

Related

How can I create optimized graphical multicolor logger?

I am setting up a logger class and want to optimize logging.
It has to be multicolor logger, so std::string::append(...) is not an option.
Adding new log to a vector of strings is not a good idea, because every push_back memory rises and fps goes down. I thought to create Log struct that would hold string msg and color or flag that inform us what kind of message is that and double buffer Log struct. Write to the first then pass it to the second Log object and draw from it, then clear first Log object ... and so on. I tried to implement it, but it does not work as I wish, though.
At the moment I left vector of Logs
class Logger
{
public:
struct Log {
std::string text;
Uint color;
}
void Draw() {
for(const auto& log : logs) {
renderer->DrawString(log.text, log.color);
}
}
void AddLog(const std::string& text, Uint color) {
logs.emplace_back(text, color);
}
std::vector<Log> logs;
};
int main() {
//window stuff, opengl context, etc.
Logger logger;
while(!quit) {
// Do not flood logger with logs, just add it sometimes
static double lastTime = -1.0;
if(time - lastTime >= 0.20f) {
logger.AddLog("Log", 0xff00ffff);
lastTime = time;
}
logger.Draw();
}
return 0;
}
we do not pass position to renderere->DrawString(...), because it gets automatically moved down to another line.
This approach works, but with very, very, very, very, superb poor speed.
How could I optimize it? I would like to get something like cs go console has. It is also a multicolor logger and it can log massive message with no fps drops.
In order to avoid reallocations of your vector when you push log messages to your logger, you can use a ring buffer structure. It preallocates memory for N message and it stores only the latest N messages pushed. A simple implementation can be as follows:
template <std::size_t bufferSize>
class Logger {
public:
struct Data {
std::string msg = " - ";
uint32_t color = 0x00000000;
};
void addLog(Data && item) {
buffer_[head_] = std::move(item);
if (++head_ >= bufferSize) head_ -= bufferSize;
}
void draw() const {
for (std::size_t i = 0; i < bufferSize; ++i) {
auto idx = head_ + i;
if (idx >= bufferSize) idx -= bufferSize;
// print the log whatever way you like, for example:
printf("%s\n", buffer_[idx].msg.c_str());
}
}
private:
std::size_t head_ = 0;
std::array<Data, bufferSize> buffer_;
};
Here, the template parameter bufferSize specifies the size of the ring buffer. The draw() method processes the oldest messages first, so your newest logs will be at the bottom.
Here is a live example: link

Waiting-time of thread switches systematicly between 0 and 30000 microseconds for the same task

I'm writing a little Console-Game-Engine and for better performance I wanted 2 threads (or more but 2 for this task) using two buffers. One thread is drawing the next frame in the first buffer while the other thread is reading the current frame from the second buffer. Then the buffers get swapped.
Of cause I can only swap them if both threads finished their task and the drawing/writing thread happened to be the one waiting. But the time it is waiting systematicly switches more or less between two values, here a few of the messurements I made (in microseconds):
0, 36968, 0, 36260, 0, 35762, 0, 38069, 0, 36584, 0, 36503
It's pretty obvious that this is not a coincidence but I wasn't able to figure out what the problem was as this is the first time I'm using threads.
Here the code, ask for more if you need it, I think it's too much to post it all:
header-file (Manager currently only adds a pointer to my WinAppBase-class):
class SwapChain : Manager
{
WORD *pScreenBuffer1, *pScreenBuffer2, *pWritePtr, *pReadPtr, *pTemp;
bool isRunning, writingFinished, readingFinished, initialized;
std::mutex lockWriting, lockReading;
std::condition_variable cvWriting, cvReading;
DWORD charsWritten;
COORD startPosition;
int screenBufferWidth;
// THREADS (USES NORMAL THREAD AS SECOND THREAD)
void ReadingThread();
// THIS FUNCTION IS ONLY FOR INTERN USE
void SwapBuffers();
public:
// USE THESE TO CONTROL WHEN THE BUFFERS GET SWAPPED
void BeginDraw();
void EndDraw();
// PUT PIXEL | INLINED FOR BETTER PERFORMANCE
inline void PutPixel(short xPos, short yPos, WORD color)
{
this->pWritePtr[(xPos * 2) + yPos * screenBufferWidth] = color;
this->pWritePtr[(xPos * 2) + yPos * screenBufferWidth + 1] = color;
}
// GENERAL CONTROL OVER SWAP CHAIN
void Initialize();
void Run();
void Stop();
// CONSTRUCTORS
SwapChain(WinAppBase * pAppBase);
virtual ~SwapChain();
};
Cpp-file
SwapChain::SwapChain(WinAppBase * pAppBase)
:
Manager(pAppBase)
{
this->isRunning = false;
this->initialized = false;
this->pReadPtr = NULL;
this->pScreenBuffer1 = NULL;
this->pScreenBuffer2 = NULL;
this->pWritePtr = NULL;
this->pTemp = NULL;
this->charsWritten = 0;
this->startPosition = { 0, 0 };
this->readingFinished = 0;
this->writingFinished = 0;
this->screenBufferWidth = this->pAppBase->screenBufferInfo.dwSize.X;
}
SwapChain::~SwapChain()
{
this->Stop();
if (_CrtIsValidHeapPointer(pReadPtr))
delete[] pReadPtr;
if (_CrtIsValidHeapPointer(pScreenBuffer1))
delete[] pScreenBuffer1;
if (_CrtIsValidHeapPointer(pScreenBuffer2))
delete[] pScreenBuffer2;
if (_CrtIsValidHeapPointer(pWritePtr))
delete[] pWritePtr;
}
void SwapChain::ReadingThread()
{
while (this->isRunning)
{
this->readingFinished = 0;
WriteConsoleOutputAttribute(
this->pAppBase->consoleCursor,
this->pReadPtr,
this->pAppBase->screenBufferSize,
this->startPosition,
&this->charsWritten
);
memset(this->pReadPtr, 0, this->pAppBase->screenBufferSize);
this->readingFinished = true;
this->cvWriting.notify_all();
if (!this->writingFinished)
{
std::unique_lock<std::mutex> lock(this->lockReading);
this->cvReading.wait(lock);
}
}
}
void SwapChain::SwapBuffers()
{
this->pTemp = this->pReadPtr;
this->pReadPtr = this->pWritePtr;
this->pWritePtr = this->pTemp;
this->pTemp = NULL;
}
void SwapChain::BeginDraw()
{
this->writingFinished = false;
}
void SwapChain::EndDraw()
{
TimePoint tpx1, tpx2;
tpx1 = Clock::now();
if (!this->readingFinished)
{
std::unique_lock<std::mutex> lock2(this->lockWriting);
this->cvWriting.wait(lock2);
}
tpx2 = Clock::now();
POST_DEBUG_MESSAGE(std::chrono::duration_cast<std::chrono::microseconds>(tpx2 - tpx1).count(), "EndDraw wating time");
SwapBuffers();
this->writingFinished = true;
this->cvReading.notify_all();
}
void SwapChain::Initialize()
{
if (this->initialized)
{
POST_DEBUG_MESSAGE(Result::CUSTOM, "multiple initialization");
return;
}
this->pScreenBuffer1 = (WORD *)malloc(sizeof(WORD) * this->pAppBase->screenBufferSize);
this->pScreenBuffer2 = (WORD *)malloc(sizeof(WORD) * this->pAppBase->screenBufferSize);
for (int i = 0; i < this->pAppBase->screenBufferSize; i++)
{
this->pScreenBuffer1[i] = 0x0000;
}
for (int i = 0; i < this->pAppBase->screenBufferSize; i++)
{
this->pScreenBuffer2[i] = 0x0000;
}
this->pWritePtr = pScreenBuffer1;
this->pReadPtr = pScreenBuffer2;
this->initialized = true;
}
void SwapChain::Run()
{
this->isRunning = true;
std::thread t1(&SwapChain::ReadingThread, this);
t1.detach();
}
void SwapChain::Stop()
{
this->isRunning = false;
}
This is where I run the SwapChain-class from:
void Application::Run()
{
this->engine.graphicsmanager.swapChain.Initialize();
Sprite<16, 16> sprite(&this->engine);
sprite.LoadSprite("engine/resources/TestData.xml", "root.test.sprites.baum");
this->engine.graphicsmanager.swapChain.Run();
int a, b, c;
for (int i = 0; i < 60; i++)
{
this->engine.graphicsmanager.swapChain.BeginDraw();
for (c = 0; c < 20; c++)
{
for (a = 0; a < 19; a++)
{
for (b = 0; b < 10; b++)
{
sprite.Print(a * 16, b * 16);
}
}
}
this->engine.graphicsmanager.swapChain.EndDraw();
}
this->engine.graphicsmanager.swapChain.Stop();
_getch();
}
The for-loops above simply draw the sprite 20 times from the top-left corner to the bottom-right corner of the console - the buffers don't get swapped during that, and that again for a total of 60 times (so the buffers get swapped 60 times).
sprite.Print uses the PutPixel function of SwapChain.
Here the WinAppBase (which consits more or less of global-like variables)
class WinAppBase
{
public:
// SCREENBUFFER
CONSOLE_SCREEN_BUFFER_INFO screenBufferInfo;
long screenBufferSize;
// CONSOLE
DWORD consoleMode;
HWND consoleWindow;
HANDLE consoleCursor;
HANDLE consoleInputHandle;
HANDLE consoleHandle;
CONSOLE_CURSOR_INFO consoleCursorInfo;
RECT consoleRect;
COORD consoleSize;
// FONT
CONSOLE_FONT_INFOEX fontInfo;
// MEMORY
char * pUserAccessDataPath;
public:
void reload();
WinAppBase();
virtual ~WinAppBase();
};
There are no errors, simply this alternating waitng time.
Maybe you'd like to start by looking if I did the synchronisation of the threads correctly? I'm not exactly sure how to use a mutex or condition-variables so it might comes from that.
Apart from that it is working fine, the sprites are shown as they should.
The clock you are using may have limited resolution. Here is a random example of a clock provided by Microsoft with 15 ms (15000 microsecond) resolution: Why are .NET timers limited to 15 ms resolution?
If one thread is often waiting for the other, it is entirely possible (assuming the above clock resolution) that it sometimes waits two clockticks and sometimes none. Maybe your clock only has 30 ms resolution. We really can't tell from the code. Do you get more precise measurements elsewhere with this clock?
There are also other systems in play such as the OS scheduler or whatever controls your std::threads. That one is (hopefully) much more granular, but how all these interactions play out doesn't have to be obvious or intuitive.

c/c++ get large size data like 180 array from another class in stm32

I have an 32-bit ARM Cortex M4 (the processor in Pixhawk) to write two classes, each one is one threading in Pixhawk codebase setting.
The first one is LidarScanner, which dealing with incoming serial data and generates "obstacle situation". The second one is Algorithm, which handle "obstacle situation" and take some planning strategy. Here are my solution right now, use the reference function LidarScanner::updateObstacle(uint8_t (&array)[181]) to update "obstacle situation" which is 181 size array.
LidarScanner.cpp:
class LidarScanner{
private:
struct{
bool available = false;
int AngleArr[181];
int RangeArr[181];
bool isObstacle[181] = {}; //1: unsafe; 0:safe;
}scan;
......
public:
LidarScanner();
//main function
void update()
{
while(hal.uartE->available()) //incoming serial data is available
{
decode_data(); //decode serial data into three kind data: Range, Angle and Period_flag
if(complete_scan()) //determine if the lidarscanner one period is completed
{
scan.available = false;
checkObstacle(); //check obstacle situation and store safety in isObstacle[181]
scan.available = true;
}
}
}
//for another API recall
void updateObstacle(uint8_t (&array)[181])
{
for(int i=0; i<=181; i++)
{
array[i]=scan.isObstacle[i];
}
}
//for another API recall
bool ScanAvailable() const { return scan.available; }
......
}
Algorithm.cpp:
class Algorithm{
private:
uint8_t Obatcle_Value[181] = {};
class LidarScanner& _lidarscanner;
......
public:
Algorithm(class LidarScanner& _lidarscanner);
//main funcation
void update()
{
if (hal.uartE->available() && _lidarscanner.ScanAvailable())
{
//Update obstacle situation into Algorithm phase and do more planning strategy
_lidarscanner.updateObstacle(Obatcle_Value);
}
}
......
}`
Usually, it works fine. But I want to improve the performances so that I want to know what's the most effective way to do that. thanks!!!!
The most efficient way to copy data is to use the DMA.
DMAx_Channelx->CNDTR = size;
DMAx_Channelx->CPAR = (uint32_t)&source;
DMAx_Channelx->CMAR = (uint32_t)&destination;
DMAx_Channelx->CCR = (0<<DMA_CCR_MSIZE_Pos) | (0<<DMA_CCR_PSIZE_Pos)
| DMA_CCR_MINC | DMA_CCR_PINC | DMA_CCR_MEM2MEM ;
while(!(DMAx->ISR & DMA_ISR_TCIFx ));
AN4031 Using the DMA controller.

std::map pass by reference Pointer to Object

I'm coding a plugin for XPLANE10 which gets a MSG from ROS.
My IDE is QTcreator 4.1.0 based QT 5.7.0 for Ubuntu 64 Bit. I would like to use C++11 Standards
My code explained
The main initializes ROS and creates a map -> container.
ROS spins in a loop till my GUI sends a MSG where my AirPlane should fly.
The MSG contains 3 floats(phi, theta, psi) where "phi" is the AirPlane ID, theta contains the ID for my ETA(Estimated Time of Arrival)
and psi contains the ID for my pose All of the IDs are saved in the ParameterServer(lookuptable).
So at the beginning i look up the activeAirplanes which returns a vector . I would like to store them in a map where the key is the AirCraft ID and the second param is an instance of the Object.
So i have initialized the for example(looked in container while debugging):
[0] first = 1 // Airplane ID1
[0] second = new CObject(freq)
[1] first = 2 // Airplane ID2
[1] second = new CObject(freq)
If i get a MSG from GUI
phi = 1
theta=2
psi=3
,
ROS will callback
MSG(....std::map<i32, CObject> &container)
// if phi is 1 so use the mapkey 1 and trigger the method do_stuff from CObject
do_stuff(phi, theta, psi,freq)
I would like to call the in a function from main
int getPlanes(std::map<i32,CObject>& container)
{
...
getActiveAirplanesFromServer(activePlanes);
}
First Question:
How do i pass the container to my callback?
Second Question:
How do i parallelize do_stuff() so my callback will return to main and i'm able to command more aircrafts while the others are calculated?
Third Question:
How would be the correct syntax for getPlanes to pass the container by reference so getPlanes() can edit it?
Fourth Question:
Is there a difference between
std::map<i32,CObject*> map
std::map<i32,CObject>* map
and
std::map<i32,CObject*>::iterator it=container->begin();
std::map<i32,CObject*>::iterator* it=container->begin();
If yes, what do i want ? #4Solved
// I have to edit stuff 'cause of some restrictions in my company.
#include "Header.h"
int main()
{
f64 freq = 10;
std::map<i32, CObject>* container;
std::map<i32,CObject>::iterator* it=container->begin();
// ROS
if(!ros::isInitialized())
{
int rosargc = 0;
char** rosargv = NULL;
ros::init(rosargc, rosargv, "MainNode");//), ros::init_options::AnonymousName);
}
else
{
printf("Ros has already been initialized.....\n");
}
ros::NodeHandle* mainNodeHandle=new ros::NodeHandle;
ros::AsyncSpinner spinner(2);
ParameterServer * ptrParam= new ParameterServer(mainNodeHandle);
ros::Subscriber airSub=mainNodeHandle->subscribe<own_msgs::ownStruct>("/MSG",
1000,
boost::bind(MSG,
_1,
freq,
container));
std::vector<i32> activePlanes;
i32 retVal=0;
retVal += ptrParam-> ParameterServer::getActiveAirplanesFromServer(activePlanes);
if (retVal == 0 && activePlanes.size()>0)
{
for (u32 j =0; j <activePlanes.size(); j++)
{
container->insert (std::pair<i32,CObject> (activePlanes[j] , new CObject(freq)));
}
}
while (ros::ok())
{
spinner.start(); //spinnt sehr viel :-)
ros::waitForShutdown ();
}
std::cout<<"ENDE"<<std::endl;
int retval = 1;
return retval;
}
void MSG(const own_msgs::ownStruct<std::allocator<void> >::ConstPtr &guiMSG,
f64 freq,
std::map<i32, CObject> &container)
{
if ((guiMSG->phi != 0) && (guiMSG->theta != 0) && (guiMSG->psi != 0))
{
std::string alpha = std::to_string(guiMSG->phi)+std::to_string(guiMSG->theta)+to_string(guiMSG->psi);
container.at(guiMSG->phi) -> do_stuff(guiMSG->phi,guiMSG->theta,guiMSG->psi, freq);
}
else
{
std::cout<<" Did not receive anything\n"<<endl;
}
}
void do_stuff(...)
{
//copy the IDs to private Member of this single Object
//setROS() for this single Object
//callback the current AC pose via ID from XPLANE
//callback the wished AC pose via ID from ParamServer
// do some calculations for optimum flight path
// publish the Route to XPlane
}
EDIT::
Problem is i get it to compile now and if debug it and set a breakpoint at :
void MSG(const own_msgs::ownStruct<std::allocator<void> >::ConstPtr &guiMSG,f64 freq,std::map<i32, CObject*> &container)
{
..
/*->*/ container.at(guiMSG->)...
}
The Container remains empty.
So i read some stuff about pointers and i saw my errors..
I confused * and &
if i want to pass the adress of a variable i have to write like
int main()
{
int a = 0;
AddTwo(&a)
cout<<a<<endl; // Output: 2
}
void AddTwo(int* a)
{
a+=2;
}

D parallel loop

First, how D create parallel foreach (underlying logic)?
int main(string[] args)
{
int[] arr;
arr.length = 100000000;
/* Why it is working?, it's simple foreach which working with
reference to int from arr, parallel function return ParallelForeach!R
(ParallelForeach!int[]), but I don't know what it is.
Parallel function is part od phobos library, not D builtin function, then what
kind of magic is used for this? */
foreach (ref e;parallel(arr))
{
e = 100;
}
foreach (ref e;parallel(arr))
{
e *= e;
}
return 0;
}
And second, why it is slower then simple foreach?
Finally, If I create my own taskPool (and don't use global taskPool object), program never end. Why?
parallel returns a struct (of type ParallelForeach) that implements the opApply(int delegate(...)) foreach overload.
when called the struct submits a parallel function to the private submitAndExecute which submits the same task to all threads in the pool.
this then does:
scope(failure)
{
// If an exception is thrown, all threads should bail.
atomicStore(shouldContinue, false);
}
while (atomicLoad(shouldContinue))
{
immutable myUnitIndex = atomicOp!"+="(workUnitIndex, 1);
immutable start = workUnitSize * myUnitIndex;
if(start >= len)
{
atomicStore(shouldContinue, false);
break;
}
immutable end = min(len, start + workUnitSize);
foreach(i; start..end)
{
static if(withIndex)
{
if(dg(i, range[i])) foreachErr();
}
else
{
if(dg(range[i])) foreachErr();
}
}
}
where workUnitIndex and shouldContinue are shared variables and dg is the foreach delegate
The reason it is slower is simply because of the overhead required to pass the function to the threads in the pool and atomically accessing the shared variables.
the reason your custom pool doesn't shut down is likely you don't shut down the threadpool with finish