I have an application that is run in a for loop:
// initialization
for (std::vector< VerifObj >::const_iterator itVOV = verifObjVector.begin(); itVOV != verifObjVector.end(); itVOV++)
{
// run my application for itVOV
std::cout << "\b\b\b\b" << std::setw(3) << static_cast< int >(100.f * ++photoCntr / verifObjVecSz) << "%"
<< std::flush;
std::this_thread::sleep_for(std::chrono::milliseconds(30));
}
std::cout << "\b\b\b\b" << std::setw(3) << "100%" << std::endl;
Because each iteration is taking some minutes, I thought to make it multi thread, so it can run faster. I am a beginner in multi threading so, I am asking how to do it?
Related
I'm trying to write a simple single header benchmarker and I understand that std::clock will give me the time that a process (thread) is in actual use.
So, given the following simplified program:
nt main() {
using namespace std::literals::chrono_literals;
auto start_cpu = std::clock();
auto start_wall = std::chrono::high_resolution_clock::now();
// clobber();
std::this_thread::sleep_for(1s);
// clobber();
auto finish_cpu = std::clock();
auto finish_wall = std::chrono::high_resolution_clock::now();
std::cerr << "cpu: "
<< start_cpu << " " << finish_cpu << " "
<< (finish_cpu - start_cpu) / (double)CLOCKS_PER_SEC << " s" << std::endl;
std::cerr << "wall: "
// << FormatTime(start_wall) << " " << FormatTime(finish_wall) << " "
<< (finish_wall - start_wall) / 1.0s << " s" << std::endl;
return 0;
}
Demo
We get the following output:
cpu: 4820 4839 1.9e-05 s
wall: 1.00007 s
I just want to clarify that the cpu time is the time that it executes the code that is not actually the sleep_for code as that is actually done by the kernel which std::clock doesn't track. So to confirm, I changed what I was timing:
int main() {
using namespace std::literals::chrono_literals;
int value = 0;
auto start_cpu = std::clock();
auto start_wall = std::chrono::high_resolution_clock::now();
// clobber();
for (int i = 0; i < 1000000; ++i) {
srand(value);
value = rand();
}
// clobber();
std::cout << "value = " << value << std::endl;
auto finish_cpu = std::clock();
auto finish_wall = std::chrono::high_resolution_clock::now();
std::cerr << "cpu: "
<< start_cpu << " " << finish_cpu << " "
<< (finish_cpu - start_cpu) / (double)CLOCKS_PER_SEC << " s" << std::endl;
std::cerr << "wall: "
// << FormatTime(start_wall) << " " << FormatTime(finish_wall) << " "
<< (finish_wall - start_wall) / 1.0s << " s" << std::endl;
return 0;
}
Demo
This gave me an output of:
cpu: 4949 1398224 1.39328 s
wall: 2.39141 s
value = 354531795
So far, so good. I then tried this on my windows box running MSYS2's g++ compiler. The output for the last program gave me:
value = 0
cpu: 15 15 0 s
wall: 0.0080039 s
std::clock() is always outputting 15? Is the compiler implementation of std::clock() broken?
Seems that I assumed that CLOCKS_PER_SEC would be the same. However, on the MSYS2 compiler, it was 1000x less then on godbolt.org.
on a project I'm working on we are trying to test out the line_walk() function and work with the Line_face_circulator. Would there be a way to visualise the faces in the Line_face_circulator? The problem that I am having is figuring out how to extract information from the circulator. I've tried some of the methods below which didn't work but the documentation for the LFC is a bit difficult to maneuver around.
Delaunay::Line_face_circulator lfc = dt.line_walk(Point_2(13, 166), Point_2(42, 1761));
std::cout << "linewalk res" << std::endl;
Container container(lfc);
Iterator i = container.begin();
std::cout << typeid(i).name() << std::endl;
//std::cout << (container.begin()->vertex(0)) << std::endl;
std::cout << lfc.is_empty() << std::endl;
//std::cout << lfc->vertex(0) << std::endl;
//std::cout << *i->vertex(0)->point() << std::endl;
while (i != container.end()){
std::cout << *i << std::endl;
i++;
}
May I ask if there are particular methods to extract the faces or the vertices of the faces from the line_face_circulator?
Thank you!
Something like the following should work:
Delaunay::Line_face_circulator lfc = dt.line_walk(Point_2(13, 166), Point_2(42, 1761)), start=lfc;
std::vector<Delaunay::Face_handle> visited_faces;
if (lfc!=nullptr)
{
do{
visited_face.push_back(lfc);
} while(++lfc!=start);
}
Circulator doc is here.
I'm encountering an unexpected performance with my OpenCL code (more precisely, I use boost::compute 1.67.0). For now, I just want to add each elements of 2 buffers c[i] = a[i] + b[i].
I noticed some speed reduction in comparison of an existing SIMD implementation so I isolated each step to highlight which one is time consuming. Here is my code sample :
Chrono chrono2;
chrono2.start();
Chrono chrono;
ipReal64 elapsed;
// creating the OpenCL context and other stuff
// ...
std::string kernel_src = BOOST_COMPUTE_STRINGIZE_SOURCE(
__kernel void add_knl(__global const uchar* in1, __global const uchar* in2, __global uchar* out)
{
size_t idx = get_global_id(0);
out[idx] = in1[idx] + in2[idx];
}
);
boost::compute::program* program = new boost::compute::program;
try {
chrono.start();
*program = boost::compute::program::create_with_source(kernel_src, context);
elapsed = chrono.elapsed();
std::cout << "Create program : " << elapsed << "s" << std::endl;
chrono.start();
program->build();
elapsed = chrono.elapsed();
std::cout << "Build program : " << elapsed << "s" << std::endl;
}
catch (boost::compute::opencl_error& e) {
std::cout << "Error building program : " << std::endl << program->build_log() << std::endl << e.what() << std::endl;
return;
}
boost::compute::kernel* kernel = new boost::compute::kernel;
try {
chrono.start();
*kernel = program->create_kernel("add_knl");
elapsed = chrono.elapsed();
std::cout << "Create kernel : " << elapsed << "s" << std::endl;
}
catch (const boost::compute::opencl_error& e) {
std::cout << "Error creating kernel : " << std::endl << e.what() << std::endl;
return;
}
try {
chrono.start();
// Pass the argument to the kernel
kernel->set_arg(0, bufIn1);
kernel->set_arg(1, bufIn2);
kernel->set_arg(2, bufOut);
elapsed = chrono.elapsed();
std::cout << "Set args : " << elapsed << "s" << std::endl;
}
catch (const boost::compute::opencl_error& e) {
std::cout << "Error setting kernel arguments: " << std::endl << e.what() << std::endl;
return;
}
try {
chrono.start();
queue.enqueue_1d_range_kernel(*kernel, 0, sizeX*sizeY, 0);
elapsed = chrono.elapsed();
std::cout << "Kernel calculation : " << elapsed << "s" << std::endl;
}
catch (const boost::compute::opencl_error& e) {
std::cout << "Error executing kernel : " << std::endl << e.what() << std::endl;
return;
}
std::cout << "[Function] Full duration " << chrono2.elapsed() << std::endl;
chrono.start();
delete program;
elapsed = chrono.elapsed();
std::cout << "Delete program : " << elapsed << "s" << std::endl;
delete kernel;
elapsed = chrono.elapsed();
std::cout << "Delete kernel : " << elapsed << "s" << std::endl;
And here is a sample of result (I run my program on a NVidia GeForce GT 630, with NVidia SDK TookKit) :
Create program : 0.0013123s
Build program : 0.0015421s
Create kernel : 6.6e-06s
Set args : 1.7e-06s
Kernel calculation : 0.0001639s
[Function] Full duration : 0.0077794
Delete program : 4.1e-06s
Delete kernel : 0.0879901s
I know my program is simple and I don't expect having the kernel execution being the most time consumming step. However, I thought the kernel deletion would take only a few ms, such as creating or building the program.
Is this a normal behaviour?
Thanks
I'll point out that I've never used boost::compute, but it looks like it's a fairly thin wrapper over OpenCL, so the following should be correct:
Enqueueing the kernel does not wait for it to complete. The enqueue function returns an event, which you can then wait for, or you can wait for all tasks enqueued onto the queue to complete. You are timing neither of those things. What is likely happening is that when you destroy your kernel, it waits for all queued instances which are still pending to complete before returning from the destructor.
This is my third question on this topic, Instead of asking a new question in the comments I thought it would be better to start a new thread.
The full code can be found here:
C++ CvSeq Accessing arrays that are stored
And using the following code I can display the most recent vector that has been added to the RECT array(Note that this is placed inside of the for loop):
RECT& lastRect = detectBox->back();
std::cout << "Left: " << lastRect.left << std::endl;
std::cout << "Right: " << lastRect.right << std::endl;
std::cout << "Top: " << lastRect.top << std::endl;
std::cout << "Bottom: " << lastRect.bottom << std::endl;
What I am now trying to do is create a loop outside of this for loop that will display all of the vectors present in detectBox. I havent been able to determine how many vectors are actually present in the array, and therefore cannot loop through the vectors.
I tried using the following:
int i = 0;
while ((*detectBox)[i].left!=NULL)
{
std::cout << "Left: " << (*detectBox)[i].left << std::endl;
std::cout << "Right: " << (*detectBox)[i].right << std::endl;
std::cout << "Top: " << (*detectBox)[i].top << std::endl;
std::cout << "Bottom: " << (*detectBox)[i].bottom << std::endl;
i++;
}
And have also tried playing around with sizeof(*detectBox) , but only have an answer of 32 being returned...
Okay, you are using the wrong terms here. The variable detectBox is a vector (or rather a pointer to a vector it seems). There are three ways to iterate over it (I'll show them a little later). It is not an array, it is not an array of vectors. It is a pointer to a vector of RECT structures.
Now as for how to iterate over the vector. It is like you iterate over any vector.
The first way is to use the C way, by using indexes:
for (unsigned i = 0; i < detectBox->size(); ++i)
{
RECT rect = detectBox->at(i);
std::cout << "Left: " << rect.left << std::endl;
...
}
The second way is the traditional C++ way using iterators:
for (std::vector<RECT>::iterator i = detectBox->begin();
i != detectBox->end();
++i)
{
std::cout << "Left: " << i->left << std::endl;
...
}
The last way is to use range for loops introduced in the C++11 standard:
for (RECT const& rect : *detectBox)
{
std::cout << "Left: " << rect.left << std::endl;
...
}
The propblem with your attempt of the loop, with the condition (*detectBox)[i].left!=NULL is that the member variable left is not a pointer and that when you go out of bounds you are not guaranteed to have a "NULL" value (instead it will be indeterminate and will seem random).
//function creating my class and thread
extractor = new FeatureExtractor(receiveBufferCurrent);
HANDLE hth1;
unsigned uiThread1ID;
hth1 = (HANDLE)_beginthreadex(NULL,
0,
FeatureExtractor::ThreadStaticEntryPoint,
extractor,
CREATE_SUSPENDED,
&uiThread1ID);
//Header file
class FeatureExtractor
{
private:
float sensorData[200][10];
public:
FeatureExtractor(float receiveBufferCurrent[][10]);
~FeatureExtractor();
//Thread for parallel input and motion detection
static unsigned int __stdcall ThreadStaticEntryPoint(void * pThis);
void ThreadEntryPoint();
void outputTest();
};
FeatureExtractor::FeatureExtractor(float receiveBufferCurrent[][10])
{
memcpy(sensorData, receiveBufferCurrent, sizeof(sensorData));
}
unsigned __stdcall FeatureExtractor::ThreadStaticEntryPoint(void * pThis)
{
FeatureExtractor * pthX = (FeatureExtractor*)pThis;
pthX->ThreadEntryPoint();
return 1;
}
void FeatureExtractor::ThreadEntryPoint()
{
outputTest();
}
//output function
for (int i = 0; i < 200; i = i + 50)
{
std::cout << "-------------------------------------------------------------------------" << std::endl;
std::cout << "AccelX=" << sensorData[i][1] << ", AccelY=" << sensorData[i][2] << ", AccelZ=" << sensorData[i][3] << std::endl;
std::cout << std::endl;
std::cout << "MagX=" << sensorData[i][4] << ", MagY=" << sensorData[i][5] << ", MagZ=" << sensorData[i][6] << std::endl;
std::cout << std::endl;
std::cout << "GyroX=" << sensorData[i][7] << ", GyroY=" << sensorData[i][8] << ", GyroZ=" << sensorData[i][9] << std::endl;
std::cout << std::endl;
std::cout << "-------------------------------------------------------------------------" << std::endl;
}
I have some problem with accessing the float array "sensorData" inside a thread.
If I output the sensorData array inside the constructor everything is fine but if I access the array from inside my thread my array just contains -1.58839967e+038 which I guess means that I cannot access my array in this way from a thread.
What am I doing wrong?
I got the thread code from a tutorial which accesses class member variables in the same way although just integers not multi dimensional arrays.
I tried to minimize the length of my code snippets while keeping the important parts, I'm thankful for anybody taking the time to analyze my code!
I found the solution myself now.
WaitForSingleObject(hth1, INFINITE);
Once I waited for my thread the issue has been resolved.
The issue occurred because I deleted my class before it could finish execution.
It also worked to simple remove the delete statement.