Ive been using pthreads but have realized that my code is taking the same amount of time independently if i use 1 thread or if i separate the task into 1/N for N threads. To exemplify i reduced my code to this example:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <boost/progress.hpp>
#define SIZEEXEC 200000000
using namespace boost;
using std::cout;
using std::endl;
typedef struct t_d{
int intArg;
} Thread_data;
void* function(void *threadarg)
{
Thread_data *my_data= (Thread_data *) threadarg;
int size= my_data->intArg;
int i=0;
unsigned rand_state = 0;
for(i=0; i<size; i++) rand_r(&rand_state);
return 0;
}
void withOutThreads(void)
{
Thread_data* t1= new Thread_data();
t1->intArg= SIZEEXEC/3;
function((void *) t1);
Thread_data* t2= new Thread_data();
t2->intArg= SIZEEXEC/3;
function((void *) t2);
Thread_data* t3= new Thread_data();
t3->intArg= SIZEEXEC/3;
function((void *) t3);
}
void withThreads(void)
{
pthread_t* h1 = new pthread_t;
pthread_t* h2 = new pthread_t;
pthread_t* h3 = new pthread_t;
pthread_attr_t* atr = new pthread_attr_t;
pthread_attr_init(atr);
pthread_attr_setscope(atr,PTHREAD_SCOPE_SYSTEM);
Thread_data* t1= new Thread_data();
t1->intArg= SIZEEXEC/3;
pthread_create(h1,atr,function,(void *) t1);
Thread_data* t2= new Thread_data();
t2->intArg= SIZEEXEC/3;
pthread_create(h2,atr,function,(void *) t2);
Thread_data* t3= new Thread_data();
t3->intArg= SIZEEXEC/3;
pthread_create(h3,atr,function,(void *) t3);
pthread_join(*h1,0);
pthread_join(*h2,0);
pthread_join(*h3,0);
pthread_attr_destroy(atr);
delete h1;
delete h2;
delete h3;
delete atr;
}
int main(int argc, char *argv[])
{
bool multThread= bool(atoi(argv[1]));
if(!multThread){
cout << "NO THREADS" << endl;
progress_timer timer;
withOutThreads();
}
else {
cout << "WITH THREADS" << endl;
progress_timer timer;
withThreads();
}
return 0;
}
Either the code is wrong or there is something on my system not allowing for parallel processing. I'm running on Ubuntu 11.10 x86_64-linux-gnu, gcc 4.6, Intel® Xeon(R) CPU E5620 # 2.40GHz × 4
Thanks for any advice!
EDIT:
Given the answers i have realized that (1) progress_timer timer did not allow me to measure differences in "real" time and (2) that the task i am giving in "function" does not seem to be enough for my machine to give different times with 1 or 3 threads (which is odd, i get around 10 seconds in both cases...). I have tried to allocate memory and make it heavier and yes, i see a difference. Although my other code is more complex, there is a good chance it still runs +- the same time with 1 or 3 threads. Thanks!
This is expected. You are measuring CPU time, not wall time.
time ./test 1
WITH THREADS
2.55 s
real 0m1.387s
user 0m2.556s
sys 0m0.008s
Real time is less than user time, which is identical to your measured time. Real time is what your wall clock shows, user and sys are CPU time spent in user and kernel mode
by all CPUs combined.
time ./test 0
NO THREADS
2.56 s
real 0m2.578s
user 0m2.560s
sys 0m0.008s
Your measured time, real time and user time are all virtually the same.
The culprit seems to be progress_timer or rather understanding of it.
Try replacing main() with this. This tells the program doesn't take time as reported by progress_timer, maybe it reports total system time?
#include <sys/time.h>
void PrintTime() {
struct timeval tv;
if(!gettimeofday(&tv,NULL))
cout << "Sec=" << tv.tv_sec << " usec=" << tv.tv_usec << endl ;
}
int main(int argc, char *argv[])
{
bool multThread= bool(atoi(argv[1]));
PrintTime();
if(!multThread){
cout << "NO THREADS" << endl;
progress_timer timer;
withOutThreads();
}
else {
cout << "WITH THREADS" << endl;
progress_timer timer;
withThreads();
}
PrintTime();
return 0;
}
Related
#include <sys/time.h>
#include <pthread.h>
#include <cstdio>
#include <iostream>
timespec m_timeToWait;
pthread_mutex_t m_lock;
pthread_cond_t m_cond;
timespec & calculateNextCheckTime(int intervalSeconds){
timeval now{};
gettimeofday(&now, nullptr);
m_timeToWait.tv_sec = now.tv_sec + intervalSeconds;
//m_timeToWait.tv_nsec = (1000 * now.tv_usec) + intervalSeconds;
return m_timeToWait;
}
void *run(void *){
int i = 0;
pthread_mutex_lock(&m_lock);
while (i < 10) {
std::cout << "Waiting .." << std::endl;
int ret = pthread_cond_timedwait(&m_cond, &m_lock, &calculateNextCheckTime(1));
std::cout << "doing work" << std::endl;
i++;
}
pthread_mutex_unlock(&m_lock);
}
int main()
{
pthread_t thread;
int ret;
int i;
std::cout << "In main: creating thread" << std::endl;
ret = pthread_create(&thread, NULL, &run, NULL);
pthread_join(reinterpret_cast<pthread_t>(&thread), reinterpret_cast<void **>(ret));
return 0;
}
There are similar examples on SO, but I can't seem to figure it out. Also, the Clion IDE insists that I use re-interpret casts on the pthread_join params, even though examples on SO don't have those casts in place. I am using C++11.
This is just maths.
You have access to tv_sec, and you have access to tv_nsec.
Currently you're only setting tv_sec, to "the seconds part of now, plus X seconds".
You can also set tv_nsec, to "the nanoseconds part of now, plus Y nanoseconds".
The result is "now, plus X seconds and Y nanoseconds"… which is when you want the program to wait (at the earliest), with nanoseconds resolution.
Just uncomment the line that does this, then provide the appropriate numbers for what you want to do.
You could have the function take an additional "milliseconds" argument (don't forget to multiply it by 1,000,000!) then leave the "seconds" at zero if you want that:
timespec& calculateNextCheckTime(const int intervalSeconds, const int intervalMillis)
{
timeval now{};
gettimeofday(&now, nullptr);
m_timeToWait.tv_sec = now.tv_sec + intervalSeconds;
m_timeToWait.tv_nsec = (1000 * now.tv_usec) + (1000 * 1000 * intervalMillis);
return m_timeToWait;
}
You may or may not wish to perform some range checking (i.e. verify that intervalMillis >= 0 && intervalMillis < 1000) to avoid nasty overflows.
Or, instead, you may wish to allow calculateNextCheckTime(1, 234) to be treated the same as calculateNextCheckTime(3, 34). And that will work, but only because you're also going to need to implement "carry" semantics to ensure that m_timeToWait.tv_nsec is less than 1,000,000,000 after adding the (1000 * now.tv_usec) component, over which the calling user has no control. (I have not implemented that in the above example.)
Also, you may or may not wish to make those arguments unsigned.
I would like to measure the execution time of some code. The code starts in the main() function and finishes in an event handler.
I have a C++11 code that looks like this:
#include <iostream>
#include <time.h>
...
volatile clock_t t;
void EventHandler()
{
// when this function called is the end of the part that I want to measure
t = clock() - t;
std::cout << "time in seconds: " << ((float)t)/CLOCKS_PER_SEC;
}
int main()
{
MyClass* instance = new MyClass(EventHandler); // this function starts a new std::thread
instance->start(...); // this function only passes some data to the thread working data, later the thread will call EventHandler()
t = clock();
return 0;
}
So it is guaranteed that the EventHandler() will be called only once, and only after an instance->start() call.
It is working, this code give me some output, but it is a horrible code, it uses global variable and different threads access global variable. However I can't change the used API (the constructor, the way the thread calls to EventHandler).
I would like to ask if a better solution exists.
Thank you.
Global variable is unavoidable, as long as MyClass expects a plain function and there's no way to pass some context pointer along with the function...
You could write the code in a slightly more tidy way, though:
#include <future>
#include <thread>
#include <chrono>
#include <iostream>
struct MyClass
{
typedef void (CallbackFunc)();
constexpr explicit MyClass(CallbackFunc* handler)
: m_handler(handler)
{
}
void Start()
{
std::thread(&MyClass::ThreadFunc, this).detach();
}
private:
void ThreadFunc()
{
std::this_thread::sleep_for(std::chrono::seconds(5));
m_handler();
}
CallbackFunc* m_handler;
};
std::promise<std::chrono::time_point<std::chrono::high_resolution_clock>> gEndTime;
void EventHandler()
{
gEndTime.set_value(std::chrono::high_resolution_clock::now());
}
int main()
{
MyClass task(EventHandler);
auto trigger = gEndTime.get_future();
auto startTime = std::chrono::high_resolution_clock::now();
task.Start();
trigger.wait();
std::chrono::duration<double> diff = trigger.get() - startTime;
std::cout << "Duration = " << diff.count() << " secs." << std::endl;
return 0;
}
clock() call will not filter out executions of different processes and threads run by scheduler in parallel with program event handler thread. There are alternative like times() and getrusage() which tells cpu time of process. Though it is not clearly mentioned about thread behaviour for these calls but if it is Linux, threads are treated as processes but it has to be investigated.
clock() is the wrong tool here, because it does not count the time actually required by the CPU to run your operation, for example, if the thread is not running at all, the time is still counted.
Instead you have to use platform-specific APIs, such as pthread_getcpuclockid for POSIX-compliant systems (Check if _POSIX_THREAD_CPUTIME is defined), that counts the actual time spent by a specific thread.
You can take a look at a benchmarking library I wrote for C++ that supports thread-aware measuring (see struct thread_clock implementation).
Or, you can use the code snippet from the man page:
/* Link with "-lrt" */
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <string.h>
#include <errno.h>
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
#define handle_error_en(en, msg) \
do { errno = en; perror(msg); exit(EXIT_FAILURE); } while (0)
static void *
thread_start(void *arg)
{
printf("Subthread starting infinite loop\n");
for (;;)
continue;
}
static void
pclock(char *msg, clockid_t cid)
{
struct timespec ts;
printf("%s", msg);
if (clock_gettime(cid, &ts) == -1)
handle_error("clock_gettime");
printf("%4ld.%03ld\n", ts.tv_sec, ts.tv_nsec / 1000000);
}
int
main(int argc, char *argv[])
{
pthread_t thread;
clockid_t cid;
int j, s;
s = pthread_create(&thread, NULL, thread_start, NULL);
if (s != 0)
handle_error_en(s, "pthread_create");
printf("Main thread sleeping\n");
sleep(1);
printf("Main thread consuming some CPU time...\n");
for (j = 0; j < 2000000; j++)
getppid();
pclock("Process total CPU time: ", CLOCK_PROCESS_CPUTIME_ID);
s = pthread_getcpuclockid(pthread_self(), &cid);
if (s != 0)
handle_error_en(s, "pthread_getcpuclockid");
pclock("Main thread CPU time: ", cid);
/* The preceding 4 lines of code could have been replaced by:
pclock("Main thread CPU time: ", CLOCK_THREAD_CPUTIME_ID); */
s = pthread_getcpuclockid(thread, &cid);
if (s != 0)
handle_error_en(s, "pthread_getcpuclockid");
pclock("Subthread CPU time: 1 ", cid);
exit(EXIT_SUCCESS); /* Terminates both threads */
}
I have been dabbling with writing a C++ program that would control spark timing on a gas engine and have been running in to some trouble. My code is very simple. It starts by creating a second thread that works to emulate the output signal of a Hall Effect sensor that is triggered once per engine revolution. My main code processes the fake sensor output, recalculates engine rpm, and then determines the time necessary to wait for the crankshaft to rotate to the correct angle to send spark to the engine. The problem I'm running into is that I am using a sleep function in milliseconds and at higher RPM's I am losing a significant amount of data.
My question is how are real automotive ECU's programed to be able to control spark at high RPM's accurately?
My code is as follows:
#include <iostream>
#include <Windows.h>
#include <process.h>
#include <fstream>
#include "GetTimeMs64.cpp"
using namespace std;
void HEEmulator(void * );
int HE_Sensor1;
int *sensor;
HANDLE handles[1];
bool run;
bool *areRun;
int main( void )
{
int sentRpm = 4000;
areRun = &run;
sensor = &HE_Sensor1;
*sensor = 1;
run = TRUE;
int rpm, advance, dwell, oHE_Sensor1, spark;
oHE_Sensor1 = 1;
advance = 20;
uint64 rtime1, rtime2, intTime, curTime, sparkon, sparkoff;
handles[0] = (HANDLE)_beginthread(HEEmulator, 0, &sentRpm);
ofstream myfile;
myfile.open("output.out");
intTime = GetTimeMs64();
rtime1 = intTime;
rpm = 0;
spark = 0;
dwell = 10000;
sparkoff = 0;
while(run == TRUE)
{
rtime2 = GetTimeMs64();
curTime = rtime2-intTime;
myfile << "Current Time = " << curTime << " ";
myfile << "HE_Sensor1 = " << HE_Sensor1 << " ";
myfile << "RPM = " << rpm << " ";
myfile << "Spark = " << spark << " ";
if(oHE_Sensor1 != HE_Sensor1)
{
if(HE_Sensor1 > 0)
{
rpm = (1/(double)(rtime2-rtime1))*60000;
dwell = (1-((double)advance/360))*(rtime2-rtime1);
rtime1 = rtime2;
}
oHE_Sensor1 = HE_Sensor1;
}
if(rtime2 >= (rtime1 + dwell))
{
spark = 1;
sparkoff = rtime2 + 2;
}
if(rtime2 >= sparkoff)
{
spark = 0;
}
myfile << "\n";
Sleep(1);
}
myfile.close();
return 0;
}
void HEEmulator(void *arg)
{
int *rpmAd = (int*)arg;
int rpm = *rpmAd;
int milliseconds = (1/(double)rpm)*60000;
for(int i = 0; i < 10; i++)
{
*sensor = 1;
Sleep(milliseconds * 0.2);
*sensor = 0;
Sleep(milliseconds * 0.8);
}
*areRun = FALSE;
}
A desktop PC is not a real-time processing system.
When you use Sleep to pause a thread, you don't have any guarantees that it will wake up exactly after the specified amount of time has elapsed. The thread will be marked as ready to resume execution, but it may still have to wait for the OS to actually schedule it. From the documentation of the Sleep function:
Note that a ready thread is not guaranteed to run immediately. Consequently, the thread may not run until some time after the sleep interval elapses.
Also, the resolution of the system clock ticks is limited.
To more accurately simulate an ECU and the attached sensors, you should not use threads. Your simulation should not even depend on the passage of real time. Instead, use a single loop that updates the state of your simulation (both ECU and sensors) with each tick. This also means that your simulation should include the clock of the ECU.
I'm trying to figure out how to calculate time in c++ . I'm making
a program where every 3 seconds an event happens for example print out "hello" etc;
Here's an example using two threads so your program won't freeze and this_thread::sleep_for() in C++11:
#include <iostream>
#include <chrono>
#include <thread>
using namespace std;
void hello()
{
while(1)
{
cout << "Hello" << endl;
chrono::milliseconds duration( 3000 );
this_thread::sleep_for( duration );
}
}
int main()
{
//start the hello thread
thread help1(hello);
//do other stuff in the main thread
for(int i=0; i <10; i++)
{
cout << "Hello2" << endl;
chrono::milliseconds duration( 3000 );
this_thread::sleep_for( duration );
}
//wait for the other thread to finish in this case wait forever(while(1))
help1.join();
}
you can use boost::timer to calculate time in C++:
using boost::timer::cpu_timer;
using boost::timer::cpu_times;
using boost::timer::nanosecond_type;
...
nanosecond_type const three_seconds(3 * 1000000000LL);
cpu_timer timer;
cpu_times const elapsed_times(timer.elapsed());
nanosecond_type const elapsed(elapsed_times.system + elapsed_times.user);
if (elapsed >= three_seconds)
{
//more then 3 seconds elapsed
}
It is dependent on your OS/Compiler.
Case 1:
If you have C++11 then you can use as suggested by Chris:
std::this_thread::sleep_for() // You have to include header file thread
Case 2:
If you are on the windows platform then you can also use something like:
#include windows.h
int main ()
{
event 1;
Sleep(1000); // number is in milliseconds 1Sec = 1000 MiliSeconds
event 2;
return 0;
}
Case 3:
On linux platform you can simply use:
sleep(In seconds);
Suppose there are several boost strand share_ptr stored in a vector m_poStrands. And tJobType is the enum indicated different type of job.
I found the time diff from posting a job in one strand (JOBA) to call the onJob of another strand (JOBB) is around 50 milli second.
I want to know if there is any way to reduce the time diff.
void postJob(tJobType oType, UINT8* pcBuffer, size_t iSize)
{
//...
m_poStrands[oType]->post(boost::bind(&onJob, this, oType, pcDestBuffer, iSize));
}
void onJob(tJobType oType, UINT8* pcBuffer, size_t iSize)
{
if (oType == JOBA)
{
//....
struct timeval sTV;
gettimeofday(&sTV, 0);
memcpy(pcDestBuffer, &sTV, sizeof(sTV));
pcDestBuffer += sizeof(sTV);
iSize += sizeof(sTV);
memcpy(pcDestBuffer, pcBuffer, iSize);
m_poStrands[JOBB]->(boost::bind(&onJob, this, JOBB, pcDestBuffer, iSize));
}
else if (oType == JOBB)
{
// get the time from buffer
// and calculate the dime diff
struct timeval eTV;
gettimeofday(&eTV, 0);
}
}
Your latency is probably coming from the memcpys between your gettimeofdays. Here's an example program I ran on my machine (2 ghz core 2 duo). I'm getting thousands of nanoseconds. So a few microseconds. I doubt that your system is running 4 orders of magnitude slower than mine. The worst I ever saw it run was 100 microsecond for one of the two tests. I tried to make the code as close to the code posted as possible.
#include <boost/asio.hpp>
#include <boost/chrono.hpp>
#include <boost/bind.hpp>
#include <boost/thread.hpp>
#include <iostream>
struct Test {
boost::shared_ptr<boost::asio::strand>* strands;
boost::chrono::high_resolution_clock::time_point start;
int id;
Test(int i, boost::shared_ptr<boost::asio::strand>* strnds)
: id(i),
strands(strnds)
{
strands[0]->post(boost::bind(&Test::callback,this,0));
}
void callback(int i) {
if (i == 0) {
start = boost::chrono::high_resolution_clock::now();
strands[1]->post(boost::bind(&Test::callback,this,1));
} else {
boost::chrono::nanoseconds sec = boost::chrono::high_resolution_clock::now() - start;
std::cout << "test " << id << " took " << sec.count() << " ns" << std::endl;
}
}
};
int main() {
boost::asio::io_service io_service_;
boost::shared_ptr<boost::asio::strand> strands[2];
strands[0] = boost::shared_ptr<boost::asio::strand>(new boost::asio::strand(io_service_));
strands[1] = boost::shared_ptr<boost::asio::strand>(new boost::asio::strand(io_service_));
boost::thread t1 (boost::bind(&boost::asio::io_service::run, &io_service_));
boost::thread t2 (boost::bind(&boost::asio::io_service::run, &io_service_));
Test test1 (1, strands);
Test test2 (2, strands);
t1.join();
t2.join();
}