I have a function which has a factor that needs to be adjusted according to the load on the machine to consume exactly the wall time passed to the function. The factor can vary according to the load of the machine.
void execute_for_wallTime(int factor, int wallTime)
{
double d = 0;
for (int n = 0; n<factor; ++n)
for (int m = 0; wall_time; ++m)
d += d * n*m;
}
Is there a way to dynamically check the load on the machine and adjust the factor accordingly in order to consume the exact wall time passed to the function?
The wall time is read from the file and passed to this function. The values are in micro seconds, e.g:
73
21
44
According to OP comment:
#include <sys/time.h>
int deltaTime(struct timeval *tv1, struct timeval *tv2){
return ((tv2->tv_sec - tv1->tv_sec)*1000000)+ tv2->tv_usec - tv1->tv_usec;
}
//might require longs anyway. this is time in microseconds between the 2 timevals
void execute_for_wallTime(int wallTime)
{
struct timeval tvStart, tvNow;
gettimeofday(&tvStart, NULL);
double d = 0;
for (int m = 0; wall_time; ++m){
gettimeofday(&tvNow, NULL);
if(deltaTime(tvStart,tvNow) >=wall_time) { // if timeWall is 1000 microseconds,
// this function returns after
// 1000 microseconds (and a
// little more due to overhead)
return;
}
d += d*m;
}
}
Now deal with timeWall by increasing or decreasing it in a logic outside this function depending on your performance calculations. This function simply runs for timeWall microseconds.
For C++ style, you can use std::chrono.
I must comment that I would handle things differently, for example by calling nanosleep(). The operations make no sense unless in actual code you plan to substitute these "fillers" with actual operations. In that case you might consider threads and schedulers. Besides the clock calls add overhead.
Related
Consider the loop below. This is a simplified example of a problem I am trying to solve. I want to limit the number of times doSomething function is called in each second. Since the loop works very fast, I thought I could use a rate limiter. Let's assume that I have found an appropriate value by running it with different x numbers.
unsigned int incrementionRate = x;
unsigned int counter == 0;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
counter = (counter + 1) % incrementionRate;
if (counter == 0) {
doSomething();
}
}
I wonder if the number of calls to doSomething function would be less if I was working on a lower clock rate. In that case, I would like to limit the number of calls to doSomething function to once for each second. The second loop I have written is below.
float epsilon = 0.0001;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
if (abs(seconds - floor(seconds)) <= epsilon) {
doSomething();
}
}
Would that do the trick for different clock cycles or are there still problems? Also, I would like to know if there is a better way of doing this. I have never worked with clock rates before and trying to understand how concerns differ when working with limited resources.
Note: Using sleep is not an option.
If I understand the issue proberly, you could use a std::chrono::steady_clock that you just add a second to every time a second has passed.
Example:
#include <chrono>
auto end_time = std::chrono::steady_clock::now();
while (true) {
// only call doSomething once a second
if(end_time < std::chrono::steady_clock::now()) {
doSomething();
// set a new end time a second after the previous one
end_time += std::chrono::seconds(1);
}
// do something else
}
Ted's answer is fine if you are really doing something else in the loop; if not, though, this results in a busy wait which is just consuming up your CPU for nothing.
In such a case you should rather prefer letting your thread sleep:
std::chrono::milliseconds offset(200);
auto next = std::chrono::steady_clock::now();
for(;;)
{
doSomething();
next += offset;
std::this_thread::sleep_until(next);
}
You'll need to include chrono and thread headers for.
I decided to go with a much more simple approach at the end. Used an adjustable time interval and just stored the latest update time, without introducing any new mechanism. Honestly, now I don't know why I couldn't think of it at first. Overthinking is a problem. :)
double lastUpdateTimestamp = 0;
const double updateInterval = 1.0;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
if ((elapsedSeconds - lastUpdateTimestamp) >= updateInterval) {
doSomething();
lastUpdateTimestamp = elapsedSeconds;
}
}
I have a function here that can make program count, wait etc with least count of 1 millisecond. But i was wondering if i can do same will lower precision. I have read other answers but they are mostly about changing to linux or sleep is guesstimate and whats more is those answers were around a decade old so maybe there might have come new function to do it.
Here's function-
void sleep(unsigned int mseconds)
{
clock_t goal = mseconds + clock();
while (goal > clock());
}
Actually, i was trying to make a function similar to secure_compare but i dont think it is wise idea to waste 1 millisecond(current least count) on just comparing two strings.
Here is function i made for the same -
bool secure_compare(string a,string b){
clock_t limit=wait + clock(); //limit of time program can take to compare
bool x = (a==b);
if(clock()>limit){ //if time taken to compare is more increase wait so it takes this new max time for other comparisons too
wait = clock()-limit;
cout<<"Error";
secure_compare(a,b);
}
while(clock()<limit); //finishing time left to make it constant time function
return x;
}
You're trying to make a comparison function time-independent. There are basically two ways to do this:
Measure the time taken for the call and sleep the appropriate amount
This might only swap out one side channel (timing) with another (power consumption, since sleeping and computation might have different power usage characteristics).
Make the control flow more data-independent:
Instead of using the normal string comparison, you could implement your own comparison that compares all characters and not just up until the first mismatch, like this:
bool match = true;
size_t min_length = min(a.size(), b.size());
for (size_t i = 0; i < min_length; ++i) {
match &= (a[i] == b[i]);
}
return match;
Here, no branching (conditional operations) takes place, so every call of this method with strings of the same length should take roughly the same time. So the only side-channel information you leak is the length of the strings you compare, but that would be difficult to hide anyways, if they are of arbitrary length.
EDIT: Incorporating Passer By's comment:
If we want to reduce the size leakage, we could try to round the size up and clamp the index values.
bool match = true;
size_t min_length = min(a.size(), b.size());
size_t rounded_length = (min_length + 1023) / 1024 * 1024;
for (size_t i = 0; i < rounded_length; ++i) {
size_t clamped_i = min(i, min_length - 1);
match &= (a[clamped_i] == b[clamped_i]);
}
return match;
There might be a tiny cache timing sidechannel present (because we don't get any more cache misses if i > clamped_i), but since a and b should be in the cache hierarchy anyways, I doubt the difference is usable in any way.
I wanted to see which would access faster, a struct or a tuple, so I wrote a small program. However, when it finishes running the time recorded is 0.000000 for both. I'm pretty sure the program isn't finishing that fast(running on an online compiler since I am away from home)
#include <iostream>
#include <time.h>
#include <tuple>
#include <cstdlib>
using namespace std;
struct Item
{
int x;
int y;
};
typedef tuple<int, int> dTuple;
int main() {
printf("Let's see which is faster...\n");
//Timers
time_t startTime;
time_t currentTimeTuple;
time_t currentTimeItem;
//Delta times
double deltaTimeTuple;
double deltaTimeItem;
//Collections
dTuple tupleArray[100000];
struct Item itemArray[100000];
//Seed random number
srand(time(NULL));
printf("Generating tuple array...\n");
//Initialize an array of tuples with random ints
for(int i = 0; i < 100000; ++i)
{
tupleArray[i] = dTuple(rand() % 1000,rand() % 1000);
}
printf("Generating Item array...\n");
//Initialize an array of Items
for(int i = 0; i < 100000; ++i)
{
itemArray[i].x = rand() % 1000;
itemArray[i].y = rand() % 1000;
}
//Begin timer for tuple array
time(&startTime);
//Iterate through the array of tuples and print out each value, timing how long it takes
for(int i = 0; i < 100000; ++i)
{
printf("%d: %d", get<0>(tupleArray[i]), get<1>(tupleArray[i]));
}
//Get the time it took to go through the tuple array
time(¤tTimeTuple);
deltaTimeTuple = difftime(startTime, currentTimeTuple);
//Start the timer for the array of Items
time(&startTime);
//Iterate through the array of Items and print out each value, timing how long it takes
for(int i = 0; i < 100000; ++i)
{
printf("%d: %d", itemArray[i].x, itemArray[i].y);
}
//Get the time it took to go through the item array
time(¤tTimeItem);
deltaTimeItem = difftime(startTime, currentTimeItem);
printf("\n\n");
printf("It took %f seconds to go through the tuple array\nIt took %f seconds to go through the struct Item array\n", deltaTimeTuple, deltaTimeItem);
return 0;
}
According to www.cplusplus.com/reference/ctime/time/, difftime should return the difference between two time_t.
time() generally returns the number of seconds since 00:00 hours, Jan 1, 1970 UTC (i.e., the current unix timestamp). So depending on your library implementation, anything below 1 second might appear as 0.
You should prefer use of <chrono> for benchmarking purpose:
chrono::high_resolution_clock::time_point t;
t = high_resolution_clock::now();
// do something to benchmark
chrono::high_resolution_clock::time_point t2 = chrono::high_resolution_clock::now();
cout <<"Exec in ms: "<< chrono::duration_cast<milliseconds>(t2 - t).count() <<endl;
You have nevertheless to consider the clock resolution. For example with windows it's 15 ms, so if you are close around 15 ms, or even below, you should increase the number of iterations.
I recommend checking the assembly language listing before you start performance timings.
The first thing to check is that the assembly language is significantly different between accessing a tuple versus accessing your structure. You can get a rough estimate of the timing difference by counting the number of different instructions.
Secondly, I recommend looking at the definition of Tuple. Something tells me that it is a structure declared similarly to yours. Thus I predict your timings should be near equal.
Thirdly, you should also compare std::pair since you only have 2 items in your tuple and a std::pair has two elements. Again, it should be the same as your structure.
Lastly, be prepared for "noise" in your data. The noise may come from the data cache misses, other programs using the data cache, delegation of code between cores, and any I/O. The closer you can get to running your program on a single core, free of interruptions, the better quality your data will be.
time_t is a count of seconds, and 100,000 trivial operations on a fast, modern computer could indeed finish in less than a second.
In a function that updates all particles I have the following code:
for (int i = 0; i < _maxParticles; i++)
{
// check if active
if (_particles[i].lifeTime > 0.0f)
{
_particles[i].lifeTime -= _decayRate * deltaTime;
}
}
This decreases the lifetime of the particle based on the time that passed.
It gets calculated every loop, so if I've 10000 particles, that wouldn't be very efficient because it doesn't need to(it doesn't get changed anyways).
So I came up with this:
float lifeMin = _decayRate * deltaTime;
for (int i = 0; i < _maxParticles; i++)
{
// check if active
if (_particles[i].lifeTime > 0.0f)
{
_particles[i].lifeTime -= lifeMin;
}
}
This calculates it once and sets it to a variable that gets called every loop, so the CPU doesn't have to calculate it every loop, which would theoretically increase performance.
Would it run faster than the old code? Or does the release compiler do optimizations like this?
I wrote a program that compares both methods:
#include <time.h>
#include <iostream>
const unsigned int MAX = 1000000000;
int main()
{
float deltaTime = 20;
float decayRate = 200;
float foo = 2041.234f;
unsigned int start = clock();
for (unsigned int i = 0; i < MAX; i++)
{
foo -= decayRate * deltaTime;
}
std::cout << "Method 1 took " << clock() - start << "ms\n";
start = clock();
float calced = decayRate * deltaTime;
for (unsigned int i = 0; i < MAX; i++)
{
foo -= calced;
}
std::cout << "Method 2 took " << clock() - start << "ms\n";
int n;
std::cin >> n;
return 0;
}
Result in debug mode:
Method 1 took 2470ms
Method 2 took 2410ms
Result in release mode:
Method 1 took 0ms
Method 2 took 0ms
But that doesn't work. I know it doesn't do exactly the same, but it gives an idea.
In debug mode, they take roughly the same time. Sometimes Method 1 is faster than Method 2(especially at fewer numbers), sometimes Method 2 is faster.
In release mode, it takes 0 ms. A little weird.
I tried measuring it in the game itself, but there aren't enough particles to get a clear result.
EDIT
I tried to disable optimizations, and let the variables be user inputs using std::cin.
Here are the results:
Method 1 took 2430ms
Method 2 took 2410ms
It will almost certainly make no difference what so ever, at least if
you compile with optimization (and of course, if you're concerned with
performance, you are compiling with optimization). The opimization in
question is called loop invariant code motion, and is universally
implemented (and has been for about 40 years).
On the other hand, it may make sense to use the separate variable
anyway, to make the code clearer. This depends on the application, but
in many cases, giving a name to the results of an expression can make
code clearer. (In other cases, of course, throwing in a lot of extra
variables can make it less clear. It's all depends on the application.)
In any case, for such things, write the code as clearly as possible
first, and then, if (and only if) there is a performance problem,
profile to see where it is, and fix that.
EDIT:
Just to be perfectly clear: I'm talking about this sort of code optimization in general. In the exact case you show, since you don't use foo, the compiler will probably remove it (and the loops) completely.
In theory, yes. But your loop is extremely simple and thus likeley to be heavily optimized.
Try the -O0 option to disable all compiler optimizations.
The release runtime might be caused by the compiler statically computing the result.
I am pretty confident that any decent compiler will replace your loops with the following code:
foo -= MAX * decayRate * deltaTime;
and
foo -= MAX * calced ;
You can make the MAX size depending on some kind of input (e.g. command line parameter) to avoid that.
I've made a small application that averages the numbers between 1 and 1000000. It's not hard to see (using a very basic algebraic formula) that the average is 500000.5 but this was more of a project in learning C++ than anything else.
Anyway, I made clock variables that were designed to find the amount of clock steps required for the application to run. When I first ran the script, it said that it took 3770000 clock steps, but every time that I've run it since then, it's taken "0.0" seconds...
I've attached my code at the bottom.
Either a.) It's saved the variables from the first time I ran it, and it's just running quickly to the answer...
or b.) something is wrong with how I'm declaring the time variables.
Regardless... it doesn't make sense.
Any help would be appreciated.
FYI (I'm running this through a Linux computer, not sure if that matters)
double avg (int arr[], int beg, int end)
{
int nums = end - beg + 1;
double sum = 0.0;
for(int i = beg; i <= end; i++)
{
sum += arr[i];
}
//for(int p = 0; p < nums*10000; p ++){}
return sum/nums;
}
int main (int argc, char *argv[])
{
int nums = 1000000;//atoi(argv[0]);
int myarray[nums];
double timediff;
//printf("Arg is: %d\n",argv[0]);
printf("Nums is: %d\n",nums);
clock_t begin_time = clock();
for(int i = 0; i < nums; i++)
{
myarray[i] = i+1;
}
double average = avg(myarray, 0, nums - 1);
printf("%f\n",average);
clock_t end_time = clock();
timediff = (double) difftime(end_time, begin_time);
printf("Time to Average: %f\n", timediff);
return 0;
}
You are measuring the I/O operation too (printf), that depends on external factors and might be affecting the run time. Also, clock() might not be as precise as needed to measure such a small task - look into higher resolution functions such as clock_get_time(). Even then, other processes might affect the run time by generating page fault interrupts and occupying the memory BUS, etc. So this kind of fluctuation is not abnormal at all.
On the machine I tested, Linux's clock call was only accurate to 1/100th of a second. If your code runs in less than 0.01 seconds, it will usually say zero seconds have passed. Also, I ran your program a total of 50 times in .13 seconds, so I find it suspicous that you claim it takes 2 seconds to run it once on your computer.
Your code incorrectly uses the difftime, which may display incorrect output as well if clock says time did pass.
I'd guess that the first timing you got was with different code than that posted in this question, becase I can't think of any way the code in this question could produce a time of 3770000.
Finally, benchmarking is hard, and your code has several benchmarking mistakes:
You're timing how long it takes to (1) fill an array, (2) calculate an average, (3) format the result string (4) make an OS call (slow) that prints said string in the right language/font/colo/etc, which is especially slow.
You're attempting to time a task which takes less than a hundredth of a second, which is WAY too small for any accurate measurement.
Here is my take on your code, measuring that the average takes ~0.001968 seconds on this machine.