I am trying to measure the time of atomics operations like bitwise for example.
The problem I had is that I can't just compute 0&1, because the IDE doing optimisation and ignoring this command, so I had to use assignment
num = 0&1.
So to get the accurate time of the operation without the assignment I was checking the time it takes to do an only assignment, I did that with x=0;
and return at the end something like this
return assign_and_comp - assign_only;
The problem is that I'm getting negative results pretty frequently.
Is it possible that num=0&1 cost less then x=0 ?
I cant use any time measuring function except gettimeofday() unfortunately
I've saw This soution , first im forced to use gettimeofday() but the most importent thing is that im mesuaring in the same way, geting the time before and after the operationg, and returning the diff.
BUT, i'm trying to isolate the assigment from the operationg, and this is not what they are doing in the soultion.
This is my full code.
#include <iostream>
#include <sys/time.h>
#include "osm.h"
#define NUM_ITERATIONS 1000000
#define SECOND_TO_NANO 1000000000.0
#define MICRO_TO_NANO 1000.0
using namespace std;
//globals variabels
struct timeval tvalBefore, tvalAfter;
double assign_only = 0.0;
int main() {
osm_init();
cout << osm_operation_time(50000) << endl;
return 0;
}
int osm_init(){
int x=0;
gettimeofday(&tvalBefore,NULL);
for (int i=0; i<NUM_ITERATIONS; i++){
x = 0;
}
gettimeofday(&tvalAfter,NULL);
assign_only = ((tvalAfter.tv_sec-tvalBefore.tv_sec)*SECOND_TO_NANO+
(tvalAfter.tv_usec-tvalBefore.tv_usec)*MICRO_TO_NANO)/NUM_ITERATIONS;
return 0;
}
double osm_operation_time(unsigned int iterations){
volatile int num=0;
gettimeofday(&tvalBefore,NULL);
for (int i=0; i<iterations; i++){
num = 0&1;
}
gettimeofday(&tvalAfter,NULL);
double assign_and_comp = ((tvalAfter.tv_sec-tvalBefore.tv_sec)*SECOND_TO_NANO+
(tvalAfter.tv_usec-tvalBefore.tv_usec)*MICRO_TO_NANO)/iterations;
return assign_and_comp-assign_only;
}
Related
I've written some code in c++ that is meant to find the minimum and maximum values that can be calculated by summing 4 of the 5 integers presented in an array. My thinking was that I could add up all elements of the array and loop through subtracting each of the elements to figure out which subtraction would lead to the smallest and largest totals. I know this isn't the smartest way to do it, but I'm just curious why this brute force method isn't working when I code it. Any feedback would be very much appreciated.
#include <iostream>
#include <vector>
#include <limits.h>
using namespace std;
void minimaxsum(vector<int> arr){
int i,j,temp;
int n=sizeof(arr);
int sum=0;
int low=INT_MAX;
int high=0;
for (j=0;j<n;j++){
for (i=0;i<n;i++){
sum+=arr[i];
}
temp=sum-arr[j];
if(temp<low){
low=temp;
}
else if(temp>high){
high=temp;
}
}
cout<<low;
cout<<high<<endl;
}
int main (){
vector<int> arr;
arr.push_back(1.0);
arr.push_back(2.0);
arr.push_back(3.0);
arr.push_back(1.0);
arr.push_back(2.0);
minimaxsum(arr);
return 0;
}
There are 2 problems.
Your code is unfortunately buggy and cannot deliver the correct result.
The solution approach, the design is wrong
I will show you what is wrong and how it could be refactored.
But first and most important: Before you start coding, you need to think. At least 1 day. After that, take a piece of paper and sketch your solution idea. Refactor this idea several times, which will take a complete additional day.
Then, start to write your code. This will take 3 minutes and if you do it with high quality, then it takes 10 minutes.
Let us look first at you code. I will add comments in the source code to indicate some of the problems. Please see:
#include <iostream>
#include <vector>
#include <limits.h> // Do not use .h include files from C-language. Use limits
using namespace std; // Never open the complete std-namepsace. Use fully qualified names
void minimaxsum(vector<int> arr) { // Pass per reference and not per value to avoid copies
int i, j, temp; // Always define variables when you need them, not before. Always initialize
int n = sizeof(arr); // This will not work. You mean "arr.size();"
int sum = 0;
int low = INT_MAX; // Use numeric_limits from C++
int high = 0; // Initialize with MIN value. Otherwise it will fail for negative integers
for (j = 0; j < n; j++) { // It is not understandable, why you use a nested loop, using the same parameters
for (i = 0; i < n; i++) { // Outside sum should be calculated only once
sum += arr[i]; // You will sum up always. Sum is never reset
}
temp = sum - arr[j];
if (temp < low) {
low = temp;
}
else if (temp > high) {
high = temp;
}
}
cout << low; // You miss a '\n' at the end
cout << high << endl; // endl is not necessary for cout. '\n' is sufficent
}
int main() {
vector<int> arr; // use an initializer list
arr.push_back(1.0); // Do not push back doubles into an integer vector
arr.push_back(2.0);
arr.push_back(3.0);
arr.push_back(1.0);
arr.push_back(2.0);
minimaxsum(arr);
return 0;
}
Basically your idea to subtract only one value from the overall sum is correct. But there is not need to calculate the overall sum all the time.
Refactoring your code to a working, but still not an optimal C++ solution could look like:
#include <iostream>
#include <vector>
#include <limits>
// Function to show the min and max sum from 4 out of 5 values
void minimaxsum(std::vector<int>& arr) {
// Initialize the resulting values in a way, the the first comparison will always be true
int low = std::numeric_limits<int>::max();
int high = std::numeric_limits<int>::min();;
// Calculate the sum of all 5 values
int sumOf5 = 0;
for (const int i : arr)
sumOf5 += i;
// Now subtract one value from the sum of 5
for (const int i : arr) {
if (sumOf5 - i < low) // Check for new min
low = sumOf5 - i;
if (sumOf5 - i > high) // Check for new max
high = sumOf5 - i;
}
std::cout << "Min: " << low << "\tMax: " << high << '\n';
}
int main() {
std::vector<int> arr{ 1,2,3,1,2 }; // The test Data
minimaxsum(arr); // Show min and max result
}
When i run the source code on sample cases of their example it runs fine but when I submit the question It says runtime error.
Here is my Source code and link of the question.
Question link - https://www.codechef.com/SEPT21C/submit/MNDIGSM2
Below is the code.
#include <iostream>
#include <vector>
// #include <bits/stdc++.h>
using namespace std;
// #define fast ios_base::sync_with_stdio(0);cin.tie(0);cout.tie(0);
int converter(int n , int b){
vector<int> vec;
int sum = 0;
while(n>0){
vec.push_back(n%b);
n = n / b;
}
int vecSize = vec.size();
for(int i = 0;i<vecSize;i++){
// cout<<
sum = sum + vec[i];
}
return sum;
}
int minVal(vector<int> arr , int len){
int min = arr[0], c = 0;
// if(arr)
for(int i = 1 ; i< len;i++){
if (arr[i] < min){
min = arr[i];
c = i;
}
}
return c;
}
int main() {
// your code goes here
// fast;
int test;
cin>>test;
while(test--){
int n ,r;
cin>>n>>r;
int l = 2;
// ll copy = l;
int arSize = (r-2)+1;
vector<int> arr(arSize);
for(int i = 0;i< arSize ;i++){
arr[i] = converter(n,l);
l++;
}
int tobe = minVal(arr , arSize);
cout<<tobe + 2<<endl;
}
return 0;
}
Maybe I do not understand the question fully. There is not enough information available.
It could be that the program slows down because the usage of the std::vector. First you calculate, then store the values and then iterate again over all values.
This is not necessary. You can do all calculations inline without the need for additional storage.
And, additionally, all these "contest" questions do not have the intention, to improve your programming skills.
Basically the language doesn't matter. The important thing is the algorithm. They want you to find a "good" algorithm.
Bruteforcing is nearly never a feasible solution. If you read about big numbers like 10^12, then you know already in advance that you will get a TLE with the brute force solution.
Regarding this horrible and non compliant C++ slang that is used on this "competetion" sides, please note that this is nearly never necessary. You have no time contraints to submit a solution. So, you could use also real C++ code.
Anyway. I corrected your code and added meaningful varibale names, comments and formatting. So, logically, the approach is the same, but it is readable.
Of course it may fail as well, because it is still brute forcing . . .
#include <iostream>
#include <limits>
constexpr unsigned long long BaseStart = 2ull;
int main() {
// Get number of test cases
unsigned int numberOfTestCases{};
std::cin >> numberOfTestCases;
// Work on all test cases
for (unsigned int testCase{}; testCase < numberOfTestCases; ++testCase) {
// Get the value to check and the upper limit for the base
unsigned long long valueToCheck{}, upperLimitBase{};
std::cin >> valueToCheck >> upperLimitBase;
// Here we will store the minimum sum to check
unsigned long long minimumsumOfDigits{std::numeric_limits<unsigned long long>::max()};
// Here we will store the result
unsigned long long minimumBase{};
for (unsigned long long base{BaseStart}; base <= upperLimitBase; ++base) {
// And this will be the running sumOfDigits
unsigned long long runningSumOfDigits{};
// get the digits of the value and calculate the running sum
unsigned long long value{valueToCheck};
while (value > 0) {
// Get digits via modulo division and add up
runningSumOfDigits += value % base;
value /= base;
}
// Get current minimum
if (runningSumOfDigits < minimumsumOfDigits) {
minimumsumOfDigits = runningSumOfDigits;
minimumBase = base;
}
}
std::cout << minimumBase << '\n';
}
return 0;
}
This code can of course be optimized further, but for this I would need more information . . .
For the following code which generates random numbers for Monte Carlo simulation, I need to receive the exact sum for each run, but this will not happen, although I have fixed the seed. I would appreciate it if anyone could point out the problem with this code
#include <cmath>
#include <random>
#include <iostream>
#include <chrono>
#include <cfloat>
#include <iomanip>
#include <cstdlib>
#include <omp.h>
#include <trng/yarn2.hpp>
#include <trng/mt19937_64.hpp>
#include <trng/uniform01_dist.hpp>
using namespace std;
using namespace chrono;
const double landa = 1;
const double exact_solution = landa / (pow(landa, 2) + 1);
double function(double x) {
return cos(x) / landa;
}
int main() {
int rank;
const int N = 1000000;
double sum = 0.0;
trng::yarn2 r[6];
for (int i = 0; i <6; i++)
{
r[i].seed(0);
}
for (int i = 0; i < 6; i++)
{
r[i].split(6,i);
}
trng::uniform01_dist<double> u;
auto start = high_resolution_clock::now();
#pragma omp parallel num_threads(6)
{
rank=omp_get_thread_num();
#pragma omp for reduction (+: sum)
for (int i = 0; i<N; ++i) {
//double x = distribution(g);
double x= u(r[rank]);
x = (-1.0 / landa) * log(1.0 - x);
sum = sum+function(x);
}
}
double app = sum / static_cast<double> (N);
auto end = high_resolution_clock::now();
auto diff=duration_cast<milliseconds>(end-start);
cout << "Approximation is: " <<setprecision(17) << app << "\t"<<"Time: "<< setprecision(17) << diff.count()<<" Error: "<<(app-exact_solution)<< endl;
return 0;
}
TL;DR The problem is two-fold:
Floating point addition is not associative;
You are generating different random number for each thread.
I need to receive the exact sum for each run, but this will not
happen, although I have fixed the seed. I would appreciate it if
anyone could point out the problem with this code
First, you have a race-condition on rank=omp_get_thread_num();, the variable rank is shared among all threads, to fix that you can declared the variable rank inside the parallel region, hence, making it private to each thread.
#pragma omp parallel num_threads(6)
{
int rank=omp_get_thread_num();
...
}
In your code, you should not expect that the value of the sum will be the same for different number of threads. Why ?
because you are adding doubles in parallel
double sum = 0.0;
...
#pragma omp for reduction (+: sum)
for (int i = 0; i<N; ++i) {
//double x = distribution(g);
double x= u(r[rank]);
x = (-1.0 / landa) * log(1.0 - x);
sum = sum+function(x);
}
and from What Every Computer Scientist Should Know about Floating
Point Arithmetic one can read:
Another grey area concerns the interpretation of parentheses. Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. For example, the
expression (x+y)+z has a totally different answer than x+(y+z) when
x = 1e30, y = -1e30 and z = 1 (it is 1 in the former case, 0 in the
latter).
Hence, from that you conclude that floating point addition is not
associative, and the reason why for a different number of threads you might have different sum values.
You are generating different random values per thread:
for (int i = 0; i < 6; i++)
{
r[i].split(6,i);
}
Consequently, for different number of threads, the variable sum
gets different results as well.
As kindly point out by jérôme-richard in the comments:
Note that more precise algorithm like the Kahan summation can
significantly reduces the rounding issue while being still relatively
fast.
I have a small piece of code that I would like to parallelize as I upscale. I've been using cilk_for from Cilk Plus to run the multithreading. The trouble is that I get a different result depending on the number of workers.
I've read that this might be due to a race condition, but I'm not sure what specifically about the code causes that or how to ameliorate it. Also, I realize that long and __float128 are overkill for this problem, but might be necessary in the upscaling.
Code:
#include <assert.h>
#include "cilk/cilk.h"
#include <cstring>
#include <iostream>
#include <math.h>
#include <stdio.h>
#include <string>
#include <vector>
using namespace std;
__float128 direct(const vector<double>& Rpct, const vector<unsigned>& values, double Rbase, double toWin) {
unsigned count = Rpct.size();
__float128 sumProb = 0.0;
__float128 rProb = 0.0;
long nCombo = static_cast<long>(pow(2, count));
// for (long j = 0; j < nCombo; ++j) { //over every combination
cilk_for (long j = 0; j < nCombo; ++j) { //over every combination
vector<unsigned> binary;
__float128 prob = 1.0;
unsigned point = Rbase;
for (unsigned i = 0; i < count; ++i) { //over all the individual events
long exp = static_cast<long>(pow(2, count-i-1));
bool odd = (j/exp) % 2;
if (odd) {
binary.push_back(1);
point += values[i];
prob *= static_cast<__float128>(Rpct[i]);
} else {
binary.push_back(0);
prob *= static_cast<__float128>(1.0 - Rpct[i]);
}
}
sumProb += prob;
if (point >= toWin) rProb += prob;
assert(sumProb >= rProb);
}
//print sumProb
cout << " sumProb = " << (double)sumProb << endl;
assert( fabs(1.0 - sumProb) < 0.01);
return rProb;
}
int main(int argc, char *argv[]) {
vector<double> Rpct;
vector<unsigned> value;
value.assign(20,1);
Rpct.assign(20,0.25);
unsigned Rbase = 22;
unsigned win = 30;
__float128 rProb = direct(Rpct, value, Rbase, win);
cout << (double)rProb << endl;
return 0;
}
Sample output for export CILK_NWORKERS=1 && ./code.exe:
sumProb = 1
0.101812
Sample output for export CILK_NWORKERS=4 && ./code.exe:
sumProb = 0.948159
Assertion failed: (fabs(1.0 - sumProb) < 0.01), function direct, file code.c, line 61.
Abort trap: 6
It is because of a race condition. cilk_for is implementation of parallel for algorithm. If you want to use parallel for you must use independent iteration (independent data). It`is very important. You have to use cilk reducers for your case: https://www.cilkplus.org/tutorial-cilk-plus-reducers
To clarify, there is at least one race on sumProb. Each of the parallel workers will do a read/modify/write on that location. As sribin mentioned above, solving problems like this is what reducers are for.
It's entirely possible that there's more than one race in your program. The only way to be sure is to run it under a race detector, since finding races is one of the things that computers are much better at than humans. A free possibility is the Cilkscreen race detector, available from the cilkplus.org website. Unfortunately it doesn't support gcc/g++.
I am trying to learn how to use clock(). Here is a piece of code that i have
int main()
{
srand(time(NULL));
clock_t t;
int num[100000];
int total=0;
t=clock();
cout<<"tick:"<<t<<endl;
for (int i=0;i<100000;i++)
{
num[i]=rand();
//cout<<num[i]<<endl;
}
for(int j=0;j<100000;j++)
{
total+=num[j];
}
t=clock();
cout<<"total:"<<total<<endl;
cout<<"ticks after loop:"<<t<<endl;
//std::cout<<"The number of ticks for the loop to caluclate total:"<<t<<"\t time is seconds:"<<((float)t)/CLOCKS_PER_SEC<<endl;
cin.get();
}
The result that i get is in below image. I don't understand why the tick count are same even though there are two big loops in between.
The clock() function has a finite resolution. On VC2013 it is once per millisec. (Your system may vary). If you call clock() twice in the same millisecond (or whatever) you get the same value.
in <ctime> there is a constant CLOCKS_PER_SEC which tells you how many ticks per second. For VC2012 that is 1000.
** Update 1 **
You said you're in Windows. Here's some Win-specific code that gets higher resolution time. If I get time I'll try to do something portable.
#include <iostream>
#include <vector>
#include <ctime>
#include <Windows.h>
int main()
{
::srand(::time(NULL));
FILETIME ftStart, ftEnd;
const int nMax = 1000*1000;
std::vector<unsigned> vBuff(nMax);
int nTotal=0;
::GetSystemTimeAsFileTime(&ftStart);
for (int i=0;i<nMax;i++)
{
vBuff[i]=rand();
}
for(int j=0;j<nMax;j++)
{
nTotal+=vBuff[j];
}
::GetSystemTimeAsFileTime(&ftEnd);
double dElapsed = (ftEnd.dwLowDateTime - ftStart.dwLowDateTime) / 10000.0;
std::cout << "Elapsed time = " << dElapsed << " millisec\n";
return 0;
}
** Update 2 **
Ok, here's the portable version.
#include <iostream>
#include <vector>
#include <ctime>
#include <chrono>
// abbreviations to avoid long lines
typedef std::chrono::high_resolution_clock Clock_t;
typedef std::chrono::time_point<Clock_t> TimePoint_t;
typedef std::chrono::microseconds usec;
uint64_t ToUsec(Clock_t::duration t)
{
return std::chrono::duration_cast<usec>(t).count();
}
int main()
{
::srand(static_cast<unsigned>(::time(nullptr)));
const int nMax = 1000*1000;
std::vector<unsigned> vBuff(nMax);
int nTotal=0;
TimePoint_t tStart(Clock_t::now());
for (int i=0;i<nMax;i++)
{
vBuff[i]=rand();
}
for(int j=0;j<nMax;j++)
{
nTotal+=vBuff[j];
}
TimePoint_t tEnd(Clock_t::now());
uint64_t nMicroSec = ToUsec(tEnd - tStart);
std::cout << "Elapsed time = "
<< nMicroSec / 1000.0
<< " millisec\n";
return 0;
}
Strong suggestion:
Run the same benchmark, but try multiple, alternative methods. For example:
clock_gettime
/proc/pid/stat
GetProcessTimes
getrusage
Etc.
The problem with (Posix-compliant) "clock()" is that it isn't necessarily accurate enough for meanintful benchmarks, dependent on your compiler library/platform.
Time has limited accuracy (perhaps only several milliseconds)... And on Linux clock has been slightly improved in very recent libc. At last, your loop is too small (a typical elementary C instruction runs in less than a few nanoseconds). Make it bigger, e.g. do it a billion times. But then you should declare static int num[1000000000]; to avoid eating too much stack space.