I have an integer parameter which is supposed to control how many times in a particular run an event occurs.
For example, if the number of iterations for each run is 1000, then the parameter FREQ would be 5, if I wanted to have the event occur every 200 iterations. However, I want to be able to change the number of iterations, but keep the ratio the same, and also to be able to set the FREQ parameter to 0 if I don't want the event to occur at all.
Here is what I am currently using:
int N_ITER = 1000;
int FREQ = 5;
void doRun(){
int count = 0;
for (int i = 0; i < N_ITER; ++i){
if (FREQ > 0){
if (count < N_ITER/FREQ){
doSomething();
count++;
}
else{
doSomethingElse();
count = 0;
}
}
else{
doSomething();
}
}
}
This works fine, but it doesn't sit right with me having the nested conditionals, especially when I have two lots of doSomething(), I feel like it should be able to be accomplished more easily than that.
I tried making the one conditional if (FREQ > 0 && count < N_ITER/FREQ) but obviously that threw a Floating point exception because of the divide by zero.
I also tried using a try/catch block, but it really was no different, in terms of messiness, to using the nested conditionals. Is there a more elegant solution to this problem?
How about rearranging the condition? Instead of count < N_ITER/FREQ, use count*FREQ < N_ITER. If FREQ = 0, the expression will still be true.
int N_ITER = 1000;
int FREQ = 5;
void doRun() {
int count = 0;
for (int i = 0; i < N_ITER; ++i) {
if (count*FREQ < N_ITER) {
doSomething();
count++;
} else {
doSomethingElse();
count = 0;
}
}
}
Related
I'm trying to fix a SIGSEGV error in my program. I am not able to locate the site of error. The program compiles successfully in Xcode but does not provide me the results.
The goal of the program is to check whether the same element occurs in three separate arrays and return the element if it is more than 2 arrays.
#include <iostream>
using namespace std;
int main()
{
int i = 0 ,j = 0,k = 0;
int a[5]={23,30,42,57,90};
int b[6]={21,23,35,57,90,92};
int c[5]={21,23,30,57,90};
while(i< 5 or j< 6 or k< 5)
{
int current_a = 0;
int current_b = 0;
int current_c = 0;
{ if (i<5) {
current_a = a[i];
} else
{
;;
}
if (j<6)
{
current_b = b[j];
} else
{
;;
}
if (k<5)
{
current_c= c[k];
} else
{
;;
}
}
int minvalue = min((current_a,current_b),current_c);
int countoo = 0;
if (minvalue==current_a)
{
countoo += 1;
i++;
}
if (minvalue==current_b)
{
countoo +=1;
j++;
}
if (minvalue==current_c)
{
countoo += 1;
k++;
}
if (countoo >=2)
{
cout<< minvalue;
}
}
}
I am not getting any output for the code.
This is surely not doing what you want
int minvalue = min((current_a,current_b),current_c);
If min() is defined meaningfully (you really should provide an MCVE for a question like this), you want
int minvalue = min(min(current_a,current_b),current_c);
This will result in the minimum of the minimum of (a and b) and c, i.e. the minimum of all three, instead of the minimum of b and c. The comma operator , is important to understand this.
This seems to be a flag/counter to make a note across loop executions or count something
int countoo = 0;
It can however not work if you define the variable inside the loop.
You need to move that line BEFORE the while.
With this line you do not prevent the indexes to leave the size of the arrays,
that is very likely the source for the segfault.
while(i< 5 or j< 6 or k< 5)
In order to prevent segfaults, make sure that ALL indexes stay small enough,
instead of only at least one.
while(i< 5 && j< 6 && k< 5)
(By the way I initially seriously doubted that or can compile. I thought
with a macro for or it could, but I do not see that. It could be a new operator in a recent C++ standard update which I missed...
And it turns out that it is the case. I learned something here.)
This should fix the segfault.
To achieve the goal of the code I think you need to spend some additional effort on the algorithm. I do not see the code being related to the goal.
I picked up "Programming Principles and Practice using C++", and was doing an early problem involving the Sieve of Eratosthenes, and I'm having unexpected output, but I cannot pin down exactly what the problem is. Here is my code:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> prime;
std::vector<int> nonPrime;
int multiple = 0;
for(int i = 2; i < 101; i++) //initialized to first prime number, i will
// be the variable that should contain prime numbers
{
for(int j = 0; j < nonPrime.size(); j++) //checks i against
// vector to see if
// marked as nonPrime
{
if(i == nonPrime[j])
{
goto outer;//jumps to next iteration if number
// is on the list
}
}
prime.push_back(i); //adds value of i to Prime vector if it
//passes test
for(int j = i; multiple < 101; j++) //This loop is where the
// sieve bit comes in
{
multiple = i * j;
nonPrime.push_back(multiple);
}
outer:
;
}
for(int i = 0; i < prime.size(); i++)
{
std::cout << prime[i] << std::endl;
}
return 0;
}
The question only currently asks me to find prime numbers up to 100 utilizing this method. I also tried using this current 'goto' method of skipping out of a double loop under certain conditions, and I also tried using a Boolean flag with an if statement right after the check loop and simply used the "continue;" statement and neither had any effect.
(Honestly I figured since people say goto was evil perhaps it had consequences that I hadn't foreseen, which is why I tried to switch it out) but the problem doesn't call for me to use modular functions, so I assume it wants me to solve it all in main, ergo my problem of utilizing nested loops in main. Oh, and to further specify my output issues, it seems like it only adds multiples of 2 to the nonPrime vector, but everything else checks out as passing the test (e.g 9).
Can someone help me understand where I went wrong?
Given that this is not a good way to implement a Sieve of Eratosthenes, I'll point out some changes to your code to make it at least output the correct sequence.
Please also note that the indentation you choose is a bit misleading, after the first inner loop.
#include <iostream>
#include <vector>
int main()
{
std::vector<int> prime;
std::vector<int> nonPrime;
int multiple = 0;
for(int i = 2; i < 101; i++)
{
// you can use a flag, but note that usually it could be more
// efficiently implemented with a vector of bools. Try it yourself
bool is_prime = true;
for(int j = 0; j < nonPrime.size(); j++)
{
if(i == nonPrime[j])
{
is_prime = false;
break;
}
}
if ( is_prime )
{
prime.push_back(i);
// You tested 'multiple' before initializing it for every
// new prime value
for(multiple = i; multiple < 101; multiple += i)
{
nonPrime.push_back(multiple);
}
}
}
for(int i = 0; i < prime.size(); i++)
{
std::cout << prime[i] << std::endl;
}
return 0;
}
I am trying to evaluate a statistics problem via a Monte Carlo method. In this problem I am generating a random number and comparing it to a fixed probability number stored in a vector array titled comms_reliability. Assuming there is only one variable in the vector array, I am comparing the random number and the probability and tallying the results if the random number is greater than the reliability number. However, the vector array could also have two values, in which case I am producing two random numbers and comparing them to the two reliability numbers and. If both random numbers are bigger than the reliability numbers, I am tallying the scenarios. Theoretically this could continue on and on for as many values in the vector array as I want. However, through a failure of imagination I only know how to code this where the for statement is contained in multiple if statements
for each possible scenario. In this implementation I have to copy the same lines of code multiple times, and it also limits the commms_reliability array sizes that can be evaluated based on how many times I have copied these lines of code to handle the next array point. How can I do this where I only need one if statement. An example of how I have it coded currently is shown below.
int main(int argc, const char * argv[]) {
int sample_size = 1000000;
std::vector<float> comms_reliability = {0.6,0.6};
float tally = 0.0;
// rang() = random number generator
// if statement for comms_reliability array of size 1
if (comms_reliability.size() == 1) {
for (int i = 0; i < sample_size; i++){
if (rang() > comms_reliability[0]) tally = tally + 1.0;
}
}
// if statement 2 for comms_reliability array of size 2
if (comms_reliability.size() == 2) {
for (int i = 0; i < sample_size; i++){
if (rang() > comms_reliability[0] && rang() > comms_reliability[1]) tally = tally + 1.0;
}
}
// if statement 3 for comms_reliability array of size 3
if (comms_reliability.size() == 3) {
for (int i = 0; i < sample_size; i++){
if (rang() > comms_reliability[0] && rang() > comms_reliability[1] &&
rang() > comms_reliability[2]) tally = tally + 1.0;
}
}
If I understand you correctly you want to make sure that all elements of comms_reliability satisfy some criterion (namely being less than rang()) for each sample.
So make a loop over all elements and test each, or just use std::all_of:
// Lambda function used to test a single comm_reliability
auto is_reliable = [] (float r) { return rang() > r; };
// Iterate over your samples
for (int i = 0; i < sample_size; ++i) {
// If all elements satisfy your criterion ...
if (std::all_of(std::begin(comms_reliability),
std::end(comms_reliability),
is_reliable)) {
// .. perform your action
tally += 1.0;
}
}
Instead of the lambda function you could also use a normal function defined somewhere before:
bool is_reliable(float r) {
return rang() > r;
}
Note: Try to improve your variable/function naming.
use a flag to keep the value
int main(int argc, const char * argv[]) {
int sample_size = 1000000;
std::vector<float> comms_reliability = {0.6,0.6};
float tally = 0.0;
// rang() = random number generator
for (int i = 0; i < sample_size; i++){
boolean flag = true;
for(int j = 0; j < comms_reliability.size(); j++)
{
if (rang() <= comms_reliability[j])
{
flag = false;
break;
}
}
tally = flag ? tally + 1.0 : tally;
}
I am trying to solve this problem:
Given a string array words, find the maximum value of length(word[i]) * length(word[j]) where the two words do not share common letters. You may assume that each word will contain only lower case letters. If no such two words exist, return 0.
https://leetcode.com/problems/maximum-product-of-word-lengths/
You can create a bitmap of char for each word to check if they share chars in common and then calc the max product.
I have two method almost equal but the first pass checks, while the second is too slow, can you understand why?
class Solution {
public:
int maxProduct2(vector<string>& words) {
int len = words.size();
int *num = new int[len];
// compute the bit O(n)
for (int i = 0; i < len; i ++) {
int k = 0;
for (int j = 0; j < words[i].length(); j ++) {
k = k | (1 <<(char)(words[i].at(j)));
}
num[i] = k;
}
int c = 0;
// O(n^2)
for (int i = 0; i < len - 1; i ++) {
for (int j = i + 1; j < len; j ++) {
if ((num[i] & num[j]) == 0) { // if no common letters
int x = words[i].length() * words[j].length();
if (x > c) {
c = x;
}
}
}
}
delete []num;
return c;
}
int maxProduct(vector<string>& words) {
vector<int> bitmap(words.size());
for(int i=0;i<words.size();++i) {
int k = 0;
for(int j=0;j<words[i].length();++j) {
k |= 1 << (char)(words[i][j]);
}
bitmap[i] = k;
}
int maxProd = 0;
for(int i=0;i<words.size()-1;++i) {
for(int j=i+1;j<words.size();++j) {
if ( !(bitmap[i] & bitmap[j])) {
int x = words[i].length() * words[j].length();
if ( x > maxProd )
maxProd = x;
}
}
}
return maxProd;
}
};
Why the second function (maxProduct) is too slow for leetcode?
Solution
The second method does repetitive call to words.size(). If you save that in a var than it working fine
Since my comment turned out to be correct I'll turn my comment into an answer and try to explain what I think is happening.
I wrote some simple code to benchmark on my own machine with two solutions of two loops each. The only difference is the call to words.size() is inside the loop versus outside the loop. The first solution is approximately 13.87 seconds versus 16.65 seconds for the second solution. This isn't huge, but it's about 20% slower.
Even though vector.size() is a constant time operation that doesn't mean it's as fast as just checking against a variable that's already in a register. Constant time can still have large variances. When inside nested loops that adds up.
The other thing that could be happening (someone much smarter than me will probably chime in and let us know) is that you're hurting your CPU optimizations like branching and pipelining. Every time it gets to the end of the the loop it has to stop, wait for the call to size() to return, and then check the loop variable against that return value. If the cpu can look ahead and guess that j is still going to be less than len because it hasn't seen len change (len isn't even inside the loop!) it can make a good branch prediction each time and not have to wait.
I'm trying to get a good understanding of branch prediction by measuring the time to run loops with predictable branches vs. loops with random branches.
So I wrote a program that takes large arrays of 0's and 1's arranged in different orders (i.e. all 0's, repeating 0-1, all rand), and iterates through the array branching based on if the current index is 0 or 1, doing time-wasting work.
I expected that harder-to-guess arrays would take longer to run on, since the branch predictor would guess wrong more often, and that the time-delta between runs on two sets of arrays would remain the same regardless of the amount of time-wasting work.
However, as amount of time-wasting work increased, the difference in time-to-run between arrays increased, A LOT.
(X-axis is amount of time-wasting work, Y-axis is time-to-run)
Does anyone understand this behavior? You can see the code I'm running at the following code:
#include <stdlib.h>
#include <time.h>
#include <chrono>
#include <stdio.h>
#include <iostream>
#include <vector>
using namespace std;
static const int s_iArrayLen = 999999;
static const int s_iMaxPipelineLen = 60;
static const int s_iNumTrials = 10;
int doWorkAndReturnMicrosecondsElapsed(int* vals, int pipelineLen){
int* zeroNums = new int[pipelineLen];
int* oneNums = new int[pipelineLen];
for(int i = 0; i < pipelineLen; ++i)
zeroNums[i] = oneNums[i] = 0;
chrono::time_point<chrono::system_clock> start, end;
start = chrono::system_clock::now();
for(int i = 0; i < s_iArrayLen; ++i){
if(vals[i] == 0){
for(int i = 0; i < pipelineLen; ++i)
++zeroNums[i];
}
else{
for(int i = 0; i < pipelineLen; ++i)
++oneNums[i];
}
}
end = chrono::system_clock::now();
int elapsedMicroseconds = (int)chrono::duration_cast<chrono::microseconds>(end-start).count();
//This should never fire, it just exists to guarantee the compiler doesn't compile out our zeroNums/oneNums
for(int i = 0; i < pipelineLen - 1; ++i)
if(zeroNums[i] != zeroNums[i+1] || oneNums[i] != oneNums[i+1])
return -1;
delete[] zeroNums;
delete[] oneNums;
return elapsedMicroseconds;
}
struct TestMethod{
string name;
void (*func)(int, int&);
int* results;
TestMethod(string _name, void (*_func)(int, int&)) { name = _name; func = _func; results = new int[s_iMaxPipelineLen]; }
};
int main(){
srand( (unsigned int)time(nullptr) );
vector<TestMethod> testMethods;
testMethods.push_back(TestMethod("all-zero", [](int index, int& out) { out = 0; } ));
testMethods.push_back(TestMethod("repeat-0-1", [](int index, int& out) { out = index % 2; } ));
testMethods.push_back(TestMethod("repeat-0-0-0-1", [](int index, int& out) { out = (index % 4 == 0) ? 0 : 1; } ));
testMethods.push_back(TestMethod("rand", [](int index, int& out) { out = rand() % 2; } ));
int* vals = new int[s_iArrayLen];
for(int currentPipelineLen = 0; currentPipelineLen < s_iMaxPipelineLen; ++currentPipelineLen){
for(int currentMethod = 0; currentMethod < (int)testMethods.size(); ++currentMethod){
int resultsSum = 0;
for(int trialNum = 0; trialNum < s_iNumTrials; ++trialNum){
//Generate a new array...
for(int i = 0; i < s_iArrayLen; ++i)
testMethods[currentMethod].func(i, vals[i]);
//And record how long it takes
resultsSum += doWorkAndReturnMicrosecondsElapsed(vals, currentPipelineLen);
}
testMethods[currentMethod].results[currentPipelineLen] = (resultsSum / s_iNumTrials);
}
}
cout << "\t";
for(int i = 0; i < s_iMaxPipelineLen; ++i){
cout << i << "\t";
}
cout << "\n";
for (int i = 0; i < (int)testMethods.size(); ++i){
cout << testMethods[i].name.c_str() << "\t";
for(int j = 0; j < s_iMaxPipelineLen; ++j){
cout << testMethods[i].results[j] << "\t";
}
cout << "\n";
}
int end;
cin >> end;
delete[] vals;
}
Pastebin link: http://pastebin.com/F0JAu3uw
I think you may be measuring the cache/memory performance, more than the branch prediction. Your inner 'work' loop is accessing an ever increasing chunk of memory. Which may explain the linear growth, the periodic behaviour, etc.
I could be wrong, as I've not tried replicating your results, but if I were you I'd factor out memory accesses before timing other things. Perhaps sum one volatile variable into another, rather than working in an array.
Note also that, depending on the CPU, the branch prediction can be a lot smarter than just recording the last time a branch was taken - repeating patterns, for example, aren't as bad as random data.
Ok, a quick and dirty test I knocked up on my tea break which tried to mirror your own test method, but without thrashing the cache, looks like this:
Is that more what you expected?
If I can spare any time later there's something else I want to try, as I've not really looked at what the compiler is doing...
Edit:
And, here's my final test - I recoded it in assembler to remove the loop branching, ensure an exact number of instructions in each path, etc.
I also added an extra case, of a 5-bit repeating pattern. It seems pretty hard to upset the branch predictor on my ageing Xeon.
In addition to what JasonD pointed out, I would also like to note that there are conditions inside for loop, which may affect branch predictioning:
if(vals[i] == 0)
{
for(int i = 0; i < pipelineLen; ++i)
++zeroNums[i];
}
i < pipelineLen; is a condition like your ifs. Of course compiler may unroll this loop, however pipelineLen is argument passed to a function so probably it does not.
I'm not sure if this can explain wavy pattern of your results, but:
Since the BTB is only 16 entries long in the Pentium 4 processor, the prediction will eventually fail for loops that are longer than 16 iterations. This limitation can be avoided by unrolling a loop until it is only 16 iterations long. When this is done, a loop conditional will always fit into the BTB, and a branch misprediction will not occur on loop exit. The following is an exam ple of loop unrolling:
Read full article: http://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts
So your loops are not only measuring memory throughput but they are also affecting BTB.
If you have passed 0-1 pattern in your list but then executed a for loop with pipelineLen = 2 your BTB will be filled with something like 0-1-1-0 - 1-1-1-0 - 0-1-1-0 - 1-1-1-0 and then it will start to overlap, so this can indeed explain wavy pattern of your results (some overlaps will be more harmful than others).
Take this as an example of what may happen rather than literal explanation. Your CPU may have much more sophisticated branch prediction architecture.