Most of the for loops I have read/written start from 0 and to be fair most of the code I have read are used for embedded systems and they were in C/C++. In embedded systems the readability is not as important as code efficiency in some cases. Therefore, I am not sure which of the following cases would be a better choice:
version 1
for(i = 0; i < allowedNumberOfIteration; i++)
{
//something that may take from 1 iteration to allowedNumberOfIteration before it happens
if(somethingHappened)
{
if(i + 1 > maxIteration)
{
maxIteration = i + 1;
}
}
}
Version 2
for(i = 1; i <= allowedNumberOfIteration; i++)
{
//something that may take from 1 iteration to allowedNumberOfIteration before it happens
if(somethingHappened)
{
if(i > maxIteration)
{
maxIteration = i;
}
}
}
Why first version is better in my opinion:
1.Most loops starts with 0. So, maybe experienced programmers find it to be better if it starts from 0.
Why second version is better in my opinion:
To be fair if there was an array in the function starting from 0
would be great because the index of arrays start from zero. But in
this part of the code no arrays are used.
Beside the second version looks simpler because you do not have to think about the '+1'.
Things I do not know
1) Is there any performance difference?
2) Which version is better?
3) Are there any other aspect that should be considered in deciding the starting point?
4) Am I worrying too much?
1) No
2) Neither
3) Arrays in C and C++ are zero-based.
4) Yes.
Arrays of all forms in C++ are zero-based. I.e. their index start at zero and goes up to the size of the array minus one. For example an array of five elements will have the indexes 0 to 4 (inclusive).
That is why most loops in C++ are starting at zero.
As for your specific list of questions, for 1 there might be a performance difference. If you start a loop at 1 then you might need to subtract 1 in each iterator if you use the value as an array index. Or if you increase the size of the arrays then you use more memory.
For 2 it really depends on what you're iterating over. Is it over array indexes, then the loop starting at zero is clearly better. But you might need to start a loop at any value, it really depends on what you're doing and the problem you're trying to solve.
For 3, what you need to consider is what you're using the loop for.
And 4, maybe a little. ;)
This argument comes from a small, 3-page note by the famous computer scientist Dijkstra (the one from Dijkstra's algorithm). In it, he lays out the reasons we might index starting at zero, and the story begins with trying to iterate over a sequence of natural numbers (meaning a sequence on the number line 0, 1, 2, 3, ...).
There are 4 possibilities to index 2, 3, ..., 12.
a.) 2 <= i < 13
b.) 1 < i <= 12
c.) 2 <= i <= 12
d.) 1 < i < 13
He mentions that a.) and b.) have the advantage that the difference of the two bounds equals the number of elements in the sequence. He also mentions if two sequences are adjacent, the upper bound of one equals the lower bound of the other. He says this doesn't help decide between a.) or b.) so he will start afresh.
He immediately removes b.) and d.) from the list since, if we were to start a natural sequence with zero, they would have bounds outside the natural numbers (-1), which is "ugly". He completes the observation by saying we prefer <= for the lower bound -- leaving us with a.) and c.).
For an empty set, he notes that in b.) and c.) will have -1 for its upper bound, which is also "ugly".
All three of these observations leads to the convention to represent a sequence of natural numbers with a.), and that indeed is how most people write a for that goes over an array: for(int i = 0; i < size; ++i). We include the lower bound (i <= 0), and we exclude the upper bound (i < size).
If you were to use something like for(int i = 0; i <= iterations - 1; ++i) to do i iterations, you can see the ugliness he refers to in the case of the empty set. iterations - 1 would be -1 for zero iterations.
So by convention, we use a.) and due to indexing arrays at zero, we start a huge number for for loops with i = 0. Then, we reason parsimony - might as well do different things the exact same way if there is no other reason to do one or the other a different way.
Now, if we were to use a.) with 1-based indexing into an array instead of 0-based indexing, we would get for(int i = 1; i < size + 1; ++i). The + 1 is "ugly", so we prefer to start our range with i = 0.
In conclusion, you should do a for iterations times with for(int i = 0; i < iterations; ++i). Something like for(int i = 1; i <= iterations; ++i) is fairly understandable and works, but is there any good reason to add a different way to loop iterations times? Just use the same pattern as when indexing an array. In other words, use 0 <= i < size. Worse, the loop based on 1 <= i <= iterations doesn't have all the reasons Dijkstra came up with to support using 0 <= i < iterations as a convention.
You're not worrying too much. In fact, Dijkstra himself wondered the exact same question as has pretty much any serious programmer. Tuning your style like a craftsman who loves their trade is the ground a great programmer stands on. Pursuing parsimony and writing code the way others tend to write code (including yourself - the looping of an array!) are both sane, great things to pursue.
Due to this convention, when I see for(i = 1, I notice a departure from a convention. I am then more cautious around that code, thinking the logic within the for might depend on starting at 1 instead of 0. This is slight, but there's no reason to add that possibility when a convention is so widely used. If you happen to have a large for body, this complaint becomes less slight.
To understand why starting at one makes no sense, consider taking the argument to its natural conclusion - the argument of "but it makes sense to me!": You can start i at anything! If we free ourselves from convention, why not loop for(int i = 5; i <= iterations + 4; ++i)? Or for(int i = -5; i > -iterations - 5; --i)? Just do it the way a majority of programmers do in the majority of cases, and save being different for when there's a good reason - the difference signals to the programmer reading your code that the body of the for contains something unusual. With the standard way, we know the for is either indexing/ordering/doing arithmetic with a sequence starting at 0 or executing some logic iterations times in a row.
Note how prevalent this convention is too. In C++, every standard container iterates between [start, end), which corresponds to a.) above. There, they do it so that the end condition can be iter != end, but the fact that we already do the logic one way and that that one way has no immediate drawbacks flows naturally into the argument of "Why do it two different ways when we already do it this way in this context?" In his little paper, Dijkstra also notes a language called Mesa that can do a.), b.), c.), or d.) with particular syntax. He claims that there, a.) has won out in practice, and the others are associated with the cause of bugs. He then laments how FORTRAN indexes at 1 and how PASCAL took on c.) by convention.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
class Solution {
public:
vector<vector<int>> threeSum(vector<int>& nums) {
vector<int> v;
vector<vector<int>> ans;
int n=nums.size();
sort(nums.begin(),nums.end());
for(int i=0;i<n;i++){
for(int j=i+1;j<n;j++){
for(int k=j+1;k<n;k++){
if(nums[i]+nums[j]+nums[k]==0 && i!=j && i!=k && j!=k){
v.push_back(nums[i]);
v.push_back(nums[j]);
v.push_back(nums[k]);
ans.push_back(v);
}
}
}
}
return ans;
}
};
it is not showing an error but it is displaying wrong answer as i have given in the attachment
Input: [-1, 0, 1, 2, -1, 4]
Your output: [[-1, -1, 2], [-1, -1, 2, -1, 0, 1], [-1, -1, 2, -1, 0, 1, -1, 0, 1]]
Expected output: [[-1, -1, 2], [-1, 0, 1]]
I can understand the problem with pushing back more and more values the my vector v. OK.
But maybe, somebody could give me a hint on how to tackle the problem with the duplicates?
Any help for me as a new user is highly welcome and appreciated.
Of course, we will help you here on SO.
Starting with a new language is never that easy and there may by some things that are not immediately clear in the beginning. Additionally, I do apologize for any rude comments that you may see, but you can be assured that the vast majority of the members of SO are very supportive.
I want to first give you some information on pages like Leetcode and Codeforces and the like. Often also referred to as “competitive programming” pages. Sometimes people misunderstand this and they think that you have only a limited time to submit the code. But that is not the case. There are such competitions but usually not on the mentioned pages. The bad thing is, the coding style used in that real competition events is also used on the online pages. And that is really bad. Because this coding style is that horrible that no serious developer would survive one day in a real company who needs to earn money with software and is then liable for it.
So, these pages will never teach you or guide you how to write good C++ code. And even worse, if newbies start learning the language and see this bad code, then they learn bad habits.
But what is then the purpose of such pages?
The purpose is to find a good algorithm, mostly optimized for runtime execution speed and often also for low memory consumption.
So, the are aiming at a good design. The Language or coding style does not matter for them. So, you can submit even completely obfuscated code or “code golf” solutions, as long at is it fast, it does not matter.
So, do never start to code immediately as a first step. First, think 3 days. Then, take some design tool, like for example a piece of paper, and sketch a design. Then refactor you design and then refactor your design and then refactor your design and then refactor your design and then refactor your design and so one. This may take a week.
And next, search for an appropriate programming language that you know and can handle your design.
And finally, start coding. Because you did a good design before, you can use long and meaningful variable names and write many many comments, so that other people (and you, after one month) can understand your code AND your design.
OK, maybe understood.
Now, let’s analyze your code. You selected a brute force solution with a triple nested loop. That could work for a low number of elements, but will result in most cases in a so called TLE (Time Limit Exceeded) error. Nearly all problems on those pages cannot be solved with brute force. Brute force solutions are always an indicator that you did not do the above design steps. And this leads to additional bugs.
Your code has too major semantic bugs.
You define in the beginning a std::vector with the name “v”. And then, in the loop, after you found a triplet meeting the given condition, you push_back the results in the std::vector. This means, you add 3 values to the std::vector “v” and now there are 3 elements in it. In the next loop run, after finding the next fit, you again push_back 3 additional values to your std::vector ”v” and now there are 6 elements in it. In the next round 9 elements and so on.
How to solve that?
You could use the std::vector’s clear function to delete the old elements from the std::vector at the beginning of the most inner loop, after the if-statement. But that is basically not that good, and, additionally, time consuming. Better is to follow the general idiom, to define variables as late as possible and at that time, when it is needed. So, if you would define your std::vector ”v” after the if statement, then the problem is gone. But then, you would additionally notice that it is only used there and nowhere else. And hence, you do not need it at all.
You may have seen that you can add values to a std::vector by using an initializer list. Something like:
std::vector<int> v {1,2,3};
With that know-how, you can delete your std::vector “v” and all related code and directly write:
ans.push_back( { nums[i], nums[j], nums[k] } );
Then you would have saved 3 unnecessary push_back (and a clear) operations, and more important, you would not get result sets with more than 3 elements.
Next problem. Duplicates. You try to prevent the storage of duplicates by writing && i!=j && i!=k && j!=k. But this will not work in general, because you compare indices and not values and because also the comparison is also wrong. The Boolean expressions is a tautology. It is always true. You initialize your variable j with i+1 and therefore “i” can never be equal to “j”. So, the condition i != j is always true. The same is valid for the other variables.
But how to prevent duplicate entries. You could do some logical comparisons, or first store all the triplets and later use std::unique (or other functions) to eliminate duplicates or use a container that would only store unique elements like a std::set. For the given design, having a time complexity of O(n^3), meaning it is already extremely slow, adding a std::set will not make things worse. I checked that in a small benchmark. So, the only solution is a completely different design. We will come to that later. Let us first fix the code, still using the brute force approach.
Please look at the below somehow short and elegant solution.
vector<vector<int>> threeSum(vector<int>& nums) {
std::set<vector<int>> ans;
int n = nums.size();
sort(nums.begin(), nums.end());
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; j++)
for (int k = j + 1; k < n; k++)
if (nums[i] + nums[j] + nums[k] == 0)
ans.insert({ nums[i], nums[j], nums[k] });
return { ans.begin(), ans.end() };
}
But, unfortunately, because of the unfortunate design decision, it is 20000 times slower for big input than a better design. And, because the online test programs will work with big input vectors, the program will not pass the runtime constraints.
How to come to a better solution. We need to carefully analyze the requirements and can also use some existing know-how for similar kind of problems.
And if you read some books or internet articles, then you often get the hint, that the so called “sliding window” is the proper approach to get a reasonable solution.
You will find useful information here. But you can of course also search here on SO for answers.
for this problem, we would use a typical 2 pointer approach, but modified for the specific requirements of this problem. Basically a start value and a moving and closing windows . . .
The analysis of the requirements leads to the following idea.
If all evaluated numbers are > 0, then we can never have a sum of 0.
It would be easy to identify duplicate numbers, if they would be beneath each other
--> Sorting the input values will be very beneficial.
This will eliminate the test for half of the values with randomly distribute input vectors. See:
std::vector<int> nums { 5, -1, 4, -2, 3, -3, -1, 2, 1, -1 };
std::sort(nums.begin(), nums.end());
// Will result in
// -3, -2, -1, -1, -1, 1, 2, 3, 4, 5
And with that we see, that if we shift our window to the right, then we can sop the evaluation, as soon as the start of the window hits a positive number. Additionally, we can identify immediately duplicate numbers.
Then next. If we start at the beginning of the sorted vector, this value will be most likely very small. And if we start the next window with one plus the start of the current window, then we will have “very” negative numbers. And to get a 0 by summing 2 “very” negative numbers, we would need a very positive number. And this is at the end of the std::vector.
Start with
startPointerIndex 0, value -3
Window start = startPointerIndex + 1 --> value -2
Window end = lastIndexInVector --> 5
And yes, we found already a solution. Now we need to check for duplicates. If there would be an additional 5 at the 2nd last position, then we can skip. It will not add an additional different solution. So, we can decrement the end window pointer in such a case. Same is valid, if there would be an additional -2 at the beginning if the window. Then we would need to increment the start window pointer, to avoid a duplicate finding from that end.
Some is valid for the start pointer index. Example: startPointerIndex = 3 (start counting indices with 0), value will be -1. But the value before, at index 2 is also -1. So, no need to evaluate that. Because we evaluate that already.
The above methods will prevent the creation of duplicate entries.
But how to continue the search. If we cannot find a solution, the we will narrow down the window. This we will do also in a smart way. If the sum is too big, the obviously the right window value was too big, and we should better use the next smaller value for the next comparison.
Same on the starting side of the window, If the sum was to small, then we obviously need a bigger value. So, let us increment the start window pointer. And we do this (making the window smaller) until we found a solution or until the window is closed, meaning, the start window pointer is no longer smaller than the end window pointer.
Now, we have developed a somehow good design and can start coding.
We additionally try to implement a good coding style. And refactor the code for some faster implementations.
Please see:
class Solution {
public:
// Define some type aliases for later easier typing and understanding
using DataType = int;
using Triplet = std::vector<DataType>;
using Triplets = std::vector<Triplet>;
using TestData = std::vector<DataType>;
// Function to identify all unique Triplets(3 elements) in a given test input
Triplets threeSum(TestData& testData) {
// In order to save function oeverhead for repeatingly getting the size of the test data,
// we will store the size of the input data in a const temporary variable
const size_t numberOfTestDataElements{ testData.size()};
// If the given input test vector is empty, we also immediately return an empty result vector
if (!numberOfTestDataElements) return {};
// In later code we often need the last valid element of the input test data
// Since indices in C++ start with 0 the value will be size -1
// With taht we later avoid uncessary subtractions in the loop
const size_t numberOfTestDataElementsMinus1{ numberOfTestDataElements -1u };
// Here we will store all the found, valid and unique triplets
Triplets result{};
// In order to save the time for later memory reallocations and copying tons of data, we reserve
// memory to hold all results only one time. This will speed upf operations by 5 to 10%
result.reserve(numberOfTestDataElementsMinus1);
// Now sort the input test data to be able to find an end condition, if all elements are
// greater than 0 and to easier identify duplicates
std::sort(testData.begin(), testData.end());
// This variables will define the size of the sliding window
size_t leftStartPositionOfSlidingWindow, rightEndPositionOfSlidingWindow;
// Now, we will evaluate all values of the input test data from left to right
// As an optimization, we additionally define a 2nd running variable k,
// to avoid later additions in the loop, where i+1 woild need to be calculated.
// This can be better done with a running variable that will be just incremented
for (size_t i = 0, k = 1; i < numberOfTestDataElements; ++i, ++k) {
// If the current value form the input test data is greater than 0,
// As um with the result of 0 will no longer be possible. We can stop now
if (testData[i] > 0) break;
// Prevent evaluation of duplicate based in the current input test data
if (i and (testData[i] == testData[i-1])) continue;
// Open the window and determin start and end index
// Start index is always the current evaluate index from the input test data
// End index is always the last element
leftStartPositionOfSlidingWindow = k;
rightEndPositionOfSlidingWindow = numberOfTestDataElementsMinus1;
// Now, as long as if the window is not closed, meaning to not narrow, we will evaluate
while (leftStartPositionOfSlidingWindow < rightEndPositionOfSlidingWindow) {
// Calculate the sum of the current addressed values
const int sum = testData[i] + testData[leftStartPositionOfSlidingWindow] + testData[rightEndPositionOfSlidingWindow];
// If the sum is t0o small, then the mall value on the left side of the sorted window is too small
// Therefor teke next value on the left side and try again. So, make the window smaller
if (sum < 0) {
++leftStartPositionOfSlidingWindow;
}
// Else, if the sum is too biig, the the value on the right side of the window was too big
// Use one smaller value. One to the left of the current closing address of the window
// So, make the window smaller
else if (sum > 0) {
--rightEndPositionOfSlidingWindow;
}
else {
// Accodring to above condintions, we found now are triplet, fulfilling the requirements.
// So store this triplet as a result
result.push_back({ testData[i], testData[leftStartPositionOfSlidingWindow], testData[rightEndPositionOfSlidingWindow] });
// We know need to handle duplicates at the edges of the window. So, left and right edge
// For this, we remember to c
const DataType lastLeftValue = testData[leftStartPositionOfSlidingWindow];
const DataType lastRightValue = testData[rightEndPositionOfSlidingWindow];
// Check left edge. As long as we have duplicates here, we will shift the opening position of the window to the right
// Because of boolean short cut evaluation we will first do the comparison for duplicates. This will give us 5% more speed
while (testData[leftStartPositionOfSlidingWindow] == lastLeftValue && leftStartPositionOfSlidingWindow < rightEndPositionOfSlidingWindow)
++leftStartPositionOfSlidingWindow;
// Check right edge. As long as we have duplicates here, we will shift the closing position of the window to the left
// Because of boolean short cut evaluation we will first do the comparison for duplicates. This will give us 5% more speed
while (testData[rightEndPositionOfSlidingWindow] == lastRightValue && leftStartPositionOfSlidingWindow < rightEndPositionOfSlidingWindow)
--rightEndPositionOfSlidingWindow;
}
}
}
return result;
}
};
The above solution will outperform 99% of other solutions. I made many benchmarks to prove that.
It additionally contains tons of comments to explain what is going on there. And If I have selected “speaking” and meaningful variable names for a better understanding.
I hope, that I could help you a little.
And finally: I dedicate this answer to Sam Varshavchik and PaulMcKenzie.
Problem 1: suppose you have an array of n floats and you want to calculate an array of n running averages over three elements. The middle part would be straightforward:
for (int i=0; i<n; i++)
b[i] = (a[i-1] + a[i] + a[i+1])/3.
But you need to have separate code to handle the cases i==0 and i==(n-1). This is often done with extra code before the loop, extra code after the loop, and adjusting the loop range, e.g.
b[0] = (a[0] + a[1])/2.
for (int i=1; i<n-1; i++)
b[i] = (a[i-1] + a[i] + a[i+1])/3.;
b[n-1] = (a[n-1] + a[n-2])/2.
Even that is not enough, because the cases of n<3 need to be handled separately.
Problem 2. You are reading a variable-length code from an array (say implementing a UTF-8 to UTF-32 converter). The code reads a byte, and accordingly may read one or more bytes to determine the output. However, before each such step, it also needs to check if the end of the input array has been reached, and if so, perhaps load more data into a buffer, or terminate with an error.
Both of these problems are cases of loops where the interior of the loop can be expressed neatly, but the edges need special handling. I find these sort of problems the most prone to error and to messy programming. So here's my question:
Are there any C++ idioms which generalize wrapping such loop patterns in a clean way?
Efficiently and elegantly handling boundary conditions is troublesome in any programming language -- C++ has no magic hammer for this. This is a common problem with applying convolution filters to signals / images -- what do you do at the image boundaries where your kernel goes outside the image support?
There are generally two things you are trying to avoid:
out of bounds array indexing (which you must avoid), and
special computation (which is inelegant and results in slower code due to extra branching).
There are usually three approaches:
Avoid the boundaries -- this is the simplest approach and is often sufficient since the boundary case make up a tiny slice of the problem and can be ignored.
Extend the bounds of your buffer -- add extra columns/rows of padding to the array so the same code used in the general case can be used at the edges. Of course this raises the problem of what values to place in the padding -- this often depends on the problem you are solving and is considered in the next approach.
Special computation at the boundary -- this is what you do in your example. Of course how you do this is problem dependent and raises a similar issue as the previous approach -- what is the correct thing to do when my filter (in your case an averaging filter) extends beyond the array support? What should I consider the values to be outside the array support? Most image filter libraries provide some form of extrapolation options -- for example:
assume a value zero or some other constant (define a[i] = 0 if i < 0 || i >= n),
replicate the boundary value (e.g. a[i] = a[0] if i < 0 and a[i] = a[n-1] if i >= n)
wrap the value (define a[i] = a[(i + n) % n] -- makes sense of some cases -- e.g, texture filters)
mirror the border ((e.g. a[i] = a[abs(i+1)] if i < 0 and a[i] = a[2n - i -1] if i >= n)
other special case (what you do)
When reasonable, its best to separate the special case from the general case (like you do) to avoid inelegant and slow general cases. One could always wrap/hide the special case and general case in a function or operator (e.g., overload operator[]) , but this only sugar coats the problem like any contrived C++ idiom would. In a multi-threaded environment (e.g. CUDA / SIMD) you can do some other tricks be preloading out-of-bounds values, but you are still stuck with the same problem.
This is why programmers use the phrase "edge case" in referring any kind of special case programming and is often a time sink and a source for annoying errors. Some languages that efficiently support exception handling for out of bounds array indexing (e.g. Ada) can make for prettier code, but still cause the same pain.
Unfortunately the answer is NO.
There is no C++ idioms which generalize wrapping such loop patterns in a clean way!
You can do it by making something like this, but you still need to adjust window size.
template <typename T, int N>
T subscript(T (&data)[N], int index) {
if (index < 0 || index >= N) {
return 0;
}
return data[index];
}
for (int i = 0; i < n; ++i) {
b[i] = (subscript(a, i - 1) + subscript(a, i) + subscript(a, i + 1)) / 3.
}
My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero
Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.
When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.
Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.
You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]
I am using OpenCL C++ for the implementation of my project. I want to get the maximum speed/performance out of my GPU/s (depending on whether I have multiple GPUs or a single one). But for the purpose of this question, lets assume I have only one device.
Suppose I have an array of length 100.
double arr[100];
Now what currently I am doing is that I am calling the kernel through the following method.
kernelGA(cl::EnqueueArgs(queue[iter],
cl::NDRange(100)),
d_arr, // and some other buffers.
)
Now at the kernel side. I have one global id. that is:
int idx = get_global_id(0);
The way I want my kernel is to work is the following:
Each of the 100 work groups will take care of one element each.
There are some rules with using which each work group is updating the element of the array. eg:
if (arr[idx] < 5) {
arr[idx] = 10; // a very simple example.
}
For most of the parts, it is okay. But then there is one point where I want to interchange and where I want the threads/work items to communicate with each other. At that point, they don't seem to work and they don't seem to communicate.
eg:
if(arr[idx] < someNumber) {
arr[idx] = arr[idx + 1];
}
At this point, nothing seems to work. I tried to implement a for loop and to create a barrier
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
but it also doesn't work. It doesn't change the values of the array elements.
I have the following questions:
1. Why doesn't it work? Is my implementation wrong? The threads seem to update their own indexed array element correctly. But when it comes to communication between them, they don't work. Why?
2. Is my implementation of the barriers and letting only one work item wrong? Is there a better way to let one item take care of this part while the other items are waiting for this one to finish?
The code you wrote is serial:
if(arr[idx] < someNumber) {
arr[idx] = arr[idx + 1];
}
The worker N will take the result of worker N-1, N-1 the results of N-2, and so on.
So it means worker N needs to wait for all the others to complete. Which means the code is not parallel and will never be. You are far better computing that with a CPU than a GPU.
OpenCL design model, allows you to run multiple work items in parallel but the synchronization model only allows to synchronize inside the work group.
If you need global sync, is a clear sign that your algorithm is not for OpenCL.
Now, if I assume you just want the value of the last element. And what you really want is to perform a "sum" of all the array. Then, this is a reduction problem, and it is possible to perform it in log(N) time by parallelization in this fashion:
1st step, array[x] = array[x] + array[N/2+x] (x from 0 to N/2)
2nd step, array[x] = array[x] + array[N/4+x] (x from 0 to N/4)
...
log(N) passes
Each step will be a separate kernel, and therefore ensures all work items have finished before starting the next batch.
Another faster option is to perform reduction inside the work group, so if the work group size is 256, you can sum groups of 256 together in each pass. Which is faster than just reducing by 2 in each pass.
I suspect that your problem represents a problem that has limited ability to be made parallel, and thus is a poor fit for any kind of GPGPU solution.
Consider the following array of elements:
1 5 2 6 5 3 6 7 2 8 1 8 3 4 2
Now suppose we perform the following transformation on this data:
//A straightforward, serially-executed iteration on the entire array.
for(int i = 0; i < arr.size() - 1; i++) {
if(arr[i] < 5) arr[i] = arr[i + 1];
}
The result will then be
5 5 6 6 5 6 6 7 8 8 8 8 4 2 2
But what happens if the for loop executes in reverse?
for(int i = arr.size() - 2; i >= 0; i--) {
if(arr[i] < 5) arr[i] = arr[i + 1];
}
The result will then be
5 5 6 6 5 6 6 7 8 8 8 8 2 2 2
Note how the third-to-last number is different depending on the order of execution. My example input doesn't change much, but if your code has lots of numbers below the threshold chosen, it could completely change the entire array! Because GPGPU APIs don't make guarantees about the order of execution of individual work items—which means your order of execution could be like the first for-loop I wrote, or it could be like the second for-loop I wrote, or it could be a completely randomly shuffled order—you've written non-deterministic code, and the only way to make it deterministic is to use so many barriers that you're guaranteeing sequential ordering, at which point, there's literally no reason to be using a GPGPU API in the first place.
You could write something like the following instead, which would be deterministic:
if(arr[i] < 5) output[i] = arr[i + 1];
else output[i] = arr[i];
But that might require a reconsideration of your design constraints. I don't know, as I don't know what your program is ultimately doing.
Either way though, you need to spend some time reconsidering what you're actually trying to do.
I am given a array A[] having N elements which are positive integers
.I have to find the number of sequences of lengths 1,2,3,..,N that satisfy a particular property?
I have built an interval tree with O(nlogn) complexity.Now I want to count the number of sequences that satisfy a certain property ?
All the properties required for the problem are related to sum of the sequences
Note an array will have N*(N+1)/2 sequences. How can I iterate over all of them in O(nlogn) or O(n) ?
If we let k be the moving index from 0 to N(elements), we will run an algorithm that is essentially looking for the MIN R that satisfies the condition (lets say I), then every other subset for L = k also is satisfied for R >= I (this is your short circuit). After you find I, simply return an output for (L=k, R>=I). This of course assumes that all numerics in your set are >= 0.
To find I, for every k, begin at element k + (N-k)/2. Figure out if this defined subset from (L=k, R=k+(N-k)/2) satisfies your condition. If it does, then decrement R until your condition is NOT met, then R=1 is your MIN (your could choose to print these results as you go, but they results in these cases would be essentially printed backwards). If (L=k, R=k+(N-k)/2) does not satisfy your condition, then INCREMENT R until it does, and this becomes your MIN for that L=k. This degrades your search space for each L=k by a factor of 2. As k increases and approaches N, your search space continuously decreases.
// This declaration wont work unless N is either a constant or MACRO defined above
unsigned int myVals[N];
unsigned int Ndiv2 = N / 2;
unsigned int R;
for(unsigned int k; k < N; k++){
if(TRUE == TESTVALS(myVals, k, Ndiv2)){ // It Passes
for(I = NDiv2; I>=k; I--){
if(FALSE == TESTVALS(myVals, k, I)){
I++;
break;
}
}
}else{ // It Didnt Pass
for(I = NDiv2; I>=k; I++){
if(TRUE == TESTVALS(myVals, k, I)){
break;
}
}
}
// PRINT ALL PAIRS from L=k, from R=I to R=N-1
if((k & 0x00000001) == 0) Ndiv2++;
} // END --> for(unsigned int k; k < N; k++)
The complexity of the algorithm above is O(N^2). This is because for each k in N(i.e. N iterations / tests) there is no greater than N/2 values for each that need testing. Big O notation isnt concerned about the N/2 nor the fact that truly N gets smaller as k grows, it is concerned with really only the gross magnitude. Thus it would say N tests for every N values thus O(N^2)
There is an Alternative approach which would be FASTER. That approach would be to whenever you wish to move within the secondary (inner) for loops, you could perform a move have the distance algorithm. This would get you to your O(nlogn) set of steps. For each k in N (which would all have to be tested), you run this half distance approach to find your MIN R value in logN time. As an example, lets say you have a 1000 element array. when k = 0, we essentially begin the search for MIN R at index 500. If the test passes, instead of linearly moving downward from 500 to 0, we test 250. Lets say the actual MIN R for k = 0 is 300. Then the tests to find MIN R would look as follows:
R=500
R=250
R=375
R=312
R=280
R=296
R=304
R=300
While this is oversimplified, your are most likely going to have to optimize, and test 301 as well 299 to make sure youre in the sweet spot. Another not is to be careful when dividing by 2 when you have to move in the same direction more than once in a row.
#user1907531: First of all , if you are participating in an online contest of such importance at national level , you should refrain from doing this cheap tricks and methodologies to get ahead of other deserving guys. Second, a cheater like you is always a cheater but all this hampers the hard work of those who have put in making the questions and the competitors who are unlike you. Thirdly, if #trumetlicks asks you why haven't you tagged the ques as homework , you tell another lie there.And finally, I don't know how could so many people answer this question this cheater asked without knowing the origin/website/source of this question. This surely can't be given by a teacher for homework in any Indian school. To tell everyone this cheater has asked you the complete solution of a running collegiate contest in India 6 hours before the contest ended and he has surely got a lot of direct helps and top of that invited 100's others to cheat from the answers given here. So, good luck to all these cheaters .