Using modulus operator to keep within indices of container - c++

Assume I have a vector v with m elements in it, and a random access index to the vector called i.
When I increment the index, if it goes out of bounds, I want to index the first (zeroth) element. Similarly, when I decrement the index, if the index is < 0, I want to index to last element. At the moment I'm only moving through the container one element at a time, so came up with this function:
unsigned int GetIndexModM(int index,unsigned int m) {return (index + m) % m;}
The call-site might look like this:
std::vector<Whatever> v = ... // initialise with 5 elements
unsigned int i = 0;
unsigned int j = GetIndexModM(static_cast<int>(i) - 1,v.size()); // get preceeding index
This function will fail however if one subtracts a value > m from index:
unsigned int j = GetIndexModM(static_cast<int>(i) - 17,v.size()); // oops: returns -2
My question: What's the most elegant implementation of a function that takes any integer and returns it's place as an index?

The trick for handling MOD is this, which works with positive as well as negative numbers:
val = ((val % mod_val) + mod_val) % mod_val;
For example, assume we want to keep value between 0 and 359 inclusive. We could use this:
val = ((val % 360) + 360) % 360;
Here's a simple example in C++.
int getmod(int val, int mod) {
return ((val % mod) + mod) % mod;
}
int main() {
printf("%d\n", getmod(50,360)); // prints 50
printf("%d\n", getmod(-400,360)); // prints 320
printf("%d\n", getmod(350,360)); // prints 350
printf("%d\n", getmod(375,360)); // prints 15
printf("%d\n", getmod(-725,360)); // prints 355
return 0;
}

Unfortunately, C++ doesn’t implement a proper modulus that still works correctly for negative integers.
I think the cleanest solution is indeed using if to take care of all cases properly. This at least makes the code obvious (because every case is explicit) and errors easier to find:
unsigned GetIndexModM(int index, unsigned m) {
if (index < 0)
return GetIndexModM(index + m, m);
if (index >= m)
return index % m;
return index;
}

The following ensures that index is in [0,n) but with only one modulus operation and no branches:
index = index % n + (index < 0)*n
where the first term (containing the modulus operator) gets the value into (-n,n) and the second term ensures that the value is in [0,n).
Note that this is unreliable when n is an unsigned type and in older (pre-11) versions of C++ where the % operator is implementation dependent for negative arguments.

Related

Modify hash value on replacing a single character in string (c++)

I am using a polynomial hash function to calculate the hash value of a string (consisting of only lowercase english letters) as follows:
int SZ = 105, P = 31;
long long M = 1e12 + 9;
vector <long long> pw;
pw.resize(SZ, 1);
for(int i = 1; i < SZ; i++) {
pw[i] = (pw[i - 1] * P) % M;
}
long long calculateHash(string &s) {
long long h = 0;
for(int i = 0; i < s.length(); i++) {
h = (h + (s[i] - 'a' + 1) * pw[i]) % M;
}
return h;
}
I don't want to re-calculate the hash of the entire string in O(N) time when I have to replace just one character at any given position. So inorder to do this in O(1) time, I do the following operation:
long long h1 = calculateHash(s1);
long long h2 = calculateHash(s2);
// Only one character differs in `s1` and `s2` at index `idx`
// Modifying hash for h1 to incorporate s2[idx] and removing s1[idx]
h1 = (h1 + ((s2[idx] - s1[idx]) * pw[idx])) % M;
Now when I check h1 == h2, it should be equal ideally, right? It does work for smaller strings but fails at times, I get negative values for h1, not sure if this is an overflow issue or ((s2[idx] - s1[idx]) * pw[idx]) is more negative causing h1 to fall below zero.
Could anyone, suggest a way to re-calculate the hash in O(1) time when only one character is changed? Thank you in advance!
In principle your idea of changing the resulting value ist correct, but what you need is a modulo operator, which result is always positiv, also for negativ input numbers.
To emulate this behaviour with C++ modulo you could do the following:
long long tmp=(h1 + ((s2[idx] - s1[idx]) * pw[idx])) % M;
h1=(tmp+M)%M;
The first line is the same operation you have done, an the second line make the result positiv, because tmp could not be less than -M after the C++ modulo operation. The additional modulo is needed to assure that the number keeps smaller that M, even if tmp was already positiv.

How to Write Recursive Majority Element Algorithm [duplicate]

An array is said to have a majority element if more than half of its elements are the same. Is there a divide-and-conquer algorithm for determining if an array has a majority element?
I normally do the following, but it is not using divide-and-conquer. I do not want to use the Boyer-Moore algorithm.
int find(int[] arr, int size) {
int count = 0, i, mElement;
for (i = 0; i < size; i++) {
if (count == 0) mElement = arr[i];
if (arr[i] == mElement) count++;
else count--;
}
count = 0;
for (i = 0; i < size; i++) {
if (arr[i] == mElement) count++;
}
if (count > size / 2) return mElement;
return -1;
}
I can see at least one divide and conquer method.
Start by finding the median, such as with Hoare's Select algorithm. If one value forms a majority of the elements, the median must have that value, so we've just found the value we're looking for.
From there, find (for example) the 25th and 75th percentile items. Again, if there's a majority element, at least one of those would need to have the same value as the median.
Assuming you haven't ruled out there being a majority element yet, you can continue the search. For example, let's assume the 75th percentile was equal to the median, but the 25th percentile wasn't.
When then continue searching for the item halfway between the 25th percentile and the median, as well as the one halfway between the 75th percentile and the end.
Continue finding the median of each partition that must contain the end of the elements with the same value as the median until you've either confirmed or denied the existence of a majority element.
As an aside: I don't quite see how Boyer-Moore would be used for this task. Boyer-Moore is a way of finding a substring in a string.
There is, and it does not require the elements to have an order.
To be formal, we're dealing with multisets (also called bags.) In the following, for a multiset S, let:
v(e,S) be the multiplicity of an element e in S, i.e. the number of times it occurs (the multiplicity is zero if e is not a member of S at all.)
#S be the cardinality of S, i.e. the number of elements in S counting multiplicity.
⊕ be the multiset sum: if S = L ⊕ R then S contains all the elements of L and R counting multiplicity, i.e. v(e;S) = v(e;L) + v(e;R) for any element e. (This also shows that the multiplicity can be calculated by 'divide-and-conquer'.)
[x] be the largest integer less than or equal to x.
The majority element m of S, if it exists, is that element such that 2 v(m;S) > #S.
Let's call L and R a splitting of S if L ⊕ R = S and an even splitting if |#L - #R| ≤ 1. That is, if n=#S is even, L and R have exactly half the elements of S, and if n is odd, than one has cardinality [n/2] and the other has cardinality [n/2]+1.
For an arbitrary split of S into L and R, two observations:
If neither L nor R has a majority element, then S cannot: for any element e, 2 v(e;S) = 2 v(e;L) + 2 v(e;R) ≤ #L + #R = #S.
If one of L and R has a majority element m with multiplicity k, then it is the majority element of S only if it has multiplicity r in the other half, with 2(k+r) > #S.
The algorithm majority(S) below returns either a pair (m,k), indicating that m is the majority element with k occurrences, or none:
If S is empty, return none; if S has just one element m, then return (m,1). Otherwise:
Make an even split of S into two halves L and R.
Let (m,k) = majority(L), if not none:
a. Let k' = k + v(m;R).
b. Return (m,k') if 2 k' > n.
Otherwise let (m,k) = majority(R), if not none:
a. Let k' = k + v(m;L).
b. Return (m,k') if 2 k' > n.
Otherwise return none.
Note that the algorithm is still correct even if the split is not an even one. Splitting evenly though is likely to perform better in practice.
Addendum
Made the terminal case explicit in the algorithm description above. Some sample C++ code:
struct majority_t {
int m; // majority element
size_t k; // multiplicity of m; zero => no majority element
constexpr majority_t(): m(0), k(0) {}
constexpr majority_t(int m_,size_t k_): m(m_), k(k_) {}
explicit operator bool() const { return k>0; }
};
static constexpr majority_t no_majority;
size_t multiplicity(int x,const int *arr,size_t n) {
if (n==0) return 0;
else if (n==1) return arr[0]==x?1:0;
size_t r=n/2;
return multiplicity(x,arr,r)+multiplicity(x,arr+r,n-r);
}
majority_t majority(const int *arr,size_t n) {
if (n==0) return no_majority;
else if (n==1) return majority_t(arr[0],1);
size_t r=n/2;
majority_t left=majority(arr,r);
if (left) {
left.k+=multiplicity(left.m,arr+r,n-r);
if (left.k>r) return left;
}
majority_t right=majority(arr+r,n-r);
if (right) {
right.k+=multiplicity(right.m,arr,r);
if (right.k>r) return right;
}
return no_majority;
}
A simpler divide and conquer algorithm works for the case that there exists more than 1/2 elements which are the same and there are n = 2^k elements for some integer k.
FindMost(A, startIndex, endIndex)
{ // input array A
if (startIndex == endIndex) // base case
return A[startIndex];
x = FindMost(A, startIndex, (startIndex + endIndex - 1)/2);
y = FindMost(A, (startIndex + endIndex - 1)/2 + 1, endIndex);
if (x == null && y == null)
return null;
else if (x == null && y != null)
return y;
else if (x != null && y == null)
return x;
else if (x != y)
return null;
else return x
}
This algorithm could be modified so that it works for n which is not exponent of 2, but boundary cases must be handled carefully.
Lets say the array is 1, 2, 1, 1, 3, 1, 4, 1, 6, 1.
If an array contains more than half of elements same then there should be a position where the two consecutive elements are same.
In the above example observe 1 is repeated more than half times. And the indexes(index start from 0) index 2 and index 3 have same element.

Divide-and-conquer algorithm for finding the majority element?

An array is said to have a majority element if more than half of its elements are the same. Is there a divide-and-conquer algorithm for determining if an array has a majority element?
I normally do the following, but it is not using divide-and-conquer. I do not want to use the Boyer-Moore algorithm.
int find(int[] arr, int size) {
int count = 0, i, mElement;
for (i = 0; i < size; i++) {
if (count == 0) mElement = arr[i];
if (arr[i] == mElement) count++;
else count--;
}
count = 0;
for (i = 0; i < size; i++) {
if (arr[i] == mElement) count++;
}
if (count > size / 2) return mElement;
return -1;
}
I can see at least one divide and conquer method.
Start by finding the median, such as with Hoare's Select algorithm. If one value forms a majority of the elements, the median must have that value, so we've just found the value we're looking for.
From there, find (for example) the 25th and 75th percentile items. Again, if there's a majority element, at least one of those would need to have the same value as the median.
Assuming you haven't ruled out there being a majority element yet, you can continue the search. For example, let's assume the 75th percentile was equal to the median, but the 25th percentile wasn't.
When then continue searching for the item halfway between the 25th percentile and the median, as well as the one halfway between the 75th percentile and the end.
Continue finding the median of each partition that must contain the end of the elements with the same value as the median until you've either confirmed or denied the existence of a majority element.
As an aside: I don't quite see how Boyer-Moore would be used for this task. Boyer-Moore is a way of finding a substring in a string.
There is, and it does not require the elements to have an order.
To be formal, we're dealing with multisets (also called bags.) In the following, for a multiset S, let:
v(e,S) be the multiplicity of an element e in S, i.e. the number of times it occurs (the multiplicity is zero if e is not a member of S at all.)
#S be the cardinality of S, i.e. the number of elements in S counting multiplicity.
⊕ be the multiset sum: if S = L ⊕ R then S contains all the elements of L and R counting multiplicity, i.e. v(e;S) = v(e;L) + v(e;R) for any element e. (This also shows that the multiplicity can be calculated by 'divide-and-conquer'.)
[x] be the largest integer less than or equal to x.
The majority element m of S, if it exists, is that element such that 2 v(m;S) > #S.
Let's call L and R a splitting of S if L ⊕ R = S and an even splitting if |#L - #R| ≤ 1. That is, if n=#S is even, L and R have exactly half the elements of S, and if n is odd, than one has cardinality [n/2] and the other has cardinality [n/2]+1.
For an arbitrary split of S into L and R, two observations:
If neither L nor R has a majority element, then S cannot: for any element e, 2 v(e;S) = 2 v(e;L) + 2 v(e;R) ≤ #L + #R = #S.
If one of L and R has a majority element m with multiplicity k, then it is the majority element of S only if it has multiplicity r in the other half, with 2(k+r) > #S.
The algorithm majority(S) below returns either a pair (m,k), indicating that m is the majority element with k occurrences, or none:
If S is empty, return none; if S has just one element m, then return (m,1). Otherwise:
Make an even split of S into two halves L and R.
Let (m,k) = majority(L), if not none:
a. Let k' = k + v(m;R).
b. Return (m,k') if 2 k' > n.
Otherwise let (m,k) = majority(R), if not none:
a. Let k' = k + v(m;L).
b. Return (m,k') if 2 k' > n.
Otherwise return none.
Note that the algorithm is still correct even if the split is not an even one. Splitting evenly though is likely to perform better in practice.
Addendum
Made the terminal case explicit in the algorithm description above. Some sample C++ code:
struct majority_t {
int m; // majority element
size_t k; // multiplicity of m; zero => no majority element
constexpr majority_t(): m(0), k(0) {}
constexpr majority_t(int m_,size_t k_): m(m_), k(k_) {}
explicit operator bool() const { return k>0; }
};
static constexpr majority_t no_majority;
size_t multiplicity(int x,const int *arr,size_t n) {
if (n==0) return 0;
else if (n==1) return arr[0]==x?1:0;
size_t r=n/2;
return multiplicity(x,arr,r)+multiplicity(x,arr+r,n-r);
}
majority_t majority(const int *arr,size_t n) {
if (n==0) return no_majority;
else if (n==1) return majority_t(arr[0],1);
size_t r=n/2;
majority_t left=majority(arr,r);
if (left) {
left.k+=multiplicity(left.m,arr+r,n-r);
if (left.k>r) return left;
}
majority_t right=majority(arr+r,n-r);
if (right) {
right.k+=multiplicity(right.m,arr,r);
if (right.k>r) return right;
}
return no_majority;
}
A simpler divide and conquer algorithm works for the case that there exists more than 1/2 elements which are the same and there are n = 2^k elements for some integer k.
FindMost(A, startIndex, endIndex)
{ // input array A
if (startIndex == endIndex) // base case
return A[startIndex];
x = FindMost(A, startIndex, (startIndex + endIndex - 1)/2);
y = FindMost(A, (startIndex + endIndex - 1)/2 + 1, endIndex);
if (x == null && y == null)
return null;
else if (x == null && y != null)
return y;
else if (x != null && y == null)
return x;
else if (x != y)
return null;
else return x
}
This algorithm could be modified so that it works for n which is not exponent of 2, but boundary cases must be handled carefully.
Lets say the array is 1, 2, 1, 1, 3, 1, 4, 1, 6, 1.
If an array contains more than half of elements same then there should be a position where the two consecutive elements are same.
In the above example observe 1 is repeated more than half times. And the indexes(index start from 0) index 2 and index 3 have same element.

Printing prime numbers from x to y algorithm

I have this snippet of code that generates the primes on "max" in a sufficient time with Sieve of Eratosthenes.
I want to give the function the posibility to use a starting value to calculate a range of primes. So I wonder at what point in the algoritm I have to hand over the starting value..
e.g.
get_primes(unsigned long from, unsigned long to);
get_primes(200, 5000);
-> Saves the prime numbers from 200 to 5000 in the vector.
Unfortunately I don't understand the algorithm completely. [Especially lines 3 to 5, 7 & 10 are unclear]
I tryed to follow the steps by using a debugger but that also did not make me smarter.
It would be great if anyone can explain me this code better and tell me how to set a start value.
Thank you.
vector<unsigned long long> get_primes(unsigned long max) {
vector<unsigned long long> primes;
char *sieve;
sieve = new char[max / 8 + 1];
memset(sieve, 0xFF, (max / 8 + 1) * sizeof(char));
for (unsigned long long x = 2; x <= max; x++)
if (sieve[x / 8] & (0x01 << (x % 8))) {
primes.push_back(x);
for (unsigned long long j = 2 * x; j <= max; j += x)
sieve[j / 8] &= ~(0x01 << (j % 8));
}
delete[] sieve;
return primes;
}
You must start at 2 since the sieve first removes all multiples of 2 to find the next prime as 3. It then removes all multiples of 3 to find the next prime as 5 and so on.
If you want to generate unsigned long long primes using a version of get_primes() then you are in for a very long wait.
For generating primes in the range lo ... hi (inclusive) you need to consider only factors up to sqrt(hi). Hence you need a small sieve (up to 32 bits) for factors and another small sieve of size (hi - lo + 1) for sieving the target range.
Here's an abbreviated version of a sieve that runs up to 2^64 - 1; it uses a full sieve instead of sieving only odd numbers, because it is the reference code I use to verify optimised implementations. The changes for that are straightforward but add even more pitfalls to the whole shebang. As it is it sieves the 10 million primes between 999560010209 and 999836351599 in about 3 seconds, and those between 18446744073265777349u and 18446744073709551557u (i.e. just below 2^64) in about 20 seconds
The factor sieve is global because it gets reused a lot, and sieving the factors can take a while too. I.e. prepping the factors for a range close to 2^64 means sieving all (or most) of the factors up to 2^32 - 1, and thus it can easily take up to 10 seconds.
I wrapped my bitmap code (moral equivalent to std::bitset<>) and the factor sieve into classes; using raw vectors would make the code inflexible and unreadable. I shortened my code, remove a lot of asserts and other noise, and substituted calls to external functions with inlined code (like the call to std::sqrt()), for the sake of exposition. That way you can cull answers like what to do with the offset (here called lo) directly from verified working code.
The point of having separate number_t and index_t is that number_t can be unsigned long long but index_t must be uint32_t for my current infrastructure. The member functions of bitmap_t use the name of the underlying CPU instructions. BTS ... bit test and set, BT ... bit test. Bitmaps are initialised to 0 and a set bit signifies non-prime.
typedef uint32_t index_t;
sieve_t g_factor_sieve;
template<typename number_t, typename OutputIterator>
index_t generate_primes (number_t lo, number_t hi, OutputIterator sink)
{
// ...
index_t max_factor = index_t(std::sqrt(double(hi)));
g_factor_sieve.extend_to_cover(max_factor);
number_t range = hi - lo; assert( range <= index_t(index_t(0) - 1) );
index_t range32 = index_t(range);
bitmap_t bm(range32);
if (lo < 2) bm.bts(1 - index_t(lo)); // 1 is not a prime
if (lo == 0) bm.bts(0); // 0 is not a prime
for (index_t n = 2; n <= max_factor && n > 1; n += 1 + (n & 1))
{
if (g_factor_sieve.not_prime(n)) continue;
number_t start = square(number_t(n));
index_t stride = n << (int(n) > 2 ? 1 : 0); // double stride for n > 2
if (start >= lo)
start -= lo;
else
start = (stride - (lo - start) % stride) % stride;
// double test because of the possibility of wrapping
for (index_t i = index_t(start); i <= bm.max_bit; )
{
bm.bts(i);
if ((i += stride) < stride)
{
break;
}
}
}
// output
for (index_t i = 0; ; ++i)
{
if (!bm.bt(i))
{
*sink = lo + i;
++sink;
++n;
}
if (i >= bm.max_bit) break;
}
return n;
}
Try this one;
I used it to set starting and ending numbers
for(int x = m;x<n;x++){
if(x%2!=0 && x%3!=0 && x%5!=0 && x%7!=0 && x%11!=0)
// then x is prime
}
where m is starting value, and n is the ending value

Is there an expression using modulo to do backwards wrap-around ("reverse overflow")?

For any whole number input W restricted by the range R = [x,y], the "overflow," for lack of a better term, of W over R is W % (y-x+1) + x. This causes it wrap back around if W exceeds y.
As an example of this principle, suppose we iterate over a calendar's months:
int this_month = 5;
int next_month = (this_month + 1) % 12;
where both integers will be between 0 and 11, inclusive. Thus, the expression above "clamps" the integer to the range R = [0,11]. This approach of using an expression is simple, elegant, and advantageous as it omits branching.
Now, what if we want to do the same thing, but backwards? The following expression works:
int last_month = ((this_month - 1) % 12 + 12) % 12;
but it's abstruse. How can it be beautified?
tl;dr - Can the expression ((x-1) % k + k) % k be simplified further?
Note: C++ tag specified because other languages handle negative operands for the modulo operator differently.
Your expression should be ((x-1) + k) % k. This will properly wrap x=0 around to 11. In general, if you want to step back more than 1, you need to make sure that you add enough so that the first operand of the modulo operation is >= 0.
Here is an implementation in C++:
int wrapAround(int v, int delta, int minval, int maxval)
{
const int mod = maxval + 1 - minval;
if (delta >= 0) {return (v + delta - minval) % mod + minval;}
else {return ((v + delta) - delta * mod - minval) % mod + minval;}
}
This also allows to use months labeled from 0 to 11 or from 1 to 12, setting min_val and max_val accordingly.
Since this answer is so highly appreciated, here is an improved version without branching, which also handles the case where the initial value v is smaller than minval. I keep the other example because it is easier to understand:
int wrapAround(int v, int delta, int minval, int maxval)
{
const int mod = maxval + 1 - minval;
v += delta - minval;
v += (1 - v / mod) * mod;
return v % mod + minval;
}
The only issue remaining is if minval is larger than maxval. Feel free to add an assertion if you need it.
k % k will always be 0. I'm not 100% sure what you're trying to do but it seems you want the last month to be clamped between 0 and 11 inclusive.
(this_month + 11) % 12
Should suffice.
The general solution is to write a function that computes the value that you want:
//Returns floor(a/n) (with the division done exactly).
//Let ÷ be mathematical division, and / be C++ division.
//We know
// a÷b = a/b + f (f is the remainder, not all
// divisions have exact Integral results)
//and
// (a/b)*b + a%b == a (from the standard).
//Together, these imply (through algebraic manipulation):
// sign(f) == sign(a%b)*sign(b)
//We want the remainder (f) to always be >=0 (by definition of flooredDivision),
//so when sign(f) < 0, we subtract 1 from a/n to make f > 0.
template<typename Integral>
Integral flooredDivision(Integral a, Integral n) {
Integral q(a/n);
if ((a%n < 0 && n > 0) || (a%n > 0 && n < 0)) --q;
return q;
}
//flooredModulo: Modulo function for use in the construction
//looping topologies. The result will always be between 0 and the
//denominator, and will loop in a natural fashion (rather than swapping
//the looping direction over the zero point (as in C++11),
//or being unspecified (as in earlier C++)).
//Returns x such that:
//
//Real a = Real(numerator)
//Real n = Real(denominator)
//Real r = a - n*floor(n/d)
//x = Integral(r)
template<typename Integral>
Integral flooredModulo(Integral a, Integral n) {
return a - n * flooredDivision(a, n);
}
Easy Peasy, do not use the first module operator, it is superfluous:
int last_month = (this_month - 1 + 12) % 12;
which is the general case
In this instance you can write 11, but I would still do the -1 + 11 as it more clearly states what you want to achieve.
Note that normal mod causes the pattern 0...11 to repeat at 12...23, 24...35, etc. but doesn't wrap on -11...-1. In other words, it has two sets of behaviors. One from -infinity...-1, and a different set of behavior from 0...infinity.
The expression ((x-1) % k + k) % k fixes -11...-1 but has the same problem as normal mod with -23...-12. I.e. while it fixes 12 additional numbers, it doesn't wrap around infinitely. It still has one set of behavior from -infinity...-12, and a different behavior from -11...+infinity.
This means that if you're using the function for offsets, it could lead to buggy code.
If you want a truly wrap around mod, it should handle the entire range, -infinity...infinity in exactly the same way.
There is probably a better way to implement this, but here is an easy to understand implementation:
// n must be greater than 0
func wrapAroundMod(a: Int, n: Int) -> Int {
var offsetTimes: Int = 0
if a < 0 {
offsetTimes = (-a / n) + 1
}
return (a + n * offsetTimes) % n
}
Not sure if you were having the same problem as me, but my problem was essentially that I wanted to constrain all numbers to a certain range. Say that range was 0-6, so using %7 means that any number higher than 6 will wrap back around to 0 or above. The actual problem is that numbers less than zero didn't wrap back around to 6. I have a solution to that (where X is the upper limit of your number range and 0 is the minimum):
if(inputNumber <0)//If this is a negative number
{
(X-(inputNumber*-1))%X;
}
else
{
inputNumber%X;
}