C++ - Checking whether two substrings are equal using hashing

C++ - Checking whether two substrings are equal using hashing - c++

I'm taking a coursera course and one of the assignment is to process large number of queries(upto the order of 10^6) to check whether two substrings are equal or not from a single string(maximum length of string is 500000) using polynomial hashing. A single query includes index a and b and length l.
I've implemented a code that pre-computes hashes for the whole string twice(for two seperate modulo-n numbers) and computes two hashes for each substring. This is to reduce the number of collisions.
The code:
class Solver {
string s;
unsigned long long x, m1, m2;
vector<long long> h1, h2;
public:
Solver(string s) : s(s) {
// initialization, precalculation inside constructor
h1.resize(s.size() + 1); //first array to store precomputed hashes
h2.resize(s.size() + 1); //second array
x = 263; // random multiplier for polynomial hash and 0 < x < m1, m2
m1 = 1000000007;
m2 = 1000000009;
h1[0] = 0, h2[0] = 0;
for (size_t i = 1; i <= s.size(); ++i) {
h1[i] = ((x * h1[i - 1]) + s[i - 1]) % m1;
h2[i] = ((x * h2[i - 1]) + s[i - 1]) % m2;
}
}
bool ask(int a, int b, int l) {
if (a + l > s.size() || b + l > s.size())
return false;
long long y = 1, z = 1;
y = (long long)pow(x, l) % m1;
z = (long long)pow(x, l) % m2;
long long H1 = (((h1[a + l] - (y * h1[a])) % m1) + m1) % m1;
long long H2 = (((h1[b + l] - (y * h1[b])) % m1) + m1) % m1;
long long H3 = (((h2[a + l] - (z * h2[a])) % m2) + m2) % m2;
long long H4 = (((h2[b + l] - (z * h2[b])) % m2) + m2) % m2;
return ((H1 == H2) && (H3 == H4));
}
};
int main() {
ios_base::sync_with_stdio(0), cin.tie(0);
string s;
int q;
cin >> s >> q;
Solver solver(s);
for (int i = 0; i < q; i++) {
int a, b, l;
cin >> a >> b >> l;
cout << (solver.ask(a, b, l) ? "Yes\n" : "No\n");
}
}
sample output:
trololo
4 // number of queries
0 0 7 // a b l
2 4 3
3 5 1
1 3 2
yes
yes
yes
no
The code works for the above example. But when input for a is given as 0 and b a random value such that
on paper it's the right answer but the code outputs the wrong one and also for certain trivial strings like "abcabc", the code outputs the wrong answer.
example:
abcabcabc
4
0 3 3
1 7 2
2 8 1
1 4 4
no
no
no
yes
Where am I going wrong when a = 0 and are my trivial inputs like "thethethe" valid?

If the string is 500k in length, then there are hundreds of billions of substrings - far too many to prehash in advance and fit in memory.
Instead of doing that, you should create hashes from every starting point of length 1, 2, 4, 8, 16, ... There should only be on the order of 10 million of those.
And now given two substrings we can compare at least half of each of them with a single lookup. Which makes the comparison take logarithmic time.

Related

Substrings of equal length comparison using hashing

On an assignment that I have, for a string S, I need to compare two substrings of equal lengths. Output should be "Yes" if they are equal, "No" if they are not equal. I am given the starting indexes of two substrings (a and b), and the length of the substrings L.
For example, for S = "Hello", a = 1, b = 3, L = 2, the substrings are:
substring1 = "el" and substring2 = "lo", which aren't equal, so answer will be "No".
I think hashing each substring of the main string S and writing them all to memory would be a good aproach to take. Here is the code I have written for this (I have tried to implement what I learned about this from the Coursera course that I was taking):
This function takes any string, and values for p and x for hashing thing, and performs a polynomial hash on the given string.
long long PolyHash(string str, long long p, int x){
long long res = 0;
for(int i = str.length() - 1; i > -1; i--){
res = (res * x + (str[i] - 'a' + 1)) % p;
}
return res;
}
The function below just precomputes all hashes, and fills up an array called ah, which is initialized in the main function. The array ah consists of n = string length rows, and n = string length columns (half of which gets wasted because I couldn't find how to properly make it work as a triangle, so I had to go for a full rectangular array). Assuming n = 7, then ah[0]-ah[6] are hash values for string[0]-string[6] (meaning all substrings of length 1). ah[7]-ah[12] are hash values for string[0-1]-string[5-6] (meaning all substrings of length 2), and etc. until the end.
void PreComputeAllHashes(string str, int len, long long p, int x, long long* ah){
int n = str.length();
string S = str.substr(n - len, len);
ah[len * n + n - len] = PolyHash(S, p, x);
long long y = 1;
for(int _ = 0; _ < len; _++){
y = (y * x) % p;
}
for(int i = n - len - 1; i > -1; i--){
ah[n * len + i] = (x * ah[n * len + i + 1] + (str[i] - 'a' + 1) - y * (str[i + len] - 'a' + 1)) % p;
}
}
And below is the main function. I took p equal to some large prime number, and x to be some manually picked, somewhat "random" prime number.
I take the text as input, initialize hash array, fill the hash array, and then take queries as input, to answer all queries from my array.
int main(){
long long p = 1e9 + 9;
int x = 78623;
string text;
cin >> text;
long long* allhashes = new long long[text.length() * text.length()];
for(int i = 1; i <= text.length(); i++){
PreComputeAllHashes(text, i, p, x, allhashes);
}
int queries;
cin >> queries;
int a, b, l;
for(int _ = 0; _ < queries; _++){
cin >> a >> b >> l;
if(a == b){
cout << "Yes" << endl;
}else{
cout << ((allhashes[l * text.length() + a] == allhashes[l * text.length() + b]) ? "Yes" : "No") << endl;
}
}
return 0;
}
However, one of the test cases for this assignment on Coursera is throwing an error like this:
Failed case #7/14: unknown signal 6 (Time used: 0.00/1.00, memory used: 29396992/536870912.)
Which, I have looked up online, and means the following:
Unknown signal 6 (or 7, or 8, or 11, or some other).This happens when your program crashes. It can be
because of division by zero, accessing memory outside of the array bounds, using uninitialized
variables, too deep recursion that triggers stack overflow, sorting with contradictory comparator,
removing elements from an empty data structure, trying to allocate too much memory, and many other
reasons. Look at your code and think about all those possibilities.
And I've been looking at my code the entire day, and still haven't been able to come up with a solution to this error. Any help to fix this would be appreciated.
Edit: The assignment states that the length of the input string can be up to 500000 characters long, and the number of queries can be up to 100000. This task also has 1 second time limit, which is pretty small for going over characters one by one for each string.

So, I did some research as to how I can reduce the complexity of this algorithm that I have implemented, and finally found it! Turns out there is a super-simple way (well, not if you count the theory involved behind it) to get hash value of any substring, given the prefix hashes of the initial string!
You can read more about it here, but I will try to explain it briefly.
So what do we do - We precalculate all the hash values for prefix-substrings.
Prefix substrings for a string "hello" would be the following:
h
he
hel
hell
hello
Once we have hash values of all these prefix substrings, we can collect them in a vector such that:
h[str] = str[0] + str[1] * P + str[2] * P^2 + str[3] * P^3 + ... + str[N] * P^N
where P is any prime number (I chose p = 263)
Then, we need a high value that we will take everything's modulo by, just to keep things not too large. This number I will choose m = 10^9 + 9.
First I am creating a vector to hold the precalculated powers of P:
vector<long long> p_pow (s.length());
p_pow[0] = 1;
for(size_t i=1; i<p_pow.size(); ++i){
p_pow[i] = (m + (p_pow[i-1] * p) % m) % m;
}
Then I calculate the vector of hash values for prefix substrings:
vector<long long> h (s.length());
for (size_t i=0; i<s.length(); ++i){
h[i] = (m + (s[i] - 'a' + 1) * p_pow[i] % m) % m;
if(i){
h[i] = (m + (h[i] + h[i-1]) % m) % m;
}
}
Suppose I have q queries, each of which consist of 3 integers: a, b, and L.
To check equality for substrings s1 = str[a...a+l-1] and s2 = str[b...b+l-1], I can compare the hash values of these substrings. And to get the hash value of substrings using the has values of prefix substrings that we just created, we need to use the following formula:
H[I..J] * P[I] = H[0..J] - H[0..I-1]
Again, you can read about the proof of this in the link.
So, to address each query, I would do the following:
cin >> a >> b >> len;
if(a == b){ // just avoid extra calculation, saves little time
cout << "Yes" << endl;
}else{
long long h1 = h[a+len-1] % m;
if(a){
h1 = (m + (h1 - h[a-1]) % m) % m;
}
long long h2 = h[b+len-1] % m;
if(b){
h2 = (m + (h2 - h[b-1]) % m) % m;
}
if (a < b && h1 * p_pow[b-a] % m == h2 % m || a > b && h1 % m == h2 * p_pow[a-b] % m){
cout << "Yes" << endl;
}else{
cout << "No" << endl;
}
}

Your approach is very hard and complex for such a simple task. Assuming that you only need to do this operation once. You can compare the substrings manually with a for loop. No need for hashing. Take a look at this code:
for(int i = a, j = b, counter = 0 ; counter < L ; counter++, i++, j++){
if(S[i] != S[j]){
cout << "Not the same" << endl;
return 0;
}
}
cout << "They are the same" << endl;

Simplest way to split n objects into m sets, where m doesn't divide n?

I have two integers m and n, with m < n. In general, m doesn't divide n.
Say that n = m*q + r, where q is the integer quotient of n and m, and r is the remainder (0 <= r < m). If we split n objects into m boxes as homogeneously as possible, r of the boxes will contain q+1 objects and the remaining boxes will contain q objects. Suppose that the objects are indexed from 1 to n, and that they are inserted into the boxes in order. Moreover, suppose that the first r boxes contain q+1 objects.
I want to write a function that returns a list of indices i1, i2, ..., im, such that i1 is the index of the smallest object in the first box, i2 the index of the smallest object in the second box, and so on.
I can think of a couple of ways to write this function myself, but I think they are too complicated. I believe there's a simple way to do this that I am not seeing.

Okay, I did not fully understand what you want to say by 'inserted into the boxes in order', so I'll offer you a solution for the 2 possible meanings.
a) The objects are inserted into the boxes like this:
9|
5|6|7|8
1|2|3|4
in which case the solution is fairly simple: just print all the numbers from 1 to m.
Code:
function foo(int n, int m) {
for (int k=1; k<=m; k++)
cout<<k<<endl;
}
b)The objects are inserted into the boxes like this:
3|6|
2|5|8|10
1|4|7|9
in which case, for every box, the object with the smallest index in box k is: (n / m) * (k - 1) + min(k, n % m + 1)
Code:
function foo(int n, int m) {
for (int k=1; k<=m; k++)
cout<<(n / m) * (k - 1) + min(k, n % m + 1)<<endl;
}

Add q objects to every box. If the box is one of the n - m*q == n % m first ones, add one extra object:
std::vector<int> starts_of(int n, int m)
{
std::vector<int> v;
int q = n / m;
int s = 1;
for (int i = 0; i < m; i++) {
v.push_back(s);
s += q;
if (i < n % m) s++;
}
return v;
}

r = n % m
d = n / m
//with two loops without conditionals:
for i = 0..r - 1
I[i] = 1 + i * d + i
for i = r..m - 1
I[i] = 1 + i * d + r
//or with single loop:
for i = 0..m - 1
I[i] = 1 + i * d + min(i, r)

Most efficient way to calculate lexicographic index

Can anybody find any potentially more efficient algorithms for accomplishing the following task?:
For any given permutation of the integers 0 thru 7, return the index which describes the permutation lexicographically (indexed from 0, not 1).
For example,
The array 0 1 2 3 4 5 6 7 should return an index of 0.
The array 0 1 2 3 4 5 7 6 should return an index of 1.
The array 0 1 2 3 4 6 5 7 should return an index of 2.
The array 1 0 2 3 4 5 6 7 should return an index of 5039 (that's 7!-1 or factorial(7)-1).
The array 7 6 5 4 3 2 1 0 should return an index of 40319 (that's 8!-1). This is the maximum possible return value.
My current code looks like this:
int lexic_ix(int* A){
int value = 0;
for(int i=0 ; i<7 ; i++){
int x = A[i];
for(int j=0 ; j<i ; j++)
if(A[j]<A[i]) x--;
value += x*factorial(7-i); // actual unrolled version doesn't have a function call
}
return value;
}
I'm wondering if there's any way I can reduce the number of operations by removing that inner loop, or if I can reduce conditional branching in any way (other than unrolling - my current code is actually an unrolled version of the above), or if there are any clever bitwise hacks or filthy C tricks to help.
I already tried replacing
if(A[j]<A[i]) x--;
with
x -= (A[j]<A[i]);
and I also tried
x = A[j]<A[i] ? x-1 : x;
Both replacements actually led to worse performance.
And before anyone says it - YES this is a huge performance bottleneck: currently about 61% of the program's runtime is spent in this function, and NO, I don't want to have a table of precomputed values.
Aside from those, any suggestions are welcome.

Don't know if this helps but here's an other solution :
int lexic_ix(int* A, int n){ //n = last index = number of digits - 1
int value = 0;
int x = 0;
for(int i=0 ; i<n ; i++){
int diff = (A[i] - x); //pb1
if(diff > 0)
{
for(int j=0 ; j<i ; j++)//pb2
{
if(A[j]<A[i] && A[j] > x)
{
if(A[j]==x+1)
{
x++;
}
diff--;
}
}
value += diff;
}
else
{
x++;
}
value *= n - i;
}
return value;
}
I couldn't get rid of the inner loop, so complexity is o(n log(n)) in worst case, but o(n) in best case, versus your solution which is o(n log(n)) in all cases.
Alternatively, you can replace the inner loop by the following to remove some worst cases at the expense of another verification in the inner loop :
int j=0;
while(diff>1 && j<i)
{
if(A[j]<A[i])
{
if(A[j]==x+1)
{
x++;
}
diff--;
}
j++;
}
Explanation :
(or rather "How I ended with that code", I think it is not that different from yours but it can make you have ideas, maybe)
(for less confusion I used characters instead and digit and only four characters)
abcd 0 = ((0 * 3 + 0) * 2 + 0) * 1 + 0
abdc 1 = ((0 * 3 + 0) * 2 + 1) * 1 + 0
acbd 2 = ((0 * 3 + 1) * 2 + 0) * 1 + 0
acdb 3 = ((0 * 3 + 1) * 2 + 1) * 1 + 0
adbc 4 = ((0 * 3 + 2) * 2 + 0) * 1 + 0
adcb 5 = ((0 * 3 + 2) * 2 + 1) * 1 + 0 //pb1
bacd 6 = ((1 * 3 + 0) * 2 + 0) * 1 + 0
badc 7 = ((1 * 3 + 0) * 2 + 1) * 1 + 0
bcad 8 = ((1 * 3 + 1) * 2 + 0) * 1 + 0 //First reflexion
bcda 9 = ((1 * 3 + 1) * 2 + 1) * 1 + 0
bdac 10 = ((1 * 3 + 2) * 2 + 0) * 1 + 0
bdca 11 = ((1 * 3 + 2) * 2 + 1) * 1 + 0
cabd 12 = ((2 * 3 + 0) * 2 + 0) * 1 + 0
cadb 13 = ((2 * 3 + 0) * 2 + 1) * 1 + 0
cbad 14 = ((2 * 3 + 1) * 2 + 0) * 1 + 0
cbda 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0 //pb2
cdab 16 = ((2 * 3 + 2) * 2 + 0) * 1 + 0
cdba 17 = ((2 * 3 + 2) * 2 + 1) * 1 + 0
[...]
dcba 23 = ((3 * 3 + 2) * 2 + 1) * 1 + 0
First "reflexion" :
An entropy point of view. abcd have the fewest "entropy". If a character is in a place it "shouldn't" be, it creates entropy, and the earlier the entropy is the greatest it becomes.
For bcad for example, lexicographic index is 8 = ((1 * 3 + 1) * 2 + 0) * 1 + 0 and can be calculated that way :
value = 0;
value += max(b - a, 0); // = 1; (a "should be" in the first place [to create the less possible entropy] but instead it is b)
value *= 3 - 0; //last index - current index
value += max(c - b, 0); // = 1; (b "should be" in the second place but instead it is c)
value *= 3 - 1;
value += max(a - c, 0); // = 0; (a "should have been" put earlier, so it does not create entropy to put it there)
value *= 3 - 2;
value += max(d - d, 0); // = 0;
Note that the last operation will always do nothing, that's why "i
First problem (pb1) :
For adcb, for example, the first logic doesn't work (it leads to an lexicographic index of ((0* 3+ 2) * 2+ 0) * 1 = 4) because c-d = 0 but it creates entropy to put c before b. I added x because of that, it represents the first digit/character that isn't placed yet. With x, diff cannot be negative.
For adcb, lexicographic index is 5 = ((0 * 3 + 2) * 2 + 1) * 1 + 0 and can be calculated that way :
value = 0; x=0;
diff = a - a; // = 0; (a is in the right place)
diff == 0 => x++; //x=b now and we don't modify value
value *= 3 - 0; //last index - current index
diff = d - b; // = 2; (b "should be" there (it's x) but instead it is d)
diff > 0 => value += diff; //we add diff to value and we don't modify x
diff = c - b; // = 1; (b "should be" there but instead it is c) This is where it differs from the first reflexion
diff > 0 => value += diff;
value *= 3 - 2;
Second problem (pb2) :
For cbda, for example, lexicographic index is 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0, but the first reflexion gives : ((2 * 3 + 0) * 2 + 1) * 1 + 0 = 13 and the solution to pb1 gives ((2 * 3 + 1) * 2 + 3) * 1 + 0 = 17. The solution to pb1 doesn't work because the two last characters to place are d and a, so d - a "means" 1 instead of 3. I had to count the characters placed before that comes before the character in place, but after x, so I had to add an inner loop.
Putting it all together :
I then realised that pb1 was just a particular case of pb2, and that if you remove x, and you simply take diff = A[i], we end up with the unnested version of your solution (with factorial calculated little by little, and my diff corresponding to your x).
So, basically, my "contribution" (I think) is to add a variable, x, which can avoid doing the inner loop when diff equals 0 or 1, at the expense of checking if you have to increment x and doing it if so.
I also checked if you have to increment x in the inner loop (if(A[j]==x+1)) because if you take for example badce, x will be b at the end because a comes after b, and you will enter the inner loop one more time, encountering c. If you check x in the inner loop, when you encounter d you have no choice but doing the inner loop, but x will update to c, and when you encounter c you will not enter the inner loop. You can remove this check without breaking the program
With the alternative version and the check in the inner loop it makes 4 different versions. The alternative one with the check is the one in which you enter the less the inner loop, so in terms of "theoretical complexity" it is the best, but in terms of performance/number of operations, I don't know.
Hope all of this helps (since the question is rather old, and I didn't read all the answers in details). If not, I still had fun doing it. Sorry for the long post. Also I'm new on Stack Overflow (as a member), and not a native speaker, so please be nice, and don't hesitate to let me know if I did something wrong.

Linear traversal of memory already in cache really doesn't take much times at all. Don't worry about it. You won't be traversing enough distance before factorial() overflows.
Move the 8 out as a parameter.
int factorial ( int input )
{
return input ? input * factorial (input - 1) : 1;
}
int lexic_ix ( int* arr, int N )
{
int output = 0;
int fact = factorial (N);
for ( int i = 0; i < N - 1; i++ )
{
int order = arr [ i ];
for ( int j = 0; j < i; j++ )
order -= arr [ j ] < arr [ i ];
output += order * (fact /= N - i);
}
return output;
}
int main()
{
int arr [ ] = { 11, 10, 9, 8, 7 , 6 , 5 , 4 , 3 , 2 , 1 , 0 };
const int length = 12;
for ( int i = 0; i < length; ++i )
std::cout << lexic_ix ( arr + i, length - i ) << std::endl;
}

Say, for a M-digit sequence permutation, from your code, you can get the lexicographic SN formula which is something like: Am-1*(m-1)! + Am-2*(m-2)! + ... + A0*(0)! , where Aj range from 0 to j. You can calculate SN from A0*(0)!, then A1*(1)!, ..., then Am-1 * (m-1)!, and add these together(suppose your integer type does not overflow), so you do not need calculate factorials recursively and repeatedly. The SN number is a range from 0 to M!-1 (because Sum(n*n!, n in 0,1, ...n) = (n+1)!-1)
If you are not calculating factorials recursively, I cannot think of anything that could make any big improvement.
Sorry for posting the code a little bit late, I just did some research, and find this:
http://swortham.blogspot.com.au/2011/10/how-much-faster-is-multiplication-than.html
according to this author, integer multiplication can be 40 times faster than integer division. floating numbers are not so dramatic though, but here is pure integer.
int lexic_ix ( int arr[], int N )
{
// if this function will be called repeatedly, consider pass in this pointer as parameter
std::unique_ptr<int[]> coeff_arr = std::make_unique<int[]>(N);
for ( int i = 0; i < N - 1; i++ )
{
int order = arr [ i ];
for ( int j = 0; j < i; j++ )
order -= arr [ j ] < arr [ i ];
coeff_arr[i] = order; // save this into coeff_arr for later multiplication
}
//
// There are 2 points about the following code:
// 1). most modern processors have built-in multiplier, \
// and multiplication is much faster than division
// 2). In your code, you are only the maximum permutation serial number,
// if you put in a random sequence, say, when length is 10, you put in
// a random sequence, say, {3, 7, 2, 9, 0, 1, 5, 8, 4, 6}; if you look into
// the coeff_arr[] in debugger, you can see that coeff_arr[] is:
// {3, 6, 2, 6, 0, 0, 1, 2, 0, 0}, the last number will always be zero anyway.
// so, you will have good chance to reduce many multiplications.
// I did not do any performance profiling, you could have a go, and it will be
// much appreciated if you could give some feedback about the result.
//
long fac = 1;
long sn = 0;
for (int i = 1; i < N; ++i) // start from 1, because coeff_arr[N-1] is always 0
{
fac *= i;
if (coeff_arr[N - 1 - i])
sn += coeff_arr[N - 1 - i] * fac;
}
return sn;
}
int main()
{
int arr [ ] = { 3, 7, 2, 9, 0, 1, 5, 8, 4, 6 }; // try this and check coeff_arr
const int length = 10;
std::cout << lexic_ix(arr, length ) << std::endl;
return 0;
}

This is the whole profiling code, I only run the test in Linux, code was compiled using G++8.4, with '-std=c++11 -O3' compiler options. To be fair, I slightly rewrote your code, pre-calculate the N! and pass it into the function, but it seems this does not help much.
The performance profiling for N = 9 (362,880 permutations) is:
Time durations are: 34, 30, 25 milliseconds
Time durations are: 34, 30, 25 milliseconds
Time durations are: 33, 30, 25 milliseconds
The performance profiling for N=10 (3,628,800 permutations) is:
Time durations are: 345, 335, 275 milliseconds
Time durations are: 348, 334, 275 milliseconds
Time durations are: 345, 335, 275 milliseconds
The first number is your original function, the second is the function re-written that gets N! passed in, the last number is my result. The permutation generation function is very primitive and runs slowly, but as long as it generates all permutations as testing dataset, that is alright. By the way, these tests are run on a Quad-Core 3.1Ghz, 4GBytes desktop running Ubuntu 14.04.
EDIT: I forgot a factor that the first function may need to expand the lexi_numbers vector, so I put an empty call before timing. After this, the times are 333, 334, 275.
EDIT: Another factor that could influence the performance, I am using long integer in my code, if I change those 2 'long' to 2 'int', the running time will become: 334, 333, 264.
#include <iostream>
#include <vector>
#include <chrono>
using namespace std::chrono;
int factorial(int input)
{
return input ? input * factorial(input - 1) : 1;
}
int lexic_ix(int* arr, int N)
{
int output = 0;
int fact = factorial(N);
for (int i = 0; i < N - 1; i++)
{
int order = arr[i];
for (int j = 0; j < i; j++)
order -= arr[j] < arr[i];
output += order * (fact /= N - i);
}
return output;
}
int lexic_ix1(int* arr, int N, int N_fac)
{
int output = 0;
int fact = N_fac;
for (int i = 0; i < N - 1; i++)
{
int order = arr[i];
for (int j = 0; j < i; j++)
order -= arr[j] < arr[i];
output += order * (fact /= N - i);
}
return output;
}
int lexic_ix2( int arr[], int N , int coeff_arr[])
{
for ( int i = 0; i < N - 1; i++ )
{
int order = arr [ i ];
for ( int j = 0; j < i; j++ )
order -= arr [ j ] < arr [ i ];
coeff_arr[i] = order;
}
long fac = 1;
long sn = 0;
for (int i = 1; i < N; ++i)
{
fac *= i;
if (coeff_arr[N - 1 - i])
sn += coeff_arr[N - 1 - i] * fac;
}
return sn;
}
std::vector<std::vector<int>> gen_permutation(const std::vector<int>& permu_base)
{
if (permu_base.size() == 1)
return std::vector<std::vector<int>>(1, std::vector<int>(1, permu_base[0]));
std::vector<std::vector<int>> results;
for (int i = 0; i < permu_base.size(); ++i)
{
int cur_int = permu_base[i];
std::vector<int> cur_subseq = permu_base;
cur_subseq.erase(cur_subseq.begin() + i);
std::vector<std::vector<int>> temp = gen_permutation(cur_subseq);
for (auto x : temp)
{
x.insert(x.begin(), cur_int);
results.push_back(x);
}
}
return results;
}
int main()
{
#define N 10
std::vector<int> arr;
int buff_arr[N];
const int length = N;
int N_fac = factorial(N);
for(int i=0; i<N; ++i)
arr.push_back(N-i-1); // for N=10, arr is {9, 8, 7, 6, 5, 4, 3, 2, 1, 0}
std::vector<std::vector<int>> all_permus = gen_permutation(arr);
std::vector<int> lexi_numbers;
// This call is not timed, only to expand the lexi_numbers vector
for (auto x : all_permus)
lexi_numbers.push_back(lexic_ix2(&x[0], length, buff_arr));
lexi_numbers.clear();
auto t0 = high_resolution_clock::now();
for (auto x : all_permus)
lexi_numbers.push_back(lexic_ix(&x[0], length));
auto t1 = high_resolution_clock::now();
lexi_numbers.clear();
auto t2 = high_resolution_clock::now();
for (auto x : all_permus)
lexi_numbers.push_back(lexic_ix1(&x[0], length, N_fac));
auto t3 = high_resolution_clock::now();
lexi_numbers.clear();
auto t4 = high_resolution_clock::now();
for (auto x : all_permus)
lexi_numbers.push_back(lexic_ix2(&x[0], length, buff_arr));
auto t5 = high_resolution_clock::now();
std::cout << std::endl << "Time durations are: " << duration_cast<milliseconds> \
(t1 -t0).count() << ", " << duration_cast<milliseconds>(t3 - t2).count() << ", " \
<< duration_cast<milliseconds>(t5 - t4).count() <<" milliseconds" << std::endl;
return 0;
}

Finding MAX of numbers without conditional IF statements c++ [duplicate]

This question already has answers here:
Mathematically Find Max Value without Conditional Comparison
(18 answers)
Closed 9 years ago.
So i have too get two numbers from user input, and find the max of the two numbers without using if statements.
The class is a beginner class, and we have too use what we already know. I kinda worked something out, but it only works if the numbers are inputted with the max number first.
#include <iostream>
using namespace std;
int main()
{
int x = 0, y = 0, max = 0;
int smallest, largest;
cout << "Please enter 2 integer numbers, and i will show you which one is larger: ";
cin >> x >> y;
smallest = (x < y == 1) + (x - 1);
smallest = (y < x == 1) + (y - 1);
largest = (x < y == 1) + (y - 1);
largest = (y > x == 1) + (x + 1 - 1);
cout << "Smallest: " << smallest << endl;
cout << "Largest: " << largest << endl;
return 0;
}
Thats what i have so far, but after putting different test data in, i found out it only works for numbers such as 4,5 or 6,7. But numbers with more then 2 spaces between eachother they dont such as, 4,8 or 5, 7. Any help would be appreciated.

I saw this question in Cracking the Coding interview book.
Let’s try to solve this by “re-wording” the problem We will re-word the problem until we get something that has removed all if statements
Rewording 1: If a > b, return a; else, return b
Rewording 2: If (a - b) is negative, return b; else, return a
Rewording 3: If (a - b) is negative, let k = 1; else, let k = 0 Return a - k * (a - b)
Rewording 4: Let c = a - b Let k = the most significant bit of c Return a - k * c
int getMax(int a, int b) {
int c = a - b;
int k = (c >> ((sizeof(int) * CHAR_BIT) - 1)) & 0x1;
int max = a - k * c;
return max;
}
Source: http://www.amazon.com/Cracking-Coding-Interview-Programming-Questions/dp/098478280X
Edit: This code works even when a-b overflows.
Let k equal the sign of a-b such that if a-b >=0, then k is 1, else k=0.Let q be the inverse of k. Above code overflows when a is positive or b is negative, or the other way around. If a and b have different signs, then we want the k to equal sign(a).
/* Flips 1 to 0 and vice-versa */
public static int flip(int bit){
return 1^bit;
}
/* returns 1 if a is positive, and 0 if a is negative */
public static int sign(int a){
return flip((a >> ((sizeof(int) * CHAR_BIT) - 1)) & 0x1);
}
public static int getMax(int a, int b){
int c = a - b;
int sa = sign(a-b); // if a>=0, then 1 else 0
int sb = sign(a-b); // if b>=1, then 1 else 0
int sc = sign(c); // depends on whether or not a-b overflows
/* If a and b have different signs, then k = sign(a) */
int use_sign_of_a = sa ^ sb;
/* If a and b have the same sign, then k = sign(a - b) */
int use_sign_of_c = flip(sa ^ sb);
int k = use_sign_of_a * sa + use_sign_of_c * sc;
int q = flip(k); //opposite of k
return a * k + b * q;
}

Here is a funny solution:
int max_num = (x>y)*x + (y>=x)*y;

Assuming that you have covered bitwise operators already you can do this:
max = a-((a-b)&((a-b)>>(sizeof(int)*8-1)));
This is based off of the solution from Mathematically Find Max Value without Conditional Comparison that #user93353 pointed out in the comments above.
This may be overkill if you really are just trying to avoid if statements, not comparisons in general.

You can try this code to find max and min for two input variables.
((a > b) && (max = a)) || (max=b);
((a < b) && (min = a)) || (min=b);
For three input variables you can use similar method like this:
int main()
{
int a = 10, b = 9 , c = 8;
cin >> a >> b >> c;
int max = a, min = a;
// For Max
((a > b) && (a > c) && (max=a)) ||
((b > c) && (b > a) && (max=b)) ||
(max=c) ;
// For min
((a < b) && (a < c) && (min=a)) ||
((b < c) && (b < a) && (min=b)) ||
(min=c) ;
cout << "max = " << max;
cout << "and min = " << min;
return 1;
}
One run is:
:~$ ./a.out
1
2
3
max = 3 and min = 1
Edit
Thanks to #Tony D: This code will fail for negative numbers.
One may try this for negative numbers for two inputs to find max(not sure for this):
((a > b) && ( a > 0 && (max = a))) || ((b > a) && (max = b)) || (max = a);

Find the nearest number of specific number which has specific digit (7)

Well, I have to write a program to find the NEAREST number of given number N which has exactly "K" 7s.
For example, if input is:
N K
1773 3
Output:
1777
Oh, one more thing is that N can be 100 000 000 000 000 maximum, will long long be enough to handle this?
My code so far which is not working :(
#include <iostream>
using namespace std;
int main()
{
unsigned long long a, i;
int b, num=0, dig, tmp;
cin>>a>>b;
i=a+1;
do
{
num=0;
tmp=i;
while (tmp>0)
{
dig=tmp%10;
tmp=tmp/10;
if (dig==7)
num++;
}
i++;
}
while(num<b);
cout<<i-1;
return 0;
}

Your problem is not a programming problem but a math problem.
Let m = 1+E(log10(N)), ie the number of digits in the decimal writing of N (it will be probably faster to compute it by counting digits than using a logarithm).
Let mK be the number of 7 in N.
Let N' be the output number.
I see 4 cases:
K >= m : then N' = 7..7 (K digits).
K == mK : then N' = N.
K > mK and K < m : then you replace all non-7 digits with 7, starting from the least significant digits. Ex: N = 1 357 975 , K = 4 => N' = 1 357 777. Warning : there is a special case, if you have a 8, ex: N = 80, N' = 79. You can do this case by using a common prefix, and then generating an all 7 suffix (special case: remove one more from the prefix and add 7 9 7 7 ... 7). See special case in the code.
K < mK : there are two possible numbers.
Lets decompose N: N = a1 a2 ... ap 7 b1 b2 ... bq, where
a1 ... ap are p numbers in [0..9] and
b1 ... bq are q numbers in [0..9] \ {7}
Let A = a1 ... ap 6 9 ... 9 and B = a1 ... ap 8 0 ... 0 (q digits after the 6or the 8). Then, N' = closestToN(A,B). If both numbers are equally close, the choice is up to you.
Sorry for the bad math formatting.
The code can now be more easy to write. Here is my implementation:
#include <iostream>
unsigned long long getClosestWith7(unsigned long long n, unsigned int k)
{
// Count number of digits
unsigned long long tmp = n;
unsigned int m = 0, mK = 0;
while(tmp > 0)
{
if(tmp % 10 == 7) mK++;
tmp /= 10;
m++;
}
// Distinct cases
if(k == mK && n != 0)
return n;
else if(k >= m || n == 0) // implicit: k != mK
{
unsigned long long r = 0;
while(k > 0)
{
r = 10 * r + 7;
k--;
}
return r;
}
else if(k > mK) // implicit: k != mK, k < m
{
unsigned long long r = n;
unsigned long long s = 0;
m = 0;
while(mK < k)
{
if(r % 10 != 7) mK++;
r /= 10;
m++;
}
if(r % 10 == 8) // special case
s = 79 + 100 * (r / 10);
while(m > 0)
{
r = 10 * r + 7;
if(s != 0 && m > 1) // special case
s = 10 * s + 7;
m--;
}
return (r < n && n - r < n - s) || (r >= n && r - n < n - s) ? r : s;
}
else // implicit : k < mK
{
// Generate a and b
unsigned long long a = n;
unsigned long long b = 0;
m = 0;
while(mK > k)
{
if(a % 10 == 7) mK--;
a /= 10;
m++;
}
b = 10 * a + 8;
a = 10 * a + 6;
m--;
while(m > 0)
{
a = 10 * a + 9;
b = 10 * b + 0;
m--;
}
// Compare (return lowest if equal)
return n - a <= b - n ? a : b;
}
}
#define CLOSEST7( N , K ) \
std::cout << "N = " << N << ", K = " << K << " => N' = " << getClosestWith7(N,K) << "\n"
int main()
{
CLOSEST7(1773,3);
CLOSEST7(83,1);
CLOSEST7(17273,3);
CLOSEST7(1273679750,6);
CLOSEST7(1773,1);
CLOSEST7(83,5);
CLOSEST7(0,2);
CLOSEST7(0,0);
}
For your question about long long: it depends on the compiler. Often, the size of this type is 64 bits, so you can store number from 0 to 2^64 - 1 (unsigned), which is 18 446 744 073 709 551 615, so it should be ok for your data range on most implementations.

Some problems:
ans=i records some i after you've divided it a few times, you need to record the original i
You only loop in 1 direction, you need to check in both directions at the same time
Looping through all numbers is fundamentally too slow
If the number is 100 000 000 000 000 and k = 14, you'd need to check 22 222 222 222 223 (100 000 000 000 000-77 777 777 777 777) numbers, which is not viable
Side note - the maximum for long long is 9223372036854775807.
Here is some pseudo-code which should work:
num = number of 7s in input
if (num == k)
print input
if (num < k)
a = input with (k-num) non-7 digits from least significant digit set to 7
let x = last position set
b = substring(input, 1, position)
c = b + 1
d = b - 1
ba = concat(b, substring(a, position, end))
ca = concat(c, substring(a, position, end))
da = concat(d, substring(a, position, end))
if (abs(input - ba) <= abs(input - ca) &&
abs(input - ba) <= abs(input - da))
print b
else
if (abs(input - ca) <= abs(input - ba) &&
abs(input - ca) <= abs(input - da))
print c
else
print d
if (num > k)
x = (k-num)th 7 from least significant digit
a = input with x set to 6 and all less significant digits to 9
b = input with x set to 8 and all less significant digits to 0
if (input - a > b - input)
print b
else
print a

How about this algorithm?
Convert the number into a string.
Count the number of 7s in it.
If it has less 7s than K, change the numbers from the right-most to left into 7s one-by-one until K is reached, then go to step 5.
If it has more 7s than K, change the numbers from the right-most to left into 6s one-by-one only if they are 7, until K is reached, then go to step 5.
Convert it back into an integer.
long long is usable according to Dukeling's answer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ - Checking whether two substrings are equal using hashing - c++

Related

Substrings of equal length comparison using hashing

Simplest way to split n objects into m sets, where m doesn't divide n?

Most efficient way to calculate lexicographic index

Finding MAX of numbers without conditional IF statements c++ [duplicate]

Find the nearest number of specific number which has specific digit (7)

Categories

Resources