How to hash very large substrings quickly without collisions? - c++

I have an app which as part of it finds all palindrome substrings of the input string. The input string can be up to 100,000 in length so the substrings can be very large. For example one input to the app resulted in over 300,000 substring palindromes over 10,000 in length. The app later counts all palindromes for equality and counts the unique ones by a hash that uses the standard hash that is done in the function that finds the palindromes. The hashes are stored in a vector and later counted for uniqueness in the app. The problems with such input and outptut conditions is the hashing for the very large substrings takes too long plus gets collisions in the hashes. So I was wondering if there is an algorithm (hash) that can quickly and uniquely hash a very large substring (preferably by index range for the substring for speed, but with accuracy for uniqueness). The hashing is done at the end of the function get_palins. The code is below.
#include <iostream>
#include <string>
#include <cstdlib>
#include <time.h>
#include <vector>
#include <algorithm>
#include <unordered_map>
#include <map>
#include <cstdio>
#include <cmath>
#include <ctgmath>
using namespace std;
#define MAX 100000
#define mod 1000000007
vector<long long> palins[MAX+5];
// Finds all palindromes for the string
void get_palins(string &s)
{
int N = s.length();
int i, j, k, // iterators
rp, // length of 'palindrome radius'
R[2][N+1]; // table for storing results (2 rows for odd- and even-length palindromes
s = "#" + s + "#"; // insert 'guards' to iterate easily over s
for(j = 0; j <= 1; j++)
{
R[j][0] = rp = 0; i = 1;
while(i <= N)
{
while(s[i - rp - 1] == s[i + j + rp]) { rp++; }
R[j][i] = rp;
k = 1;
while((R[j][i - k] != rp - k) && (k < rp))
{
R[j][i + k] = min(R[j][i - k],rp - k);
k++;
}
rp = max(rp - k,0);
i += k;
}
}
s = s.substr(1,N); // remove 'guards'
for(i = 1; i <= N; i++)
{
for(j = 0; j <= 1; j++)
for(rp = R[j][i]; rp > 0; rp--)
{
int begin = i - rp - 1;
int end_count = 2 * rp + j;
int end = begin + end_count - 1;
if (!(begin == 0 && end == N -1 ))
{
string ss = s.substr(begin, end_count);
long long hsh = hash<string>{}(ss);
palins[begin].push_back(hsh);
}
}
}
}
unordered_map<long long, int> palin_counts;
unordered_map<char, int> end_matches;
// Solve when at least 1 character in string is different
void solve_all_not_same(string &s)
{
int n = s.length();
long long count = 0;
get_palins(s);
long long palin_count = 0;
// Gets all palindromes into unordered map
for (int i = 0; i <= n; i++)
{
for (auto& it : palins[i])
{
if (palin_counts.find(it) == palin_counts.end())
{
palin_counts.insert({it,1});
}
else
{
palin_counts[it]++;
}
}
}
// From total palindromes, get proper border count
// minus end characters of substrings
for ( auto it = palin_counts.begin(); it != palin_counts.end(); ++it )
{
int top = it->second - 1;
palin_count += (top * (top + 1)) / 2;
palin_count %= mod;
}
// Store string character counts in unordered map
for (int i = 0; i <= n; i++)
{
char c = s[i];
//long long hsh = hash<char>{}(c);
if (end_matches[c] == 0)
end_matches[c] = 1;
else
end_matches[c]++;
}
// From substring end character matches, get proper border count
// for end characters of substrings
for ( auto it = end_matches.begin(); it != end_matches.end(); it++ )
{
int f = it->second - 1;
count += (f * (f + 1)) / 2;
}
cout << (count + palin_count) % mod << endl;
for (int i = 0; i < MAX+5; i++)
palins[i].clear();
}
int main()
{
string s;
cin >> s;
solve_all_not_same(s);
return 0;
}

Faced with problem X (find all palindrome substrings), you ask how to solve Y (hash substrings quickly): The XY Problem.
For palindrome detection, consider suffix arrays (one for the reverse of the input, or that appended to the input).
For fast hashes of overlapping strings, look into rolling hashes.

Related

Optimizing algorithm for constructing a string given costs to operations

I'm doing the following problem (not homework):
I'm doing an exercise (not homework) and I decided to go with backtracking, The problem says as follows:
You are given as input a target string. Starting with an empty string,
you add characters to it, until your new string is same as the target.
You have two options to add characters to a string: You can append an
arbitrary character to your new string, with cost x You can clone any
substring of your new string so far, and append it to the end of your
new string, with cost y For a given target, append cost x, and clone
cost y, we want to know what the cheapest cost is of building the
target string
And some examples:
Target "aa", append cost 1, clone cost 2: the cheapest cost is 2:
Start with an empty string, ""
Append 'a' (cost 1), giving the string "a"
Append 'a' (cost 1), giving the string "aa"
Target "aaaa", append cost 2, clone cost 3: the cheapest cost is 7:
Start with an empty string, ""
Append 'a' (cost 2), giving the string "a"
Append 'a' (cost 2), giving the string "aa"
Clone "aa" (cost 3), giving the string "aaaa"
Target "xzxpzxzxpq", append cost 10, clone cost 11: the cheapest cost is 71:
Start with an empty string, ""
Append 'x' (cost 10): "x"
Append 'z' (cost 10): "xz"
Append 'x' (cost 10): "xzx"
Append 'p' (cost 10): "xzxp"
Append 'z' (cost 10): "xzxpz"
Clone "xzxp" (cost 11): "xzxpzxzxp"
Append 'q' (cost 10) : "xzxpzxzxpq"
So far so good. I first tried to do it with backtracking, but then the following test case came:
string bigString = "abcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjoirmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaip";
string doubleIt = bigString + bigString;
Now that's big.
Given costs of 1234, 1235 to append and clone respectivly, the total cost of building it is 59249.
So no more backtracking for this one because of the stack overflow.
I tried a more efficient approach:
#include <iostream>
#include <vector>
#include <string>
#include <set>
int isWorthClone(const int size, const std::string& target) {
int worth = 0;
for (int j = size; j < target.size() and worth < size; j++) {
if (target[j] == target[worth]) {
worth++;
}
else break;
}
return worth;
}
int buildSolution(const std::string& target, int cpyCst, int apndCst) {
int index = 0;
int cost = 0;
while (int(target.size()) != (index)) {
int hasta = isWorthClone(index, target);
if (cpyCst < hasta * apndCst) {
cost += cpyCst;
index += hasta ;
}
else {
cost += apndCst;
index++;
}
}
return cost;
}
int main() {
std::string bigString = "abcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjoirmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaip";
std::string doubleIt = bigString + bigString;
std::string target = bigString;
int copyCost = 1235;
int appendCost = 1234;
std::cout << buildSolution(target, copyCost, appendCost) << std::endl;
}
but the output is 3588498, and from the test case, the correct output should be 59249.
I can't find why this approach is giving me that result. I tried debugging it, and it seems like isWorthClone is not finding the right position to clone in some cases. Also it seems a little strange, because it works for the other cases, but as this is somewhat "clone expensive" I think is propagating the error.
Any clues on why is this happening? This is O(n^2), so I think this should be the optimal solution.
Edit:
My code now looks like the following, trying to follow the dp approach:
int canCopy(const int i, const string& target, int posCopied) {
int iStartArray = 0;
bool canCopy = true;
int aux = i;
while (canCopy) {
if (aux - 1 + posCopied > target.size() or target[iStartArray] != target[aux - 1]) {
canCopy = false;
}
else {
posCopied += 1;
iStartArray++;
aux++;
}
}
return posCopied;
}
int stringConstruction(string target, int copyCost, int appendCost) {
vector<int> dp(target.size() + 1, std::numeric_limits<int>::max());
dp[1] = appendCost;
for (int i = 2; i < dp.size(); i++) {
dp[i] = std::min(dp[i], dp[i - 1] + appendCost);
int posCopied = canCopy(i, target, 0);
if (posCopied != 0 and (posCopied + i) < dp.size()) {
dp[posCopied + i] = dp[i] + copyCost;
}
}
return dp[dp.size()-1];
}
This still doesn't work for the test case presented here.
Edit2:
Finally I implemented the solution provided by #David Eisenstat (thanks!), with a really naive approach:
int best_clone(const string& s) {
int j = s.size() - 1;
while (s.substr(0, j).find(s.substr(j, s.size() - j)) != std::string::npos) {
j--;
}
return j + 1;
}
int stringConstruction(string target, int copyCost, int appendCost) {
vector<int> v = vector<int> (1, 0);
for (int i = 0; i < target.size(); i++) {
int cost = v[i] + appendCost;
int j = best_clone(target.substr(0, i+1));
if (j <= i) {
cost = std::min(cost, v[j] + copyCost);
}
v.push_back(cost);
}
return v[v.size() - 1];
}
It seems like I missunderstood the problem. This is giving the solution for the test cases, but it takes too long. best_clone needs to be optimized.
Edit 3:
(Hope this is the last one)
I added the following class SA for storing the suffix array:
#pragma once
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
#include <chrono>
using namespace std;
typedef struct {
int index;
string s;
} suffix;
struct comp
{
inline bool operator() (const suffix& s1, const suffix& s2)
{
return (s1.s < s2.s);
}
};
class SA
{
private:
vector<suffix> values;
public:
SA(const string& s) : values(s.size()) {
string aux = s;
for (int i = 0; i < s.length(); i++) {
values[i].index = i;
values[i].s = s.substr(i, s.size() - i);;
}
sort(values.begin(), values.end(), comp());
}
friend ostream& operator<<(ostream& os, const SA& dt)
{
for (int i = 0; i < dt.values.size(); i++) {
os << dt.values[i].index << ": " << dt.values[i].s << "\n";
}
return os;
}
int search(const string& subst, int i, int j) {
while (j >= i) {
int mid = (i + j) / 2;
if (this->values[mid].s > subst) {
j = mid-1;
}
else if (this->values[mid].s < subst) {
i = mid+1;
}
else return mid;
}
return -1;
}
};
But know I don't know how to search here for the best clone in this array. (I know this is slow, n*2log(n) I would say, but I think is going to be good enough for this one. So now I need to put together these parts.
The problem is that you're making the decision to clone greedily. Let's look at a case where the append cost is 2 and the clone cost is 3. If you process the string aabaaaba, you'll append aab, clone aa, and clone aba, whereas the best solution is to append aaba and clone it.
The fix is dynamic programming, specifically, to build an array of the cost to make each prefix of the target string. To fill each entry, take the min of (append cost plus previous entry, clone cost plus cost for the shortest prefix that can be completed with one clone). Since the clone cost is constant, the array is nondecreasing, and therefore we don't need to check all of the possible prefixes.
Depending on the constraints you may need to construct a suffix array/longest common prefix array (using e.g., SA-IS) to identify all of the best clones quickly. This will run in time o(n²) for sure (quite possibly O(n), but there are enough moving parts that I don't want to claim that).
This Python is too slow but gets the right answer on the large test case:
def best_clone(s):
j = len(s) - 1
while s[j:] in s[:j]:
j -= 1
return j + 1
def construction_cost(s, append_cost, clone_cost):
table = [0]
for i in range(len(s)):
cost = table[i] + append_cost
j = best_clone(s[: i + 1])
if j <= i:
cost = min(cost, table[j] + clone_cost)
table.append(cost)
return table[len(s)]
If the limit of your ambitions is quadratic, then we can put the Z function for string matching to good use.
#include <algorithm>
#include <cstddef>
#include <iostream>
#include <string>
#include <string_view>
#include <vector>
using Cost = unsigned long long;
// Adapted from https://cp-algorithms.com/string/z-function.html
std::vector<std::size_t> ZFunction(std::string_view s) {
std::size_t n = s.length();
std::vector<std::size_t> z(n);
for (std::size_t i = 1, l = 0, r = 0; i < n; i++) {
if (i <= r) {
z[i] = std::min(r - i + 1, z[i - l]);
}
while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
z[i]++;
}
if (i + z[i] - 1 > r) {
l = i;
r = i + z[i] - 1;
}
}
return z;
}
std::size_t BestClone(std::string_view s) {
std::string r{s};
std::reverse(r.begin(), r.end());
auto z = ZFunction(r);
std::size_t best = 0;
for (std::size_t i = 0; i < z.size(); i++) {
best = std::max(best, std::min(z[i], i));
}
return s.length() - best;
}
Cost ConstructionCost(std::string_view s, Cost append_cost, Cost clone_cost) {
std::vector<Cost> costs = {0};
for (std::size_t j = 0; j < s.length(); j++) {
std::size_t i = BestClone(s.substr(0, j + 1));
if (i <= j) {
costs.push_back(
std::min(costs.back() + append_cost, costs[i] + clone_cost));
} else {
costs.push_back(costs.back() + append_cost);
}
}
return costs.back();
}
int main() {
std::string s;
while (std::cin >> s) {
std::cout << ConstructionCost(s, 1234, 1235) << '\n';
}
}

My Boyer-Moore algorithm only searches the first 3000ish characters in my text file

I'm trying to implement a Boyer-Moore string search algorithm. The search algorithm itself seems to work fine, up until a point. It prints out all occurrences until it reaches around the 3300 character area, then it does not search any further.
I am unsure if this is to do with the text file being too big to fit into my string or something entirely different. When I try and print the string holding the text file, it cuts off the first 185122 characters as well. For reference, the text file is Lord of the Rings: Fellowship of the Ring - it is 1016844 characters long.
Here is my code for reference:
#include <fstream>
#include <iostream>
#include <algorithm>
#include <vector>
#include <chrono>
using namespace std;
# define number_chars 256
typedef std::chrono::steady_clock clocktime;
void boyer_moore(string text, string pattern, int textlength, int patlength) {
clocktime::time_point start = clocktime::now();
vector<int> indexes;
int chars[number_chars];
for (int i = 0; i < number_chars; i++) {
chars[i] = -1;
}
for (int i = 0; i < patlength; i++) {
chars[(int)pattern[i]] = i;
}
int shift = 0;
while (shift <= (textlength - patlength)) {
int j = patlength - 1;
while (j >= 0 && pattern[j] == text[shift + j]) {
j--;
}
if (j < 0) {
indexes.push_back(shift);
if (shift + patlength < textlength) {
shift += patlength - chars[text[shift + patlength]];
}
else {
shift += 1;
}
}
else {
shift += max(1, j - chars[text[shift + j]]);
}
}
clocktime::time_point end = clocktime::now();
auto time_taken = chrono::duration_cast<chrono::milliseconds>(end - start).count();
for (int in : indexes) {
cout << in << endl;
}
}
int main() {
ifstream myFile;
//https://www.kaggle.com/ashishsinhaiitr/lord-of-the-rings-text/version/1#01%20-%20The%20Fellowship%20Of%20The%20Ring.txt
myFile.open("lotr.txt");
if (!myFile) {
cout << "no text file found";
}
string text((istreambuf_iterator<char>(myFile)), (istreambuf_iterator<char>()));
cout << text;
string pattern;
cin >> pattern;
int n = text.size();
int m = pattern.size();
boyer_moore(text, pattern, n, m);
}
I have tried to do some researching about what could be the cause but couldn't find anyone with this particular issue. Would appreciate any nudges in the right direction.

Find Maximum Strength given number of elements to be skipped on left and right.Please Tell me why my code gives wrong output for certain test cases?

Given an array "s" of "n" items, you have for each item an left value "L[i]" and right value "R[i]" and its strength "S[i]",if you pick an element you can not pick L[i] elements on immediate left of it and R[i] on immediate right of it, find the maximum strength possible.
Example input:
5 //n
1 3 7 3 7 //strength
0 0 2 2 2 //Left Value
3 0 1 0 0 //Right Value
Output:
10
Code:
#include < bits / stdc++.h >
using namespace std;
unsigned long int getMax(int n, int * s, int * l, int * r) {
unsigned long int dyn[n + 1] = {};
dyn[1] = s[1];
for (int i = 2; i <= n; i++) {
dyn[i] = dyn[i - 1];
unsigned long int onInc = s[i];
int left = i - l[i] - 1;
if (left >= 1) {
unsigned int k = left;
while ((k > 0) && ((r[k] + k) >= i)) {
k--;
}
if (k != 0) {
if ((dyn[k] + s[i]) > dyn[i]) {
onInc = dyn[k] + s[i];
}
}
}
dyn[i] = (dyn[i] > onInc) ? dyn[i] : onInc;
}
return dyn[n];
}
int main() {
int n;
cin >> n;
int s[n + 1] = {}, l[n + 1] = {}, r[n + 1] = {};
for (int i = 1; i <= n; i++) {
cin >> s[i];
}
for (int i = 1; i <= n; i++) {
cin >> l[i];
}
for (int i = 1; i <= n; i++) {
cin >> r[i];
}
cout << getMax(n, s, l, r) << endl;
return 0;
}
Problem in your approach:
In your DP table, the information you are storing is about maximum possible so far. The information regarding whether the ith index has been considered is lost. You can consider taking strength at current index to extend previous indices only if any of the previously seen indices is either not in range or it is in range and has not been considered.
Solution:
Reconfigure your DP recurrence. Let DP[i] denote the maximum answer if ith index was considered. Now you will only need to extend those that satisfy range condition. The answer would be maximum value of all DP indices.
Code:
vector<long> DP(n,0);
DP[0]=strength[0]; // base condition
for(int i = 1; i < n ; i++){
DP[i] = strength[i];
for(int j = 0; j < i ; j++){
if(j >= (i-l[i]) || i <= (j+r[j])){ // can't extend
}
else{
DP[i]=max(DP[i],strength[i]+DP[j]); // extend to maximize result
}
}
}
long ans=*max_element(DP.begin(),DP.end());
Time Complexity: O(n^2)
Possible Optimizations:
There are better ways to calculate maximum values which you might want to look into. You can start by looking into Segment tree and Binary Indexed Trees.

Given an integer N, print numbers from 1 to N in lexicographic order

I'm trying to print the numbers from 1 to N in lexicographic order, but I get a failed output. for the following input 100, I get the 100, but its shifted and it doesn't match with the expected output, there is a bug in my code but I can not retrace it.
class Solution {
public:
vector<int> lexicalOrder(int n) {
vector<int> result;
for(int i = 1; i <= 9; i ++){
int j = 1;
while( j <= n){
for(int m = 0; m < j ; ++ m){
if(m + j * i <= n){
result.push_back(m+j*i);
}
}
j *= 10;
}
}
return result;
}
};
Input:
100
Output:
[1,10,11,12,13,14,15,16,17,18,19,100,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,44,45,46,47,48,49,5,50,51,52,53,54,55,56,57,58,59,6,60,61,62,63,64,65,66,67,68,69,7,70,71,72,73,74,75,76,77,78,79,8,80,81,82,83,84,85,86,87,88,89,9,90,91,92,93,94,95,96,97,98,99]
Expected:
[1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,44,45,46,47
Think about when i=1,j=10 what will happen in
for(int m = 0; m < j ; ++ m){
if(m + j * i <= n){
result.push_back(m+j*i);
}
}
Yes,result will push_back 10(0+10*1),11(1+10*1),12(2+10*1)..
Here is a solution:
#include <iostream>
#include <vector>
#include <string>
std::vector<int> fun(int n)
{
std::vector<std::string> result;
for (int i = 1; i <= n; ++i) {
result.push_back(std::to_string(i));
}
std::sort(result.begin(),result.end());
std::vector<int> ret;
for (auto i : result) {
ret.push_back(std::stoi(i));
}
return ret;
}
int main(int argc, char *argv[])
{
std::vector<int> result = fun(100);
for (auto i : result) {
std::cout << i << ",";
}
std::cout << std::endl;
return 0;
}
You are looping through all 2 digit numbers starting with 1 before outputting the first 3 digit number, so your approach won't work.
One way to do this is to output the digits in base 11, padded out with leading spaces to the maximum number of digits, in this case 3. Output 0 as a space, 1 as 0, 2 as 1 etc. Reject any numbers that have any non-trailing spaces in this representation, or are greater than n when interpreted as a base 10 number. It should be possible to jump past multiple rejects at once, but that's an unnecessary optimization. Keep a count of the numbers you have output and stop when it reaches n. This will give you a lexicographical ordering in base 10.
Example implementation that uses O(1) space, where you don't have to generate and sort all the numbers up front before you can output the first one:
void oneToNLexicographical(int n)
{
if(n < 1) return;
// count max digits
int digits = 1, m = n, max_digit11 = 1, max_digit10 = 1;
while(m >= 10)
{
m /= 10; digits++; max_digit11 *= 11; max_digit10 *= 10;
}
int count = 0;
bool found_n = false;
// count up starting from max_digit * 2 (first valid value with no leading spaces)
for(int i = max_digit11 * 2; ; i++)
{
int val = 0, trailing_spaces = 0;
int place_val11 = max_digit11, place_val10 = max_digit10;
// bool valid_spaces = true;
for(int d = 0; d < digits; d++)
{
int base11digit = (i / place_val11) % 11;
if(base11digit == 0)
{
trailing_spaces++;
val /= 10;
}
else
{
// if we got a non-space after a space, it's invalid
// if(trailing_spaces > 0)
// {
// valid_spaces = false;
// break; // trailing spaces only
// }
val += (base11digit - 1) * place_val10;
}
place_val11 /= 11;
place_val10 /= 10;
}
// if(valid_spaces && (val <= n))
{
cout << val << ", ";
count++;
}
if(val == n)
{
found_n = true;
i += 10 - (i % 11); // skip to next number with one trailing space
}
// skip past invalid numbers:
// if there are multiple trailing spaces then the next run of numbers will have spaces in the middle - invalid
if(trailing_spaces > 1)
i += (int)pow(11, trailing_spaces - 1) - 1;
// if we have already output the max number, then all remaining numbers
// with the max number of digits will be greater than n
else if(found_n && (trailing_spaces == 1))
i += 10;
if(count == n)
break;
}
}
This skips past all invalid numbers, so it's not necessary to test valid_spaces before outputting each.
The inner loop can be removed by doing the base11 -> base 10 conversion using differences, making the algorithm O(N) - the inner while loop tends towards a constant:
int val = max_digit10;
for(int i = max_digit11 * 2; ; i++)
{
int trailing_spaces = 0, pow11 = 1, pow10 = 1;
int j = i;
while((j % 11) == 0)
{
trailing_spaces++;
pow11 *= 11;
pow10 *= 10;
j /= 11;
}
int output_val = val / pow10;
if(output_val <= n)
{
cout << output_val << ", ";
count++;
}
if(output_val == n)
found_n = true;
if(trailing_spaces > 1)
{
i += (pow11 / 11) - 1;
}
else if(found_n && (trailing_spaces == 1))
{
i += 10;
val += 10;
}
else if(trailing_spaces == 0)
val++;
if(count == n)
break;
}
Demonstration
The alternative, simpler approach is just to generate N strings from the numbers and sort them.
Maybe more general solution?
#include <vector>
#include <algorithm>
using namespace std;
// returns true is i1 < i2 according to lexical order
bool lexicalLess(int i1, int i2)
{
int base1 = 1;
int base2 = 1;
for (int c = i1/10; c > 0; c/=10) base1 *= 10;
for (int c = i2/10; c > 0; c/=10) base2 *= 10;
while (base1 > 0 && base2 > 0) {
int d1 = i1 / base1;
int d2 = i2 / base2;
if (d1 != d2) return (d1 < d2);
i1 %= base1;
i2 %= base2;
base1 /= 10;
base2 /= 10;
}
return (base1 < base2);
}
vector<int> lexicalOrder(int n) {
vector<int> result;
for (int i = 1; i <= n; ++i) result.push_back(i);
sort(result.begin(), result.end(), lexicalLess);
return result;
}
The other idea for lexicalLess(...) is to convert integers to string before comparision:
#include <vector>
#include <algorithm>
#include <string>
#include <boost/lexical_cast.hpp>
using namespace std;
// returns true is i1 < i2 according to lexical order
bool lexicalLess(int i1, int i2)
{
string s1 = boost::lexical_cast<string>(i1);
string s2 = boost::lexical_cast<string>(i2);
return (s1 , s2);
}
You need Boost to run the second version.
An easy one to implement is to convert numbers to string, them sort the array of strings with std::sort in algorithm header, that sorts strings in lexicographical order, then again turn numbers to integer
Make a vector of integers you want to sort lexicographically, name it numbers.
Make an other vector and populate it strings of numbers in the first vector. name it strs.
Sort strs array.4. Convert strings of strs vector to integers and put it in vectors
List item
#include <cstdlib>
#include <string>
#include <algorithm>
#include <vector>
#include <iostream>
using namespace std;
string int_to_string(int x){
string ret;
while(x > 0){
ret.push_back('0' + x % 10);
x /= 10;
}
reverse(ret.begin(), ret.end());
return ret;
}
int main(){
vector<int> ints;
ints.push_back(1);
ints.push_back(2);
ints.push_back(100);
vector<string> strs;
for(int i = 0; i < ints.size(); i++){
strs.push_back(int_to_string((ints[i])));
}
sort(strs.begin(), strs.end());
vector<int> sorted_ints;
for(int i = 0; i < strs.size(); i++){
sorted_ints.push_back(atoi(strs[i].c_str()));
}
for(int i = 0; i < sorted_ints.size(); i++){
cout<<sorted_ints[i]<<endl;
}
}
As the numbers are unique from 1 to n, you can use a set of size n and insert all of them into it and then print them out.
set will automatically keep them sorted in lexicographical order if you store the numbers as a string.
Here is the code, short and simple:
void lexicographicalOrder(int n){
set<string> ans;
for(int i = 1; i <= n; i++)
ans.insert(to_string(i));
for(auto ele : ans)
cout <<ele <<"\n";
}

Significant C++ code execution slowdown

So I have to solve one USACO problem involving computing all the primes <= 100M and printing these of them which are palindromes while the restrictions are 16MB memory and 1 sec executions time. So I had to make a lot of optimisations.
Please take a look at the following block of code:
for(int i = 0; i < all.size(); ++i)
{
if(all[i] < a) continue;
else if(all[i] > b) break;
if(isPrime(all[i]))
{
char buffer[50];
//toString(all[i], buffer);
int c = all[i];
log10(2);
buffer[3] = 2;
//buffer[(int)log10(all[i])+1] = '\n';
//buffer[(int)log10(all[i])+2] = '\0';
//fputs(buffer, pFile);
}
}
Now, it executes in the satisfying 0.5 sec range, but when I change log10(2) to log10(all[i]) it skyrockets nearly to 2 seconds! For no apparent reason. I'm assigning all[i] to the variable c and it doesn't slow down the execution at all, but when I pass all[i] as parameter, it makes the code 4 times slower! Any ideas why this is happening and how I can fix it?
Whole code:
/*
ID: xxxxxxxx
PROG: pprime
LANG: C++11
*/
#include <fstream>
#include <iostream>
#include <vector>
#include <queue>
#include <stack>
#include <set>
#include <string>
#include <cstring>
#include <algorithm>
#include <list>
#include <ctime>
#include <cstdio>
using namespace std;
typedef struct number Number;
ifstream fin("pprime.in");
ofstream fout("pprime.out");
int MAXN = 100000000;
unsigned short bits[2000000] = {};
vector<int> primes;
vector<int> all;
int a, b;
short getBit(int atPos)
{
int whichNumber = (atPos-1) / 16;
int atWhichPosInTheNumber = (atPos-1) % 16;
return ((bits[whichNumber] & (1 << atWhichPosInTheNumber)) >> atWhichPosInTheNumber);
}
void setBit(int atPos)
{
int whichNumber = (atPos-1) / 16;
int atWhichPosInTheNumber = (atPos-1) % 16;
int old = bits[whichNumber];
bits[whichNumber] = bits[whichNumber] | (1 << atWhichPosInTheNumber);
}
void calcSieve()
{
for(int i = 2; i < MAXN; ++i)
{
if(getBit(i) == 0)
{
for(int j = 2*i; j <= (MAXN); j += i)
{
setBit(j);
}
primes.push_back(i);
}
}
}
int toInt(list<short> integer)
{
int number = 0;
while(!integer.empty())
{
int current = integer.front();
integer.pop_front();
number = number * 10 + current;
}
return number;
}
void toString(int number, char buffer[])
{
int i = 0;
while(number != 0)
{
buffer[i] = number % 10 + '0';
number /= 10;
}
}
void DFS(list<short> integer, int N, int atLeast)
{
if(integer.size() > N)
{
return;
}
if(!(integer.size() > 0 && (integer.front() == 0 || integer.back() % 2 == 0)) && atLeast <= integer.size())
{
int toI = toInt(integer);
if(toI <= b) all.push_back(toInt(integer));
}
for(short i = 0; i <= 9; ++i)
{
integer.push_back(i);
integer.push_front(i);
DFS(integer, N, atLeast);
integer.pop_back();
integer.pop_front();
}
}
bool isPrime(int number)
{
for(int i = 0; i < primes.size() && number > primes[i]; ++i)
{
if(number % primes[i] == 0) return false;
}
return true;
}
int main()
{
int t = clock();
ios::sync_with_stdio(false);
fin >> a >> b;
MAXN = min(MAXN, b);
int N = (int)log10(b) + 1;
int atLeast = (int)log10(a) + 1;
for(short i = 0; i <= 9; ++i)
{
list<short> current;
current.push_back(i);
DFS(current, N, atLeast);
}
list<short> empty;
DFS(empty, N, atLeast);
sort(all.begin(), all.end());
//calcSieve
calcSieve();
//
string output = "";
int ends = clock() - t;
cout<<"Exexution time: "<<((float)ends)/CLOCKS_PER_SEC<<" seconds";
cout<<"\nsize: "<<all.size()<<endl;
FILE* pFile;
pFile = fopen("pprime.out", "w");
for(int i = 0; i < all.size(); ++i)
{
if(all[i] < a) continue;
else if(all[i] > b) break;
if(isPrime(all[i]))
{
char buffer[50];
//toString(all[i], buffer);
int c = all[i];
log10(c);
buffer[3] = 2;
//buffer[(int)log10(all[i])+1] = '\n';
//buffer[(int)log10(all[i])+2] = '\0';
//fputs(buffer, pFile);
}
}
ends = clock() - t;
cout<<"\nExexution time: "<<((float)ends)/CLOCKS_PER_SEC<<" seconds";
ends = clock() - t;
cout<<"\nExexution time: "<<((float)ends)/CLOCKS_PER_SEC<<" seconds";
fclose(pFile);
//fout<<output;
return 0;
}
I think you've done this backwards. It seems odd to generate all the possible palindromes (if that's what DFS actually does... that function confuses me) and then check which of them are prime. Especially since you have to generate the primes anyway.
The other thing is that you are doing a linear search in isPrime, which is not taking advantage of the fact that the array is sorted. Use a binary search instead.
And also, using list instead of vector for your DFS function will hurt your runtime. Try using a deque.
Now, all that said, I think that you should do this the other way around. There are a huge number of palindromes that won't be prime. What's the point in generating them? A simple stack is all you need to check if a number is a palindrome. Like this:
bool IsPalindrome( unsigned int val )
{
int digits[10];
int multiplier = 1;
int *d = digits;
// Add half of number's digits to a stack
while( multiplier < val ) {
*d++ = val % 10;
val /= 10;
multiplier *= 10;
}
// Adjust for odd-length palindrome
if( val * 10 < multiplier ) --d;
// Check remaining digits
while( val != 0 ) {
if(*(--d) != val % 10) return false;
val /= 10;
}
return true;
}
This avoids the need to call log10 at all, as well as eliminates all that palindrome generation. The sieve is pretty fast, and after that you'll only have a few thousand primes to test, most of which will not be palindromes.
Now your whole program becomes something like this:
calcSieve();
for( vector<int>::iterator it = primes.begin(); it != primes.end(); it++ ) {
if( IsPalindrome(*it) ) cout << *it << "\n";
}
One other thing to point out. Two things actually:
int MAXN = 100000000;
unsigned short bits[2000000] = {};
bits is too short to represent 100 million flags.
bits is uninitialised.
To address both these issues, try:
unsigned short bits[1 + MAXN / 16] = { 0 };