Optimizing algorithm for constructing a string given costs to operations - c++

I'm doing the following problem (not homework):
I'm doing an exercise (not homework) and I decided to go with backtracking, The problem says as follows:
You are given as input a target string. Starting with an empty string,
you add characters to it, until your new string is same as the target.
You have two options to add characters to a string: You can append an
arbitrary character to your new string, with cost x You can clone any
substring of your new string so far, and append it to the end of your
new string, with cost y For a given target, append cost x, and clone
cost y, we want to know what the cheapest cost is of building the
target string
And some examples:
Target "aa", append cost 1, clone cost 2: the cheapest cost is 2:
Start with an empty string, ""
Append 'a' (cost 1), giving the string "a"
Append 'a' (cost 1), giving the string "aa"
Target "aaaa", append cost 2, clone cost 3: the cheapest cost is 7:
Start with an empty string, ""
Append 'a' (cost 2), giving the string "a"
Append 'a' (cost 2), giving the string "aa"
Clone "aa" (cost 3), giving the string "aaaa"
Target "xzxpzxzxpq", append cost 10, clone cost 11: the cheapest cost is 71:
Start with an empty string, ""
Append 'x' (cost 10): "x"
Append 'z' (cost 10): "xz"
Append 'x' (cost 10): "xzx"
Append 'p' (cost 10): "xzxp"
Append 'z' (cost 10): "xzxpz"
Clone "xzxp" (cost 11): "xzxpzxzxp"
Append 'q' (cost 10) : "xzxpzxzxpq"
So far so good. I first tried to do it with backtracking, but then the following test case came:
string bigString = "abcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjoirmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaip";
string doubleIt = bigString + bigString;
Now that's big.
Given costs of 1234, 1235 to append and clone respectivly, the total cost of building it is 59249.
So no more backtracking for this one because of the stack overflow.
I tried a more efficient approach:
#include <iostream>
#include <vector>
#include <string>
#include <set>
int isWorthClone(const int size, const std::string& target) {
int worth = 0;
for (int j = size; j < target.size() and worth < size; j++) {
if (target[j] == target[worth]) {
worth++;
}
else break;
}
return worth;
}
int buildSolution(const std::string& target, int cpyCst, int apndCst) {
int index = 0;
int cost = 0;
while (int(target.size()) != (index)) {
int hasta = isWorthClone(index, target);
if (cpyCst < hasta * apndCst) {
cost += cpyCst;
index += hasta ;
}
else {
cost += apndCst;
index++;
}
}
return cost;
}
int main() {
std::string bigString = "abcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjoirmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipiblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifpblgmbtmblgmbaipfdmbcntdblgblgmbaipmbcntdblgblgmbaipcbcntdblgblgmbaipobacjodblgblgmbaiabcblgmbcntdblgblgmbaipmbcntdblgblgmbaipfdmbcqaipfdmbcntdblgblgmbaipmbcntdblgblgmbaiprtifbcntdblgblgmbaipmbcntdblgblgmbaip";
std::string doubleIt = bigString + bigString;
std::string target = bigString;
int copyCost = 1235;
int appendCost = 1234;
std::cout << buildSolution(target, copyCost, appendCost) << std::endl;
}
but the output is 3588498, and from the test case, the correct output should be 59249.
I can't find why this approach is giving me that result. I tried debugging it, and it seems like isWorthClone is not finding the right position to clone in some cases. Also it seems a little strange, because it works for the other cases, but as this is somewhat "clone expensive" I think is propagating the error.
Any clues on why is this happening? This is O(n^2), so I think this should be the optimal solution.
Edit:
My code now looks like the following, trying to follow the dp approach:
int canCopy(const int i, const string& target, int posCopied) {
int iStartArray = 0;
bool canCopy = true;
int aux = i;
while (canCopy) {
if (aux - 1 + posCopied > target.size() or target[iStartArray] != target[aux - 1]) {
canCopy = false;
}
else {
posCopied += 1;
iStartArray++;
aux++;
}
}
return posCopied;
}
int stringConstruction(string target, int copyCost, int appendCost) {
vector<int> dp(target.size() + 1, std::numeric_limits<int>::max());
dp[1] = appendCost;
for (int i = 2; i < dp.size(); i++) {
dp[i] = std::min(dp[i], dp[i - 1] + appendCost);
int posCopied = canCopy(i, target, 0);
if (posCopied != 0 and (posCopied + i) < dp.size()) {
dp[posCopied + i] = dp[i] + copyCost;
}
}
return dp[dp.size()-1];
}
This still doesn't work for the test case presented here.
Edit2:
Finally I implemented the solution provided by #David Eisenstat (thanks!), with a really naive approach:
int best_clone(const string& s) {
int j = s.size() - 1;
while (s.substr(0, j).find(s.substr(j, s.size() - j)) != std::string::npos) {
j--;
}
return j + 1;
}
int stringConstruction(string target, int copyCost, int appendCost) {
vector<int> v = vector<int> (1, 0);
for (int i = 0; i < target.size(); i++) {
int cost = v[i] + appendCost;
int j = best_clone(target.substr(0, i+1));
if (j <= i) {
cost = std::min(cost, v[j] + copyCost);
}
v.push_back(cost);
}
return v[v.size() - 1];
}
It seems like I missunderstood the problem. This is giving the solution for the test cases, but it takes too long. best_clone needs to be optimized.
Edit 3:
(Hope this is the last one)
I added the following class SA for storing the suffix array:
#pragma once
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
#include <chrono>
using namespace std;
typedef struct {
int index;
string s;
} suffix;
struct comp
{
inline bool operator() (const suffix& s1, const suffix& s2)
{
return (s1.s < s2.s);
}
};
class SA
{
private:
vector<suffix> values;
public:
SA(const string& s) : values(s.size()) {
string aux = s;
for (int i = 0; i < s.length(); i++) {
values[i].index = i;
values[i].s = s.substr(i, s.size() - i);;
}
sort(values.begin(), values.end(), comp());
}
friend ostream& operator<<(ostream& os, const SA& dt)
{
for (int i = 0; i < dt.values.size(); i++) {
os << dt.values[i].index << ": " << dt.values[i].s << "\n";
}
return os;
}
int search(const string& subst, int i, int j) {
while (j >= i) {
int mid = (i + j) / 2;
if (this->values[mid].s > subst) {
j = mid-1;
}
else if (this->values[mid].s < subst) {
i = mid+1;
}
else return mid;
}
return -1;
}
};
But know I don't know how to search here for the best clone in this array. (I know this is slow, n*2log(n) I would say, but I think is going to be good enough for this one. So now I need to put together these parts.

The problem is that you're making the decision to clone greedily. Let's look at a case where the append cost is 2 and the clone cost is 3. If you process the string aabaaaba, you'll append aab, clone aa, and clone aba, whereas the best solution is to append aaba and clone it.
The fix is dynamic programming, specifically, to build an array of the cost to make each prefix of the target string. To fill each entry, take the min of (append cost plus previous entry, clone cost plus cost for the shortest prefix that can be completed with one clone). Since the clone cost is constant, the array is nondecreasing, and therefore we don't need to check all of the possible prefixes.
Depending on the constraints you may need to construct a suffix array/longest common prefix array (using e.g., SA-IS) to identify all of the best clones quickly. This will run in time o(n²) for sure (quite possibly O(n), but there are enough moving parts that I don't want to claim that).
This Python is too slow but gets the right answer on the large test case:
def best_clone(s):
j = len(s) - 1
while s[j:] in s[:j]:
j -= 1
return j + 1
def construction_cost(s, append_cost, clone_cost):
table = [0]
for i in range(len(s)):
cost = table[i] + append_cost
j = best_clone(s[: i + 1])
if j <= i:
cost = min(cost, table[j] + clone_cost)
table.append(cost)
return table[len(s)]
If the limit of your ambitions is quadratic, then we can put the Z function for string matching to good use.
#include <algorithm>
#include <cstddef>
#include <iostream>
#include <string>
#include <string_view>
#include <vector>
using Cost = unsigned long long;
// Adapted from https://cp-algorithms.com/string/z-function.html
std::vector<std::size_t> ZFunction(std::string_view s) {
std::size_t n = s.length();
std::vector<std::size_t> z(n);
for (std::size_t i = 1, l = 0, r = 0; i < n; i++) {
if (i <= r) {
z[i] = std::min(r - i + 1, z[i - l]);
}
while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
z[i]++;
}
if (i + z[i] - 1 > r) {
l = i;
r = i + z[i] - 1;
}
}
return z;
}
std::size_t BestClone(std::string_view s) {
std::string r{s};
std::reverse(r.begin(), r.end());
auto z = ZFunction(r);
std::size_t best = 0;
for (std::size_t i = 0; i < z.size(); i++) {
best = std::max(best, std::min(z[i], i));
}
return s.length() - best;
}
Cost ConstructionCost(std::string_view s, Cost append_cost, Cost clone_cost) {
std::vector<Cost> costs = {0};
for (std::size_t j = 0; j < s.length(); j++) {
std::size_t i = BestClone(s.substr(0, j + 1));
if (i <= j) {
costs.push_back(
std::min(costs.back() + append_cost, costs[i] + clone_cost));
} else {
costs.push_back(costs.back() + append_cost);
}
}
return costs.back();
}
int main() {
std::string s;
while (std::cin >> s) {
std::cout << ConstructionCost(s, 1234, 1235) << '\n';
}
}

Related

Resolving a Bug in My Code which is not passing the test cases

There is this problem on Leetcode , Link of the problem is : https://leetcode.com/problems/largest-time-for-given-digits/
I have written the code for this problem , and according to me my code is correct but still my code is not passing all the test cases and I am stuck debugging where is the issue in my Code .
Can Anybody please help me with this ?
class Solution {
public:
bool isValid(string s){
if(s[0] > '2') return false;
if(s[0] == '2'){
if(s[1] >= '4'){
return false ;
}
}
if(s[2] >=6) return false ;
return true ;
}
vector<vector<int>> permute(vector<int> &nums)
{
vector<vector<int>> result;
//Base Case For The Problem:
if (nums.size() <= 1)
return {nums};
for (int i = 0; i < nums.size(); i++)
{
vector<int> v(nums.begin(), nums.end());
v.erase(v.begin() + i);
auto res = permute(v);
for (int j = 0; j < res.size(); j++)
{
vector<int> _v = res[j];
_v.insert(_v.begin(), nums[i]);
result.push_back(_v);
}
}
return result;
}
string largestTimeFromDigits(vector<int>& A) {
vector<vector<int>> res ;
vector<string> valid ; //For Only Storing the Valid Time Permutations
res = permute(A);
//Now , Iterating Over All the Permutations:
for(int i=0 ; i<res.size() ; i++){
string curr = "";
for(int j=0 ; j<res[i].size() ; ++j){
curr += res[i][j];
}
if(isValid(curr)) valid.push_back(curr);
}
sort(valid.begin() , valid.end());
string ans = ""; //The Final Answer that we have to return at the end.
if(valid.size() > 0){
//Now , perform the Required Operations:
string temp = valid[valid.size() - 1];
ans = temp.substr(0,2) + ":" + temp.substr(2);
}
return ans;
}
};
Two problems in your code, both related to mixing int with char. The first is here:
if(s[2] >=6 ) {
return false ;
}
Because of this condition your isValid returns false always. No character in the range '0'...'9' is smaller than the integer 6. Compare the char to a char:
if(s[2] >='6' ) {
return false ;
}
Next, here
curr += res[i][j];
res[i][j] is an integer, but you want to add a character to the string:
curr += static_cast<char>(res[i][j]) + '0';
After fixing those two I get expected output at least for input {2,2,2,2}, see here: https://godbolt.org/z/35r3f9.
I have to mention that you would have found those problems yourself if you had used a debugger. Getting better in coding is not that much about making less mistakes, but about getting better at finding and fixing them. The debugger is an essential tool to do that.
C++
You can use std::prev_permutation and sort first:
// The following block might slightly improve the execution time;
// Can be removed;
static const auto __optimize__ = []() {
std::ios::sync_with_stdio(false);
std::cin.tie(nullptr);
std::cout.tie(nullptr);
return 0;
}();
// Most of headers are already included;
// Can be removed;
#include <cstdint>
#include <string>
#include <vector>
#include <algorithm>
static const struct Solution {
static const std::string largestTimeFromDigits(std::vector<int>& A) {
std::sort(std::begin(A), std::end(A), std::greater<int>());
do if (
(A[0] < 2 || A[0] == 2 && A[1] < 4) &&
A[2] < 6
) {
return std::to_string(A[0]) + std::to_string(A[1]) + ":" + std::to_string(A[2]) + std::to_string(A[3]);
}
while (std::prev_permutation(std::begin(A), std::end(A)));
return "";
}
};
Here is LeetCode's official solution in C++:
class Solution {
public:
string largestTimeFromDigits(vector<int>& A) {
int max_time = -1;
// prepare for the generation of permutations next.
std::sort(A.begin(), A.end());
do {
int hour = A[0] * 10 + A[1];
int minute = A[2] * 10 + A[3];
if (hour < 24 && minute < 60) {
int new_time = hour * 60 + minute;
max_time = new_time > max_time ? new_time : max_time;
}
} while(next_permutation(A.begin(), A.end()));
if (max_time == -1) {
return "";
} else {
std::ostringstream strstream;
strstream << std::setw(2) << std::setfill('0') << max_time / 60
<< ":" << std::setw(2) << std::setfill('0') << max_time % 60;
return strstream.str();
}
}
};
Alternative solution with regular expression:
This'd be difficult in C++ though:
class Solution:
def largestTimeFromDigits(self, A: List[int]) -> str:
for i in range(2359, -1, -1):
if i < 1000:
i = format(i, '04')
if int(re.findall(r'\d{2}$', str(i))[0]) > 59:
continue
l = list(map(int, str(i)))
for j in A:
if j in l:
l.remove(j)
if len(l) == 0:
hm = re.findall(r'.{2}', str(i))
return f'{hm[0]}:{hm[1]}'
return ""
Alternative solution using three loops in Java:
public final class Solution {
public static final String largestTimeFromDigits(
final int[] A
) {
String res = "";
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
for (int k = 0; k < 4; ++k) {
if (i == j || i == k || j == k) {
continue;
}
String hour = "" + A[i] + A[j];
String minute = "" + A[k] + A[6 - i - j - k];
String time = hour + ":" + minute;
if (
hour.compareTo("24") < 0 &&
minute.compareTo("60") < 0 &&
res.compareTo(time) < 0
) {
res = time;
}
}
}
}
return res;
}
}
References
For additional details, please see the Discussion Board where you can find plenty of well-explained accepted solutions with a variety of languages including low-complexity algorithms and asymptotic runtime/memory analysis1, 2.

How to get every possible string of n characters in c++?

I know it is possible to use n nested for loops to get the result. This however isn't very flexible. If I wanted to get every string of n+2 characters I would have to write an extra two for loops.
I'm pretty sure I should use a parameter called n_Letters and use some kind of recursion. Any ideas? This is how my code looks right now. It gives all the 3 character combinations.
#include <iostream>
#include <string>
using namespace std;
void StringMaker(){
for(int firstLetter = 97; firstLetter < 123; firstLetter++){
char a = firstLetter;
for(int secondLetter = 97; secondLetter < 123; secondLetter++){
char b = secondLetter;
for(int thirdLetter = 97; thirdLetter < 123; thirdLetter++){
char c = thirdLetter;
cout << a << b << c << endl;
}
}
}
}
int main() {
StringMaker(); // I could add a parameter n_Letters here
}
This is a simple tree traversal problem that can easily be solved using recursion. Using a counter (count) and accumulator (partial) recur on your function for each letter until count is zero then print partial.
#include <iostream>
#include <string>
void StringMaker(int count, std::string partial = "") {
if (count == 0) {
std::cout << partial << '\n';
}
else {
for (char letter = 'a'; letter <= 'z'; ++letter) {
StringMaker(count - 1, partial + letter);
}
}
}
int main() {
StringMaker(3);
return 0;
}
Edit: It seems their are some concerns with my answer regarding memory allocations. If it's a concern for you, consider this alternative solution. Increment the first character if it isn't 'z', otherwise set it to a and repeat with the the second character. Do this until the last character is set from z to a. This acts as a sort of base 26 counter with count digits.
#include <iostream>
#include <string>
void StringMaker(size_t count)
{
std::string data(count, 'a');
size_t i = 0;
do
{
std::cout << data << '\n';
for (i = 0; i < count; ++i)
{
auto & next_char = data[i];
if (next_char < 'z') {
++next_char;
break;
}
else {
next_char = 'a';
}
}
} while (i != count);
}
int main() {
StringMaker(3);
return 0;
}
Here is my just-for-fun solution:
void StringMaker(int n)
{
int base = ('z' - 'a' + 1);
std::string str(n, '\0');
for(int i = 0; i < int_pow(base, n); ++i)
{
for(int j = 0; j < n; ++j)
{
str[n - j - 1] = 'a' + i / int_pow(base, j) % base;
}
cout << str << '\n';
}
}
Suppose we have i written in numerical system with base 26 (from a to z), so increment i with n = 4 give us aaaa, aaab and so on

How to hash very large substrings quickly without collisions?

I have an app which as part of it finds all palindrome substrings of the input string. The input string can be up to 100,000 in length so the substrings can be very large. For example one input to the app resulted in over 300,000 substring palindromes over 10,000 in length. The app later counts all palindromes for equality and counts the unique ones by a hash that uses the standard hash that is done in the function that finds the palindromes. The hashes are stored in a vector and later counted for uniqueness in the app. The problems with such input and outptut conditions is the hashing for the very large substrings takes too long plus gets collisions in the hashes. So I was wondering if there is an algorithm (hash) that can quickly and uniquely hash a very large substring (preferably by index range for the substring for speed, but with accuracy for uniqueness). The hashing is done at the end of the function get_palins. The code is below.
#include <iostream>
#include <string>
#include <cstdlib>
#include <time.h>
#include <vector>
#include <algorithm>
#include <unordered_map>
#include <map>
#include <cstdio>
#include <cmath>
#include <ctgmath>
using namespace std;
#define MAX 100000
#define mod 1000000007
vector<long long> palins[MAX+5];
// Finds all palindromes for the string
void get_palins(string &s)
{
int N = s.length();
int i, j, k, // iterators
rp, // length of 'palindrome radius'
R[2][N+1]; // table for storing results (2 rows for odd- and even-length palindromes
s = "#" + s + "#"; // insert 'guards' to iterate easily over s
for(j = 0; j <= 1; j++)
{
R[j][0] = rp = 0; i = 1;
while(i <= N)
{
while(s[i - rp - 1] == s[i + j + rp]) { rp++; }
R[j][i] = rp;
k = 1;
while((R[j][i - k] != rp - k) && (k < rp))
{
R[j][i + k] = min(R[j][i - k],rp - k);
k++;
}
rp = max(rp - k,0);
i += k;
}
}
s = s.substr(1,N); // remove 'guards'
for(i = 1; i <= N; i++)
{
for(j = 0; j <= 1; j++)
for(rp = R[j][i]; rp > 0; rp--)
{
int begin = i - rp - 1;
int end_count = 2 * rp + j;
int end = begin + end_count - 1;
if (!(begin == 0 && end == N -1 ))
{
string ss = s.substr(begin, end_count);
long long hsh = hash<string>{}(ss);
palins[begin].push_back(hsh);
}
}
}
}
unordered_map<long long, int> palin_counts;
unordered_map<char, int> end_matches;
// Solve when at least 1 character in string is different
void solve_all_not_same(string &s)
{
int n = s.length();
long long count = 0;
get_palins(s);
long long palin_count = 0;
// Gets all palindromes into unordered map
for (int i = 0; i <= n; i++)
{
for (auto& it : palins[i])
{
if (palin_counts.find(it) == palin_counts.end())
{
palin_counts.insert({it,1});
}
else
{
palin_counts[it]++;
}
}
}
// From total palindromes, get proper border count
// minus end characters of substrings
for ( auto it = palin_counts.begin(); it != palin_counts.end(); ++it )
{
int top = it->second - 1;
palin_count += (top * (top + 1)) / 2;
palin_count %= mod;
}
// Store string character counts in unordered map
for (int i = 0; i <= n; i++)
{
char c = s[i];
//long long hsh = hash<char>{}(c);
if (end_matches[c] == 0)
end_matches[c] = 1;
else
end_matches[c]++;
}
// From substring end character matches, get proper border count
// for end characters of substrings
for ( auto it = end_matches.begin(); it != end_matches.end(); it++ )
{
int f = it->second - 1;
count += (f * (f + 1)) / 2;
}
cout << (count + palin_count) % mod << endl;
for (int i = 0; i < MAX+5; i++)
palins[i].clear();
}
int main()
{
string s;
cin >> s;
solve_all_not_same(s);
return 0;
}
Faced with problem X (find all palindrome substrings), you ask how to solve Y (hash substrings quickly): The XY Problem.
For palindrome detection, consider suffix arrays (one for the reverse of the input, or that appended to the input).
For fast hashes of overlapping strings, look into rolling hashes.

Given an integer N, print numbers from 1 to N in lexicographic order

I'm trying to print the numbers from 1 to N in lexicographic order, but I get a failed output. for the following input 100, I get the 100, but its shifted and it doesn't match with the expected output, there is a bug in my code but I can not retrace it.
class Solution {
public:
vector<int> lexicalOrder(int n) {
vector<int> result;
for(int i = 1; i <= 9; i ++){
int j = 1;
while( j <= n){
for(int m = 0; m < j ; ++ m){
if(m + j * i <= n){
result.push_back(m+j*i);
}
}
j *= 10;
}
}
return result;
}
};
Input:
100
Output:
[1,10,11,12,13,14,15,16,17,18,19,100,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,44,45,46,47,48,49,5,50,51,52,53,54,55,56,57,58,59,6,60,61,62,63,64,65,66,67,68,69,7,70,71,72,73,74,75,76,77,78,79,8,80,81,82,83,84,85,86,87,88,89,9,90,91,92,93,94,95,96,97,98,99]
Expected:
[1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,44,45,46,47
Think about when i=1,j=10 what will happen in
for(int m = 0; m < j ; ++ m){
if(m + j * i <= n){
result.push_back(m+j*i);
}
}
Yes,result will push_back 10(0+10*1),11(1+10*1),12(2+10*1)..
Here is a solution:
#include <iostream>
#include <vector>
#include <string>
std::vector<int> fun(int n)
{
std::vector<std::string> result;
for (int i = 1; i <= n; ++i) {
result.push_back(std::to_string(i));
}
std::sort(result.begin(),result.end());
std::vector<int> ret;
for (auto i : result) {
ret.push_back(std::stoi(i));
}
return ret;
}
int main(int argc, char *argv[])
{
std::vector<int> result = fun(100);
for (auto i : result) {
std::cout << i << ",";
}
std::cout << std::endl;
return 0;
}
You are looping through all 2 digit numbers starting with 1 before outputting the first 3 digit number, so your approach won't work.
One way to do this is to output the digits in base 11, padded out with leading spaces to the maximum number of digits, in this case 3. Output 0 as a space, 1 as 0, 2 as 1 etc. Reject any numbers that have any non-trailing spaces in this representation, or are greater than n when interpreted as a base 10 number. It should be possible to jump past multiple rejects at once, but that's an unnecessary optimization. Keep a count of the numbers you have output and stop when it reaches n. This will give you a lexicographical ordering in base 10.
Example implementation that uses O(1) space, where you don't have to generate and sort all the numbers up front before you can output the first one:
void oneToNLexicographical(int n)
{
if(n < 1) return;
// count max digits
int digits = 1, m = n, max_digit11 = 1, max_digit10 = 1;
while(m >= 10)
{
m /= 10; digits++; max_digit11 *= 11; max_digit10 *= 10;
}
int count = 0;
bool found_n = false;
// count up starting from max_digit * 2 (first valid value with no leading spaces)
for(int i = max_digit11 * 2; ; i++)
{
int val = 0, trailing_spaces = 0;
int place_val11 = max_digit11, place_val10 = max_digit10;
// bool valid_spaces = true;
for(int d = 0; d < digits; d++)
{
int base11digit = (i / place_val11) % 11;
if(base11digit == 0)
{
trailing_spaces++;
val /= 10;
}
else
{
// if we got a non-space after a space, it's invalid
// if(trailing_spaces > 0)
// {
// valid_spaces = false;
// break; // trailing spaces only
// }
val += (base11digit - 1) * place_val10;
}
place_val11 /= 11;
place_val10 /= 10;
}
// if(valid_spaces && (val <= n))
{
cout << val << ", ";
count++;
}
if(val == n)
{
found_n = true;
i += 10 - (i % 11); // skip to next number with one trailing space
}
// skip past invalid numbers:
// if there are multiple trailing spaces then the next run of numbers will have spaces in the middle - invalid
if(trailing_spaces > 1)
i += (int)pow(11, trailing_spaces - 1) - 1;
// if we have already output the max number, then all remaining numbers
// with the max number of digits will be greater than n
else if(found_n && (trailing_spaces == 1))
i += 10;
if(count == n)
break;
}
}
This skips past all invalid numbers, so it's not necessary to test valid_spaces before outputting each.
The inner loop can be removed by doing the base11 -> base 10 conversion using differences, making the algorithm O(N) - the inner while loop tends towards a constant:
int val = max_digit10;
for(int i = max_digit11 * 2; ; i++)
{
int trailing_spaces = 0, pow11 = 1, pow10 = 1;
int j = i;
while((j % 11) == 0)
{
trailing_spaces++;
pow11 *= 11;
pow10 *= 10;
j /= 11;
}
int output_val = val / pow10;
if(output_val <= n)
{
cout << output_val << ", ";
count++;
}
if(output_val == n)
found_n = true;
if(trailing_spaces > 1)
{
i += (pow11 / 11) - 1;
}
else if(found_n && (trailing_spaces == 1))
{
i += 10;
val += 10;
}
else if(trailing_spaces == 0)
val++;
if(count == n)
break;
}
Demonstration
The alternative, simpler approach is just to generate N strings from the numbers and sort them.
Maybe more general solution?
#include <vector>
#include <algorithm>
using namespace std;
// returns true is i1 < i2 according to lexical order
bool lexicalLess(int i1, int i2)
{
int base1 = 1;
int base2 = 1;
for (int c = i1/10; c > 0; c/=10) base1 *= 10;
for (int c = i2/10; c > 0; c/=10) base2 *= 10;
while (base1 > 0 && base2 > 0) {
int d1 = i1 / base1;
int d2 = i2 / base2;
if (d1 != d2) return (d1 < d2);
i1 %= base1;
i2 %= base2;
base1 /= 10;
base2 /= 10;
}
return (base1 < base2);
}
vector<int> lexicalOrder(int n) {
vector<int> result;
for (int i = 1; i <= n; ++i) result.push_back(i);
sort(result.begin(), result.end(), lexicalLess);
return result;
}
The other idea for lexicalLess(...) is to convert integers to string before comparision:
#include <vector>
#include <algorithm>
#include <string>
#include <boost/lexical_cast.hpp>
using namespace std;
// returns true is i1 < i2 according to lexical order
bool lexicalLess(int i1, int i2)
{
string s1 = boost::lexical_cast<string>(i1);
string s2 = boost::lexical_cast<string>(i2);
return (s1 , s2);
}
You need Boost to run the second version.
An easy one to implement is to convert numbers to string, them sort the array of strings with std::sort in algorithm header, that sorts strings in lexicographical order, then again turn numbers to integer
Make a vector of integers you want to sort lexicographically, name it numbers.
Make an other vector and populate it strings of numbers in the first vector. name it strs.
Sort strs array.4. Convert strings of strs vector to integers and put it in vectors
List item
#include <cstdlib>
#include <string>
#include <algorithm>
#include <vector>
#include <iostream>
using namespace std;
string int_to_string(int x){
string ret;
while(x > 0){
ret.push_back('0' + x % 10);
x /= 10;
}
reverse(ret.begin(), ret.end());
return ret;
}
int main(){
vector<int> ints;
ints.push_back(1);
ints.push_back(2);
ints.push_back(100);
vector<string> strs;
for(int i = 0; i < ints.size(); i++){
strs.push_back(int_to_string((ints[i])));
}
sort(strs.begin(), strs.end());
vector<int> sorted_ints;
for(int i = 0; i < strs.size(); i++){
sorted_ints.push_back(atoi(strs[i].c_str()));
}
for(int i = 0; i < sorted_ints.size(); i++){
cout<<sorted_ints[i]<<endl;
}
}
As the numbers are unique from 1 to n, you can use a set of size n and insert all of them into it and then print them out.
set will automatically keep them sorted in lexicographical order if you store the numbers as a string.
Here is the code, short and simple:
void lexicographicalOrder(int n){
set<string> ans;
for(int i = 1; i <= n; i++)
ans.insert(to_string(i));
for(auto ele : ans)
cout <<ele <<"\n";
}

How to apply longest common subsequence algorithm on large strings?

How to apply longest common subsequence on bigger strings (600000 characters). Is there any way to do it in DP? I have done this for shorter strings.
#include <iostream>
#include <algorithm>
#include <cstring>
#include <cstdio>
using namespace std;
int dp[1005][1005];
char a[1005], b[1005];
int lcs(int x,int y)
{
if(x==strlen(a)||y==strlen(b))
return 0;
if(dp[x][y]!=-1)
return dp[x][y];
else if(a[x]==b[y])
dp[x][y]=1+lcs(x+1,y+1);
else
dp[x][y]=max(lcs(x+1,y),lcs(x,y+1));
return dp[x][y];
}
int main()
{
while(gets(a)&&gets(b))
{
memset(dp,-1,sizeof(dp));
int ret=lcs(0,0);
printf("%d\n",ret);
}
}
You should take a look at this article which discusses the various design and implementation considerations. It is pointed out that you can look at Hirschberg's algorithm that finds optimal alignments between two strings using Edit distance (or Levenshtein distance). It can simplify the amount of space required on your behalf.
At the bottom you will find the "space-efficient LCS" defined thusly as a kind of mixed/pseudocode where m is the length of A and n is the length of B:
int lcs_length(char *A, char *B) {
// Allocate storage for one-dimensional arrays X and Y.
for (int i = m; i >= 0; i--) {
for (int j = n; j >= 0; j--) {
if (A[i] == '\0' || B[j] == '\0') {
X[j] = 0;
}
else if (A[i] == B[j]) {
X[j] = 1 + Y[j+1];
}
else {
X[j] = max(Y[j], X[j+1]);
}
}
// Copy contents of X into Y. Note that the "=" operator here
// might not do what you expect. If Y and X are pointers then
// it will assign the address and not copy the contents, so in
// that case you'd do a memcpy. But they could be a custom
// data type with an overridden "=" operator.
Y = X;
}
return X[0];
}
If you are interested here is a paper about LCS on strings from large alphabets. Find algorithm Approx2LCS in section 3.2.
First, use bottom-up approach of dynamic programming:
// #includes and using namespace std;
const int SIZE = 1000;
int dp[SIZE + 1][SIZE + 1];
char a[SIZE + 1], b[SIZE + 1];
int lcs_bottomUp(){
int strlenA = strlen(a), strlenB = strlen(b);
for(int y = 0; y <= strlenB; y++)
dp[strlenA][y] = 0;
for(int x = strlenA - 1; x >= 0; x--){
dp[x][strlenB] = 0;
for(int y = strlenB - 1; y >= 0; y--)
dp[x][y] = (a[x]==b[y]) ? 1 + dp[x+1][y+1] :
max(dp[x+1][y], dp[x][y+1]);
}
return dp[0][0];
}
int main(){
while(gets(a) && gets(b)){
printf("%d\n", lcs_bottomUp());
}
}
Observe that you only need to keep 2 rows (or columns), one for dp[x] and another for dp[x + 1]:
// #includes and using namespace std;
const int SIZE = 1000;
int dp_x[SIZE + 1]; // dp[x]
int dp_xp1[SIZE + 1]; // dp[x + 1]
char a[SIZE + 1], b[SIZE + 1];
int lcs_bottomUp_2row(){
int strlenA = strlen(a), strlenB = strlen(b);
for(int y = 0; y <= strlenB; y++)
dp_x[y] = 0; // assume x == strlenA
for(int x = strlenA - 1; x >= 0; x--){
// x has been decreased
memcpy(dp_xp1, dp_x, sizeof(dp_x)); // dp[x + 1] <- dp[x]
dp_x[strlenB] = 0;
for(int y = strlenB - 1; y >= 0 ; y--)
dp_x[y] = (a[x]==b[y]) ? 1 + dp_xp1[y+1] :
max(dp_xp1[y], dp_x[y+1]);
}
return dp_x[0]; // assume x == 0
}
int main(){
while(gets(a) && gets(b)){
printf("%d\n", lcs_bottomUp_2row());
}
}
Now it's safe to change SIZE to 600000.
As OP stated, the other answers are taking too much time, mainly due to the fact that for each outter iteration, 600000 characters are being copied.
To improve it, one could, instead of physically changing column, change it logically. Thus:
int spaceEfficientLCS(std::string a, std::string b){
int i, j, n = a.size(), m = b.size();
// Size of columns is based on the size of the biggest string
int maxLength = (n < m) ? m : n;
int costs1[maxLength+1], costs2[maxLength+1];
// Fill in data for costs columns
for (i = 0; i <= maxLength; i++){
costs1[i] = 0;
costs2[i] = 0;
}
// Choose columns in a way that the return value will be costs1[0]
int* mainCol, *secCol;
if (n%2){
mainCol = costs2;
secCol = costs1;
}
else{
mainCol = costs1;
secCol = costs2;
}
// Compute costs
for (i = n; i >= 0; i--){
for (j = m; j >= 0; j--){
if (a[i] == '\0' || b[j] == '\0') mainCol[j] = 0;
else mainCol[j] = (a[i] == b[j]) ? secCol[j+1] + 1 :
std::max(secCol[j], mainCol[j+1]);
}
// Switch logic column
int* aux = mainCol;
mainCol = secCol;
secCol = aux;
}
return costs1[0];
}