How to memoize or make recursive function with no apparent pattern? - c++

Consider the following code, it was done for the Codeforces Round #731 (Div. 3), problem B https://codeforces.com/contest/1547/problem/B
In short, you are given a string and you are supposed to check if it's possible to create that string by sequentially adding letters in alphabetical order in either the front to the back of a string that starts empty.
Ex. the string "bac", you would first make the empty string be "a", then it can be either "ba" or "ab", then we try again and we get that based on the last result it now can be "bac", "cba", "abc", "cab". We get that is possible so we return true.
We can only do this procedure up to 26 times.
My code would make a tree, grabbing a base string as the head, and two children nodes, one with the letter added to the front and one with the letter added to the back, if neither worked out, then I would repeat it again with the children nodes.
I rewrote my solution and sent a completely different one, but I still wanted to know if there was a way to optimize so it could actually be executed. The code works if n is around 14 or 15, it stutters a little bit but can finish; but once it goes to 20 it will not even finish.
#include <iostream>
#include <string>
using namespace std;
bool solve(string fs,string s = "", int n = 0){
if(s == fs){
return true;
}
if(n > 26 || s.size() > fs.size()){
return false;
}
if(solve(fs,s+(char)(96+n+1),n+1) ||solve(fs,(char)(96+n+1)+s,n+1)){
return true;
}
return false;
}
int main(){
int t;cin>>t;
for(int i = 0; i < t; i++){
string p;
cin>>p;
if(solve(p)){
cout<<"YES"<<endl;
}
else{
cout<<"NO"<<endl;
}
}
}```

You are doing brute force approach which is time complexity of n * 2^n. And it looks pretty reasonable to fail(TLE) when n is around 20 (taking into account that t is up to 10000)
I cannot come up with a way for efficient memoization, however this problem can easily be solved with greedy approach. You don't have to check all the combinations. Check out the official editorial

Related

undesired output of the Palindrome program using C++

So I started learning C++ two weeks ago and I want to build a program that checks if a string is a palindrome or not.
I tried different ways including the str1==str2 method in the following way:
#include<iostream>
#include<string>
using namespace std;
string empty;
string word;
bool inverse(string word)
{
for (int i=0;i<=word.length();i++)
{
empty+=word[word.length()-i];
}
return empty==word;
}
int main()
{
cout<<inverse("civic");
}
The output is always 0
Second way: the str1.compare(str2) method
#include<iostream>
#include<string>
using namespace std;
string empty;
string word;
bool inverse(string word)
{
for (int i=0;i<=word.length();i++)
{empty+=word[word.length()-i];}
if (word.compare(empty))
return true;
else
return false;
}
int main()
{
if (inverse(word)==true)
cout<<"is a palindrome";
else
cout<<"is not a palindrome";
cout<<inverse("ano");
cout<<inverse("madam");
}
the output is always: is palindrome1 (with 1 or two ones at the end of "palindrome")
even if the string is not a palindrome.
please explain to me what mistakes I made and how I can correct them.
Also If I want to make my program handle a string that has white space in it, how can I do it?
There are a couple of problems
Your code is looping too many times. For example a word of three letters should loop three times, but your code loops for 4 (i=0, i=1, i=2, and i=3). To fix this you need to change the final condition to use < instead of <=.
You are computing the symmetrical index with the wrong formula. If for example you have a word of length three the letters be word[0], word[1] and word[2]. However your code uses length - i and for i=0 this will use word[3] that is outside the allowed limits for the word. You need to do the indexing using as formula length - 1 - i instead of length - i.
Both of these errors are quite common in programming and they're called "off-by-one" errors. Remember to always double-check the boundary conditions when you write code so that you can keep this kind of error away from your programs.
For first one you need to change
for (int i=0;i<=word.length();i++)
{empty+=word[word.length()-i];}
to this
for (int i=0;i<word.length();i++)
{empty+=word[word.length()-(i+1)];}
Your program's behavior will become undefined after this line:
for (int i = 0;i <= word.length(); i++)
empty += word[word.length() - i];
Since length is always one plus the last element (Since the first index is zero), when i is 0, then: word[word.length()] will give you the element after the last element, which is not possible and thus your program will invoke undefined behavior since C/C++... word[word.length()] is also possible when i itself becomes word.length(), so change <= (less than or equal to) to < (less than)
So, it should be:
for (int i = 0;i < word.length(); i++)
empty += word[word.length() - 1 - i];

Need suggestion to improve speed for word break (dynamic programming)

The problem is: Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "hithere",
dict = ["hi", "there"].
Return true because "hithere" can be segmented as "leet code".
My implementation is as below. This code is ok for normal cases. However, it suffers a lot for input like:
s = "aaaaaaaaaaaaaaaaaaaaaaab", dict = {"aa", "aaaaaa", "aaaaaaaa"}.
I want to memorize the processed substrings, however, I cannot done it right. Any suggestion on how to improve? Thanks a lot!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
for(int i(0); i<len; i++) {
string tmp = s.substr(0, i+1);
if((wordDict.find(tmp)!=wordDict.end())
&& (wordBreak(s.substr(i+1), wordDict)) )
return true;
}
return false;
}
};
It's logically a two-step process. Find all dictionary words within the input, consider the found positions (begin/end pairs), and then see if those words cover the whole input.
So you'd get for your example
aa: {0,2}, {1,3}, {2,4}, ... {20,22}
aaaaaa: {0,6}, {1,7}, ... {16,22}
aaaaaaaa: {0,8}, {1,9} ... {14,22}
This is a graph, with nodes 0-23 and a bunch of edges. But node 23 b is entirely unreachable - no incoming edge. This is now a simple graph theory problem
Finding all places where dictionary words occur is pretty easy, if your dictionary is organized as a trie. But even an std::map is usable, thanks to its equal_range method. You have what appears to be an O(N*N) nested loop for begin and end positions, with O(log N) lookup of each word. But you can quickly determine if s.substr(begin,end) is a still a viable prefix, and what dictionary words remain with that prefix.
Also note that you can build the graph lazily. Staring at begin=0 you find edges {0,2}, {0,6} and {0,8}. (And no others). You can now search nodes 2, 6 and 8. You even have a good algorithm - A* - that suggests you try node 8 first (reachable in just 1 edge). Thus, you'll find nodes {8,10}, {8,14} and {8,16} etc. As you see, you'll never need to build the part of the graph that contains {1,3} as it's simply unreachable.
Using graph theory, it's easy to see why your brute-force method breaks down. You arrive at node 8 (aaaaaaaa.aaaaaaaaaaaaaab) repeatedly, and each time search the subgraph from there on.
A further optimization is to run bidirectional A*. This would give you a very fast solution. At the second half of the first step, you look for edges leading to 23, b. As none exist, you immediately know that node {23} is isolated.
In your code, you are not using dynamic programming because you are not remembering the subproblems that you have already solved.
You can enable this remembering, for example, by storing the results based on the starting position of the string s within the original string, or even based on its length (because anyway the strings you are working with are suffixes of the original string, and therefore its length uniquely identifies it). Then, in the beginning of your wordBreak function, just check whether such length has already been processed and, if it has, do not rerun the computations, just return the stored value. Otherwise, run computations and store the result.
Note also that your approach with unordered_set will not allow you to obtain the fastest solution. The fastest solution that I can think of is O(N^2) by storing all the words in a trie (not in a map!) and following this trie as you walk along the given string. This achieves O(1) per loop iteration not counting the recursion call.
Thanks for all the comments. I changed my previous solution to the implementation below. At this point, I didn't explore to optimize on the dictionary, but those insights are very valuable and are very much appreciated.
For the current implementation, do you think it can be further improved? Thanks!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
if(wordDict.size()==0) return false;
vector<bool> dq (len+1,false);
dq[0] = true;
for(int i(0); i<len; i++) {// start point
if(dq[i]) {
for(int j(1); j<=len-i; j++) {// length of substring, 1:len
if(!dq[i+j]) {
auto pos = wordDict.find(s.substr(i, j));
dq[i+j] = dq[i+j] || (pos!=wordDict.end());
}
}
}
if(dq[len]) return true;
}
return false;
}
};
Try the following:
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict)
{
for (auto w : wordDict)
{
auto pos = s.find(w);
if (pos != string::npos)
{
if (wordBreak(s.substr(0, pos), wordDict) &&
wordBreak(s.substr(pos + w.size()), wordDict))
return true;
}
}
return false;
}
};
Essentially one you find a match remove the matching part from the input string and so continue testing on a smaller input.

Recursive String Transformations

EDIT: I've made the main change of using iterators to keep track of successive positions in the bit and character strings and pass the latter by const ref. Now, when I copy the sample inputs onto themselves multiple times to test the clock, everything finishes within 10 seconds for really long bit and character strings and even up to 50 lines of sample input. But, still when I submit, CodeEval says the process was aborted after 10 seconds. As I mention, they don't share their input so now that "extensions" of the sample input work, I'm not sure how to proceed. Any thoughts on an additional improvement to increase my recursive performance would be greatly appreciated.
NOTE: Memoization was a good suggestion but I could not figure out how to implement it in this case since I'm not sure how to store the bit-to-char correlation in a static look-up table. The only thing I thought of was to convert the bit values to their corresponding integer but that risks integer overflow for long bit strings and seems like it would take too long to compute. Further suggestions for memoization here would be greatly appreciated as well.
This is actually one of the moderate CodeEval challenges. They don't share the sample input or output for moderate challenges but the output "fail error" simply says "aborted after 10 seconds," so my code is getting hung up somewhere.
The assignment is simple enough. You take a filepath as the single command-line argument. Each line of the file will contain a sequence of 0s and 1s and a sequence of As and Bs, separated by a white space. You are to determine whether the binary sequence can be transformed into the letter sequence according to the following two rules:
1) Each 0 can be converted to any non-empty sequence of As (e.g, 'A', 'AA', 'AAA', etc.)
2) Each 1 can be converted to any non-empty sequences of As OR Bs (e.g., 'A', 'AA', etc., or 'B', 'BB', etc) (but not a mixture of the letters)
The constraints are to process up to 50 lines from the file and that the length of the binary sequence is in [1,150] and that of the letter sequence is in [1,1000].
The most obvious starting algorithm is to do this recursively. What I came up with was for each bit, collapse the entire next allowed group of characters first, test the shortened bit and character strings. If it fails, add back one character from the killed character group at a time and call again.
Here is my complete code. I removed cmd-line argument error checking for brevity.
#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
using namespace std;
//typedefs
typedef string::const_iterator str_it;
//declarations
//use const ref and iterators to save time on copying and erasing
bool TransformLine(const string & bits, str_it bits_front, const string & chars, str_it chars_front);
int main(int argc, char* argv[])
{
//check there are at least two command line arguments: binary executable and file name
//ignore additional arguments
if(argc < 2)
{
cout << "Invalid command line argument. No input file name provided." << "\n"
<< "Goodybe...";
return -1;
}
//create input stream and open file
ifstream in;
in.open(argv[1], ios::in);
while(!in.is_open())
{
char* name;
cout << "Invalid file name. Please enter file name: ";
cin >> name;
in.open(name, ios::in);
}
//variables
string line_bits, line_chars;
//reserve space up to constraints to reduce resizing time later
line_bits.reserve(150);
line_chars.reserve(1000);
int line = 0;
//loop over lines (<=50 by constraint, ignore the rest)
while((in >> line_bits >> line_chars) && (line < 50))
{
line++;
//impose bit and char constraints
if(line_bits.length() > 150 ||
line_chars.length() > 1000)
continue; //skip this line
(TransformLine(line_bits, line_bits.begin(), line_chars, line_chars.begin()) == true) ? (cout << "Yes\n") : (cout << "No\n");
}
//close file
in.close();
return 0;
}
bool TransformLine(const string & bits, str_it bits_front, const string & chars, str_it chars_front)
{
//using iterators so store current length as local const
//can make these const because they're not altered here
int bits_length = distance(bits_front, bits.end());
int chars_length = distance(chars_front, chars.end());
//check success rule
if(bits_length == 0 && chars_length == 0)
return true;
//Check fail rules:
//1. next bit is 0 but next char is B
//2. bits length is zero (but char is not, by previous if)
//3. char length is zero (but bits length is not, by previous if)
if((*bits_front == '0' && *chars_front == 'B') ||
bits_length == 0 ||
chars_length == 0)
return false;
//we now know that chars_length != 0 => chars_front != chars.end()
//kill a bit and then call recursively with each possible reduction of front char group
bits_length = distance(++bits_front, bits.end());
//current char group tracker
const char curr_char_type = *chars_front; //use const so compiler can optimize
int curr_pos = distance(chars.begin(), chars_front); //position of current front in char string
//since chars are 0-indexed, the following is also length of current char group
//start searching from curr_pos and length is relative to curr_pos so subtract it!!!
int curr_group_length = chars.find_first_not_of(curr_char_type, curr_pos)-curr_pos;
//make sure this isn't the last group!
if(curr_group_length < 0 || curr_group_length > chars_length)
curr_group_length = chars_length; //distance to end is precisely distance(chars_front, chars.end()) = chars_length
//kill the curr_char_group
//if curr_group_length = char_length then this will make chars_front = chars.end()
//and this will mean that chars_length will be 0 on next recurssive call.
chars_front += curr_group_length;
curr_pos = distance(chars.begin(), chars_front);
//call recursively, adding back a char from the current group until 1 less than starting point
int added_back = 0;
while(added_back < curr_group_length)
{
if(TransformLine(bits, bits_front, chars, chars_front))
return true;
//insert back one char from the current group
else
{
added_back++;
chars_front--; //represents adding back one character from the group
}
}
//if here then all recursive checks failed so initial must fail
return false;
}
They give the following test cases, which my code solves correctly:
Sample input:
1| 1010 AAAAABBBBAAAA
2| 00 AAAAAA
3| 01001110 AAAABAAABBBBBBAAAAAAA
4| 1100110 BBAABABBA
Correct output:
1| Yes
2| Yes
3| Yes
4| No
Since a transformation is possible if and only if copies of it are, I tried just copying each binary and letter sequences onto itself various times and seeing how the clock goes. Even for very long bit and character strings and many lines it has finished in under 10 seconds.
My question is: since CodeEval is still saying it is running longer than 10 seconds but they don't share their input, does anyone have any further suggestions to improve the performance of this recursion? Or maybe a totally different approach?
Thank you in advance for your help!
Here's what I found:
Pass by constant reference
Strings and other large data structures should be passed by constant reference.
This allows the compiler to pass a pointer to the original object, rather than making a copy of the data structure.
Call functions once, save result
You are calling bits.length() twice. You should call it once and save the result in a constant variable. This allows you to check the status again without calling the function.
Function calls are expensive for time critical programs.
Use constant variables
If you are not going to modify a variable after assignment, use the const in the declaration:
const char curr_char_type = chars[0];
The const allows compilers to perform higher order optimization and provides safety checks.
Change data structures
Since you are perform inserts maybe in the middle of a string, you should use a different data structure for the characters. The std::string data type may need to reallocate after an insertion AND move the letters further down. Insertion is faster with a std::list<char> because a linked list only swaps pointers. There may be a trade off because a linked list needs to dynamically allocate memory for each character.
Reserve space in your strings
When you create the destination strings, you should use a constructor that preallocates or reserves room for the largest size string. This will prevent the std::string from reallocating. Reallocations are expensive.
Don't erase
Do you really need to erase characters in the string?
By using starting and ending indices, you overwrite existing letters without have to erase the entire string.
Partial erasures are expensive. Complete erasures are not.
For more assistance, post to Code Review at StackExchange.
This is a classic recursion problem. However, a naive implementation of the recursion would lead to an exponential number of re-evaluations of a previously computed function value. Using a simpler example for illustration, compare the runtime of the following two functions for a reasonably large N. Lets not worry about the int overflowing.
int RecursiveFib(int N)
{
if(N<=1)
return 1;
return RecursiveFib(N-1) + RecursiveFib(N-2);
}
int IterativeFib(int N)
{
if(N<=1)
return 1;
int a_0 = 1, a_1 = 1;
for(int i=2;i<=N;i++)
{
int temp = a_1;
a_1 += a_0;
a_0 = temp;
}
return a_1;
}
You would need to follow a similar approach here. There are two common ways of approaching the problem - dynamic programming and memoization. Memoization is the easiest way of modifying your approach. Below is a memoized fibonacci implementation to illustrate how your implementation can be speeded up.
int MemoFib(int N)
{
static vector<int> memo(N, -1);
if(N<=1)
return 1;
int& res = memo[N];
if(res!=-1)
return res;
return res = MemoFib(N-1) + MemoFib(N-2);
}
Your failure message is "Aborted after 10 seconds" -- implying that the program was working fine as far as it went, but it took too long. This is understandable, given that your recursive program takes exponentially more time for longer input strings -- it works fine for the short (2-8 digit) strings, but will take a huge amount of time for 100+ digit strings (which the test allows for). To see how your running time goes up, you should construct yourself some longer test inputs and see how long they take to run. Try things like
0000000011111111 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBAAAAAAAA
00000000111111110000000011111111 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBAAAAAAAA
and longer. You need to be able to handle up to 150 digits and 1000 letters.
At CodeEval, you can submit a "solution" that just outputs what the input is, and do that to gather their test set. They may have variations so you may wish to submit it a few times to gather more samples. Some of them are too difficult to solve manually though... the ones you can solve manually will also run very quickly at CodeEval too, even with inefficient solutions, so there's that to consider.
Anyway, I did this same problem at CodeEval (using VB of all things), and my solution recursively looked for the "next index" of both A and B depending on what the "current" index is for where I was in a translation (after checking stoppage conditions first thing in the recursive method). I did not use memoization but that might've helped speed it up even more.
PS, I have not run your code, but it does seem curious that the recursive method contains a while loop within which the recursive method is called... since it's already recursive and should therefore encompass every scenario, is that while() loop necessary?

extracting data between having { } any other method with good run time

I have this piece of code where am extracting the data between the { }, and this takes me around O(n) is there any other method which is more effcient
#include <iostream>
#include <string>
#include <stdio.h>
int main()
{
const char *blah = "[{post:banb {bbbbbbbb}: ananmsdb},{dsgdf{9090909090}fdgsdfg}";
std::string op;
unsigned int i = 0;
int im = 0;
int found = 0;
while(strlen(blah) != i){
if(blah[i] == '{'){
found = 1;
// copy what ever u got
op+=blah[i];
im++;
}
else if(blah[i] == '}'){
//copy wat ever u got
op+=blah[i];
im--;
}
if(found ==1){
//copy wat ever u got.
op+=blah[i];
}
if(found ==1 && im == 0) {
found = 0;
cout << op <<endl;
op.clear() ;
// u have found the full one post so send it for processing.
}
i++;
}
}
output :post:banb {bbbbbbbb}: ananmsdb
dsgdf{9090909090}fdgsdfg
No. You can use library functions to make this code shorter, but it will never be more efficient than O(n) where n is the length of the input string, since you need to examine each character at least once because each one could potentially be a token that you need to extract.
I don't think you can improve on O(n) for the underlying algorithm, but you can probably improve your implementation.
Currently your implementation may well be O(n^2) rather than O(n), as strlen() could be called on every iteration (unless your compiler is particularly smart). You should probably cache the call to strlen() explicitly, e.g. change:
while(strlen(blah) != i){
...
to:
const int len = strlen(blah);
while(len != i){
...
As everybody else has said there is no way to make this faster as you have to check every character incase you miss one.
But, this being said if perhaps you already knew some information about the data aka how many pairs of '{' and'}' then I can onlythink of one way(besides sorting but that brings it back to O(n) + etc).
This would be to choose indexes between 0 - x and randomly check at spots if you can find the '{' or '}'. Just to be clear this will only work if you know how many '{' and '}'s are already in the set of data.
Edit: Addtionally the copy you provided (just from using notepad) when it hits the first cout<< it should output "{{post:banb {{bbbbbbbb}}: ananmsdb}}", as its creating additional {'s and }'s.

How are recursive backtracking returns handled with the void type

To generalize this question I am borrowing material from a Zelenski CS class handout. And, it is relevant to my specific question since I took the class from a different instructor several years ago and learned this approach to C++. The handout is here. My understanding of C++ is low since I use it occasionally. Basically, the few times I have needed to write a program I return to the class material, found something similar and started from there.
In this example (page 4) Julie is looking for a word using a recursive algorithm in a string function. To reduce the number of recursive calls she added a decision point bool containsWord().
string FindWord(string soFar, string rest, Lexicon &lex)
{
if (rest.empty()) {
return (lex.containsWord(soFar)? soFar : "");
} else {
for (int i = 0; i < rest.length(); i++) {
string remain = rest.substr(0, i) + rest.substr(i+1);
string found = FindWord(soFar + rest[i], remain, lex);
if (!found.empty()) return found;
}
}
return ""; // empty string indicates failure
}
To add flexibility to how this algorithm is used, can this be implemented as a void type?
void FindWord(string soFar, string rest, Lexicon &lex, Set::StructT &words)
{
if (rest.empty()) {
if (lex.containsWord(soFar)) //this is a bool
updateSet(soFar, words); //add soFar to referenced Set struct tree
} else {
for (int i = 0; i < rest.length(); i++) {
string remain = rest.substr(0, i) + rest.substr(i+1);
return FindWord(soFar + rest[i], remain, lex, words); //<-this is where I am confused conceptually
}
}
return; // indicates failure
}
And, how about without the returns
void FindWord(string soFar, string rest, Lexicon &lex, Set::StructT &words)
{
if (rest.empty()) {
if (lex.containsWord(soFar))
updateSet(soFar, words); //add soFar to Set memory tree
} else {
for (int i = 0; i < rest.length(); i++) {
string remain = rest.substr(0, i) + rest.substr(i+1);
FindWord(soFar + rest[i], remain, lex, words); //<-this is where I am confused conceptually
}
}
}
The first code fragment will try all permutations of rest, appended to the initial value of soFar (probably an empty string?). It will stop on the first word found that is in lex. That word will be returned immediately as it is found, and the search will be cut short at that point. If none were in lex, empty string will be returned eventually, when all the for loops have ran their course to the end.
The second fragment will only try one word: the concatenation of initial soFar and rest strings. If that concatenated string is in lex, it will call updateSet with it. Then it will return, indicating failure. No further search will be performed, because the return from inside the for loop is unconditional.
So these two functions are completely different. To make the second code behave like the first, you need it to return something else to indicate a success, and only return from within the for loop when FindWord call return value indicates a success. Obviously, void can not be used to signal failure and success. At the very least, you need to return bool value for that.
And without the returns your third code will perform an exhaustive search. Every possible permutation of initial string value of rest will be tried for, to find in the lexicon.
You can visualize what's going on like this:
FindWord: soFar="" rest=...........
for: i=... rest[i]=a
call findWord
FindWord: soFar=a rest=..........
for: i=... rest[i]=b
call findWord
FindWord: soFar=ab rest=.........
for: i=... rest[i]=c
call findWord
if return, the loop will be cut short
if not, the loop continues and next i will be tried
......
FindWord: soFar=abcdefgh... rest=z
for: i=0 rest[0]=z
call findWord
FindWord: soFar=abcdefgh...z rest="" // base case
// for: i=N/A rest[i]=N/A
if soFar is_in lex // base case
then do_some and return soFar OR success
else return "" OR failure
Each time the base case is reached (rest is empty) we have n+1 FindWord call frames on the stack, for n letters in the initial rest string.
Each time we hit the bottom, we've picked all the letters from rest. The check is performed to see whether it's in lex, and control returns back one level up.
So if there are no returns, each for loop will run to its end. If the return is unconditional, only one permutation will be tried - the trivial one. But if the return is conditional, the whole thing will stop only on first success.