String matching algorithm trying to correct it - c++

I'm trying to do string matching algorithm a brute force method. but The algorithm is not working correctly, I get an out of bound index error.
here is my algorithm
int main() {
string s = "NOBODY_NOTICED_HIM";
string pattern="NOT";
int index = 0;
for (int i = 0; i < s.size();)
{
for (int j = 0; j < pattern.size();)
{
if(s[index] == pattern[j])
{
j++;
i++;
}
else
{
index = i;
j = 0;
}
}
}
cout<<index<<endl;
return 0;
}
FIXED VERSION
I fixed the out of bound exception. I don't know if the algorithm will work with different strings
int main() {
string s = "NOBODY_NOTICED_HIM";
string pattern="NOT";
int index = 0;
int i = 0;
while( i < s.size())
{
i++;
for (int j = 0; j < pattern.size();)
{
if(s[index] == pattern[j])
{
index++;
j++;
cout<<"i is " <<i << " j is "<<j <<endl;
}
else
{
index = i;
break;
}
}
}
cout<<i<<endl;
return 0;
}

Because the inner for loop has a condition to loop while j is less than pattern.size() but you are also incrementing i inside the body. When i goes out of bounds of s.size() then index also goes out of bounds and you'd get an OutOfBounds error.
The brute force method has to test the pattern with every possible subsequence. The main condition is the length, which has to be the same. All subsequence from s are:
['NOB', 'OBO', 'BOD', 'ODY', 'DY_', 'Y_N', 'NO', 'NOT', 'OTI', 'TIC',
'ICE', 'CED', 'ED', 'D_H', '_HI', 'HIM']
There are many ways to do it, you can do it char by char, or by using string operations like taking a substring. Both are nice excercises for learning.
Starting at zero in the s string you take the first three chars, compare to the pattern, and if equal you give the answer. Otherwise you move on to the char starting at one, etc.

Related

How is this line returning the length of the array in recursion?

I am trying to understand this recursion using the debugger and trying to understand it step by step the main.The debugger shows the smallAns returns the size of the array I can't understand how this smallAns is returning the size of the array input[].can anyone explain this
#include<iostream>
using namespace std;
int subsequences(char input[], int startIndex,char output[][50]){
if(input[startIndex] == '\0'){
output[0][0] = '\0';
return 1;
}
int smallAns = subsequences(input, startIndex+1, output);
for(int i = smallAns; i < 2*smallAns; i++){
int row = i - smallAns;
output[i][0] = input[startIndex];
int j = 0;
for(; output[row][j] != '\0'; j++){
output[i][j + 1] = output[row][j];
}
output[i][j + 1] = '\0';
}
return 2*smallAns;
}
int main(){
char input[] = "abc";
char output[100][50];
int ans = subsequences(input, 0, output);
for(int i = 0; i < ans; i++){
for(int j = 0; output[i][j] != '\0'; j++){
cout << output[i][j];
}
cout << endl;
}
}
Here's what the algorithm is doing:
Start at the end, with the empty subsequence (or "\0"). You have 1 subsequence.
Look at the last character not yet considered. For all the subsequences you have found, you can either add this last character, or don't. Therefore you have doubled the number of subsequences.
Repeat.
Therefore, 2 * smallAns means "Take the number of subsequences found in the lower recursive call, and double it." And this makes sense after you know how it was implemented. Thus the importance of comments and documentation in code. :)

DP approach for minimum number of characters that should be added to make a string palindrome?

Find the minimum number of characters needed to make S a palindrome.For instance, if S = "fft", the string should be changed to the string "tfft", adding only 1 character.
Now, I used the dp approach for solving this problem which is as follows:
Let the given input string be S[1.....L]. Then for any substring S[i....j] of the input string, we can find the minimum insertions as:
min_insertions(S[i+1 ...... j-1]) [if S[i] is equal to S[j]]
min(min_insertions(S[i+1......j]), min_insertions(S[i....j-1])) + 1
I coded this as follows:
#include <iostream>
using namespace std;
int dp[100][100];
int main (void)
{
int n,i,j;
char arr[100];
cin>>arr;
n = strlen(arr);
//cout<<"You entered the string as "<<arr<<"\n";
for (i = 0; i < n; i++ )
dp[i][0] = 0;
for ( i = 0; i < n; i++ )
{
for ( j = 0; j < n; j++ )
{
if (arr[i] == arr[j])
dp[i][j] = dp[i+1][j-1];
else
dp[i][j] = min(dp[i+1][j],dp[i][j-1])+1;
}
// cout<<dp[0][n-1];
}
cout<<dp[0][n-1]<<"\n";
return 0;
}
However, this gives a wrong value. Why is it happening? For example, if I enter the string as abc, it outputs 1. What's wrong with this? Is there anything wrong with my logic?
You are not filling your array dp in the right order. For instance dp[0][2] will ask for dp[1][2] which has not been computed yet.
So the logic should be to have an assigning loop of the form :
for (int i = 0; i <n; i++) {
for (int h = 0; h < n-i; h++) {
dp[i][i+h] = .. // your part
}
}
You also need to be more careful about the case where h = 0 above, where you don't want to call dp[i+1][i] or dp[i][i-1], and h=1, arr[i]=arr[i+1] where you don't want dp[i+1][i] getting called.

c++: string.replace inserting too many characters

I'm trying to step through a given string with a for loop, replacing one character per iteration with a character from a vector[char].
Problem is that the replace inserts the entire vector-k instead of the character at place k and I cannot figure out what I've done wrong.
Any and all help is appreciated.
(alphabet is a const string a-z, FirstWord is the given string).
vector<char> VectorAlphabet;
for (int i=0; i<alphabet.length(); ++i)
{
VectorAlphabet.push_back(alphabet.at(i));
}
for (int i = 0; i < FirstWord.length(); ++i )
{
for (int k = 0; k < VectorAlphabet.size(); ++k)
{
string TempWord = FirstWord;
TempWord.replace(i, 1, &VectorAlphabet[k]);
if (CheckForValidWord(TempWord, WordSet))
{
if(CheckForDuplicateChain(TempWord, DuplicateWordSet))
{
DuplicateWordSet.insert(TempWord);
stack<string> TempStack = WordStack;
TempStack.push(TempWord);
WordQueue.push(TempStack);
}
}
}
}
e.g TempWord = tempword, then after TempWord.replace() on the first iteration it is abcde...zempWord. and not aempword. On the second to last iteration of the second for loop it is yzempword.
What have I missed?
Problem solved, thanks to Dieter Lücking.
Looking closer at the string.replace reference, I see that I tried to use a replace which takes strings as the input, and then the vector[char] is interpreted as a c-string, starting from the k-position.
By using the fill-version of replace the vector position is correctly used as a char instead.
New code is:
for (int i = 0; i < FirstWord.length(); ++i )
{
for (int k = 0; k < VectorAlphabet.size(); ++k)
{
string TempWord = WordStack.top();
// Change:
TempWord.replace(i, 1, 1, VectorAlphabet[k]);
if (CheckForValidWord(TempWord, WordSet))
{
if(CheckForDuplicateChain(TempWord, DuplicateWordSet))
{
DuplicateWordSet.insert(TempWord);
stack<string> TempStack = WordStack;
TempStack.push(TempWord);
WordQueue.push(TempStack);
}
}
}
}

Getting run time error working with vectors

So, I read the problem 4.5 from Accelerated C++, and interpreted it rather wrong. I wrote a program which is supposed to display counts of a word in string. However, I have probably done something very stupid, and very wrong. I can't figure it out.
Here's the code: http://ideone.com/87zA7E.
Stackoverflow says links to ideone.com must be accompanied by code. Instead of pasting the all of it, I will just paste the function which I think is most likely at fault:
vector<str_info> words(const vector<string>& s) {
vector<str_info> rex;
str_info record;
typedef vector<string>::size_type str_sz;
str_sz i = 0;
while (i != s.size()) {
record.str = s[i];
record.count = 0;
++i; //edit
for (str_sz j = 0; j != s.size(); ++j) {
if (compare(record, s[j]))
++record.count;
}
for (vector<str_info>::size_type k = 0; k != s.size(); ++k) {
if (!compare(record, rex[k].str))
rex.push_back(record);
}
}
return rex;
}
One problem is that you have this:
str_sz i = 0;
while (i != s.size()) {
but you never increment i, leading to an endless loop. Inside of that loop, you're pushing elements into vector rex. A vector cannot contain an infinite number of elements.
Also, you are trying to access:
rex[k].str
in
for (vector<str_info>::size_type k = 0; k != s.size(); ++k) {
if (!compare(record, rex[k].str)) // rex is empty in the beginning!!
rex.push_back(record);
}
But you do not know whether rex has k+1 elements in it.
EDIT: Change your code to:
while (i != s.size()) {
// read new string into a record (initial count should be one).
str_info record;
record.str = s[i];
record.count = 1;
// check if this string already exists in rex
bool found = false;
for (vector<str_info>::size_type k = 0; k < rex.size(); ++k) {
if ( record.str == rex[k].str ) {
rex[k].count++;
found = true;
break;
}
}
i++;
if ( found )
continue;
// if it is not found then push_back to rex
rex.push_back( record );
}

Boyer Moore k-mismatches algorithm fails

I've done a program for string comparison with one mismatch at a programming website. It gives me wrong answer. I've working on it extensively but, I couldn't find testcases where my code fails. Can somebody provide me test cases where my code fails. I've done the comparison using Boyer Moore Horspool k-mismatches algorithm as it's the fastest searching algorithm
The code is as such
int BMSearch_k(string text, string pattern, int tlen, int mlen,int pos)
{
int i, j=0,ready[256],skip2[256][mlen-1],neq;
for(i=0; i<256; ++i) ready[i] = mlen;
for(int a=0; a<256;a++) {
for(i = mlen;i>mlen-k;i--)
skip2[i][a] = mlen;
}
for(i = mlen-2;i>=1;i--) {
for(j=ready[pattern[i]]-1;j>=max(i,mlen-k);j--)
skip2[j][pattern[i]] = j-i;
ready[pattern[i]] = max(i,mlen-k);
}
j = mlen-1+pos;
//cout<<"\n--jafffa--\n"<<pos<<"+"<<mlen<<"="<<j<<endl;
while(j<tlen+k) {
//cout<<"\t--"<<j<<endl;
int h = j;
i=mlen-1;
int neq=0,shift = mlen-k;
while(i>=0&&neq<=k) {
//cout<<"\t--"<<i<<endl;
if(i>=mlen-k)
shift = min(shift,skip2[i][text[h]]);
if(text[h]!= pattern[i])
neq++;
i--;
h--;
}
if(neq<=k)
return j-1;
j += shift;
}
return -1;
}
You aren't initialising your arrays correctly,
int i, j=0,ready[256],skip2[256][mlen-1],neq;
for(i=0; i<256; ++i) ready[i] = mlen;
for(int a=0; a<256;a++) {
for(i = mlen;i>mlen-k;i--)
skip2[i][a] = mlen;
}
On the one hand, you declare skip2 as a 256×(mlen-1) array, on the other hand, you fill it as a (mlen+1)×256 array.
In the next loop,
for(i = mlen-2;i>=1;i--) {
for(j=ready[pattern[i]]-1;j>=max(i,mlen-k);j--)
skip2[j][pattern[i]] = j-i;
ready[pattern[i]] = max(i,mlen-k);
}
you use ready[pattern[i]] before it has been set. I don't know if those mistakes are what's causing the failing testcase, but it's easily imaginable that they do.
If Daniel's suggestions do not solve the problem, here are a couple more things that look odd:
return j-1; // I would expect "return j;" here
This seems odd as if you have k=0,mlen=1, then the highest value that j can take is tlen+k-1, and so the highest return value is tlen-2. In other words matching a pattern 'a' against a string 'a' will not return a match at position 0.
Another oddity is the loop:
for(i = mlen-2;i>=1;i--) // I would expect "for(i = mlen-2;i>=0;i--)" here
it seems odd that in the preprocessing you will never access the first character in your pattern (i.e. pattern[0] is not read).