So, I read the problem 4.5 from Accelerated C++, and interpreted it rather wrong. I wrote a program which is supposed to display counts of a word in string. However, I have probably done something very stupid, and very wrong. I can't figure it out.
Here's the code: http://ideone.com/87zA7E.
Stackoverflow says links to ideone.com must be accompanied by code. Instead of pasting the all of it, I will just paste the function which I think is most likely at fault:
vector<str_info> words(const vector<string>& s) {
vector<str_info> rex;
str_info record;
typedef vector<string>::size_type str_sz;
str_sz i = 0;
while (i != s.size()) {
record.str = s[i];
record.count = 0;
++i; //edit
for (str_sz j = 0; j != s.size(); ++j) {
if (compare(record, s[j]))
++record.count;
}
for (vector<str_info>::size_type k = 0; k != s.size(); ++k) {
if (!compare(record, rex[k].str))
rex.push_back(record);
}
}
return rex;
}
One problem is that you have this:
str_sz i = 0;
while (i != s.size()) {
but you never increment i, leading to an endless loop. Inside of that loop, you're pushing elements into vector rex. A vector cannot contain an infinite number of elements.
Also, you are trying to access:
rex[k].str
in
for (vector<str_info>::size_type k = 0; k != s.size(); ++k) {
if (!compare(record, rex[k].str)) // rex is empty in the beginning!!
rex.push_back(record);
}
But you do not know whether rex has k+1 elements in it.
EDIT: Change your code to:
while (i != s.size()) {
// read new string into a record (initial count should be one).
str_info record;
record.str = s[i];
record.count = 1;
// check if this string already exists in rex
bool found = false;
for (vector<str_info>::size_type k = 0; k < rex.size(); ++k) {
if ( record.str == rex[k].str ) {
rex[k].count++;
found = true;
break;
}
}
i++;
if ( found )
continue;
// if it is not found then push_back to rex
rex.push_back( record );
}
Related
I'm trying to do string matching algorithm a brute force method. but The algorithm is not working correctly, I get an out of bound index error.
here is my algorithm
int main() {
string s = "NOBODY_NOTICED_HIM";
string pattern="NOT";
int index = 0;
for (int i = 0; i < s.size();)
{
for (int j = 0; j < pattern.size();)
{
if(s[index] == pattern[j])
{
j++;
i++;
}
else
{
index = i;
j = 0;
}
}
}
cout<<index<<endl;
return 0;
}
FIXED VERSION
I fixed the out of bound exception. I don't know if the algorithm will work with different strings
int main() {
string s = "NOBODY_NOTICED_HIM";
string pattern="NOT";
int index = 0;
int i = 0;
while( i < s.size())
{
i++;
for (int j = 0; j < pattern.size();)
{
if(s[index] == pattern[j])
{
index++;
j++;
cout<<"i is " <<i << " j is "<<j <<endl;
}
else
{
index = i;
break;
}
}
}
cout<<i<<endl;
return 0;
}
Because the inner for loop has a condition to loop while j is less than pattern.size() but you are also incrementing i inside the body. When i goes out of bounds of s.size() then index also goes out of bounds and you'd get an OutOfBounds error.
The brute force method has to test the pattern with every possible subsequence. The main condition is the length, which has to be the same. All subsequence from s are:
['NOB', 'OBO', 'BOD', 'ODY', 'DY_', 'Y_N', 'NO', 'NOT', 'OTI', 'TIC',
'ICE', 'CED', 'ED', 'D_H', '_HI', 'HIM']
There are many ways to do it, you can do it char by char, or by using string operations like taking a substring. Both are nice excercises for learning.
Starting at zero in the s string you take the first three chars, compare to the pattern, and if equal you give the answer. Otherwise you move on to the char starting at one, etc.
I am implementing a pattern search algorithm that has a vital if statement that seems to be unpredictable in it's result. Random files are searched and thus sometimes the branch predictions are okay and sometimes they can be terrible if the file is completely random. My goal is to eliminate the if statement and I have tried but it has yielded slow results like preallocating a vector. The number of pattern possibilities can be very large so preallocating takes up a lot of time. I therefore have the dynamic vector where I initialize them all with NULL up front and then check with the if statement if a pattern is present. The if seems to be killing me and specifically the cmp assembly statement. Bad branch predictions are scrapping the pipeline a lot and causing huge slow downs. Any ideas would be great as to eliminate the if statement at line 17...stuck in a rut.
for (PListType i = 0; i < prevLocalPListArray->size(); i++)
{
vector<vector<PListType>*> newPList(256, NULL);
vector<PListType>* pList = (*prevLocalPListArray)[i];
PListType pListLength = (*prevLocalPListArray)[i]->size();
PListType earlyApproximation = ceil(pListLength/256);
for (PListType k = 0; k < pListLength; k++)
{
//If pattern is past end of string stream then stop counting this pattern
if ((*pList)[k] < file->fileStringSize)
{
uint8_t indexer = ((uint8_t)file->fileString[(*pList)[k]]);
if(newPList[indexer] != NULL) //Problem if statement!!!!!!!!!!!!!!!!!!!!!
{
newPList[indexer]->push_back(++(*pList)[k]);
}
else
{
newPList[indexer] = new vector<PListType>(1, ++(*pList)[k]);
newPList[indexer]->reserve(earlyApproximation);
}
}
}
//Deallocate or stuff patterns in global list
for (int z = 0; z < newPList.size(); z++)
{
if(newPList[z] != NULL)
{
if (newPList[z]->size() >= minOccurrence)
{
globalLocalPListArray->push_back(newPList[z]);
}
else
{
delete newPList[z];
}
}
}
delete (*prevLocalPListArray)[i];
}
Here is the code without indirection with the changes proposed...
vector<vector<PListType>> newPList(256);
for (PListType i = 0; i < prevLocalPListArray.size(); i++)
{
const vector<PListType>& pList = prevLocalPListArray[i];
PListType pListLength = prevLocalPListArray[i].size();
for (PListType k = 0; k < pListLength; k++)
{
//If pattern is past end of string stream then stop counting this pattern
if (pList[k] < file->fileStringSize)
{
uint8_t indexer = ((uint8_t)file->fileString[pList[k]]);
newPList[indexer].push_back((pList[k] + 1));
}
else
{
totalTallyRemovedPatterns++;
}
}
for (int z = 0; z < 256; z++)
{
if (newPList[z].size() >= minOccurrence/* || (Forest::outlierScans && pList->size() == 1)*/)
{
globalLocalPListArray.push_back(newPList[z]);
}
else
{
totalTallyRemovedPatterns++;
}
newPList[z].clear();
}
vector<PListType> temp;
temp.swap(prevLocalPListArray[i]);
}
Here is the most up to date program that manages to not use 3 times the memory and does not require an if statement. The only bottleneck seems to be the newPList[indexIntoFile].push_back(++index); statement. This bottleneck could be cache coherency issues when indexing the array because the patterns are random. When i search a binary files with just 1s and 0s I don't have any latency with indexing the push back statement. That is why I believe it is cache thrashing. Do you guys see any room for optimization in this code still? You guys have been a great help so far. #bogdan #harold
vector<PListType> newPList[256];
PListType prevPListSize = prevLocalPListArray->size();
PListType indexes[256] = {0};
PListType indexesToPush[256] = {0};
for (PListType i = 0; i < prevPListSize; i++)
{
vector<PListType>* pList = (*prevLocalPListArray)[i];
PListType pListLength = (*prevLocalPListArray)[i]->size();
if(pListLength > 1)
{
for (PListType k = 0; k < pListLength; k++)
{
//If pattern is past end of string stream then stop counting this pattern
PListType index = (*pList)[k];
if (index < file->fileStringSize)
{
uint_fast8_t indexIntoFile = (uint8_t)file->fileString[index];
newPList[indexIntoFile].push_back(++index);
indexes[indexIntoFile]++;
}
else
{
totalTallyRemovedPatterns++;
}
}
int listLength = 0;
for (PListType k = 0; k < 256; k++)
{
if( indexes[k])
{
indexesToPush[listLength++] = k;
}
}
for (PListType k = 0; k < listLength; k++)
{
int insert = indexes[indexesToPush[k]];
if (insert >= minOccurrence)
{
int index = globalLocalPListArray->size();
globalLocalPListArray->push_back(new vector<PListType>());
(*globalLocalPListArray)[index]->insert((*globalLocalPListArray)[index]->end(), newPList[indexesToPush[k]].begin(), newPList[indexesToPush[k]].end());
indexes[indexesToPush[k]] = 0;
newPList[indexesToPush[k]].clear();
}
else if(insert == 1)
{
totalTallyRemovedPatterns++;
indexes[indexesToPush[k]] = 0;
newPList[indexesToPush[k]].clear();
}
}
}
else
{
totalTallyRemovedPatterns++;
}
delete (*prevLocalPListArray)[i];
}
Here are the benchmarks. I didn't think it would be readable in the comments so I am placing it in the answer category. The percentages to the left define how much time is spent percentage wise on a line of code.
I'm having some troubles adding and removing elements from an std::vector (population, in the example code). What I want to do is to erase an element if a condition is satisfied and copy the element if instead other conditions are satisfied. Here's the code:
for( int i = 0; i < walkers_num; i++) {
if( population[i].molteplicity == 0 ) {
population[i] = population.back();
population.pop_back();
i--;
} else {
for( int j = population[i].molteplicity; j > 1; j-- ) {
population.push_back(population[i]);
}
}
}
walkers_num = population.size();
What I get is:
*** error for object 0x7f86a1404498: incorrect checksum for freed object - object was probably modified after being freed.
I guess I'm using some std::vector property in a wrong way, since a very similar algorithm (conceptually they seem identical to me) seems to work if population is instead an std::list:
list<Walker>::iterator it;
list<Walker>::iterator end = thread_population[i].end();
for ( it = thread_population[i].begin(); it != end; ) {
if( it->molteplicity == 0 ) {
it = thread_population[i].erase(it);
continue;
}
for( int j = it->molteplicity; j > 1; j-- ) {
population.push_back(*it);
}
++it;
}
walkers_num = population.size();
Can you help me?
You haven't posted quite enough code.
I'm assuming you omitted at the start of the fragment:
walkers_num = population.size();
And are trying to visit the whole array. In that case try:
walkers_num = population.size();
for( int i = 0; i < walkers_num; i++) {
if( population[i].molteplicity == 0 ) {
population[i] = population.back();
population.pop_back();
i--;
--walkers_num; //Array has been shortened.
}
//....
You seem to have realised the length has changed because you put walkers_num = population.size(); at the end. You need to keep track throughout.
There are subtle reasons why your iterator code is likely to work but technically just as invalid. You're not allowed to assume end is valid after a modification.
I'm trying to step through a given string with a for loop, replacing one character per iteration with a character from a vector[char].
Problem is that the replace inserts the entire vector-k instead of the character at place k and I cannot figure out what I've done wrong.
Any and all help is appreciated.
(alphabet is a const string a-z, FirstWord is the given string).
vector<char> VectorAlphabet;
for (int i=0; i<alphabet.length(); ++i)
{
VectorAlphabet.push_back(alphabet.at(i));
}
for (int i = 0; i < FirstWord.length(); ++i )
{
for (int k = 0; k < VectorAlphabet.size(); ++k)
{
string TempWord = FirstWord;
TempWord.replace(i, 1, &VectorAlphabet[k]);
if (CheckForValidWord(TempWord, WordSet))
{
if(CheckForDuplicateChain(TempWord, DuplicateWordSet))
{
DuplicateWordSet.insert(TempWord);
stack<string> TempStack = WordStack;
TempStack.push(TempWord);
WordQueue.push(TempStack);
}
}
}
}
e.g TempWord = tempword, then after TempWord.replace() on the first iteration it is abcde...zempWord. and not aempword. On the second to last iteration of the second for loop it is yzempword.
What have I missed?
Problem solved, thanks to Dieter Lücking.
Looking closer at the string.replace reference, I see that I tried to use a replace which takes strings as the input, and then the vector[char] is interpreted as a c-string, starting from the k-position.
By using the fill-version of replace the vector position is correctly used as a char instead.
New code is:
for (int i = 0; i < FirstWord.length(); ++i )
{
for (int k = 0; k < VectorAlphabet.size(); ++k)
{
string TempWord = WordStack.top();
// Change:
TempWord.replace(i, 1, 1, VectorAlphabet[k]);
if (CheckForValidWord(TempWord, WordSet))
{
if(CheckForDuplicateChain(TempWord, DuplicateWordSet))
{
DuplicateWordSet.insert(TempWord);
stack<string> TempStack = WordStack;
TempStack.push(TempWord);
WordQueue.push(TempStack);
}
}
}
}
It works with Visual Studio, but segfaults in Cygwin, which is weird because I'm compiling the same source, and both generate a Windows executable. GDB doesn't work very well for me in Cygwin for some reason, and the error doesn't appear in VS so I can't really debug it there.
Any ideas?
int main(void)
{
Pair ***occurences = new Pair**[20];
int i, j, k;
for (i = 0; i < 20; i++)
{
occurences[i] = new Pair*[i+1];
for (j = 0; j < i+1; j++)
{
occurences[i][j] = new Pair[26];
for (k = 0; k < 26; k++)
{
Pair pair;
pair.c = k + 'a';
pair.occurs = 0;
occurences[i][j][k] = pair;
}
}
}
std::fstream sin;
sin.open("dictionary.txt");
std::string word;
while (std::getline(sin, word))
{
if (word.size() < 21)
{
for (i = 0; i < word.size(); i++)
{
// SEGFAULTING HERE
occurences[word.size()-1][i][word[i] - 'a'].occurences++;
}
}
}
for (i = 0; i < 20; i++)
{
for (j = 0; j < i+1; j++)
{
delete [] occurences[i][j];
}
delete [] occurences[i];
}
delete [] occurences;
return 0;
}
You marked this line as the critical point:
occurences[word.size()-1][i][word[i] - 97].occurs++;
All three array accesses might go wrong here, and you would have to check them all:
It seems like the first dimension of the array has the length 20, so the valid values for the index are [0..19]. word.size()-1 will be less than 0 if the size of the word is zero itself, and it will be larger than 19 if the size of the word is 21 or more.
Are you sure the length of the word is always in the range [1..20]?
The second dimension has variable length, depending on the index of the first dimension. Are you sure this never gets out of bound?
The third dimension strikes me as the most obvious. You subtract 97 from the character code, and use the result as index into an array with 26 entries. This assumes that all characters are in the range of [97..122], meaning ['a'..'z']. Are you sure that there will never be other characters in the input? For example, if there are any capital characters, the resulting index will be negative.
Just reformulating my comment as an answer:
occurences[word.size()-1][i][word[i] - 'a'].occurs++;
if word.size() is 100 (for example) this will crash (for i == 0) since occurences has only 20 elements.