Briefly, I am trying to write a routine that reads comma separated values from a file into a stl vector. This works fine. The values in the csv file might also be in double quotes so I have handled this too by trimming them. However, there is one problem where the values between the quotes might also have commas in them which are not to be treated as delimiters.
If I have a file containing the line
"test1","te,st2","test3","test4"
My file routine reads this into a vector as
"test1"
"te
st2"
"test3"
"test4"
I wrote a routine which I just called PostProcessing. This would go through the vector and correct this problem. It would take each element and check of the first value was a quote. If so it would remove it. It would then look for another quote at the end of the string. If it found one it would just remove it and move onto the next item. If it didn't find one, it would keep going through the vector merging all the following items together until it did find the next quote.
However, this works in merging "te and st2" together into element 2 (index 1) but when I try and erase the required element from the vector it must be failing as the resulting vector output is as follows:
test1
test2
st2"
test3
"test4"
Note also the last element has not been processed because I derement the size of the count but as the vector erase has failed the true count hasn't actually changed.
The PostProcessing code is below. What am I doing wrong?
bool MyClass::PostProcessing()
{
bool bRet = false;
int nCount = m_vecFields.size();
for (int x = 0; x < nCount; x++)
{
string sTemp = m_vecFields[x];
if (sTemp[0] == '"')
{
sTemp.erase(0,1);
if (sTemp[sTemp.size()-1] == '"')
{
sTemp.erase(sTemp.size()-1, 1);
m_vecFields[x] = sTemp;
}
else
{
// find next double quote and merge these vector items
int offset = 1;
bool bFound = false;
while (x+offset < nCount && !bFound)
{
sTemp = sTemp + m_vecFields[x+offset];
if (sTemp[sTemp.size()-1] == '"')
{
// found corresponding "
sTemp.erase(sTemp.size()-1,1);
bFound = true;
}
else
{
offset++;
}
}
if (bFound)
{
m_vecFields[x] = sTemp;
// now remove required items from vector
m_vecFields.erase(m_vecFields.begin()+x+1, m_vecFields.begin()+x+offset);
nCount -= offset;
}
}
}
}
return bRet;
}
Edit: I've spotted a couple of issues with the code which I will be correcting but they don't affect the question being asked.
m_vecFields.erase(m_vecFields.begin()+x+1, m_vecFields.begin()+x+offset);
This function takes a semi-closed interval, which means the "end" of the interval to erase should point one-past the last element to erase. In your case, it points to that element. Change the second argument to m_vecFields.begin()+x+offset+1.
x += offset;
Since you've just processed an item and deleted everything up to the next item, you shouldn't skip offset items here. The x++ from the loop will do just fine.
Related
I've been assigned this question for my lab (and yes I understand there will be backlash because it's homework). I've been working on this question for a couple of days to no avail and I feel like I'm missing something glaringly obvious.
My code:
int processSuitors(vector<int>& currentSuitors, list<int>& rekt)
{
int sizeSuitors = currentSuitors.size();
int eliminated = 2;
while(sizeSuitors != 1)
{
rekt.push_back(currentSuitors[eliminated]);
currentSuitors.erase(currentSuitors.begin() + eliminated);
sizeSuitors--;
if(eliminated > sizeSuitors)
{
eliminated -= sizeSuitors;
}
}
return currentSuitors[0];
}
Prompt:
In an ancient land, the beautiful princess Eve had many suitors. She decided on the following procedure to determine which suitor she would marry. First, all of the suitors would be lined up one after the other and be assigned numbers. The first suitor would be number 1, the second number 2, and so on up to the last suitor, number n. Starting at the first suitor she would then count three suitors down the line (because of the three letters in her name) and the third suitor would be eliminated from winning her hand and he would be removed from the line. Eve would then continue, counting three more suitors and eliminating every third suitor. When she reached the end of the line she would continue counting from the beginning.
Write a function named processSuitors that takes as arguments an STL vector of type int containing the suitors, and an STL list of type int that will collect all the suitors that are eliminated. The function returns an int storing the position a suitor should stand in to marry the princess if there are n suitors. The function that calls processSuitors will send the vector already filled with n suitors (1, 2, 3... n), and an empty list that needs to be filled with the position number of the suitors that were eliminated, in the order they were eliminated.
Restrictions: You may not create any containers (no arrays, no vectors, etc.); you need to use the vector and the list that are passed as parameters.
Use ONLY the following STL functions:
vector::size
vector::erase
vector::begin
ist::push_back
vector::operator[ ]
The adjacent files are hidden since we are to rely on what is given. Any clean-up of my code would be extremely appreciated as well.
What do you think of this solution.
Keep another vector that marks whether an index in your currentSuitors vector has been removed. Then have a helper function that will always find the next free index.
Instead of trying to reduce currentSuitors, you just keep marking elements in the taken list.
size_t findNextFreeSlot(const vector<bool>& taken, size_t pos)
{
// increment to the next candidate position
pos = (pos + 1) % taken.size();
// search for the first free slot
for (size_t i = 0; i < taken.size(); i++)
{
if (taken[pos] == false)
{
return next;
}
pos = (pos + 1) % taken.size();
}
// assert(false); // we should never get here as long as there's one free slot index in taken
return -1;
}
int processSuitors(vector<int>& currentSuitors, list<int>& rekt)
{
size_t len = currentSuitors.size();
vector<bool> taken(len); // keep a vector of eliminated indices from current
size_t index = len; // initialize one past the last valid element
size_t eliminated = 0;
if (len == 0)
{
return -1;
}
while (eliminated < (len-1))
{
// advance the index three times to the next "untaken" index
index = findNextFreeSlot(taken, index);
index = findNextFreeSlot(taken, index);
index = findNextFreeSlot(taken, index);
taken[index] = true; // claim this index as taken
rekt.push_back(currentSuitors[index]); // add the value at this index to the eliminated list
eliminated++;
}
index = findNextFreeSlot(taken, index); // find the last free index
return currentSuitors[index];
}
The code is to read instructions from text file and print out graphic patterns. One is my function is not working properly. The function is to read the vectors of strings I've got from the file into structs.
Below is my output, and my second, third, and sixth graphs are wrong. It seems like the 2nd and 3rd vectors are not putting the correct row and column numbers; and the last one skipped "e" in the alphabetical order.
I tried to debug many times and still can't find the problem.
typedef struct Pattern{
int rowNum;
int colNum;
char token;
bool isTriangular;
bool isOuter;
}Pattern;
void CommandProcessing(vector<string>& , Pattern& );
int main()
{
for (int i = 0; i < command.size(); i++)
{
Pattern characters;
CommandProcessing(command[i], characters);
}
system("pause");
return 0;
}
void CommandProcessing(vector<string>& c1, Pattern& a1)
{
reverse(c1.begin(), c1.end());
string str=" ";
for (int j = 0; j < c1.size(); j++)
{
bool foundAlpha = find(c1.begin(), c1.end(), "alphabetical") != c1.end();
bool foundAll = find(c1.begin(), c1.end(), "all") != c1.end();
a1.isTriangular = find(c1.begin(), c1.end(), "triangular") != c1.end() ? true : false;
a1.isOuter = find(c1.begin(), c1.end(), "outer") != c1.end() ? true : false;
if (foundAlpha ==false && foundAll == false){
a1.token = '*';
}
//if (c1[0] == "go"){
else if (c1[j] == "rows"){
str = c1[++j];
a1.rowNum = atoi(str.c_str());
j--;
}
else if (c1[j] == "columns"){
str = c1[++j];
a1.colNum = atoi(str.c_str());
j--;
}
else if (c1[j] == "alphabetical")
a1.token = 0;
else if (c1[j] == "all"){
str = c1[--j];
a1.token = *str.c_str();
j++;
}
}
}
Before debugging (or posting) your code, you should try to make it cleaner. It contains many strange / unnecessary parts, making your code harder to understand (and resulting in the buggy behaviour you just described).
For example, you have an if in the beginning:
if (foundAlpha ==false && foundAll == false){
If there is no alpha and all command, this will be always true, for the entire length of your loop, and the other commands are all placed in else if statements. They won't be executed.
Because of this, in your second and third example, no commands will be read, except the isTriangular and isOuter flags.
Instead of a mixed structure like this, consider the following changes:
add a default constructor to your Pattern struct, initializing its members. For example if you initialize token to *, you can remove that if, and even the two bool variables required for it.
Do the parsing in one way, consistently - the easiest would be moving your triangular and outer bool to the same if structure as the others. (or if you really want to keep this find lookup, move them before the for loop - you only have to set them once!)
Do not modify your loop variable ever, it's an error magnet! Okay, there are some rare exceptions for this rule, but this is not one of them.
Instead of str = c1[++j];, and decrementing later, you could just write str = c1[j+1]
Also, are you sure you need that reverse? That makes your relative +/-1 indexing unclear. For example, the c1[j+1 is j-1 in the original command string.
About the last one: that's probably a bug in your outer printing code, which you didn't post.
I'm writing an autocomplete program that finds all possible matches to a letter or set of characters given a dictionary file and input file. I just finished a version that implements a binary search over an iterative search and thought I could boost the overall performance of the program.
Thing is, the binary search is almost 9 times slower than an iterative search. What gives? I thought I was improving performance by using a binary search over iterative.
Run time(bin search to the left)[Larger]:
Here is the important part of each version, full code can be built and run at my github with cmake.
Binary Search Function(called while looping through input given)
bool search(std::vector<std::string>& dict, std::string in,
std::queue<std::string>& out)
{
//tick makes sure the loop found at least one thing. if not then break the function
bool tick = false;
bool running = true;
while(running) {
//for each element in the input vector
//find all possible word matches and push onto the queue
int first=0, last= dict.size() -1;
while(first <= last)
{
tick = false;
int middle = (first+last)/2;
std::string sub = (dict.at(middle)).substr(0,in.length());
int comp = in.compare(sub);
//if comp returns 0(found word matching case)
if(comp == 0) {
tick = true;
out.push(dict.at(middle));
dict.erase(dict.begin() + middle);
}
//if not, take top half
else if (comp > 0)
first = middle + 1;
//else go with the lower half
else
last = middle - 1;
}
if(tick==false)
running = false;
}
return true;
}
Iterative Search(included in main loop):
for(int k = 0; k < input.size(); ++k) {
int len = (input.at(k)).length();
// truth false variable to end out while loop
bool found = false;
// create an iterator pointing to the first element of the dictionary
vecIter i = dictionary.begin();
// this while loop is not complete, a condition needs to be made
while(!found && i != dictionary.end()) {
// take a substring the dictionary word(the length is dependent on
// the input value) and compare
if( (*i).substr(0,len) == input.at(k) ) {
// so a word is found! push onto the queue
matchingCase.push(*i);
}
// move iterator to next element of data
++i;
}
}
example input file:
z
be
int
nor
tes
terr
on
Instead of erasing elements in the middle of the vector (which is quite expensive), and then starting your search over, just compare the elements before and after the found item (because they should all be adjacent to eachother) until you find the all the items which match.
Or use std::equal_range, which does exactly that.
This will be the culprit:
dict.erase(dict.begin() + middle);
You are repeatedly removing items from your dictionary to naively use binary search to find all valid prefixes. This adds huge complexity, and is unnecessary.
Instead, once you have found a match, step backwards until you find the first match, then step forwards, adding all matches to your queue. Remember that because your dictionary is sorted and you are using only the prefixes, all valid matches will appear consecutively.
dict.erase operation is linear in the size of dict: it copies the entire array from middle to end into the beginning of the array. This makes the "binary search" algorithm possible quadratic in the length of dict, with O(N^2) expensive memory copy operations.
I'm trying to adapt the Boyer-Moore c(++) Wikipedia implementation to get all of the matches of a pattern in a string. As it is, the Wikipedia implementation returns the first match. The main code looks like:
char* boyer_moore (uint8_t *string, uint32_t stringlen, uint8_t *pat, uint32_t patlen) {
int i;
int delta1[ALPHABET_LEN];
int *delta2 = malloc(patlen * sizeof(int));
make_delta1(delta1, pat, patlen);
make_delta2(delta2, pat, patlen);
i = patlen-1;
while (i < stringlen) {
int j = patlen-1;
while (j >= 0 && (string[i] == pat[j])) {
--i;
--j;
}
if (j < 0) {
free(delta2);
return (string + i+1);
}
i += max(delta1[string[i]], delta2[j]);
}
free(delta2);
return NULL;
}
I have tried to modify the block after if (j < 0) to add the index to an array/vector and letting the outer loop continue, but it doesn't appear to be working. In testing the modified code I still only get a single match. Perhaps this implementation wasn't designed to return all matches, and it needs more than a few quick changes to do so? I don't understand the algorithm itself very well, so I'm not sure how to make this work. If anyone can point me in the right direction I would be grateful.
Note: The functions make_delta1 and make_delta2 are defined earlier in the source (check Wikipedia page), and the max() function call is actually a macro also defined earlier in the source.
Boyer-Moore's algorithm exploits the fact that when searching for, say, "HELLO WORLD" within a longer string, the letter you find in a given position restricts what can be found around that position if a match is to be found at all, sort of a Naval Battle game: if you find open sea at four cells from the border, you needn't test the four remaining cells in case there's a 5-cell carrier hiding there; there can't be.
If you found for example a 'D' in eleventh position, it might be the last letter of HELLO WORLD; but if you found a 'Q', 'Q' not being anywhere inside HELLO WORLD, this means that the searched-for string can't be anywhere in the first eleven characters, and you can avoid searching there altogether. A 'L' on the other hand might mean that HELLO WORLD is there, starting at position 11-3 (third letter of HELLO WORLD is a L), 11-4, or 11-10.
When searching, you keep track of these possibilities using the two delta arrays.
So when you find a pattern, you ought to do,
if (j < 0)
{
// Found a pattern from position i+1 to i+1+patlen
// Add vector or whatever is needed; check we don't overflow it.
if (index_size+1 >= index_counter)
{
index[index_counter] = 0;
return index_size;
}
index[index_counter++] = i+1;
// Reinitialize j to restart search
j = patlen-1;
// Reinitialize i to start at i+1+patlen
i += patlen +1; // (not completely sure of that +1)
// Do not free delta2
// free(delta2);
// Continue loop without altering i again
continue;
}
i += max(delta1[string[i]], delta2[j]);
}
free(delta2);
index[index_counter] = 0;
return index_counter;
This should return a zero-terminated list of indexes, provided you pass something like a size_t *indexes to the function.
The function will then return 0 (not found), index_size (too many matches) or the number of matches between 1 and index_size-1.
This allows for example to add additional matches without having to repeat the whole search for the already found (index_size-1) substrings; you increase num_indexes by new_num, realloc the indexes array, then pass to the function the new array at offset old_index_size-1, new_num as the new size, and the haystack string starting from the offset of match at index old_index_size-1 plus one (not, as I wrote in a previous revision, plus the length of the needle string; see comment).
This approach will report also overlapping matches, for example searching ana in banana will find b*ana*na and ban*ana*.
UPDATE
I tested the above and it appears to work. I modified the Wikipedia code by adding these two includes to keep gcc from grumbling
#include <stdio.h>
#include <string.h>
then I modified the if (j < 0) to simply output what it had found
if (j < 0) {
printf("Found %s at offset %d: %s\n", pat, i+1, string+i+1);
//free(delta2);
// return (string + i+1);
i += patlen + 1;
j = patlen - 1;
continue;
}
and finally I tested with this
int main(void)
{
char *s = "This is a string in which I am going to look for a string I will string along";
char *p = "string";
boyer_moore(s, strlen(s), p, strlen(p));
return 0;
}
and got, as expected:
Found string at offset 10: string in which I am going to look for a string I will string along
Found string at offset 51: string I will string along
Found string at offset 65: string along
If the string contains two overlapping sequences, BOTH are found:
char *s = "This is an andean andeandean andean trouble";
char *p = "andean";
Found andean at offset 11: andean andeandean andean trouble
Found andean at offset 18: andeandean andean trouble
Found andean at offset 22: andean andean trouble
Found andean at offset 29: andean trouble
To avoid overlapping matches, the quickest way is to not store the overlaps. It could be done in the function but it would mean to reinitialize the first delta vector and update the string pointer; we also would need to store a second i index as i2 to keep saved indexes from going nonmonotonic. It isn't worth it. Better:
if (j < 0) {
// We have found a patlen match at i+1
// Is it an overlap?
if (index && (indexes[index] + patlen < i+1))
{
// Yes, it is. So we don't store it.
// We could store the last of several overlaps
// It's not exactly trivial, though:
// searching 'anana' in 'Bananananana'
// finds FOUR matches, and the fourth is NOT overlapped
// with the first. So in case of overlap, if we want to keep
// the LAST of the bunch, we must save info somewhere else,
// say last_conflicting_overlap, and check twice.
// Then again, the third match (which is the last to overlap
// with the first) would overlap with the fourth.
// So the "return as many non overlapping matches as possible"
// is actually accomplished by doing NOTHING in this branch of the IF.
}
else
{
// Not an overlap, so store it.
indexes[++index] = i+1;
if (index == max_indexes) // Too many matches already found?
break; // Stop searching and return found so far
}
// Adapt i and j to keep searching
i += patlen + 1;
j = patlen - 1;
continue;
}
Trying not to lose it here. As you can see below I have assigned intFrontPtr to point to the first cell in the array. And intBackPtr to point to the last cell in the array...:
bool quack::popFront(int& nPopFront)
{
nPopFront = items[top+1].n;
if ( count >= maxSize ) return false;
else
{
items[0].n = nPopFront;
intFrontPtr = &items[0].n;
intBackPtr = &items[count-1].n;
}
for (int temp; intFrontPtr < intBackPtr ;)
{
++intFrontPtr;
temp = *intFrontPtr;
*intFrontPtr = temp;
}
return true;
}
In the else statement I'm simply reassigning to ensure that my ptrs are where I want them. For some reason I'm popping off the back instead of off the front.
Anyone care to explain?
I'm not entirely sure I understand what you're trying to do, but if I;m guessing right you're trying to 'pop' the 1st element of the array (items[0]) into the nPopFront int reference, then move all the subsequent elements of the array over by one so that the 1st element is replaced by the 2nd, the 2nd by the 3rd, and so on. After this operation, the array will contain one less total number of elements.
Not having the full declaration of the quack class makes most of the following guesswork, but here goes:
I'm assuming that item[0] represents the 'front' of your array (so it's the element you want 'popped').
I'm also assuming that 'count` is the number of valid elements (so item[count-1] is the last valid element, or the 'back' of the array).
Given these assumptions, I'm honestly not sure what top is supposed to represent (so I might be entirely wrong on these guesses).
Problem #1: your nPopFront assignment is reversed, it should be:
nPopFront = items[0].n;
Problem #2; your for loop is a big no-op. It walks through the array assigning elements back to their original location. I think you want it to look more like:
for (int i = 1; i < count; ++i)
{
items[i-1].n = items[i].n; // move elements from back to front
}
Finally, you'll want to adjust count (and probably top - if you need it at all) before you return to adjust the new number of elements in the data structure. The whole thing might look like:
bool quack::popFront(int& nPopFront)
{
if ( count >= maxSize ) return false;
if ( count == 0 ) return false; // nothing to pop
nPopFront = items[0].n;
intFrontPtr = &items[0].n; // do we really need to maintain these pointers?
intBackPtr = &items[count-1].n;
for (int i = 1; i < count; ++i)
{
items[i-1].n = items[i].n; // move elements from back to front
}
count -= 1; // one less item in the array
return true;
}
The original question seems to be that you don't understand why the function popFront returns 3 times when there are 3 elements?
If that's the case, I think you are missing the point of recursion.
When you make a recursive call, you are calling the same function again, basically creating a new stack frame and jumping back to the same function. So if there are 3 elements, it will recurse by encountering the first element, encountering the second element, encountering the third element, returning from the third encounter, returning from the second encounter, and returning from the first encounter (assuming you are properly consuming your array, which you don't appear to be).
The current function cannot return until the recursive call has iterated, thus it may appear to return from the last element before the second, and the second before the first.
That is how recursion works.
I wasn't able to make sense of your example, so I whipped one up real fast:
#include <iostream>
using namespace std;
bool popfront(int* ptr_, int* back_) {
cerr << ptr_[0] << endl;
if(ptr_ != back_) {
popfront(++ptr_, back_);
}
return true;
}
int main() {
int ar[4] = {4,3,2,1};
popfront(ar, ar + 3);
return 0;
}
That's not great, but it should get the point across.
Can't you just use a std::list?
That makes it really to pop from either end using pop_front or pop_back. You can also add to the front and the back. It also has the advantage that after popping from the front (or even removing from the middle of the list) you don't have to shift anything around (The link is simply removed) which makes it much more efficient than what you are, seemingly, proposing.
I'm assuming you're trying to assign the popped value to nPopFront?
bool stack::popFront(int& nPopFront)
{
//items[4] = {4,3,2,1}
if ( intFrontPtr < intBackPtr )
{
nPopFront = *intFrontPtr;
++intFrontPtr;
}
return true;
}
bool quack::popFront(int& nPopFront)
{
if(items.n==0) throw WhateverYouUseToSignalError;
nPopFront = items[0];
for (int =0;i<items.n-1,++i){
items[i]=items[i+1]
}
//update size of items array
}