I've got a task that I'm stuck on. I need to create a program that reads an input file, stores each word into a vector along with how many times that word was read (hence the struct). Those values then need to print out in alphabetical order.
I've come up with something that I think is along the right lines:
struct WordInfo {
string text;
int count;
} uwords, temp;
string word;
int count = 0; //ignore this. For a different part of the task
vector<WordInfo> uwords;
while (cin >> word) {
bool flag = false;
count += 1;
for (int i = 0; i < uwords.size(); i++) {
if (uwords[i].text == word) {
flag = true;
uwords[i].count += 1;
}
}
if (flag == false) {
if (count == 1) { //stores first word into vector
uwords.push_back(WordInfo());
uwords[0].count = 1;
uwords[0].text = word;
} else {
for (int i = 0; i < uwords.size(); i++) {
if (word < uwords[i].text) {
uwords.push_back(WordInfo());
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
}
}
}
}
}
Now the problem I'm having, is that when I run the program it appears to get stuck in an infinite loop and I can't see why. Although I've done enough testing to realise it's probably in that last if statement, but my attempts to fix it were no good. Any help is appreciated. Cheers.
EDIT: I forgot to mention, we must use vector class and we're limited in what we can use, and sort is not an option :(
if (word < uwords[i].text) {
uwords.push_back(WordInfo());
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
}
Take a good look at this piece of code:
First, it will actually insert 2 words into your list; one time an "empty" one with push_back, and one time with insert. And it will do that whenever the current word is smaller than the one at the position i.
And as soon as it has inserted, there's 2 new elements to walk over; one actually being at the current position of i, so in the next iteration, we will again compare the same word - so your loop gets stuck because index i increases by 1 each iteration, but the increase of i only steps over the just inserted element!
For a quick solution, you want to (1) search for the position where the word before is "smaller" than the current one, but the next one is bigger. Something like
if (uwords[i-1].text < word && word < uwords[i].text) {
(2) and you want to get rid of the push_back call.
Furthermore, (3) you can break the loop after the if condition was true - you have already inserted then, no need to iterate further. And (4), with a bit of condition tweaking, the count == 1 can actually be merged into the loop. Modified code part (will replace your whole if (code == false) block - warning, not tested yet):
if (!flag) {
for (int i = 0; i <= uwords.size(); ++i) {
if ((i == 0 || uwords[i-1].text < word) &&
(i == uwords.size() || word < uwords[i].text)) {
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
break;
}
}
}
You should not push your words nin vector, but in map
std::map<std::string,int>
Since map has comparable keys iterator over map, automaticaly returns sorted range that can be later pushed in vector if needed.
Related
I'm trying to figure out how to implement an algorithm to find a power set given a set, but I'm having some trouble. The sets are actually vectors so for example I am given Set<char> set1{ 'a','b','c' };
I would do PowerSet(set1); and I would get all the sets
but if I do Set<char> set2{ 'a','b','c', 'd' };
I would do PowerSet(set2) and I would miss a few of those sets.
Set<Set<char>> PowerSet(const Set<char>& set1)
{
Set<Set<char>> result;
Set<char> temp;
result.insertElement({});
int card = set1.cardinality();
int powSize = pow(2, card);
for (int i = 0; i < powSize; ++i)
{
for (int j = 0; j < card; ++j)
{
if (i % static_cast<int> ((pow(2, j)) + 1))
{
temp.insertElement(set1[j]);
result.insertElement(temp);
}
}
temp.clear();
}
return result;
}
For reference:
cardinality() is a function in my .h where it returns the size of the set.
insertElement() inserts element into the set while duplicates are ignored.
Also the reason why I did temp.insertElement(s[j]) then result.insertElement(temp) is because result is a set of a set and so I needed to create a temporary set to insert the elements into then insert it into result.
clear() is a function that empties the set.
I also have removeElem() which removes that element specified if it exists, otherwise it'll ignore it.
Your if test is nonsense -- it should be something like
if ((i / static_cast<int>(pow(2,j))) % 2)
you also need to move the insertion of temp into result after the inner loop (just before the temp.clear()).
With those changes, this should work as long as pow(2, card) does not overflow an int -- that is up to about card == 30 on most machines.
Basically my program reads a text file with the following format:
3
chairs
tables
refrigerators
The number on the first line indicates the number of items in the file to read.
Here's my hash function:
int hash(string& item, int n) {
int hashVal = 0;
int len = item.length();
for(int i = 0; i < len; i++)
hashVal = hashVal*37 + item[i];
hashVal %= n;
if(hashVal < 0) hashVal += n;
return hashVal;
}
when my program read the text file above, it was successful. But when I tried another one:
5
sabel
ziyarah
moustache
math
pedobear
The program would freeze. Not a segmentation fault or anything but it would just stop.
Any ideas?
Edit:
int n, tableSize;
myFile >> n;
tableSize = generateTableSize(n);
string item, hashTable[tableSize];
for(int i = 0; i < tableSize; i++)
hashTable[i] = "--";
while(myFile >> item && n!=0) {
int index = hash(item,tableSize);
if(hashTable[index] == "--")
hashTable[index] = item;
else {
int newIndex = rehash(item,tableSize);
while(hashTable[newIndex] != "--") {
newIndex = rehash(item,tableSize);
}
hashTable[newIndex] = item;
}
n--;
}
int rehash(string item, int n) {
return hash(item,n+1);
}
The code freezes because it ends in an endless loop:
int index = hash(item,tableSize);
if(hashTable[index] == "--")
hashTable[index] = item;
else {
int newIndex = rehash(item,tableSize);
while(hashTable[newIndex] != "--") {
newIndex = rehash(item,tableSize);
}
hashTable[newIndex] = item;
}
You continuously recalculate the index, but do not change the input parameters, so the output stays the same, and therefore it is being recalculated again.
In the code above newIndex is calculated, based on the same inputs as index was calculated from using a different calculaton function though, so most likely it will have a different value than the first time, however the new index is also occupied. So we recalculate the newIndex again this time using the same function as before, with the exact same input, which gives the exact same output again. You look up the same index in the hash table, which is still the same value as the last time you did so, so you recalculate again, once again with the same input parameters, giving the same output, which you look up in the hashtable once again, etc.
The reason why you didn't see this with the first 3 lines, is that you did not have a collision (or at least only a single collisison, meaning the newIndex calculated from the rehash function was useful the first time).
The solution is not to increment the table size (since incrementing the table size, will at best lower the chance of collision which in it self can be good, but won't solve your problem entirely), but to either alter the inputs to your functions, so you get a different output, or change the hashtable structure.
I always found Sedgewick's book on algorithms in C++ useful, there is a chapter on hashing it.
Sadly I don't have my copy of Algorithms in C++ at hand, so I cannot tell you how Sedgewick solved it, but I would suggest for the simple educational purpose of solving your problem, starting by simply incrementing the index by 1 until you find a free slot in the hash table.
I am using two dynamic arrays to read from a file. They are to keep track of each word and the amount of times it appears. If it has already appeared, I must keep track in one array and not add it into the other array since it already exists. However, I am getting blank spaces in my array when I meet a duplicate. I think its because my pointer continues to advance, but really it shouldn't. I do not know how to combat this. The only way I have was to use a continue; when I print out the results if the array content = ""; if (*(words + i) == "") continue;. This basically ignores those blanks in the array. But I think that is messy. I just want to figure out how to move the pointer back in this method. words and frequency are my dynamic arrays.
I would like guidance in what my problem is, rather than solutions.
I have now changed my outer loop to be a while loop, and only increment when I have found the word. Thank you WhozCraig and poljpocket.
Now this occurs.
Instead of incrementing your loop variable [i] every loop, you need to only increment it when a NEW word is found [i.e. not one already in the words array].
Also, you're wasting time in your inner loop by looping through your entire words array, since words will only exist up to index i.
int idx = 0;
while (file >> hold && idx < count) {
if (!valid_word(hold)) {
continue;
}
// You don't need to check past idx because you
// only have <idx> words so far.
for (int i = 0; i < idx; i++) {
if (toLower(words[i]) == toLower(hold)) {
frequency[i]++;
isFound = true;
break;
}
}
if (!isFound) {
words[idx] = hold;
frequency[idx] = 1;
idx++;
}
isFound = false;
}
First, to address your code, this is what it should probably look like. Note how we only increment i as we add words, and we only ever scan the words we've already added for duplicates. Note also how the first pass will skip the j-loop entirely and simply insert the first word with a frequency of 1.
void addWords(const std::string& fname, int count, string *words, int *frequency)
{
std::ifstream file(fname);
std::string hold;
int i = 0;
while (i < count && (file >> hold))
{
int j = 0;
for (; j<i; ++j)
{
if (toLower(words[j]) == toLower(hold))
{
// found a duplicate at j
++frequency[j];
break;
}
}
if (j == i)
{
// didn't find a duplicate
words[i] = hold;
frequency[i] = 1;
++i;
}
}
}
Second, to really address your code, this is what it should actually look like:
#include <iostream>
#include <fstream>
#include <map>
#include <string>
//
// Your implementation of toLower() goes here.
//
typedef std::map<std::string, unsigned int> WordMap;
WordMap addWords(const std::string& fname)
{
WordMap words;
std::ifstream inf(fname);
std::string word;
while (inf >> word)
++words[toLower(word)];
return words;
}
If it isn't obvious by now how a std::map<> makes this task easier, it never will be.
check out SEEK_CUR(). If you want to set the cursor back
The problem is a logical one, consider several situations:
Your algorithm does not find the current word. It is inserted at position i of your arrays.
Your algorithm does find the word. The frequency of the word is incremented along with i, which leaves you with blank entries in your arrays whenever there's a word which is already present.
To conclude, 1 works as expected but 2 doesn't.
My advice is that you don't rely on for loops to traverse the string but use a "get-next-until-end" approach which uses a while loop. With this, you can track your next insertion point and thus get rid of the blank entries.
int currentCount = 0;
while (file)
{
// your inner for loop
if (!found)
{
*(words + currentCount) = hold;
*(frequency + currentCount) = 1;
currentCount++;
}
}
Why not use a std::map?
void collect( std::string name, std::map<std::string,int> & freq ){
std::ifstream file;
file.open(name.c_str(), std::ifstream::in );
std::string word;
while( true ){
file >> word; // add toLower
if( file.eof() ) break;
freq[word]++;
}
file.close();
}
The problem with your solution is the use of count in the inner loop where you look for duplicates. You'll need another variable, say nocc, initially 0, used as limit in the inner loop and incremented whenever you add another word that hasn't been seen yet.
I cannot for the life of me figure out why this doesn't work. I'm having to do a frequency check of a list of words from a file, and when reading them in I'm trying to check the current word against the elements in the string array, and making sure they're not equal before I add it. Here's the code:
fin.open(finFile, fstream::in);
if(fin.is_open()) {
int wordArrSize;
while(!fin.eof()) {
char buffer[49]; //Max number chars of any given word in the file
wordArrSize = words.length();
fin >> buffer;
if(wordArrSize == 0) words.push_back(buffer);
for(int i = 0; i < wordArrSize; i++) { //Check the read-in word against the array
if(strcmp(words.at(i), buffer) != 0) { //If not equal, add to array
words.push_back(buffer);
break;
}
}
totNumWords++; //Keeps track of the total number of words in the file
}
fin.close();
This is for a school project. We're not allowed to use any container classes so I built a structure to handle expanding the char** array, pushing back and popping out elements, etc.
for(int i = 0; i < wordArrSize; i++) { //this part is just fine
if(strcmp(words.at(i), buffer) != 0) { //here lies the problem
words.push_back(buffer);
break;
}
}
You will enter your if statement each time the current word doesn't match the ith word in the array. So, most of the times, it will be the very first iteration when you will enter the loop. This means that in the beginning of the cycle (on the first word inside your string list that doesn't match the buffer) you will add the buffer to the string list and break the cycle.
What you should do is finish checking the whole words array, and only then add the buffer into the array. So you should have something like this:
bool bufferIsInTheArray = false;//assume that the buffered word is not in the array.
for(int i = 0; i < wordArrSize; i++) {
if(strcmp(words.at(i), buffer) == 0) {
//if we found a MATCH, we set the flag to true
//and break the cycle (because since we found a match already
//there is no point to continue checking)
bufferIsInTheArray = true;
break;
}
//if the flag is false here, that means we did not find a match in the array, and
//should add the buffer to it.
if( bufferIsInTheArray == false )
words.push_back(buffer);
}
i think your code words.push_back(buffer); should come outside the for loop.
Put a flag to check if you found the buffer in array inside for loop and according to flag add it to array outside the for loop
I have googled this question and couldn't find an answer that worked with my code so i wrote this to get the frequency of the words the only issue is that i am getting the wrong number of occurrences of words apart form one that i think is a fluke. Also i am checking to see if a word has already been entered into the vector so i don't count the same word twice.
fileSize = textFile.size();
vector<wordFrequency> words (fileSize);
int index = 0;
for(int i = 0; i <= fileSize - 1; i++)
{
for(int j = 0; j < fileSize - 1; j++)
{
if(string::npos != textFile[i].find(textFile[j]) && words[i].Word != textFile[j])
{
words[j].Word = textFile[i];
words[j].Times = index++;
}
}
index = 0;
}
Any help would be appreciated.
Consider using a std::map<std::string,int> instead. The map class will handle ensuring that you don't have any duplicates.
Using an associative container:
typedef std::unordered_map<std::string, unsigned> WordFrequencies;
WordFrequencies count(std::vector<std::string> const& words) {
WordFrequencies wf;
for (std::string const& word: words) {
wf[word] += 1;
}
return wf;
}
It is hard to get simpler...
Note: you can replace unordered_map with map if you want the worlds sorted alphabetically, and you can write custom comparisons operations to treat them case-insensitively.
try this code instead if you do not want to use a map container..
struct wordFreq{
string word;
int count;
wordFreq(string str, int c):word(str),count(c){}
};
vector<wordFreq> words;
int ffind(vector<wordFreq>::iterator i, vector<wordFreq>::iterator j, string s)
{
for(;i<j;i++){
if((*i).word == s)
return 1;
}
return 0;
}
Code for finding the no of occurrences in a textfile vector is then:
for(int i=0; i< textfile.size();i++){
if(ffind(words.begin(),words.end(),textfile[i])) // Check whether word already checked for, if so move to the next one, i.e. avoid repetitions
continue;
words.push_back(wordFreq(textfile[i],1)); // Add the word to vector as it was not checked before and set its count to 1
for(int j = i+1;j<textfile.size();j++){ // find possible duplicates of textfile[i]
if(file[j] == (*(words.end()-1)).word)
(*(words.end()-1)).count++;
}
}