I am using two dynamic arrays to read from a file. They are to keep track of each word and the amount of times it appears. If it has already appeared, I must keep track in one array and not add it into the other array since it already exists. However, I am getting blank spaces in my array when I meet a duplicate. I think its because my pointer continues to advance, but really it shouldn't. I do not know how to combat this. The only way I have was to use a continue; when I print out the results if the array content = ""; if (*(words + i) == "") continue;. This basically ignores those blanks in the array. But I think that is messy. I just want to figure out how to move the pointer back in this method. words and frequency are my dynamic arrays.
I would like guidance in what my problem is, rather than solutions.
I have now changed my outer loop to be a while loop, and only increment when I have found the word. Thank you WhozCraig and poljpocket.
Now this occurs.
Instead of incrementing your loop variable [i] every loop, you need to only increment it when a NEW word is found [i.e. not one already in the words array].
Also, you're wasting time in your inner loop by looping through your entire words array, since words will only exist up to index i.
int idx = 0;
while (file >> hold && idx < count) {
if (!valid_word(hold)) {
continue;
}
// You don't need to check past idx because you
// only have <idx> words so far.
for (int i = 0; i < idx; i++) {
if (toLower(words[i]) == toLower(hold)) {
frequency[i]++;
isFound = true;
break;
}
}
if (!isFound) {
words[idx] = hold;
frequency[idx] = 1;
idx++;
}
isFound = false;
}
First, to address your code, this is what it should probably look like. Note how we only increment i as we add words, and we only ever scan the words we've already added for duplicates. Note also how the first pass will skip the j-loop entirely and simply insert the first word with a frequency of 1.
void addWords(const std::string& fname, int count, string *words, int *frequency)
{
std::ifstream file(fname);
std::string hold;
int i = 0;
while (i < count && (file >> hold))
{
int j = 0;
for (; j<i; ++j)
{
if (toLower(words[j]) == toLower(hold))
{
// found a duplicate at j
++frequency[j];
break;
}
}
if (j == i)
{
// didn't find a duplicate
words[i] = hold;
frequency[i] = 1;
++i;
}
}
}
Second, to really address your code, this is what it should actually look like:
#include <iostream>
#include <fstream>
#include <map>
#include <string>
//
// Your implementation of toLower() goes here.
//
typedef std::map<std::string, unsigned int> WordMap;
WordMap addWords(const std::string& fname)
{
WordMap words;
std::ifstream inf(fname);
std::string word;
while (inf >> word)
++words[toLower(word)];
return words;
}
If it isn't obvious by now how a std::map<> makes this task easier, it never will be.
check out SEEK_CUR(). If you want to set the cursor back
The problem is a logical one, consider several situations:
Your algorithm does not find the current word. It is inserted at position i of your arrays.
Your algorithm does find the word. The frequency of the word is incremented along with i, which leaves you with blank entries in your arrays whenever there's a word which is already present.
To conclude, 1 works as expected but 2 doesn't.
My advice is that you don't rely on for loops to traverse the string but use a "get-next-until-end" approach which uses a while loop. With this, you can track your next insertion point and thus get rid of the blank entries.
int currentCount = 0;
while (file)
{
// your inner for loop
if (!found)
{
*(words + currentCount) = hold;
*(frequency + currentCount) = 1;
currentCount++;
}
}
Why not use a std::map?
void collect( std::string name, std::map<std::string,int> & freq ){
std::ifstream file;
file.open(name.c_str(), std::ifstream::in );
std::string word;
while( true ){
file >> word; // add toLower
if( file.eof() ) break;
freq[word]++;
}
file.close();
}
The problem with your solution is the use of count in the inner loop where you look for duplicates. You'll need another variable, say nocc, initially 0, used as limit in the inner loop and incremented whenever you add another word that hasn't been seen yet.
Related
I need to find the most frequently occurring word and return that value. I must use hash maps and the fuction would take a file name. This is what I've done so far but im very confused.
int most_frequent_word(string filename)
{
string words;
ifstream in(filename.c_str());
unordered_map<string, int> word_map;
while(in >> words)
{
for(int i = 0; i < 100; i++)
{
word_map[words[i]]++;
}
}
return words;
}
any help would be appreciated it. Thanks!
There are several issues in your code that might cause it to work not as expected.
First is for i loop. Why would you need taht loop at all? Leave it like this, you need to count words.
while(in >> words)
{
word_map[words]++;
}
Rename words to word, actually you are reading one word here in >> words.
The third is return statement. You cannot return string when it is declared that function returns int.
However there is nothing to return yet, because so far we only know the number each word occurred. Run a loop to find max value.
int result = 0;
for(unordered_map<string, int>::iterator it = word_map.begin(); it != word_map.end(); it++)
result = max(result, it->second);
return result;
Here word_map consists of pairs of a word and its number of occurrences. We need to iterate over all of these pairs looking for max occurrences. To do this we use iterator it.
I'm also confused!
for(int i = 0; i < 100; i++)
{
word_map[words[i]]++;
}
What are you doing here? Where does the 100 come from? Why do you care for single letters of your words (which is what words[i] gets you)?
If I understand your task correctly, wouldn't it suffice to
++word_map[words];
instead?
Also why do you return words? It's a string, and your function should return and int. Instead find the largest value in your map and you're done.
Sorry for the title, but I really have no idea what the problem is. The code looks like that (here it has no sense, but in the bigger project is has, so please, do not ask "why do you want to do....")
#include <iostream>
#include <vector>
#include <fstream>
using namespace std;
string sort (string slowo){
string litery = slowo;
for (int i=0; i<litery.length()-1; i++)
for (int j=0; j<litery.length()-1; j++)
if (litery[j]>litery[j+1])
swap(litery[j], litery[j+1]); // (3)
return litery;
}
int main()
{
fstream wordlist;
wordlist.open("wordlist_test",ios::in);
vector<string> words;
while (!wordlist.eof()){ // (4)
bool ok = true;
string word;
getline(wordlist,word);
string sorted = sort(word);
if (ok){
cout<<word<<endl; // (1)
words.push_back(word);
}
}
for (int i = 0; i<words.size(); i++){
cout<<words[i]<<endl; // (2)
}
}
There are for words in file "wordlist_tests". Program at the end should just write them to vector and write what's in vector into standard output. The problem is:
however line(1) proves that all words are ok
vector appears to be
empty in line (2)
now iteresting (probably just for me) part:
there are two ways to make it right:
I can just remove line(3) (however, if I am right, as the variable is passed to sort function through the value, it just swap two letters in independent variable; it has nothing to do with my vector), or:
I can change condition in while loop (4).
for example just like this:
int tmp = 0;
while (tmp < 5){
tmp++;
/..../
What is wrong with this code? What should I do write these words down to vector but still sort them and using this while loop? I cannot find the connection between this things (ok, I see that connection is variable word, but I do not know in what way). Any help appreciate.
What happens in swap() if one of the words is the empty sting ""?
If this happens, litery = "".
The condition in the loops will be to iterate from 0 to (unsigned) 0 - 1, which is a very large number.
You'll then execute if (litery[0] > litery[1])
litery[1] will access beyond the end of the empty string, which causes undefined behavior.
Let's fix this:
The common fix for this, is to iterate from 1 to string.length(). Here's an example:
string sort (string litery){
for (int i=1; i<litery.length(); i++)
for (int j=1; j<litery.length(); j++)
if (litery[j-1]>litery[j])
swap(litery[j-1], litery[j]);
return litery;
}
I've got a task that I'm stuck on. I need to create a program that reads an input file, stores each word into a vector along with how many times that word was read (hence the struct). Those values then need to print out in alphabetical order.
I've come up with something that I think is along the right lines:
struct WordInfo {
string text;
int count;
} uwords, temp;
string word;
int count = 0; //ignore this. For a different part of the task
vector<WordInfo> uwords;
while (cin >> word) {
bool flag = false;
count += 1;
for (int i = 0; i < uwords.size(); i++) {
if (uwords[i].text == word) {
flag = true;
uwords[i].count += 1;
}
}
if (flag == false) {
if (count == 1) { //stores first word into vector
uwords.push_back(WordInfo());
uwords[0].count = 1;
uwords[0].text = word;
} else {
for (int i = 0; i < uwords.size(); i++) {
if (word < uwords[i].text) {
uwords.push_back(WordInfo());
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
}
}
}
}
}
Now the problem I'm having, is that when I run the program it appears to get stuck in an infinite loop and I can't see why. Although I've done enough testing to realise it's probably in that last if statement, but my attempts to fix it were no good. Any help is appreciated. Cheers.
EDIT: I forgot to mention, we must use vector class and we're limited in what we can use, and sort is not an option :(
if (word < uwords[i].text) {
uwords.push_back(WordInfo());
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
}
Take a good look at this piece of code:
First, it will actually insert 2 words into your list; one time an "empty" one with push_back, and one time with insert. And it will do that whenever the current word is smaller than the one at the position i.
And as soon as it has inserted, there's 2 new elements to walk over; one actually being at the current position of i, so in the next iteration, we will again compare the same word - so your loop gets stuck because index i increases by 1 each iteration, but the increase of i only steps over the just inserted element!
For a quick solution, you want to (1) search for the position where the word before is "smaller" than the current one, but the next one is bigger. Something like
if (uwords[i-1].text < word && word < uwords[i].text) {
(2) and you want to get rid of the push_back call.
Furthermore, (3) you can break the loop after the if condition was true - you have already inserted then, no need to iterate further. And (4), with a bit of condition tweaking, the count == 1 can actually be merged into the loop. Modified code part (will replace your whole if (code == false) block - warning, not tested yet):
if (!flag) {
for (int i = 0; i <= uwords.size(); ++i) {
if ((i == 0 || uwords[i-1].text < word) &&
(i == uwords.size() || word < uwords[i].text)) {
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
break;
}
}
}
You should not push your words nin vector, but in map
std::map<std::string,int>
Since map has comparable keys iterator over map, automaticaly returns sorted range that can be later pushed in vector if needed.
I am quite in to C++ and have a recent homework assignment which I need to store 1000 most common words into a string array. I was wondering how would I go about this. Here is my example code so far,
if(infile.good() && outfile.good())
{
//save 1000 common words to a string
for (int i=0; i<1000; i++)
{
totalWordsIn1000MostCommon++;
break;
}
while (infile.good())
{
string commonWords[1000];
infile >> commonWords;
}
}
Thanks!
#include <cstdio>
#include <string>
freopen(inputfileName,"r",stdin);
const int words = 1000;
string myArr[words];
for(int i=0;i<words;i++){
string line;
getline(cin,line);
myArr[i] = line;
}
The for loop above does nothing at the beginning, just breaks at first iteration. It would be better if you'll read how to use loops in C++. Also take a look at the scopes of the variables in C++. In your case commonWords declared in while loop, so will be created each time and destroyed after each loop iteration.
What you want is something like this:
int i = 0;
std::string commonWords[1000];
while (i < 1000 && infile.good()) {
infile >> commonWords[i];
++i;
}
I'm living the remaining part for you to complete your homework.
I have googled this question and couldn't find an answer that worked with my code so i wrote this to get the frequency of the words the only issue is that i am getting the wrong number of occurrences of words apart form one that i think is a fluke. Also i am checking to see if a word has already been entered into the vector so i don't count the same word twice.
fileSize = textFile.size();
vector<wordFrequency> words (fileSize);
int index = 0;
for(int i = 0; i <= fileSize - 1; i++)
{
for(int j = 0; j < fileSize - 1; j++)
{
if(string::npos != textFile[i].find(textFile[j]) && words[i].Word != textFile[j])
{
words[j].Word = textFile[i];
words[j].Times = index++;
}
}
index = 0;
}
Any help would be appreciated.
Consider using a std::map<std::string,int> instead. The map class will handle ensuring that you don't have any duplicates.
Using an associative container:
typedef std::unordered_map<std::string, unsigned> WordFrequencies;
WordFrequencies count(std::vector<std::string> const& words) {
WordFrequencies wf;
for (std::string const& word: words) {
wf[word] += 1;
}
return wf;
}
It is hard to get simpler...
Note: you can replace unordered_map with map if you want the worlds sorted alphabetically, and you can write custom comparisons operations to treat them case-insensitively.
try this code instead if you do not want to use a map container..
struct wordFreq{
string word;
int count;
wordFreq(string str, int c):word(str),count(c){}
};
vector<wordFreq> words;
int ffind(vector<wordFreq>::iterator i, vector<wordFreq>::iterator j, string s)
{
for(;i<j;i++){
if((*i).word == s)
return 1;
}
return 0;
}
Code for finding the no of occurrences in a textfile vector is then:
for(int i=0; i< textfile.size();i++){
if(ffind(words.begin(),words.end(),textfile[i])) // Check whether word already checked for, if so move to the next one, i.e. avoid repetitions
continue;
words.push_back(wordFreq(textfile[i],1)); // Add the word to vector as it was not checked before and set its count to 1
for(int j = i+1;j<textfile.size();j++){ // find possible duplicates of textfile[i]
if(file[j] == (*(words.end()-1)).word)
(*(words.end()-1)).count++;
}
}