Unexpected Values Being Read Into Memory with getline() - C++

Unexpected Values Being Read Into Memory with getline() - C++ - c++

I wrote this program for an intro to C++ course. My issue is that unexpected values are being stored in memory. I assume it has to do with input.getline() or the way certain characters are stored, but I don't know enough about what is happening "under the hood" to fix it.
Specifically, certain characters like apostrophes and quotation marks appear to not read as their hex ASCII counterparts.
I'm pretty certain the issue lies in the lines
input.getline(raw_paragraph, MAX_PARAGRAPH_CHARS);
charCount = strlen(raw_paragraph);
Below I've included the complete code, a screenshot of the Memory from Visual Studio 2022, the test case, and the program output .
Thank you in advance!
#pragma warning(disable : 4996) //DEV
/**************************************************************************************
Header Content
**************************************************************************************/
// Includes and namespaces ------------------------------------------------------------
#include <cstdlib> // Defines functions such as exit().
#include <cstring> // Defines functions such as strcmp, etc.
#include <fstream> // Supports file I/O
#include <iostream> // Supports terminal I/O
using namespace std;
// Constants Declared -----------------------------------------------------------------
// Maximum allowable space for input / Defines space for memory allocation
const int MAX_WORD_CHARS = 50; // Longest word = 50 chars
const int MAX_WORDS = 1000; // Longest paragraph = 1000 words
const int MAX_PARAGRAPH_CHARS = 50000; // 50 * 1000
// "to be" Semantics
const char TO[] = "to";
const char BE[] = "be";
const int NUM_TO_BE_VERBS = 5; // Qty of "to be verbs below
const char TO_BE_VERBS[NUM_TO_BE_VERBS][MAX_WORD_CHARS] =
{ "am", "are", "is", "was", "were" };
// Conjunctions
const int NUM_CONJUNCTIONS = 7; // Qty objects in CONJUNCTIONS below.
const char CONJUNCTIONS[NUM_CONJUNCTIONS][MAX_WORD_CHARS] =
{ "for", "and", "nor", "but", "or", "yet", "so" };
// Punctuation
const int NUM_PUNCTUATIONS = 4;
const char PUNCTUATIONS[NUM_PUNCTUATIONS] = { '.', ',', '?', '!' };
// Functions Declared -----------------------------------------------------------------
int countComplex(char a[][MAX_WORD_CHARS], int b);
int countSentences(char a[], int b);
int count_to_be_verbs(char a[][MAX_WORD_CHARS], int wc);
void init_array(char* a);
void modify_tokens(char a[][MAX_WORD_CHARS], int wc);
int tokenizeParagraph(char p[], char tp[][MAX_WORD_CHARS]);
/**************************************************************************************
Begin Main
**************************************************************************************/
int main()
{
// Format Output
cout.setf(ios::fixed);
cout.setf(ios::showpoint);
cout.precision(1);
// Create input space for user's file request
char filename[256]; // Stores the user defined filename containing plaintext
init_array(filename);
char raw_paragraph[MAX_PARAGRAPH_CHARS]; // Stores plaintext from filename
init_array(raw_paragraph);
// Declare Variables
int charCount = 0; // Number of chars contained in input file except eof.
int complex_count; // Number of complex sentences
int sentenceCount = 0; // Total number of sentences in input
int simpleSent = 0; // Number of simple sentences in input
int to_be_count; // Number of instances of "to be" verbs in input.
int wordCount = 0; // Number of words in input
double averageWordsPerSentence;
// Asks the user for the name of an input file which contains a paragraph
cout << "Enter a filename: ";
cin.getline(filename, 256);
// Try to load the file in filename:
ifstream input;
input.open(filename);
// If file does not exist, cout error then exit(1)
if (input.fail())
{
cout << "Input file " << filename << " does not exist." << endl;
cout << "Thank you for using the English Analyzer." << endl;
exit(1);
}
// If file is empty, cout "Input file _____ is empty." Then exit(1)
char c;
input.get(c);
if (input.eof())
{
cout << "File " << filename << " is empty." << endl;
cout << "Thank you for using the English Analyzer." << endl;
exit(1);
}
else
input.putback(c);
// Store plaintext from file to raw_paragraph
input.getline(raw_paragraph, MAX_PARAGRAPH_CHARS);
// Close ifstream input, will not need it again.
input.close();
// Allocate memory for the output of tokenizeParagraph
char tkn_para[MAX_WORDS][MAX_WORD_CHARS];
// Count chars
charCount = strlen(raw_paragraph);
// Tokenize paragraph, count words
wordCount = tokenizeParagraph(raw_paragraph, tkn_para);
// Count Sentences
sentenceCount = countSentences(raw_paragraph, charCount);
// Average words per sentence
averageWordsPerSentence = double(wordCount) / double(sentenceCount);
// Count Complex Sentences
complex_count = countComplex(tkn_para, wordCount);
// Calculate Simple Sentences
simpleSent = sentenceCount - complex_count;
// Count "to be" verbs
modify_tokens(tkn_para, wordCount);
to_be_count = count_to_be_verbs(tkn_para, wordCount);
// Cout results
cout << "Number of Characters: " << charCount << endl;
cout << "Number of words: " << wordCount << endl;
cout << "Number of sentences: " << sentenceCount << endl;
cout << "Average number words in a sentence: " << averageWordsPerSentence << endl;
cout << "Number of simple sentences: " << simpleSent << endl;
cout << "Number of \"to be\" verbs: " << to_be_count << endl;
}
/**************************************************************************************
Function Definitions
**************************************************************************************/
int countComplex(char a[][MAX_WORD_CHARS], int b)
{
// counter will keep the number of complex sentences found.
int counter = 0;
// For each word in tkn_para,
for (int i = 0; i < b; i++)
{
// If a comma is at the end of tkn_m,
int s = strlen(a[i]) -1;
if (a[i][s] == ',')
{
// For each word in CONJUNCTIONS
for (int x = 0; x < NUM_CONJUNCTIONS; x++)
{
// If the words match,
if (strcmp(a[i + 1], CONJUNCTIONS[x]) == 0)
{
// Increment counter
counter++;
// If a word from a has already been matched, there
// is no reason to try to compare it to more items
// from CONJUNCTIONS. Therefore,
break;
}
}
}
}
// After all iteration has been completed:
return(counter);
}
int countSentences(char a[], int b)
{
int counter = 0;
// For each char in a[]
for (int i = 0; i < b; i++)
{
// If a[i] is an end of sentence punctuation,
if (a[i] == '.' || a[i] == '?' || a[i] == '!')
// Increment counter
counter++;
}
return counter;
}
int count_to_be_verbs(char a[][MAX_WORD_CHARS], int wc)
{
int counter = 0;
// For each word in a:
for (int i = 0; i < wc; i++)
// For each word in TO_BE_VERBS:
for (int y = 0; y < NUM_TO_BE_VERBS; y++)
{
// If words match:
if (strcmp(a[i], TO_BE_VERBS[y]) == 0)
counter++;
}
// For loop checks for "to" token followed by "be"
for (int i = 0; i < wc; i++)
if (strcmp(a[i], TO) == 0 && strcmp(a[i + 1], BE) == 0)
counter++;
return(counter);
}
void init_array(char* a)
{
// For every char in a:
for (int i = 0; i < strlen(a); i++)
// Set the value of a[i] to NULL
a[i] = NULL;
}
void modify_tokens(char a[][MAX_WORD_CHARS], int wc)
{
// For each word in a:
for (int i = 0; i < wc; i++)
{
// Does the computation once instead of 4 times below.
int s = strlen(a[i]) -1;
// Converts first char if uppercase, into lowercase
if (int('#') < a[i][0] && a[i][0] < int('['))
a[i][0] = a[i][0] + 32;
// Convert last char, if punctuation mark, into NULL
if (a[i][s] == ',' || a[i][s] == '!' || a[i][s] == '?' || a[i][s] == '.')
a[i][s] = NULL;
}
}
int tokenizeParagraph(char p[], char tp[][MAX_WORD_CHARS])
{
int i = 0;
char* cPtr;
cPtr = strtok(p, " \n\t");
while (cPtr != NULL)
{
strcpy(tp[i], cPtr);
i++;
cPtr = strtok(NULL, " \n\t");
}
return(i);
}

Turns out that the issue has to do with copying the test case into MS Word or another like application. There are character equivalents to apostrophes and quotation marks that "lean" left or right. Those characters are actually distinct and are responsible for the memory values I've been encountering. It suggests to me that a future iteration of the code would have to parse the raw input for those types of characters and replace them.

Related

Converting this program from 1 function to a modular, multi-function program?

I am very new to C++ and have a program which I included below.
The program I am working on reads text from an input file and counts the number of words and number of occurrences of each letter in the text and then prints the results. My program is working fine but the problem is all code is written in the main function and I need to break it up into a couple more functions to make the program modular, but I am unsure of how to go about doing this.
I am sure this is pretty simple but I'm not sure where to start. I was thinking of implementing two void functions, one for reading / interpreting what is read from the data file and another that displays the results; and then call them both in the main function, but I'm not sure what to take as arguments for those functions.
int main()
{
// Declaring variables
char c; // char that will store letters of alphabet found in the data file
int count[26] = {0}; // array that will store the # of occurences of each letter
int words = 1; // int that will store the # of words
string s; // declaring string found in data file
// Opening input file stream
ifstream in;
in.open("word_data.txt");
// Reading text from the data file
getline(in, s);
//cout << s << endl;
// If input file fails to open, displays an error message
if (in.fail())
{
cout << "Input file did not open correctly" << endl;
}
// For loop for interpreting what is read from the data file
for (int i = 0; i < s.length(); i++) {
// Increment word count if new line or space is found
if (s[i] == ' ' || s[i] == '\n')
words++;
//If upper case letter is found, convert to lower case.
if (s[i] >= 'A' && s[i] <= 'Z')
s[i] = (tolower(s[i]));
//If the letters are found, increment the counter for each letter.
if (s[i] >= 'a' && s[i] <= 'z')
count[s[i] - 97]++;
}
// Display the words count
cout << words << " words" << endl;
// Display the count of each letter
for (int i = 0; i < 26; i++) {
if (count[i] != 0) {
c = i + 97;
cout << count[i] << " " << c << endl;
}
}
// Always close opened files
in.close();
return 0;
}

I would rewrite it like:
class FileReader {
public:
FileReader() {
// Any init logic goes here...
}
~FileReader() {
// Always close opened files.
in.close();
}
void open(std::string &filePath) {
in.open(filePath);
}
std::string readLine() {
std::string s;
getline(in, s);
return s;
}
bool hasErrors() const { // remove const if you get compile-error here.
return in.fail();
}
private:
ifstream in;
};
class LetterCounter {
public:
void process(std::string &s) {
// For loop for interpreting what is read from the data file
for (int i = 0; i < s.length(); i++) {
// Increment word count if new line or space is found
if (s[i] == ' ' || s[i] == '\n')
words++;
//If upper case letter is found, convert to lower case.
if (s[i] >= 'A' && s[i] <= 'Z')
s[i] = (tolower(s[i]));
//If the letters are found, increment the counter for each letter.
if (s[i] >= 'a' && s[i] <= 'z')
count[s[i] - 97]++;
}
}
void logResult() {
char c; // char that will store letters of alphabet found in the data file.
// Display the words count
cout << words << " words" << endl;
// Display the count of each letter
for (int i = 0; i < 26; i++) {
if (count[i] != 0) {
c = i + 97;
cout << count[i] << " " << c << endl;
}
}
}
private:
int count[26] = {0}; // array that will store the # of occurences of each letter
int words = 1; // int that will store the # of words
};
int main()
{
// Opening input file stream.
FileReader reader;
reader.open("word_data.txt");
// Reading text from the data file.
std::string s = reader.readLine();
// If input file fails to open, displays an error message
if (reader.hasErrors()) {
cout << "Input file did not open correctly" << endl;
return -1;
}
LetterCounter counter;
counter.process(s);
// Display word and letter count.
counter.logResult();
return 0;
}
Note that I did write without testing (excuse any mistake),
but this should give you a general idea how it should be.

How to count words in a file?

I'm creating a program that counts how many words there are in a input file. I can't seem to figure out how to make it define a word with either whitespace, a period, a comma, or the beginning or end of a line.
Contents of input file:
hello world ALL is great. HELLO WORLD ALL IS GREAT. hellO worlD alL iS great.
Output should be 15 words meanwhile my output is 14
I've tried adding or's that include periods, commas etc. but it just counts those on top of the spaces as well.
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
//Function Declarations
void findFrequency(int A[], string &x);
void findWords(int A[], string &x);
//Function Definitions
void findFrequency(int A[], string &x)
{
//Counts the number of occurences in the string
for (int i = 0; x[i] != '\0'; i++)
{
if (x[i] >= 'A' && x[i] <= 'Z')
A[toascii(x[i]) - 64]++;
else if (x[i] >= 'a' && x[i] <= 'z')
A[toascii(x[i]) - 96]++;
}
//Displaying the results
char ch = 'a';
for (int count = 1; count < 27; count++)
{
if (A[count] > 0)
{
cout << A[count] << " : " << ch << endl;
}
ch++;
}
}
void findWords(int A[], string &x)
{
int wordcount = 0;
for (int count = 0; x[count] != '\0'; count++)
{
if (x[count] == ' ')
{
wordcount++;
A[0] = wordcount;
}
}
cout << A[0] << " Words " << endl;
}
int main()
{
string x;
int A[27] = { 0 }; //Array assigned all elements to zero
ifstream in; //declaring an input file stream
in.open("mytext.dat");
if (in.fail())
{
cout << "Input file did not open correctly" << endl;
}
getline(in,x);
findWords(A, x);
findFrequency(A, x);
in.close();
system("pause");
return 0;
}
The output should be 15 when the result I am getting is 14.

Perhaps this is what you need?
size_t count_words(std::istream& is) {
size_t co = 0;
std::string word;
while(is >> word) { // read a whitespace separated chunk
for(char ch : word) { // step through its characters
if(std::isalpha(ch)) {
// it contains at least one alphabetic character so
// count it as a word and move on
++co;
break;
}
}
}
return co;
}

Here is an approach with a few test cases as well.
The test cases are a series of char arrays with particular strings to test the findNextWord() method of the RetVal struct/class.
char line1[] = "this is1 a line. \t of text \n "; // multiple white spaces
char line2[] = "another line"; // string that ends with zero terminator, no newline
char line3[] = "\n"; // line with newline only
char line4[] = ""; // empty string with no text
And here is the actual source code.
#include <iostream>
#include <cstring>
#include <cstring>
struct RetVal {
RetVal(char *p1, char *p2) : pFirst(p1), pLast(p2) {}
RetVal(char *p2 = nullptr) : pFirst(nullptr), pLast(p2) {}
char *pFirst;
char *pLast;
bool findNextWord()
{
if (pLast && *pLast) {
pFirst = pLast;
// scan the input line looking for the first non-space character.
// the isspace() function indicates true for any of the following
// characters: space, newline, tab, carriage return, etc.
while (*pFirst && isspace(*pFirst)) pFirst++;
if (pFirst && *pFirst) {
// we have found a non-space character so now we look
// for a space character or the end of string.
pLast = pFirst;
while (*pLast && ! isspace(*pLast)) pLast++;
}
else {
// indicate we are done with this string.
pFirst = pLast = nullptr;
}
}
else {
pFirst = nullptr;
}
// return value indicates if we are still processing, true, or if we are done, false.
return pFirst != nullptr;
}
};
void printWords(RetVal &x)
{
int iCount = 0;
while (x.findNextWord()) {
char xWord[128] = { 0 };
strncpy(xWord, x.pFirst, x.pLast - x.pFirst);
iCount++;
std::cout << "word " << iCount << " is \"" << xWord << "\"" << std::endl;
}
std::cout << "total word count is " << iCount << std::endl;
}
int main()
{
char line1[] = "this is1 a line. \t of text \n ";
char line2[] = "another line";
char line3[] = "\n";
char line4[] = "";
std::cout << "Process line1[] \"" << line1 << "\"" << std::endl;
RetVal x (line1);
printWords(x);
std::cout << std::endl << "Process line2[] \"" << line2 << "\"" << std::endl;
RetVal x2 (line2);
printWords(x2);
std::cout << std::endl << "Process line3[] \"" << line3 << "\"" << std::endl;
RetVal x3 (line3);
printWords(x3);
std::cout << std::endl << "Process line4[] \"" << line4 << "\"" << std::endl;
RetVal x4(line4);
printWords(x4);
return 0;
}
And here is the output from this program. In some cases the line to be processed has a new line in it which affects the output by performing a new line when printed to the console.
Process line1[] "this is1 a line. of text
"
word 1 is "this"
word 2 is "is1"
word 3 is "a"
word 4 is "line."
word 5 is "of"
word 6 is "text"
total word count is 6
Process line2[] "another line"
word 1 is "another"
word 2 is "line"
total word count is 2
Process line3[] "
"
total word count is 0
Process line4[] ""
total word count is 0
If you need to treat punctuation similar to white space, as something to be ignored, then you can modify the findNextWord() method to include the ispunct() test of characters in the loops as in:
bool findNextWord()
{
if (pLast && *pLast) {
pFirst = pLast;
// scan the input line looking for the first non-space character.
// the isspace() function indicates true for any of the following
// characters: space, newline, tab, carriage return, etc.
while (*pFirst && (isspace(*pFirst) || ispunct(*pFirst))) pFirst++;
if (pFirst && *pFirst) {
// we have found a non-space character so now we look
// for a space character or the end of string.
pLast = pFirst;
while (*pLast && ! (isspace(*pLast) || ispunct (*pLast))) pLast++;
}
else {
// indicate we are done with this string.
pFirst = pLast = nullptr;
}
}
else {
pFirst = nullptr;
}
// return value indicates if we are still processing, true, or if we are done, false.
return pFirst != nullptr;
}
In general if you need to refine the filters for the beginning and ending of words, you can modify those two places with some other function that looks at a character and classifies it as either a valid character for a word or not.

How to debug my word search code in C++?

I need some help with my C++ word search code. I got it to work but not as what I intended.
When I run the code it scans my text file I inputted and only output one of the words in the text file that match the array in my code. When I add other words to the text file that is in the array it gives me a error.
And can someone please help me change it so it more in c++ code?
This is what I want to look like:
And this is what I get:
#include <fstream>
#include <iostream>
#include <cmath>
#include <iomanip>
#define SIZE 30
using namespace std;
const char *Phish[SIZE] ={"Amazon","official","bank","security",
"urgent","Alert","important","inform ation", "ebay", "password", "credit", "verify",
"confirm", "account","bill", "immediately", "address", "telephone","SSN", "charity",
"check", "secure", "personal", "confidential",
"ATM", "warning","fraud","Citibank","IRS", "paypal"};
int point[SIZE] = {2,2,1,1,1,1,2,2,3,3,3,1,1,1,1,1,2,2,3,2,1,1,1,1,2,2,2,2,2,1};
int totalPoints[SIZE];
void outputResults();
int main(void)
{
FILE *cPtr;
char filename[100];
char message[5000];
char *temp[100];
int i;
int counter=0;
int words=0;
char *tokenPtr;
cout << "Enter the name of the file to be read: \n";
cin >> filename;
if ( (cPtr = fopen(filename,"rb")) == NULL)
{
cout <<"File cannot be opened.\n";
}
else
{
fgets(message, 5000, cPtr);
tokenPtr = strtok(message, " ");
temp[0] = tokenPtr;
while (tokenPtr!=NULL)
{
for(i=0; i< SIZE; i++)
{
if(strncmp(temp[0], Phish[i], strlen(Phish[i]))==0)
{
totalPoints[i]++;
break;
}
tokenPtr =strtok(NULL, " ");
temp[0] = tokenPtr;
words++;
}
outputResults();
cout << "\n";
return 0;
}
}
}
void outputResults()
{
int i;
int count =0;
int a;
cout<<left<<setw(5) << "WORD "
<< setw(7)<<"# OF OCCURRENCE "
<< setw(15)<<"POINT TOTAL";
for(i=0; i<SIZE; i++)
{
if(totalPoints[i] !=0)
{
cout<<"\n"<<left << setw(10)<< Phish[i]
<< setw(11)<< totalPoints[i]
<< setw(13)<< point[i]*totalPoints[i];
count += point[i] * totalPoints[i];
}
}
cout<< "\nPoint total for entire file: \n"<< count;
}

First, your for loop is not written correctly. What you want to do is take a word from the file, and search your phishing array for that word. Right now, you are going through your phishing array without starting at the beginning of this array when you get a word.
Instead you should be taking a word, and then loop through the phishing array to see if the word exists. If it's not found, then get the next word, check if it's in the array starting from the beginning, etc.
fgets(message, 5000, cPtr);
tokenPtr = strtok(message, " ");
while (tokenPtr != NULL)
{
// start the search in the array
for (i = 0; i < SIZE; i++)
{
if (strncmp(tokenPtr, Phish[i], strlen(Phish[i])) == 0)
{
totalPoints[i]++;
break;
}
}
// get the next word in the file
tokenPtr = strtok(NULL, " ");
words++;
}
outputResults();
You also don't need temp[0] at all in the program. Just use tokenPtr.

Trouble with dynamic arrays and string occurence (C++)

I am working on a lab for my C++ class. I have a very basic working version of my lab running, however it is not quite how it is supposed to be.
The assignment:
Write a program that reads in a text file one word at a time. Store a word into a dynamically created array when it is first encountered. Create a parallel integer array to hold a count of the number of times that each particular word appears in the text file. If the word appears in the text file multiple times, do not add it into your dynamic array, but make sure to increment the corresponding word frequency counter in the parallel integer array. Remove any trailing punctuation from all words before doing any comparisons.
Create and use the following text file containing a quote from Bill Cosby to test your program.
I don't know the key to success, but the key to failure is trying to please everybody.
At the end of your program, generate a report that prints the contents of your two arrays in a format similar to the following:
Word Frequency Analysis
Word Frequency
I 1
don't 1
know 1
the 2
key 2
...
I can figure out if a word repeats more than once in the array, but I cannot figure out how to not add/remove that repeated word to/from the array. For instance, the word "to" appears three times, but it should only appear in the output one time (meaning it is in one spot in the array).
My code:
using namespace std;
int main()
{
ifstream file;
file.open("Quote.txt");
if (!file)
{
cout << "Error: Failed to open the file.";
}
else
{
string stringContents;
int stringSize = 0;
// find the number of words in the file
while (file >> stringContents)
{
stringSize++;
}
// close and open the file to start from the beginning of the file
file.close();
file.open("Quote.txt");
// create dynamic string arrays to hold the contents of the file
// these will be used to compare with each other the frequency
// of the words in the file
string *mainContents = new string[stringSize];
string *compareContents = new string[stringSize];
// holds the frequency of each word found in the file
int frequency[stringSize];
// initialize frequency array
for (int i = 0; i < stringSize; i++)
{
frequency[i] = 0;
}
stringContents = "";
cout << "Word\t\tFrequency\n";
for (int i = 0; i < stringSize; i++)
{
// if at the beginning of the iteration
// don't check for the reoccurence of the same string in the array
if (i == 0)
{
file >> stringContents;
// convert the current word to a c-string
// so we can remove any trailing punctuation
int wordLength = stringContents.length() + 1;
char *word = new char[wordLength];
strcpy(word, stringContents.c_str());
// set this to no value so that if the word has punctuation
// needed to remove, we can modify this string
stringContents = "";
// remove punctuation except for apostrophes
for (int j = 0; j < wordLength; j++)
{
if (ispunct(word[j]) && word[j] != '\'')
{
word[j] = '\0';
}
stringContents += word[j];
}
mainContents[i] = stringContents;
compareContents[i] = stringContents;
frequency[i] += 1;
}
else
{
file >> stringContents;
int wordLength = stringContents.length() + 1;
char *word = new char[wordLength];
strcpy(word, stringContents.c_str());
// set this to no value so that if the word has punctuation
// needed to remove, we can modify this string
stringContents = "";
for (int j = 0; j < wordLength; j++)
{
if (ispunct(word[j]) && word[j] != '\'')
{
word[j] = '\0';
}
stringContents += word[j];
}
// stringContents = "dont";
//mainContents[i] = stringContents;
compareContents[i] = stringContents;
// search for reoccurence of the word in the array
// if the array already contains the word
// don't add the word to our main array
// this is where I am having difficulty
for (int j = 0; j < stringSize; j++)
{
if (compareContents[i].compare(compareContents[j]) == 0)
{
frequency[i] += 1;
}
else
{
mainContents[i] = stringContents;
}
}
}
cout << mainContents[i] << "\t\t" << frequency[i];
cout << "\n";
}
}
file.close();
return 0;
}
I apologize if the code is difficult to understand/follow through. Any feedback is appreciated :]

If you use stl, the entire problem can be solved easily, with less coding.
#include <iostream>
#include <fstream>
#include <string>
#include <unordered_map>
#include <algorithm>
using namespace std;
int main()
{
ifstream file("Quote.txt");
string aword;
unordered_map<string,int> wordFreq;
if (!file.good()) {
cout << "Error: Failed to open the file.";
return 1;
}
else {
while( file >> aword ) {
aword.erase(remove_if(aword.begin (), aword.end (), ::ispunct), aword.end ()); //Remove Punctuations from string
unordered_map<string,int>::iterator got = wordFreq.find(aword);
if ( got == wordFreq.end() )
wordFreq.insert(std::make_pair<string,int>(aword.c_str(),1)); //insert the unique strings with default freq 1
else
got->second++; //found - increment freq
}
}
file.close();
cout << "\tWord Frequency Analyser\n"<<endl;
cout << " Frequency\t Unique Words"<<endl;
unordered_map<string,int>::iterator it;
for ( it = wordFreq.begin(); it != wordFreq.end(); ++it )
cout << "\t" << it->second << "\t\t" << it->first << endl;
return 0;
}

The algorithm that you use is very complex for such a simple task. Here is what you sahll do:
Ok, first reading pass for determining the maximum size of the
array
Then second reading pass, look directly at what to do: if string is already in the table just increment its frequency, otherwise add it to the table.
Output the table
The else block of your code would then look like:
string stringContents;
int stringSize = 0;
// find the number of words in the file
while (file >> stringContents)
stringSize++;
// close and open the file to start from the beginning of the file
file.close();
file.open("Quote.txt");
string *mainContents = new string[stringSize]; // dynamic array for strings found
int *frequency = new int[stringSize]; // dynamic array for frequency
int uniqueFound = 0; // no unique string found
for (int i = 0; i < stringSize && (file >> stringContents); i++)
{
//remove trailing punctuations
while (stringContents.size() && ispunct(stringContents.back()))
stringContents.pop_back();
// process string found
bool found = false;
for (int j = 0; j < uniqueFound; j++)
if (mainContents[j] == stringContents) { // if string already exist
frequency[j] ++; // increment frequency
found = true;
}
if (!found) { // if string not found, add it !
mainContents[uniqueFound] = stringContents;
frequency[uniqueFound++] = 1; // and increment number of found
}
}
// display results
cout << "Word\t\tFrequency\n";
for (int i=0; i<uniqueFound; i++)
cout << mainContents[i] << "\t\t" << frequency[i] <<endl;
}
Ok, it's an assignment. So you have to use arrays. Later you could sumamrize this code into:
string stringContents;
map<string, int> frequency;
while (file >> stringContents) {
while (stringContents.size() && ispunct(stringContents.back()))
stringContents.pop_back();
frequency[stringContents]++;
}
cout << "Word\t\tFrequency\n";
for (auto w:frequency)
cout << w.first << "\t\t" << w.second << endl;
and even have the words sorted alphabetically.

Depending on whether or not your assignment requires that you use an 'array', per se, you could consider using a std::vector or even a System::Collections::Generic::List for C++/CLI.
Using vectors, your code might look something like this:
#include <vector>
#include <string>
#include <fstream>
#include <iostream>
using namespace std;
int wordIndex(string); //Protoype a function to check if the vector contains the word
void processWord(string); //Prototype a function to handle each word found
vector<string> wordList; //The dynamic word list
vector<int> wordCount; //The dynamic word count
void main() {
ifstream file("Quote.txt");
if (!file) {
cout << "Error: Failed to read file" << endl;
} else {
//Read each word into the 'word' variable
string word;
while (!file.eof()) {
file >> word;
//Algorithm to remove punctuation here
processWord(word);
}
}
//Write the output to the console
for (int i = 0, j = wordList.size(); i < j; i++) {
cout << wordList[i] << ": " << wordCount[i] << endl;
}
system("pause");
return;
}
void processWord(string word) {
int index = wordIndex(word); //Get the index of the word in the vector - if the word isn't in the vector yet, the function returns -1.
//This serves a double purpose: Check if the word exsists in the vector, and if it does, what it's index is.
if (index > -1) {
wordCount[index]++; //If the word exists, increment it's word count in the parallel vector.
} else {
wordList.push_back(word); //If not, add a new entry
wordCount.push_back(1); //in both vectors.
}
}
int wordIndex(string word) {
//Iterate through the word list vector
for (int i = 0, j = wordList.size(); i < j; i++) {
if (wordList[i] == word) {
return i; //The word has been found. return it's index.
}
}
return -1; //The word is not in the vector. Return -1 to tell the program that the word hasn't been added yet.
}
I've tried to annotate any new code/concepts with comments to make it easy to understand, so hopefully you can find it useful.
As a side note, you may notice that I've moved a lot of the repetative code out of the main function and into other functions. This allows for more efficient and readable coding because you can divide each problem into easily manageable, smaller problems.
Hope this can be of some use.

Concatenation only works when it feels like it?

I am trying to get the following code to work:
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
#include <sstream>
#include <cmath>
using namespace std;
bool prime_test(int num);
void stringRotation(string& str);
int main()
{
vector<string> primes;
ifstream infile("PRIMES1T.txt");
// checks to see if there was any problems opening the .txt
if (infile.is_open()) {
string line = "";
while(getline(infile,line)) {
primes.push_back(line);
}
// rotates our string and tests if the number is still prime
vector<string> primes2;
for (int i = 0; i < primes.size(); i++) {
string str = primes[i];
for (int j = 0; j < str.length(); j++) {
stringRotation(str);
int value = atoi(str.c_str());
if (prime_test(value) == false) {
break;
}
if (j == str.length()-1) {
if (prime_test(value) == true) {
primes2.push_back(primes[i]);
}
}
}
}
cout << "There are " << primes2.size() << " primes that work.";
cout << endl;
}
else {
cout << "File failed to open." << endl;
}
return 0;
}
// tests to see if num is a prime number
bool prime_test(int num) {
if (num == 1) {
return false;
}
// Finds first integer value larger than the sqrt of num
// since that is all we really need.
double dnum = num;
double sqrt_dnum = sqrt(dnum);
int counter = ceil(sqrt_dnum);
for (int i = 2; i < counter; i++) {
if (num == 2) {
break;
}
if (num%i == 0) {
return false;
}
}
return true;
}
// rotates a string
void stringRotation(string& str) {
int len = str.length();
// converts a char variable into a string variable
stringstream ss;
string ch;
char c = str.at(0);
ss << c;
ss >> ch;
str = str.substr(1,str.length());
str = str.append(ch);
cout << str << endl;
}
What it does is it takes a prime number say 999983, cuts off the first digit 9, and then adds it to the end of the rest of the number so that it spits out the new number 999839. It then tests whether or not this new number is prime or not and repeats the process until the original number is returned. If the number is prime every time we do this process, then we add that number to the vector primes2.
The problem I have is that the stringRotation function does not work properly for some reason. I have tested it by trying to outputting the string before adding the digit that was removed and outputting the string after adding the digit. It does not concatenate properly. It will cut off the first digit in 999983 so that we have str = '99983' and ch = '9' but then when I do str.append(ch), it still gives me 99983. I have also tried variations like str = str.append(ch) and str = str + ch.
I have tried copying just the function over to a different .cpp file to compile only adding a declaration for str by setting str to "999983" and it works fine.
EDIT
I changed stringRotation to:
void stringRotation(string& str) {
int len = str.length();
char ch = str.at(0);
cout << ch << endl;
str = str.substr(1,str.length());
str.append(1,ch);
cout << str << endl;
}
but the problem still persists. I have also tried string.push_back(ch) with no luck.

In your programmer career, you will need to always make sure that your input is handled well. If you are loading data from a file which is not guaranteed to have a specific content scheme, you will always need to make sure that you prepare your data before parsing. In this particular case you need to make sure that your "numbers" are indeed numbers and execute your stringRotation on values which are guaranteed to be numbers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Unexpected Values Being Read Into Memory with getline() - C++ - c++

Related

Converting this program from 1 function to a modular, multi-function program?

How to count words in a file?

How to debug my word search code in C++?

Trouble with dynamic arrays and string occurence (C++)

Concatenation only works when it feels like it?

Categories

Resources