Wordlist transfer for an anagram program

Wordlist transfer for an anagram program - c++

I'm almost finished with my program, but there's one last bug that I'm having problems ferreting out. The program is supposed to check about 10 scrambled words against a wordlist to see what the scrambled words are anagrams of. To do this, I alphabetized each word in the wordlist (apple would become aelpp), set that as the key of a map, and made the corresponding entry the original, unalphabetized word.
The program is messing up when it comes to the entries in the map. When the entry is six characters or less, the program tags a random character on the end of the string. I've narrowed down what can be causing the problem to a single loop:
while(myFile){
myFile.getline(str, 30);
int h=0;
for (; str[h] != 0; h++)//setting the initial version of str
{
strInit[h]=str[h]; //strInit is what becomes the entry into the map.
}
strInit[h+1]='\0'; //I didn't know if the for loop would include the null char
cout<<strInit; //Personal error-checking; not necessary for the program
}
And if it's necessary, here's the entire program:
Program

Prevent issues, use normal functions:
getline(str, 30);
strncpy(strInit, str, 30);
Prevent more issues, use standard strings:
std::string strInit, str;
while (std::getline(myFile, str)) {
strInit = str;
// do stuff
}

Best not to use raw C arrays at all! Here's a version, using modern C++:
#include <string>
std::string str;
while (std::getline(myFile, str))
{
// do something useful with str
// Example: mymap[str] = f(str);
std::cout << str; //Personal error-checking; not necessary for the program
}

Related

How do I make an alphabetized list of all distinct words in a file with the number of times each word was used?

I am writing a program using Microsoft Visual C++. In the program I must read in a text file and print out an alphabetized list of all distinct words in that file with the number of times each word was used.
I have looked up different ways to alphabetize a string but they do not work with the way I have my string initialized.
// What is inside my text file
Any experienced programmer engaged in writing programs for use by others knows
that, once his program is working correctly, good output is a must. Few people
really care how much time and trouble a programmer has spent in designing and
debugging a program. Most people see only the results. Often, by the time a
programmer has finished tackling a difficult problem, any output may look
great. The programmer knows what it means and how to interpret it. However,
the same cannot be said for others, or even for the programmer himself six
months hence.
string lines;
getline(input, lines); // Stores what is in file into the string
I expect an alphabetized list of words with the number of times each word was used. So far, I do not know how to begin this process.

It's rather simple, std::map automatically sorts based on key in the key/value pair you get. The key/value pair represents word/count which is what you need. You need to do some filtering for special characters and such.
EDIT: std::stringstream is a nice way of splitting std::string using whitespace delimiter as it's the default delimiter. Therefore, using stream >> word you will get whitespace-separated words. However, this might not be enough due to punctuation. For example: Often, has comma which we need to filter out. Therefore, I used std::replaceif which replaces puncts and digits with whitespaces.
Now a new problem arises. In your example, you have: "must.Few" which will be returned as one word. After replacing . with we have "must Few". So I'm using another stringstream on the filtered "word" to make sure I have only words in the final result.
In the second loop you will notice if(word == "") continue;, this can happen if the string is not trimmed. If you look at the code you will find out that we aren't trimming after replacing puncts and digits. That is, "Often," will be "Often " with trailing whitespace. The trailing whitespace causes the second loop to extract an empty word. This is why I added the condition to ignore it. You can trim the filtered result and then you wouldn't need this check.
Finally, I have added ignorecase boolean to check if you wish to ignore the case of the word or not. If you wish to do so, the program will simply convert the word to lowercase and then add it to the map. Otherwise, it will add the word the same way it found it. By default, ignorecase = true, if you wish to consider case, just call the function differently: count_words(input, false);.
Edit 2: In case you're wondering, the statement counts[word] will automatically create key/value pair in the std::map IF there isn't any key matching word. So when we call ++: if the word isn't in the map, it will create the pair, and increment value by 1 so you will have newly added word. If it exists already in the map, this will increment the existing value by 1 and hence it acts as a counter.
The program:
#include <iostream>
#include <map>
#include <sstream>
#include <cstring>
#include <cctype>
#include <string>
#include <iomanip>
#include <algorithm>
std::string to_lower(const std::string& str) {
std::string ret;
for (char c : str)
ret.push_back(tolower(c));
return ret;
}
std::map<std::string, size_t> count_words(const std::string& str, bool ignorecase = true) {
std::map<std::string, size_t> counts;
std::stringstream stream(str);
while (stream.good()) {
// wordW may have multiple words connected by special chars/digits
std::string wordW;
stream >> wordW;
// filter special chars and digits
std::replace_if(wordW.begin(), wordW.end(),
[](const char& c) { return std::ispunct(c) || std::isdigit(c); }, ' ');
// now wordW may have multiple words seperated by whitespaces, extract them
std::stringstream word_stream(wordW);
while (word_stream.good()) {
std::string word;
word_stream >> word;
// ignore empty words
if (word == "") continue;
// add to count.
ignorecase ? counts[to_lower(word)]++ : counts[word]++;
}
}
return counts;
}
void print_counts(const std::map<std::string, size_t>& counts) {
for (auto pair : counts)
std::cout << std::setw(15) << pair.first << " : " << pair.second << std::endl;
}
int main() {
std::string input = "Any experienced programmer engaged in writing programs for use by others knows \
that, once his program is working correctly, good output is a must.Few people \
really care how much time and trouble a programmer has spent in designing and \
debugging a program.Most people see only the results.Often, by the time a \
programmer has finished tackling a difficult problem, any output may look \
great.The programmer knows what it means and how to interpret it.However, \
the same cannot be said for others, or even for the programmer himself six \
months hence.";
auto counts = count_words(input);
print_counts(counts);
return 0;
}
I have tested this with Visual Studio 2017 and here is the part of the output:
a : 5
and : 3
any : 2
be : 1
by : 2
cannot : 1
care : 1
correctly : 1
debugging : 1
designing : 1

As others have already noted, an std::map handles the counting you care about quite easily.
Iostreams already have a tokenize to break an input stream up into words. In this case, we want to to only "think" of letters as characters that can make up words though. A stream uses a locale to make that sort of decision, so to change how it's done, we need to define a locale that classifies characters as we see fit.
struct alpha_only: std::ctype<char> {
alpha_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white space
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except lower- and upper-case letters, which are classified accordingly:
std::fill(&rc['a'], &rc['z'], std::ctype_base::lower);
std::fill(&rc['A'], &rc['Z'], std::ctype_base::upper);
return &rc[0];
}
};
With that in place, we tell the stream to use our ctype facet, then simply read words from the file and count them in the map:
std::cin.imbue(std::locale(std::locale(), new alpha_only));
std::map<std::string, std::size_t> counts;
std::string word;
while (std::cin >> word)
++counts[to_lower(word)];
...and when we're done with that, we can print out the results:
for (auto w : counts)
std::cout << w.first << ": " << w.second << "\n";

Id probably start by inserting all of those words into an array of strings, then start with the first index of the array and compare that with all of the other indexes if you find matches, add 1 to a counter and after you went through the array you could display the word you were searching for and how many matches there were and then go onto the next element and compare that with all of the other elements in the array and display etc. Or maybe if you wanna make a parallel array of integers that holds the number of matches you could do all the comparisons at one time and the displays at one time.

EDIT:
Everyone's answer seems more elegant because of the map's inherent sorting. My answer functions more as a parser, that later sorts the tokens. Therefore my answer is only useful to the extent of a tokenizer or lexer, whereas Everyone's answer is only good for sorted data.
You first probably want to read in the text file. You want to use a streambuf iterator to read in the file(found here).
You will now have a string called content, which is the content of you file. Next you will want to iterate, or loop, over the contents of this string. To do that you'll want to use an iterator. There should be a string outside of the loop that stores the current word. You will iterate over the content string, and each time you hit a letter character, you will add that character to your current word string. Then, once you hit a space character, you will take that current word string, and push it back into the wordString vector. (Note: that means that this will ignore non-letter characters, and that only spaces denote word separation.)
Now that we have a vector of all of our words in strings, we can use std::sort, to sort the vector in alphabetical order.(Note: capitalized words take precedence over lowercase words, and therefore will be sorted first.) Then we will iterate over our vector of stringWords and convert them into Word objects (this is a little heavy-weight), that will store their appearances and the word string. We will push these Word objects into a Word vector, but if we discover a repeat word string, instead of adding it into the Word vector, we'll grab the previous entry and increment its appearance count.
Finally, once this is all done, we can iterate over our Word object vector and output the word followed by its appearances.
Full Code:
#include <vector>
#include <fstream>
#include <iostream>
#include <streambuf>
#include <algorithm>
#include <string>
class Word //define word object
{
public:
Word(){appearances = 1;}
~Word(){}
int appearances;
std::string mWord;
};
bool isLetter(const char x)
{
return((x >= 'a' && x <= 'z') || (x >= 'A' && x <= 'Z'));
}
int main()
{
std::string srcFile = "myTextFile.txt"; //what file are we reading
std::ifstream ifs(srcFile);
std::string content( (std::istreambuf_iterator<char>(ifs) ),
( std::istreambuf_iterator<char>() )); //read in the file
std::vector<std::string> wordStringV; //create a vector of word strings
std::string current = ""; //define our current word
for(auto it = content.begin(); it != content.end(); ++it) //iterate over our input
{
const char currentChar = *it; //make life easier
if(currentChar == ' ')
{
wordStringV.push_back(current);
current = "";
continue;
}
else if(isLetter(currentChar))
{
current += *it;
}
}
std::sort(wordStringV.begin(), wordStringV.end(), std::less<std::string>());
std::vector<Word> wordVector;
for(auto it = wordStringV.begin(); it != wordStringV.end(); ++it) //iterate over wordString vector
{
std::vector<Word>::iterator wordIt;
//see if the current word string has appeared before...
for(wordIt = wordVector.begin(); wordIt != wordVector.end(); ++wordIt)
{
if((*wordIt).mWord == *it)
break;
}
if(wordIt == wordVector.end()) //...if not create a new Word obj
{
Word theWord;
theWord.mWord = *it;
wordVector.push_back(theWord);
}
else //...otherwise increment the appearances.
{
++((*wordIt).appearances);
}
}
//print the words out
for(auto it = wordVector.begin(); it != wordVector.end(); ++it)
{
Word theWord = *it;
std::cout << theWord.mWord << " " << theWord.appearances << "\n";
}
return 0;
}
Side Notes
Compiled with g++ version 4.2.1 with target x86_64-apple-darwin, using the compiler flag -std=c++11.
If you don't like iterators you can instead do
for(int i = 0; i < v.size(); ++i)
{
char currentChar = vector[i];
}
It's important to note that if you are capitalization agnostic simply use std::tolower on the current += *it; statement (ie: current += std::tolower(*it);).
Also, you seem like a beginner and this answer might have been too heavyweight, but you're asking for a basic parser and that is no easy task. I recommend starting by parsing simpler strings like math equations. Maybe make a calculator app.

How to read a complex input with istream&, string& and getline in c++?

I am very new to C++, so I apologize if this isn't a good question but I really need help in understanding how to use istream.
There is a project I have to create where it takes several amounts of input that can be on one line or multiple and then pass it to a vector (this is only part of the project and I would like to try the rest on my own), for example if I were to input this...
>> aaa bb
>> ccccc
>> ddd fff eeeee
Makes a vector of strings with "aaa", "bb", "ccccc", "ddd", "fff", "eeeee"
The input can be a char or string and the program stops asking for input when the return key is hit.
I know getline() gets a line of input and I could probably use a while loop to try and get the input such as...(correct me if I'm wrong)
while(!string.empty())
getline(cin, string);
However, I don't truly understand istream and it doesn't help that my class has not gone over pointers so I don't know how to use istream& or string& and pass it into a vector. On the project description, it said to NOT use stringstream but use functionality from getline(istream&, string&). Can anyone give somewhat of a detailed explanation as to how to make a function using getline(istream&, string&) and then how to use it in the main function?
Any little bit helps!

You're on the right way already; solely, you'd have to pre-fill the string with some dummy to enter the while loop at all. More elegant:
std::string line;
do
{
std::getline(std::cin, line);
}
while(!line.empty());
This should already do the trick reading line by line (but possibly multiple words on one line!) and exiting, if the user enters an empty line (be aware that whitespace followed by newline won't be recognised as such!).
However, if anything on the stream goes wrong, you'll be trapped in an endless loop processing previous input again and again. So best check the stream state as well:
if(!std::getline(std::cin, line))
{
// this is some sample error handling - do whatever you consider appropriate...
std::cerr << "error reading from console" << std::endl;
return -1;
}
As there might be multiple words on a single line, you'd yet have to split them. There are several ways to do so, quite an easy one is using an std::istringstream – you'll discover that it ressembles to what you likely are used to using std::cin:
std::istringstream s(line);
std::string word;
while(s >> word)
{
// append to vector...
}
Be aware that using operator>> ignores leading whitespace and stops after first trailing one (or end of stream, if reached), so you don't have to deal with explicitly.
OK, you're not allowed to use std::stringstream (well, I used std::istringstream, but I suppose this little difference doesn't count, does it?). Changes matter a little, it gets more complex, on the other hand, we can decide ourselves what counts as words an what as separators... We might consider punctuation marks as separators just like whitespace, but allow digits to be part of words, so we'd accept e. g. ab.7c d as "ab", "7c", "d":
auto begin = line.begin();
auto end = begin;
while(end != line.end()) // iterate over each character
{
if(std::isalnum(static_cast<unsigned char>(*end)))
{
// we are inside a word; don't touch begin to remember where
// the word started
++end;
}
else
{
// non-alpha-numeric character!
if(end != begin)
{
// we discovered a word already
// (i. e. we did not move begin together with end)
words.emplace_back(begin, end);
// ('words' being your std::vector<std::string> to place the input into)
}
++end;
begin = end; // skip whatever we had already
}
}
// corner case: a line might end with a word NOT followed by whitespace
// this isn't covered within the loop, so we need to add another check:
if(end != begin)
{
words.emplace_back(begin, end);
}
It shouldn't be too difficult to adjust to different interpretations of what is a separator and what counts as word (e. g. std::isalpha(...) || *end == '_' to detect underscore as part of words, but digits not). There are quite a few helper functions you might find useful...

You could input the value of the first column, then call functions based on the value:
void Process_Value_1(std::istream& input, std::string& value);
void Process_Value_2(std::istream& input, std::string& value);
int main()
{
// ...
std::string first_value;
while (input_file >> first_value)
{
if (first_value == "aaa")
{
Process_Value_1(input_file, first_value);
}
else if (first_value = "ccc")
{
Process_Value_2(input_file, first_value);
}
//...
}
return 0;
}
A sample function could be:
void Process_Value_1(std::istream& input, std::string& value)
{
std::string b;
input >> b;
std::cout << value << "\t" << b << endl;
input.ignore(1000, '\n'); // Ignore until newline.
}
There are other methods to perform the process, such as using tables of function pointers and std::map.

Erase words from a string (in C++)

I want to delete some words from a string but my code doesn't work . I don't have any errors or warnings , but I'm thinking that my string becomes empty. Could someone help me with this? I tried to convert my initial strings into 2 vectors, so that I can navigate more easily then
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
int main()
{
string s("Somewhere down the road");
string t("down");
istringstream iss(s);
vector <string> plm;
vector <string> plm2;
do
{
string sub;
iss >> sub;
plm.push_back(sub);
} while (iss);
for(unsigned int i=0 ; i<plm.size();i++){
cout<<plm[i];}
istringstream ist(t);
do
{
string subb;
ist >> subb;
plm2.push_back(subb);
} while (ist);
for(int i=0;i<plm.size();i++){
for(int j=0;j<plm2.size();i++){
{if (plm[i]==plm2[j])
plm.erase(plm.begin()+j);}}}
for(int i=0 ; i<plm.size();i++)
cout<<plm[i];
}

Warning: this is really just a comment that's too long to fit in a comment field. Oh, and a bit of a rant at that.
I'm sure glad we have these modern languages to make life so much easier than it was decades ago. Consider, for example, what this job looked like an the long-since moribund SNOBOL 4 programming language:
s = 'somewhere down the road'
del s 'down' = :s(del)
OUTPUT = s
God, it's nice that we've since made so much progress that we don't have to deal with 3 whole lines of code, and we can now do the job with only 52 lines instead (oh, except that the 52 lines don't actually work, but let's ignore that for the moment).
I guess, in fairness, we can do the job a little more compactly in C++ though. One obvious way would be with std::remove_copy, some stream iterators, and a stringstream or two:
std::istringstream input("somewhere down the road");
std::string del_str("down");
std::istream_iterator<std::string> in(input), end;
std::ostringstream result;
std::remove_copy(in, end, std::ostream_iterator<std::string>(result, " "), del_str);
std::cout << result.str();

There is no benefit in converting to vector - string itself already provides all that is necessary for what you want to do. Anyway, do it this way:
vector<char> v;
v.assign(s.c_str(), s.c_str() + s.length()); // without...
v.assign(s.c_str(), s.c_str() + s.length() + ); // including...
// ... terminating null character
Now it gets easy:
size_t pos = s.find(t);
if(pos != string::npos)
{
s.erase(pos, t.length());
}
This does not care, however, about leaving multiple whitespace or if t is not an entire word within s (e. g. t = "down"; s = "I'm going to downtown."; would result in s == "I'm going to town."), but you did not do so either...

First problem is, if std::string::erase is called only with the beginning position, it erases everything until the end of string.
Second problem is, that the code will just erase all letters which are in the second string, one by one. I.e. not the entire word - for that, you would need to check if the entire word matches, and only then erase (the entire length of the word). Ask yourself - what will happen in the code, if e.g. the first two letters will match, but not the rest of the word?

In your second for loop you never incremented j and inside the if (plm[i]==plm2[j]) block you used j instead of i as your offset in erase().
for(int i=0;i<plm.size();i++)
{
for(int j=0;j<plm2.size();j++)//here you need to increment j
{
if (plm[i]==plm2[j])
plm.erase(plm.begin()+i);//here the offset should be i
}
}
Another thing don't use a do...while loop to read from the stringstream and push back on the vector. If the reading fails you will be pushing invalid data to the vector, instead try something like:
string sub;
while(iss >> sub;)
plm.push_back(sub);//only if reading is successful
...//do the same for the other istringstream too

You do not increment j this is the first thing I saw on your code. Write it correctly then if it still doesnt work, then ask!

Use of strtok to obtain certain strings from an input string

vector<string> CategoryWithoutHashTags;
string tester = "#hello junk #world something #cool";
char *pch;
char *str;
str = new char [tester.size()+1];
strcpy(str, tester.c_str());
pch = strtok(str,"#");
while(pch!=NULL)
{
CategoryWithoutHashTags.push_back(pch);
pch=strtok(NULL,"#");
}
cout<<CategoryWithoutHashTags[0]<<endl;
I want to write a program which involves storing all the hash tags words in a vector of strings. The above program stores "hello junk" in the first index rather than "hello". What changes can i make to the program to make it do so?

If you are set on using strtok, you should at the very least use its re-entrant version strtok_r. Then, you should change the code to split at spaces, not at hash marks. This would give you the tokens. Finally, in the loop you would need to look for the first character to be the hash mark, adding the item to the list if it's there, and disregarding the item when the hash mark is not there.
An even better approach would be using a string stream: put your string into it, read tokens one by one, and discard ones with no hash mark.
Here is how you can do it with very little code using C++11's lambdas:
stringstream iss("#hello junk #world something #cool");
istream_iterator<string> eos;
istream_iterator<string> iit(iss);
vector<string> res;
back_insert_iterator< vector<string> > back_ins(res);
copy_if(iit, eos, back_ins, [](const string s) { return s[0] == '#'; } );
Demo on ideone.

Program stops parsing when last position of string is punctuation

So I'm trying to read all words from a file, and get rid of the punctuation as I do that. Here is the logic that is stripping the punctuation:
Edit: The program actually stops running altogether, just want to make that clear
ifstream file("text.txt");
string str;
string::iterator cur;
for(file>>str; !file.eof(); file>>str){
for(cur = str.begin(); cur != str.end(); cur++){
if (!(isalnum(*cur))){
cur = str.erase(cur);
}
}
cout << str << endl;
...
}
Say I have a text file that reads:
This is a program. It has trouble with (non alphanumeric chars)
But it's my own and I love it...
When I cout and endl; my string right after this bit of logic, I'll get
This
is
a
program
It
has
trouble
with
non
alphanumeric
and that's all folks.
Is there something wrong with my iterator logic?
How could I fix this?
Thank you.

The main logical problem with iterators I see is that for non-alphanumeric characters the iterator gets increased twice: during erase it moves to the next symbol and then cur++ from the for loop increases it, so it skips every symbol after a non-alphanumeric one.
So probably something along the lines of:
string next;
string::iterator cur;
cur = next.begin()
while(cur != next.end()){
if (!(isalnum(*cur))){
cur = next.erase(cur);
} else {
cur++;
}
}
This just removes the non-alphanumeric characters. If you need to tokenize your input, you will have to implement a bit more, i.e. remember, whether you're inside a word (have read at least one alphanumeric character) or not and act accordingly.

How about just not copying in the punctuation when building the transformed list. OK. probably overkill.
#include <iostream>
#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <cctype>
using namespace std;
// takes the file being processed as only command line param
int main(int argc, char *argv[])
{
if (argc != 2)
return EXIT_FAILURE;
ifstream inf(argv[1]);
vector<string> res;
std::transform(istream_iterator<string>(inf),
istream_iterator<string>(),
back_inserter(res),
[](const string& s) {
string tmp; copy_if(s.begin(), s.end(), back_inserter(tmp),
[](char c) { return std::isalnum(c); });
return tmp;
});
// optional dump to output
copy(res.begin(), res.end(), ostream_iterator<string>(cout, "\n"));
return EXIT_SUCCESS;
}
Input
All the world's a stage,
And all the men and women merely players:
They have their exits and their entrances;
And one man in his time plays many parts,
His acts being seven ages. At first, the infant,
Mewling and puking in the nurse's arms.
Output
All
the
worlds
a
stage
And
all
the
men
and
women
merely
players
They
have
their
exits
and
their
entrances
And
one
man
in
his
time
plays
many
parts
His
acts
being
seven
ages
At
first
the
infant
Mewling
and
puking
in
the
nurses
arms

You should be using ispunct to test for a punctuation character. If you also want to filter out control characters you should use iscntrl.
Once you've filtered out the punctutation you can split a spaces and newlines to get the words.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Wordlist transfer for an anagram program - c++

Prevent issues, use normal functions: getline(str, 30); strncpy(strInit, str, 30); Prevent more issues, use standard strings: std::string strInit, str; while (std::getline(myFile, str)) { strInit = str; // do stuff }

Best not to use raw C arrays at all! Here's a version, using modern C++: #include <string> std::string str; while (std::getline(myFile, str)) { // do something useful with str // Example: mymap[str] = f(str); std::cout << str; //Personal error-checking; not necessary for the program }

Related

How do I make an alphabetized list of all distinct words in a file with the number of times each word was used?

How to read a complex input with istream&, string& and getline in c++?

Erase words from a string (in C++)

Use of strtok to obtain certain strings from an input string

Program stops parsing when last position of string is punctuation

Categories

Resources