How to use single letters to form words - c++

Hey I currently have this code. It gets the user to input strings into an array, with a limit of 5. I plan to use the array to then form words from the array. How can I achieve this?
const int row = 5;
char array[row];
char count = 0;
char letter;
while (count < 5)
{
cout << "Enter a letter: ";
cin >> letter;
array[count] = letter;
count++;
}
cout << "Letter inputed" << endl;
for (count = 0; count < 5; count++)
{
cout << array[count] << " " << endl;
}
system("pause");

Here's a hint to get you started on the right track: don't even consider using std::next_permutation unless this is something you'll only ever use once or twice (and probably not even then, because it's actually more complicated than doing the job right).
Using std::next_permutation, your function will be approximately N! times slower than necessary1 -- in the case of 5 letters, that'll be 120 times slower, and if you ever use longer words, it'll get worse very quickly (e.g., for 10 letters it's over 3.5 million).
Instead, start by pre-processing your dictionary. Instead of a std::set<std::string> of words, create an std::map<std::string, std::vector<string>> (or std::unordered_map, though English has few enough words that this probably won't make a huge difference). As you read in a word from the dictionary, create a sorted version of that string. Use that as the key, and push the original version of the word onto the vector for that key.
Then when you get a word from the user, sort it, look that up in the map, and the associated vector will contain every word (from your dictionary) that can be created from those letters.
1. If you use std::map instead of std::unordered_map, that should be something like N!/(log N), but N! grows so fast and log N grows so slowly that it the difference is negligible (if you get N large enough that log N = 3, N! will be so large that N!/log N computation steps...well, you start to get into questions of cosmology, like whether the universe will have died of heat death before then (to which the answer seems to be "yes, probably").

Here's a hint to get you started. There's a function in the standard library called std::next_permutation. Assuming you have a dictionary of words to check against, a possible solution could look like this:
std::sort(array, array + row);
do {
// Check if array is a word.
} while (std::next_permutation(array, array + row));
This will cycle through every permutation of letters. It's now up to you to verify that it is a valid word.

This solution uses an associative array to map from sorted letters of the word to the words having such sorted letters. It's thus possible to get an answer with one lookup in the map, which takes asymptotically O(log N) time, where N is a size of your dictionary.
Create a file named dic.txt. In case you're using Visual Studio it should be in the same directory as your *.cpp files. Put several words inside in a "word in a row" format. Try the following code:
#include <iostream>
#include <string>
#include <map>
#include <vector>
#include <fstream>
#include <algorithm>
using namespace std;
int main() {
// Dictionary file is in a "word in a row" format
map< string, vector<string> > dictionary;
ifstream dictionary_file("dic.txt");
if (!dictionary_file.good()) {
cout << "File doesn't exist" << endl;
return 0;
}
string word;
while (dictionary_file >> word) {
string key = word;
sort(key.begin(), key.end());
dictionary[key].push_back(word);
}
// Read the letters
string letters;
cin >> letters;
if (letters.size() > 5) {
cout << "Too much letters" << endl;
return 0;
}
// Sort the letters
sort(letters.begin(), letters.end());
// Output the answers
vector<string> & ret = dictionary[letters];
for (size_t i = 0, ilen = ret.size(); i < ilen; ++i) {
cout << ret[i] << endl;
}
}
Mention that such a solution cares for a case your letters are in. In case you don't need it, you can add calls to strtolower function (got that name from PHP) before you add a word to your dictionary and before you sort your letters.
string strtolowers(string const & word) {
string ret = word;
transform(ret.begin(), ret.end(), ret.begin(), tolower);
return ret;
}
You'll need to add <cctype> header for this function to work.

Related

What is a better way to get C++ get text data into a program rather than using fstream

this is a very challenging question as I'm not sure how to ask it correctly.
I wrote a program that will read in the same text data on load every time. The text data is a list of words in the dictionary and the program solves anagram type puzzles (like Jumble, Boggle, Scrabble, etc). This text data never changes. Currently I use to read a text file which MUST be present in the same folder as the .exe that is built. This assumes the user would not just go in and erase, edit, or otherwise corrupt the text file or something really within the realm of possibility for a user to do. Not only that but locking a file and reading it are very noticeably slow operations.
Most of what the program does is convert the .txt file into an abstract data type (ADT) which sorts the word into a 'signature' then builds a set of words that have the same signature. These sets are stored in a structure (called map here) that is a key:value type data structure where the key is the signature. Anyway that info is irrelevant to the question just sufficient to say that on load I need to build my ADT in memory. I'm looking for a more sophisticated way than text files.
Since I'm new at programming there is probably a much better way. I just don't know how to even ask the question since I don't know what is available out there.
I understand databases, but then again it seems like that relies on an external file. I have seen posts which talk about storing data in a .h file but they always want to build a character array (char[i]), which would require converting this data into my ADT and that seems again like a waste of time when the program is loading. (why bother converting it into a char array just to read it back to the ADT?)
/*
* Project: myJumble
* Created by CS106 C++ Assignment Wizard 0.1
*
* Name: Brad Beall
* Section: Life
* This code will solve the jumble puzzles in the newspaper.
*/
#include <fstream>
#include <iostream>
#include "simpio.h"
#include "map.h"
#include "set.h"
#include "genlib.h"
//This function swaps two characters.
void Swap(char &ch1, char &ch2)
{
char tmp = ch1;
ch1 = ch2;
ch2 = tmp;
}
//This function sorts all the chars in a word in alphabetical order
string SortWord(string inWord)
{
inWord = ConvertToLowerCase(inWord);
//these two for loops will sort the string alphabetically
// - idea is starting from the front, find the 'smallest' character in the string.
// (where a is 'smaller' than b)
// then move that smallest character to the front of the string
// now move to the next character and again look for the smallest character.
// Example: for "peach", first move the 'a' to the front to form "apech", then move 'c' to form "acpeh"...
for (int i = 0; i < inWord.length(); i++) {
int minIndex = i;
for (int j = i+1; j < inWord.length(); j++)
{
if (inWord[j] < inWord[minIndex])
{
// looking for the 'smallest' character
minIndex = j;
}
}
Swap(inWord[i], inWord[minIndex]);
}
return inWord;
}
void BuildDictionary(Map<Set<string> > &kDict, ifstream &in)
{
string nextWord = "";
while(true)
{
//read in the next word from the dictionary
in >> nextWord;
if (in.fail()) break;
//sort letters alphabetically using SortWord, use that as the key
// and then add that key:value pair to the set.
kDict[SortWord(nextWord)].add(nextWord);
}
}
//this function prints a set
void PrintSet(Set<string> &inputSet)
{
Set<string>::Iterator it = inputSet.iterator();
while (it.hasNext())
{
cout << it.next() << endl;
}
}
int main ()
{
////debug the function: string SortWord(string inWord)
//cout << "Enter a word to sort" << endl;
//string tempString = GetLine();
//tempString = SortWord(tempString);
//cout << tempString;
//building the dictionary may take some time.
cout << "Loading the dictionary. This may take some time." << endl;
//read in the text file with all dictionary words
ifstream in;
in.open("enable1.txt");
//call the member function that will create our data structure
//this will be a MAP:
// - key: the alphabetized letters from a word, or the word's "signature"
// - value: a Vector of words with the matching signature
Map<Set<string> > keyedDictionary;
BuildDictionary(keyedDictionary, in);
while(true)
{
//prompt user for a word to solve
cout << "Enter a jumbled word to solve." << endl;
cout << "Type '0' to exit." << endl << endl;
string solveWord = GetLine();
if(solveWord == "0"){
break;
}
//sort the word into a signature key
solveWord = SortWord(solveWord);
//call the PrintSet(Set) member function to print the set of solutions for this signature key
PrintSet(keyedDictionary[solveWord]);
}
return 0;
}

Most common words in Text file never finished for large files

My program works for small files, but if I use large files (bible, Artamenes (longest novel)) it never finishes. The program keeps using more memory. It starts with 5mb and was up to over 350 in 7 hours. Is it because it is very inefficient or am I missing something?
#include "stdafx.h"
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <algorithm>
using namespace std;
struct Pair // create a struct for each word so it includes not only the word, but its count
{
string word; //the inputted word, eventually
unsigned int frequency; //count for each word
Pair(unsigned int f, const string& w) : frequency(f), word(w) {} //create constructor
bool operator <(const Pair& str) const //for sort
{
return (str.frequency < frequency);
}
};
string rmPunct (string word)
{
unsigned int position;
while ((position = word.find_first_of("|.,:;\"'!¡?¿/()^[]{}\\;-_*+")) != string::npos) //remove any punctuation, etc.
{
word.erase(position, 1);
}
return word;
}
string allLower(string word)
{
std::transform(word.begin(), word.end(), word.begin(), ::tolower); //convert any uppercase letters to lower case
return word;
}
int main()
{
vector<Pair> myVector; //Create a vector of structs so I have a dynamic array (can extend)
fstream dataFile; // create the file stream
string fileName; // necessary to find the file
cout << "Enter the file name: ";
cin >> fileName;
dataFile.open(fileName); // open the file in input mode only (no output for safeness)
string word; //will be each word from the file
while (dataFile >> word) // the >> imports each word until it hits a space then loops again
{
word = rmPunct(word);
word = allLower(word);
Pair *p = new Pair(1,word);
myVector.push_back(*p); // pushes each newly created struct into the vector
if (dataFile.fail())
break; //stop when the file is done
}
for (unsigned int i=0;i<myVector.size();i++) //this double for-loop finds each word that was already found
{
for (unsigned int j = i+1;j<myVector.size();)
{
if (myVector[i].word == myVector[j].word) //simple comparing to find where the extra word lies
{
myVector.at(i).frequency++; //increment the count
myVector.erase(myVector.begin()+j);//and... delete the duplicate struct (which has the word in it)
}
else
j++;
}
}
sort(myVector.begin(), myVector.end());
ofstream results;
results.open("results.txt");
if (myVector.size() >= 60) //outputs the top 60 most common words
{
for (int i=0;i<60;i++) {
double percent = ((double)myVector[i].frequency/(double)myVector.size()*100);
results << (i+1) << ". '" << myVector[i].word << "' occured " << myVector[i].frequency << " times. " << percent << "%" << '\n';
}
}
else //if there are not 60 unique words in the file
for (unsigned int i=0;i<myVector.size(); i++)
{
double percent = ((double)myVector[i].frequency/(double)myVector.size()*100);
results << (i+1) << ". '" << myVector[i].word << "' occured " << myVector[i].frequency << " times. " << percent << "%" << '\n';
}
results.close();
}
This loop:
for (unsigned int i=0;i<myVector.size();i++) //this double for-loop finds each word that was already found
{
for (unsigned int j = i+1;j<myVector.size();)
{
if (myVector[i].word == myVector[j].word) //simple comparing to find where the extra word lies
{
myVector.at(i).frequency++; //increment the count
myVector.erase(myVector.begin()+j);//and... delete the duplicate struct (which has the word in it)
}
else
j++;
}
}
walks your words n^2 times (roughly). If we assume your 5MB file contains half a million words, thats 500000 * 500000 = 250 billion iterations, which will take some time to run through [and erasing words will "shuffle" the entire content of your vector, which is quite time-consuming it the vector is long and you shuffle an early item]
A better approach would be to build a data structure where you can quickly search through, such as a map<std::string, int> words, where you do words[word]++; when you read the words. Then search for the most common word by iterating of words and saving the 60 most common words [keeping a sorted list of the 60 most common...]
You could also do something clever like min(60, words.size()) to know how many words you have.
You have a small memory leak in your program, and as the data you read gets larger so does the number of leaks.
The code causing the memory leak:
Pair *p = new Pair(1,word);
myVector.push_back(*p); // pushes each newly created struct into the vector
Her you dynamically allocate a Pair structure, copy the structure to the vector, and the completely ignore the original allocated structure.
There is really no need for any dynamic allocation, or even a temporary variable, just do
myVector.push_back(Pair(1, word));
And if you have a new compiler that have C++11, then just do
myVector.emplace_back(1, word);
That should help you with part of the problem.
The other part is that your algorithm is slow, really really slow for large inputs.
That can be solved by using e.g. std::unordered_map (or std::map if you don't have std::unordered_map).
Then it becomes very simple, just use the word as the key, and the frequence as the data. Then for every word you read just do
frequencyMap[word]++;
No need for the loop-in-loop comparing words, which is what slows you down.
To get the 60 most frequent words, copy from the map into a vector using std::pair with the frequency as the first member of the pair and the word as the second, sort the vector, and simply print the 60 first entries in the vector.

count even length words in vectors

I have 2 problems.
1) cout << v_setir.capacity(); is not returns the correct number.
2) I want to count of the words which lengths are even. and I should do it with vectors.
Here is my codes:
#include <iostream>
#include <vector>
#include <sstream>
using namespace std;
int main()
{
int say = 0;
cout << "Setiri daxil edin: ";
string setir;
getline(cin, setir);
vector<string> v_setir;
string ifadeler;
istringstream yig(setir);
while (yig >> ifadeler)
v_setir.push_back(setir);
// First problem
cout << v_setir.capacity() << endl;
// Second problem
/* for (size_t i = 0; i < v_setir.capacity(); i++)
{
if (v_setir[i].size() % 2 == 0)
say += 1;
}
cout << "Uzunlugu cut olan sozerin sayi: " << say << endl;*/
return 0;
}
For example, if I enter this string line it returns "6" (why I don't know):
hi hello how are you
What is wrong? my brain stopped and I couldn't determine what is the wrong in my code and/or algorithm.
Please, help me to solve these problems.
Best regards.
capacity() is the currently allocated space not the count of elements in the vector. Use: size() instead
See:
http://en.cppreference.com/w/cpp/container/vector/capacity
http://en.cppreference.com/w/cpp/container/vector/size
Your loop should work fine now, but you can also take a look at the example there which does something similar for integer divisible by 3.
You can count even-length words with std::count_if:
#include <algorithm>
int even_words = std::count_if(v_setir.begin(), v_setir.end(), [] (const string& str) { return str.size() % 2 == 0; });
1) cout << v_setir.capacity(); is not returns the correct number.
Use vector::size as the number of the element in the vector.
2) I want to count of the words which lengths are even. and I should do it with vectors.
Firstly you should use v_setir.size() instead of v_setir.capacity() in your loop as the condition.
And secondly, why not you cout the string to check whether it's length is even or not? Actually you put 5 'hi hello how are you' into the vector.
I think you want to put every single words into the vector, but not the whole sentence. If that use v_setir.push_back(ifadeler); instead of v_setir.push_back(setir);
vector::capacity gives capacity of vector (how much elements it can store). Here, you want to calculate number of strings whose length is even. You need to iterate over the strings and count the strings whose length is even.
std::vector::capacity >= std::vector::size
The capacity is the maximum number of elements the vector can currently hold.
The size is the number of elements in the vector.

c++ string loops checking the same words

I have to get let's say 10 words from the user.The program will warn the user if the same word is entered again.What could be the general logic of the program ? I managed to take 10 words from the user with a loop but cant check if the entered words are all different or not ?
Save the words and check if you already have them:
#include <iostream>
#include <set>
#include <string>
int main()
{
std::set<std::string> words;
for (std::string word; std::cin >> word; )
{
if (!words.insert(std::move(word)).second)
{
std::cout << "Word already encountered!\n";
}
}
std::cout << "We got " << words.size() << " distinct words.\n";
// use "words"
}
(You can add a counter or check words.size() if you want at most a certain number of words.)
You could use a std::set, which by definition may not have any duplicates, all elements must be unique. You could do something like
std::set<std::string> uniqueWords;
while (uniqueWords.size() < 10)
{
std::string user;
std::cin >> user;
uniqueWords.insert(user);
}
If the user inputs a duplicate word, set::insert will not add the duplicate, so the length of the set will not increase. The while loop will only terminate once the length of the set grows to 10 elements. Then you can continue on.

Unique Lines and Words? How to implement it?

I'm having trouble with this program. The program is supposed to tell the user the number of lines, words, characters, unique lines, and unique words there are in a given input. So far, words and characters are okay. However, if the user wants to input more than one line, how do I do that? The functions will only output the results of one line at a time, rather than adding the results of both lines together. Also, I can't get the Unique Lines and Unique Words to work properly. I just got into C++ so I don't really have much experience. Can someone please help me?
Problems:
Program reads one line at a time, so when the user inputs multiple times, the program produces the results separately rather than adding it together as one entity.
Unique Lines and Unique Words are not working. Any ideas how to implement it using the library used in the program.
#include <iostream>
using std::cin;
using std::cout;
using std::endl;
#include <string>
using std::string;
#include <set>
using std::set;
// write this function to help you out with the computation.
unsigned long countLines()
{
return 1;
}
unsigned long countWords(const string& s)
{
int nw =1;
for (size_t i = 0; i < s.size(); i++)
{
if (s[i] == ' ') //everytime the function encounters a whitespace, count increases by 1)//
{
nw++;
}
}
return nw;
}
unsigned long countChars(const string& s)
{
int nc = 0;
for (size_t i = 0; i < s.size(); i++)
{
if ( s[i] != ' ') //everytime the function encounters a character other than a whitespace, count increases//
{
nc++;
}
}
return nc;
}
unsigned long countUnLines(const string& s, set<string>& wl)
{
wl.insert(s);
return wl.size();
}
unsigned long countUnWords(const string& s, set<string>& wl)
{
int m1 = 0;
int m2 = 0;
string substring;
for(m2 = 0; m2 <= s.size(); m2++){
if (m2 != ' ' )
substring = s.substr(m1,m2);
wl.insert(substring);
m1 = m2 + 2;}
}
return wl.size();
int unw = 0;
wl.insert(s);
unw++;
return unw;
}
int main()
{
//stores string
string s;
//stores stats
unsigned long Lines = 0;
unsigned long Words = 0;
unsigned long Chars = 0;
unsigned long ULines = 0;
unsigned long UWords = 0;
//delcare sets
set<string> wl;
while(getline(cin,s))
{
Lines += countLines();
Words += countWords(s);
Chars += countChars(s);
ULines += countUnLines(s,wl);
UWords += countUnWords(s);
cout << Lines << endl;
cout << Words<< endl;
cout << Chars << endl;
cout << ULines << endl;
cout << UWords << endl;
Words = 0;
Chars = 0;
ULines = 0;
UWords = 0;
}
return 0;
}
You are resetting your count variables to zero at the end of your getline while loop. This is why you are only getting results for one line. The user can input multiple lines in your program as it is right now you are just resetting the count.
I think you're headed in the right direction. In order to count unique lines and words you're gonna have to store every line and word in a data structure of some kind, I'd suggest an unordered_map. Each element in the map you'll have a counter for # of occurences of each line/word.
I don't want to give the answer away wholesale, but here are some ideas to get you started.
The function getline() can read in an entire line of input. Do this until there's no more input.
You can use a container like std::set (or better, std::unordered_set) to store the lines read in. Not the most efficient, but it keeps track of all your lines, and only stores the unique ones.
Each line can then be broken down into words. Consider using something like std::stringstream for this.
Store the words in a different std::unordered_set.
The number of unique lines (words) is simply the number of lines (words) stored in the containers. Use the .size() method to obtain this.
Doing the total number of lines, words, and characters can be computed as you read the data in, so I won't go into much detail there.
Each item is googleable, and you may choose to implement different parts differently (if you don't want to use a stringstream, you can always iterate over the line read, for example.) This should get you on the right track.
It's pretty easy to get fairly accurate counts, but can be surprisingly difficult to get correct counts for all of this.
The big problem is the character count. If you open the file (as you usually would) in text mode, the number of characters you count may not match what the OS thinks is there. For the obvious examples, under Windows a CR/LF pair will be translated to a single new-line character, so you'll typically count each line as one character shorter than it really is.
Technically, there's no way to deal with that entirely correctly either -- the translation from external to internal representation when a file is opened in text mode is theoretically arbitrary. At least in theory, opening in binary mode doesn't help a lot either; in binary mode, you can have an arbitrary number of NUL characters after the end of the data that was written to the file.
The latter, however, is pretty much theoretical these days (it was allowed primarily because of CP/M, which most people have long forgotten).
To read lines, but retain the line-end delimiters intact, you can use std::cin.get() instead of std::getline(), then read the delimiters separately from the line itself.
That gives us something like this:
#include <iostream>
#include <set>
#include <string>
#include <iterator>
#include <sstream>
#include <fstream>
int main(int argc, char **argv) {
static char line[4096];
unsigned long chars = 0;
unsigned long words = 0;
unsigned long lines = 0;
std::set<std::string> unique_words;
std::ifstream in(argv[1], std::ios::binary);
while (in.get(line, sizeof(line), '\n')) {
++lines;
chars += strlen(line);
std::istringstream buffer(line);
std::string word;
while (buffer >> word) {
++words;
unique_words.insert(word);
}
while (in.peek() == '\n' || in.peek() == '\r') {
++chars;
in.ignore(1);
}
}
std::cout << "words: " << words << "\n"
<< "lines: " << lines << "\n"
<< "chars: " << chars << "\n"
<< "unique words: " << unique_words.size() << "\n";
}
Note that although this does answer that the OP actually asked at least for most typical OSes (Linux, *BSD, MacOS, Windows), it's probably not what he really wants. My guess is that his teacher isn't really asking for this level of care to try to get an accurate character count.
Also note that if you should encounter a line longer than the buffer, this can still give an inaccurate count of lines -- it'll count each buffer-full of data as a separate line, even if it didn't find a line-delimiter. That can be fixed as well, but it adds still more complexity to a program that's almost certainly already more complex than intended.