unique words from a file c++ - c++

ts been 3 days i just cant identify whats wrong with the program the program should compare words by words instead it only comparing a character to charcter its is showing like if i have words like (aaa bbb cc dd ) the result its printing is a b and same is the sentence file if i put paragraphs to compare its only comparing few character please help me
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream myfile("unique.text");
int count = 0;
string temp;
string a;
int i,j;
while(getline(myfile,temp))
{
for(i=0 ; i < sizeof(temp); i++)
{
for(int j = 0; j < i; j++)
{
if (temp[i] == temp[j])
break;
}
if (i == j)
cout << temp [i] <<" , ";
}
myfile.close ();
}

You have a couple of problems
temp is of type string. sizeof is not the way to determine the length of a string (it's used for determining things like the number of bytes in an int). You want:
temp.length()
Secondly, indexing into a string (temp[n]) gives you the nth character, not the nth word.
You can make getline split into words by adding a third delimiter parameter:
getline (myfile, temp, ' '))

So, some bugs in your code.
Mixing up characters and strings, closing the file in the while loop and not storing last words.
One recommenadtion. Before you write code, write comments for what you want to do.
Meaning, make a design, before you start coding. That is very important.
For your problem at hand in the title of this thread:
unique words from a file c++
I prepared 3 different solutions. The first is just using very simple constructs. The second is using a std::vector. And, the 3rd is the C++ solution using the C++ algorithm library.
Please see:
Simple, but lengthy
And not recommended, because we should not use raw pointers for owned memory and should not use new
#include <iostream>
#include <fstream>
#include <string>
const std::string fileName{ "unique.text" };
unsigned int numberOfWords() {
// Here we will count the number of words in the file
unsigned int counter = 0;
// Open the file. File must not be already open
std::ifstream sourceFileStream(fileName);
// Check, if we could open the file
if (sourceFileStream) {
// Simply read all words and increment the counter
std::string temp;
while (sourceFileStream >> temp) ++counter;
}
else {
// In case of problem
std::cerr << "\nCould not open file '" << fileName << "'\n";
}
return counter;
}
int main() {
// Get the number of words in the source file
unsigned size = numberOfWords();
// Allocate a dynamic array of strings. Size is the count of the words in the file
// Including doubles. So we will waste a little bit of space
std::string* words = new std::string[size+1];
// Open the source file
std::ifstream sourceFileStream(fileName);
// Check, if it could be opened
if (sourceFileStream) {
// We will read first into a temporary variable
std::string temp;
// Her we will count number of the unique words
unsigned int wordCounter = 0;
// Read all words in the file
while (sourceFileStream >> temp) {
// We will search, if we have read alread the word before. We assume NO for the beginning
bool wordIsAlreadyPresent = false;
// Go through all alread read words, and check, if the just read word is already existing
for (unsigned int i = 0; i < wordCounter; ++i) {
// Check, if just read word is already in the word array
if (temp == words[i]) {
// Yes it is, set flag, and stop the loop.
wordIsAlreadyPresent = true;
break;
}
}
// if the word was not already there
if (! wordIsAlreadyPresent) {
// Then add the just read temporary word into our array
words[wordCounter] = temp;
// And increment the counter
++wordCounter;
}
}
// Show all read unique words
for (unsigned int i = 0; i < wordCounter; ++i) {
std::cout << words[i] << "\n";
}
}
else { // In case of error
std::cerr << "\nCould not open file '" << fileName << "'\n";
}
delete[] words;
}
Using a vector. Already more compact and better readable
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
const std::string fileName{ "unique.text" };
int main() {
// Open the source file
std::ifstream sourceFileStream(fileName);
// Check, if the source file is oepen
if (sourceFileStream) {
// Temporary string for holding just read words
std::string temp;
// In this vector we will store all unique words
std::vector<std::string> words;
// Read all words from the source file
while (sourceFileStream >> temp) {
// We will search, if we have read alread the word before. We assume NO for the beginning
bool wordIsAlreadyPresent = false;
// Go through all alread read words, and check, if the just read word is already existing
for (unsigned int i = 0; i < words.size(); ++i) {
// Check, if just read word is already in the word vector
if (temp == words[i]) {
// Yes it is, set flag, and stop the loop.
wordIsAlreadyPresent = true;
break;
}
}
// if the word was not already there
if (not wordIsAlreadyPresent) {
// Then add the just read temporary word into our array
words.push_back(temp);
}
}
for (unsigned int i = 0; i < words.size(); ++i) {
std::cout << words[i] << "\n";
}
}
else {
std::cerr << "\nCould not open file '" << fileName << "'\n";
}
}
And 3., more advance C++ programming. Just very few lines and elegant code.
But too difficult to understand for starters.
#include <iostream>
#include <fstream>
#include <set>
#include <string>
#include <iterator>
#include <algorithm>
const std::string fileName{ "unique.text" };
int main() {
// Open the source file and check, if it could be opend and there is no failure
if (std::ifstream sourceFileStream(fileName); sourceFileStream) {
// Read all words (everything delimited by a white space) into a set
std::set words(std::istream_iterator<std::string>(sourceFileStream), {});
// Now we have a set with all unique words. Show this on the screen
std::copy(words.begin(), words.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
// If we could not open the source file
else {
std::cerr << "\nCould not open file '" << fileName << "'\n";
}
return 0;
}

Related

C++ While loop scoping issue

I recently started learning C++, and I'm currently trying build a tool that I recently built in python. My issue is that I can't figure out how I can make the length of the list of words global for example:
#include <iostream>
#include <fstream>
#include <string>
int main(int argc, char* argv[]) {
....
fstream file;
file.open(argv[i], ios::in);
if (file.is_open()) {
string tp;
while (getline(file, tp)) {
// cout << tp << "\n" << endl;
string words[] = {tp};
int word_count = sizeof(words) / sizeof(words[0]);
for (int e = 0; e < word_count; e++) {
cout << word_count[e];
}
}
file.close();
} else {
cout << "E: File / Directory " << argv[i] << " Does not exist";
return 0;
}
....
}
Where it says int word_count = sizeof(words) / sizeof(words[0]); and tring words[] = {tp};, I want to be able to use that globaly so that I can then later use then length of the array and the array itself later on so that I can loop through it and use them in another statement.
Can someone tell me how to do so?
And by the way, I've only been doing C++ for about 4 days so please dont get annoyed if I don't understand what you tell me.
string words[] = {tp}; creates an array with one element in it, the whole line. The name words implies that you instead want to store the individual words. The scope is also wrong if you want to use it after the loop is done. You need to declare it before the loop. Use a std::vector<std::string> to store the words. It could look like this:
#include <vector>
int main(int argc, char* argv[]) {
std::vector<std::string> words;
// ...
if (file) {
std::string word;
while (file >> word) { // read one word
words.push_back(word); // store it in the vector<string>
}
std::cout << "word count: " << words.size() << '\n';
// print all the words in the vector:
for(std::string& word : words) {
std::cout << word << '\n';
}
// file.close() // not needed, it'll close automatically when it goes out of scope
}
You have a number of issues here. First, rather than adding tp to an array called words on each iteration of the loop, you're actually creating a new array words and assigning tp to the first element of the array. words is destroyed at the end of each iteration of the while loop, but even if it wasn't, the {tp} expression would overwrite whatever was already in the array. Your line to calculate the length will therefore always calculate the length as 1.
Next, word_count[e] is trying to read element e of an array called word_count, but word_count is actually an integer. I'd be surprised if this actually compiled.
First I'd recommend you use a std::vector, and then if you just increase its scope to outside the loop then you can use it elsewhere. For example:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
int main(int argc, char* argv[]) {
fstream file;
file.open(argv[i], ios::in);
std::vector<std::string> words;
if (file.is_open()) {
string tp;
while (getline(file, tp)) {
words.push_back(tp); // Add tp to the vector of words
}
int word_count = words.size(); // vectors 'know' their own size.
for (int e = 0; e < word_count; e++) {
cout << words[e];
}
file.close();
} else {
cout << "E: File / Directory " << argv[i] << " Does not exist";
return 0;
}
}
// You can now use words as you please, because it is still in scope.
}

stock data from file into arrays c++

I have this specific code to read integers from a text file:
#include <iostream>
#include <fstream>
#include <string>
#include <cstdlib>
using namespace std;
bool contains_number(const string &c);
int main()
{
int from[50], to[50];
int count = 0;
{
string line1[50];
ifstream myfile("test.txt");
int a = 0;
if (!myfile)
{
cout << "Error opening output file" << endl;
}
while (!myfile.eof())
{
getline(myfile, line1[a]);
if (contains_number(line1[a]))
{
count += 1;
myfile >> from[a];
myfile >> to[a];
//cout << "from:" << from[a] << "\n";
//cout << "to:" << to[a] << "\n";
}
}
}
return 0;
}
bool contains_number(const string &c)
{
return (c.find_first_of("1:50") != string::npos);
}
I need to stock these values of from[] and to[] in 2 arrays to use them n another function, I tried to create 2 arrays in a simple way and affect the values for example:
int x[], y[];
myfile >> from[a];
for(int i=0; i<50;i++)
{
x[i] = from[i];
}
but it doesn't work. It seems that this way is only to read and display and a value in from will be deleted once another value comes.
Any help?
Thanks.
You're not incrementing your array index a in your loop. This results in line[0], to[0] and from[0] to be overwritten for every line in the file where contains_number returns true.
There is no reason for you to save your lines into memory. You can just process your lines as you go through the file (i.e. create a string line variable in your while loop).
Make sure you properly close your file handle.
Aside from that you should check your index bounds in the loop (a < 50), else you might be writing out of bounds of your arrays if your file has more numbers than 50.
A better solution yet would be to use vectors instead of arrays, especially if your file may contain any number of numbers.

Why isn't a string vector value converted to cstring the equivalent of manually writing the string?

I have an input file that contains a list of .txt files in a folder. I loop through the input file just fine and put the .txt file filepaths in string vectors. However, when I try to open another ifstream using one of the filepaths in the sections vector (string vector value converted to cstring),
std::ifstream secFile(sections[i].c_str());
The line secFile.fail() returns true meaning it fails. If I instead use the currently commented out line that hardcodes a filepath (manually writing the string) rather than getting it from a vector,
//std::ifstream secFile("test2/main0.txt");
it no longer fails. I even tried outputting sections[0].c_str() and "test2/main0.txt" to a text file and the text for each is exactly the same. I even compared the hexadecimal values for the text file and there were no invisible characters that might cause such an issue.
Any idea what the problem might be?
Here is my code:
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cstring>
//using namespace std;
int main (int argc, char* argv[]){
if(argc != 2){
return(0);
}
std::vector<std::string> sections;
std::vector<std::string> overlaps;
std::ifstream file(argv[1]);
std::string str;
std::string secLine;
std::string overlapLine;
std::string strLow;
std::string wholePage = "";
//determine if input text file is overlap text or main text
while (getline(file, str))
{
if (str.find("overlap")!=-1){
overlaps.push_back(str);
}
else{
sections.push_back(str);
}
}
file.clear();
for(int i = 0; i < sections.size();i++){
//HERE IS MY QUESTION
std::ifstream secFile(sections[i].c_str());
//std::ifstream secFile("test2/main0.txt");
if(secFile.good()){
std::cout << "\ngood4\n";
}
if(secFile.bad()){
std::cout << "bad4\n";
}
if(secFile.fail()){
std::cout << "fail4\n";
}
if(secFile.eof()){
std::cout << "eof4\n";
}
int secLength = 0;
//determine number of files in test2/
while (getline(secFile,secLine)){
secLength++;
}
secfile.clear();
secfile.seekg(0);
int j = 0;
while (getline(secFile,secLine)){
if (i == 0 && j==0){
wholePage += std::string(secLine) + "\n";
}
else if(j==0){
//do nothing
}
else if(i == (sections.size()-1) && j == secLength){
wholePage += std::string(secLine) + "\n";
}
else if(j == secLength){
//do nothing
}
else{
wholePage += std::string(secLine) + "\n";
}
j++;
}
int k = 0;
if(i < sections.size()-1){
std::ifstream overFile(overlaps[i].c_str());
int overLength = 0;
while (getline(overFile,overlapLine)){
overLength++;
}
while (getline(overFile,overlapLine)){
std::cout << "Hi5";
if(k == 0){
//do nothing
}
else if(k == overLength){
//do nothing
}
else{
if (wholePage.find(overlapLine)){
//do nothing
}
else{
wholePage += std::string(secLine) + "\n";
}
}
}
k++;
}
}
std::ofstream out("output.txt");
out << wholePage;
out.close();
std::cout << "\n";
return 0;
}
You haven't provided enough information to be sure, but the most likely problem is whitespace. getline doesn't strip the trailing whitespace from the lines it produces, so you might be trying to open a file named "test2/main0.txt " (trailing space), which is distinct from "test2/main0.txt". You'll want to trim trailing whitespace in most cases, likely before storing the string to your vector. Since some whitespace can legally be part of a filename, the real solution would be to make sure the garbage whitespace isn't there, but trailing whitespace is filenames is rare enough that you could just hope the file names don't use it.
Here you are passing a filename:
std::ifstream secFile("test2/main0.txt");
Here you are passing a line of text from a file:
std::ifstream secFile(sections[i].c_str());
ifstream expects a filename, not a line of text from a file. It is failing because the text you are inputting doesn't represent a file you are trying to open.

Read a file of strings with quotes and commas into string array

Let's say I have a file of names such as:
"erica","bosley","bob","david","janice"
That is, quotes around each name, each name separated by a comma with no space in between.
I want to read these into an array of strings, but can't seem to find the ignore/get/getline/whatever combo to work. I imagine this is a common problem but I'm trying to get better at file I/O and don't know much yet. Here's a basic version that just reads in the entire file as one string (NOT what I want, obviously):
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
fstream iFile("names.txt", ios::in);
string names[5];
int index = 0;
while(iFile)
{
iFile >> names[index];
index++;
}
for(int i = 0; i < 5; i++)
{
cout << "names[" << i << "]: " << names[i] << endl;
}
Output:
names[0]: "erica","bosley","bob","david","janice"
names[1]:
names[2]:
names[3]:
names[4]:
Also, I understand why it all gets read as a single string, but then why are the remaining elements not filled with garbage?
To be clear, I want the output to look like:
names[0]: erica
names[1]: bosley
names[2]: bob
names[3]: david
names[4]: janice
The easiest way to handle this:
Read the entire file and place it into a string, Here is an example of how to do it.
Split the string that you got from number 1. Here is an example of how to do that.
Stream extraction delimits by a space. Therefore the entire file gets read as one string. What you want instead is to split the string by commas.
#include <iostream>
#include <fstream>
#include <algorithm>
#include <sstream>
fstream iFile("names.txt", ios::in);
string file;
iFile >> file;
std::istringstream ss(file);
std::string token;
std::vector<std::string> names;
while(std::getline(ss, token, ',')) {
names.push_back(token);
}
To remove the quotes, use this code:
for (unsigned int i = 0; i < names.size(); i++) {
auto it = std::remove_if(names[i].begin(), names[i].end(), [&] (char c) { return c == '"'; });
names[i] = std::string(names[i].begin(), it);
}
remove_if returns the end iterator for the transformed string, which is why you construct the new string with (s.begin(), it).
Then output it:
for (unsigned int i = 0; i < names.size(); i++) {
std::cout << "names["<<i<<"]: " << names[i] << std::endl;
}
Live Example

how to count the characters in a text file

im trying to count the characters inside a text file in c++, this is what i have so far, for some reason im getting 4. even thou i have 123456 characters in it. if i increase or decrease the characters i still get 4, please help and thanks in advance
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
const char FileName[] = "text.txt";
int main ()
{
string line;
ifstream inMyStream (FileName);
int c;
if (inMyStream.is_open())
{
while( getline (inMyStream, line)){
cout<<line<<endl;
c++;
}
}
inMyStream.close();
system("pause");
return 0;
}
You're counting the lines.
You should count the characters. change it to:
while( getline ( inMyStream, line ) )
{
cout << line << endl;
c += line.length();
}
There are probably hundreds of ways to do that.
I believe the most efficient is:
inMyStream.seekg(0,std::ios_base::end);
std::ios_base::streampos end_pos = inMyStream.tellg();
return end_pos;
First of all, you have to init a local var, this means:
int c = 0;
instead of
int c;
I think the old and easy to understand way is to use the get() function till the end char EOF
char current_char;
if (inMyStream.is_open())
{
while(inMyStream.get(current_char)){
if(current_char == EOF)
{
break;
}
c++;
}
}
Then c will be the count of the characters
this is how i would approach the problem:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main ()
{
string line;
int sum=0;
ifstream inData ;
inData.open("countletters.txt");
while(!inData.eof())
{
getline(inData,line);
int numofChars= line.length();
for (unsigned int n = 0; n<line.length();n++)
{
if (line.at(n) == ' ')
{
numofChars--;
}
}
sum=numofChars+sum;
}
cout << "Number of characters: "<< sum << endl;
return 0 ;
}
Just use good old C FILE pointers:
int fileLen(std::string fileName)
{
FILE *f = fopen(fileName.c_str(), "rb");
if (f == NULL || ferror(f))
{
if (f)
fclose(f);
return -1;
}
fseek(f, 0, SEEK_END);
int len = fell(f);
fclose(f);
return len;
}
I found out this simple method , hope this helps
while(1)
{
if(txtFile.peek() == -1)
break;
c = txtFile.get();
if(c != txtFile.eof())
noOfChars++;
}
This works for sure, it is designed to read character by character.
It could be easily put into a class and you may apply function for every char, so you may check for '\n', ' ' and so on. Just have some members in your class, where they can be saved, so you may only return 0 and use methods to get what exactly you want.
#include <iostream>
#include <fstream>
#include <string>
unsigned long int count(std::string string)
{
char c;
unsigned long int cc = 0;
std::ifstream FILE;
FILE.open(string);
if (!FILE.fail())
{
while (1)
{
FILE.get(c);
if (FILE.eof()) break;
cc++; //or apply a function to work with this char..eg: analyze(c);
}
FILE.close();
}
else
{
std::cout << "Counter: Failed to open file: " << string << std::endl;
}
return cc;
};
int main()
{
std::cout << count("C:/test/ovecky.txt") << std::endl;
for (;;);
return 0;
}
C++ provides you with a simple set of functions you can use to retrieve the size of stream segment.
In your case, we want to find the file end, which can be done by using fstream::seekg, and providing the fstream::end.
note that fstream is not implementing the end iterator overload, this is it's own end constant
When we've seeked towards the end of the file, we want to get the position of the stream pointer, using tellg (also known as the character count in our case).
But we're not done yet. We need to also set the stream pointer to its original position, otherwise we'll be reading from the end of the file. Something we don't want to do.
So lets call fstream::seekg again, but this time set the position to the begining of the file using fstream::beg
std::ifstream stream(filepath);
//Seek to end of opened file
stream.seekg(0, stream.end);
int size = stream.tellg();
//reset file pointer to the beginning of the file
stream.seekg(0, stream.beg);