Reading in only letters from a text file

Reading in only letters from a text file - c++

I am trying to read in from a text file a poem that contains commas, spaces, periods, and newline character. I am trying to use getline to read in each separate word. I do not want to read in any of the commas, spaces, periods, or newline character. As I read in each word I am capitalizing each letter then calling my insert function to insert each word into a binary search tree as a separate node. I do not know the best way to separate each word. I have been able to separate each word by spaces but the commas, periods, and newline characters keep being read in.
Here is my text file:
Roses are red,
Violets are blue,
Data Structures is the best,
You and I both know it is true.
The code I am using is this:
string inputFile;
cout << "What is the name of the text file?";
cin >> inputFile;
ifstream fin;
fin.open(inputFile);
//Input once
string input;
getline(fin, input, ' ');
for (int i = 0; i < input.length(); i++)
{
input[i] = toupper(input[i]);
}
//check for duplicates
if (tree.Find(input, tree.Current, tree.Parent) == true)
{
tree.Insert(input);
countNodes++;
countHeight = tree.Height(tree.Root);
}
Basically I am using the getline(fin,input, ' ') to read in my input.

I was able to figure out a solution. I was able to read in an entire line of code into the variable line, then I searched each letter of the word and only kept what was a letter and I stored that into word.Then, I was able to call my insert function to insert the Node into my tree.
const int MAXWORDSIZE = 50;
const int MAXLINESIZE = 1000;
char word[MAXWORDSIZE], line[MAXLINESIZE];
int lineIdx, wordIdx, lineLength;
//get a line
fin.getline(line, MAXLINESIZE - 1);
lineLength = strlen(line);
while (fin)
{
for (int lineIdx = 0; lineIdx < lineLength;)
{
//skip over non-alphas, and check for end of line null terminator
while (!isalpha(line[lineIdx]) && line[lineIdx] != '\0')
++lineIdx;
//make sure not at the end of the line
if (line[lineIdx] != '\0')
{
//copy alphas to word c-string
wordIdx = 0;
while (isalpha(line[lineIdx]))
{
word[wordIdx] = toupper(line[lineIdx]);
wordIdx++;
lineIdx++;
}
//make it a c-string with the null terminator
word[wordIdx] = '\0';
//THIS IS WHERE YOU WOULD INSERT INTO THE BST OR INCREMENT FREQUENCY COUNTER IN THE NODE
if (tree.Find(word) == false)
{
tree.Insert(word);
totalNodes++;
//output word
//cout << word << endl;
}
else
{
tree.Counter();
}
}

This is a good time for a technique I've posted a few times before: define a ctype facet that treats everything but letters as white space (searching for imbue will show several examples).
From there, it's a matter of std::transform with istream_iterators on the input side, a std::set for the output, and a lambda to capitalize the first letter.

You can make a custom getline function for multiple delimiters:
std::istream &getline(std::istream &is, std::string &str, std::string const& delims)
{
str.clear();
// the 3rd parameter type and the condition part on the right side of &&
// should be all that differs from std::getline
for(char c; is.get(c) && delims.find(c) == std::string::npos; )
str.push_back(c);
return is;
}
And use it:
getline(fin, input, " \n,.");

You can use std::regex to select your tokens
Depending on the size of your file you can read it either line by line or entirely in an std::string.
To read the file you can use :
std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
and this will do the matching for space separated string.
std::regex word_regex(",\\s]+");
auto what =
std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();
std::vector<std::string> v;
for (;what!=wend ; wend) {
std::smatch match = *what;
V.push_back(match.str());
}
I think to separate tokens separated either by , space or new line you should use this regex : (,| \n| )[[:alpha:]].+ . I have not tested though and it might need you to check this out.

Related

How do i only read the second word from the .txt file

I want to read and extract the actors Surname from opened text file.
I tried to do it like this, but It could only read every other word from the sentences.
The Actors surname ends with a semicolon but I don`t know how to proceed.
(I don't want to use vectors as I don't fully understand them)
bool check=false;
while (!check) //while false
{
string ActorSurname = PromptString("Please enter the surname of the actor:");
while (getline (SecondFile,line)) //got the line. in a loop so keeps doing it
{
istringstream SeperatedWords(line); //seperate word from white spaces
string WhiteSpacesDontExist;
string lastname;
while (SeperatedWords >> WhiteSpacesDontExist >> lastname) //read every word in the line //Should be only second word of every line
{
//cout<<lastname<<endl;
ToLower(WhiteSpacesDontExist);
if (lastname == ActorSurname.c_str())
{
check = true;
}
}
}
}

Assuming that each line of your file contain two words separated with space (and second word ends with semicolon), below is example how you can read second word from such string:
#include <string>
#include <iostream>
int main()
{
std::string text = "John Smith;"; // In your case, 'text' will contain your getline() result
int beginPos = text.find(' ', 0) + 1; // +1 because we don't want to read space character
std::string secondWord;
if(beginPos) secondWord = text.substr(beginPos, text.size() - beginPos - 1); // -1 because we don't want to read semicolon
std::cout << secondWord;
}
Output:
Smith
In this example, we use method find of std::string class. This method returns position of character we look for (or -1 if character wasn't found), which we can use to determine begin index required in substr method.

How to keep track of distinct chars and words?

Trying to write a function that analyzes an input file and outputs info such as distinct characters, the average length of each word, and the number of total words. I'm having trouble figuring out how to keep track of the distinct characters in a string. As an example the line:
To be or not TO BE, THAT IS the question.
Should return 10 total words, 12 distinct characters, and 3.2 average word length.
This is the code I have so far:
void fileInfo(const string& fileName)
{
ifstream in(fileName);
if (in.fail())
{
cout << "Error, bad input file.";
}
string line = "";
int wordTotal = 0;
while (getline(in, line))
{
istringstream ss(line);
string word = "";
while (ss >> word)
{
wordTotal++;
for (size_t i = 0, len = word.size(); i < len; i++)
{
if (word.at(i))
}
}
{
}

One solution is to use a std::unordered_set<char> to store the letters of each word. Since an unordered_set does not store duplicates, you end up with a set of distinct letters.
Second, you only want to count alphabetic characters, not punctuation or digits before plading in a set. Thus you need to filter each character to ensure it is alphabetic.
void fileInfo(const string& fileName)
{
std::unordered_set<char> cSet;
//...
while (ss >> word)
{
wordTotal++;
for (auto v : word)
{
if (std::isalpha(v))
cSet.insert(std::tolower(v));
}
}
//...
}
Live Example
The word is only inserted into the set if it is alphabetic. Also note that the letter inserted is the lower case version.

How can I ignore the "end of line" or "new line" character when reading text files word by word?

Objective:
I am reading a text file word by word, and am saving each word as an element in an array. I am then printing out this array, word by word. I know this could be done more efficiently, but this is for an assignment and I have to use an array.
I'm doing more with the array, such as counting repeated elements, removing certain elements, etc. I also have successfully converted the files to be entirely lowercase and without punctuation.
Current Situation:
I have a text file that looks like this:
beginning of file
more lines with some bizzare spacing
some lines next to each other
while
others are farther apart
eof
Here is some of my code with itemsInArray initialized at 0 and an array of words refered to as wordArray[ (approriate length for my file ) ]:
ifstream infile;
infile.open(fileExample);
while (!infile.eof()) {
string temp;
getline(infile,temp,' '); // Successfully reads words seperated by a single space
if ((temp != "") && (temp != '\n') && (temp != " ") && (temp != "\n") && (temp != "\0") {
wordArray[itemsInArray] = temp;
itemsInArray++;
}
The Problem:
My code is saving the end of line character as an item in my array. In my if statement, I've listed all of the ways I have tried to disclude the end of line character, but I've had no luck.
How can I prevent the end of line character from saving as an item in my array?
I've tried a few other methods I have found on threads similar to this, including something with a *const char that I couldn't make work, as well as iterating through and deleting the new line characters. I've been working on this for hours, I don't want to repost the same issue, and have tried many many methods.

The standard >> operator overloaded for std::string already uses white-space as word boundary so your program can be simplified a lot.
#include <iostream>
#include <string>
#include <vector>
int
main()
{
std::vector<std::string> words {};
{
std::string tmp {};
while (std::cin >> tmp)
words.push_back(tmp);
}
for (const auto& word : words)
std::cout << "'" << word << "'" << std::endl;
}
For the input you are showing, this will output:
'beginning'
'of'
'file'
'more'
'lines'
'with'
'some'
'bizzare'
'spacing'
'some'
'lines'
'next'
'to'
'each'
'other'
'while'
'others'
'are'
'farther'
'apart'
'eof'
Isn't this what you want?

The stream's extraction operator should take care of that for you
std::ifstream ifs("file.txt");
while (ifs.good())
{
std::string word;
ifs >> word;
if (ifs.eof())
{
break;
}
std::cout << word << "\n";
}

int main()
{
char *n;
int count=0,count1=0;
ofstream output("user.txt");
output<<"aa bb cc";
output.close();
ifstream input("user.txt");
while(!input.eof())
{
count++;
if(count1<count)
cout<<" ";
count1=count;
input>>n;
cout<<n;
}
cout<<"\ncount="<<count;
getch();
}

Reading Words From Strings C++ While Ignoring Whitespace, Numbers, and Symbols.

I am trying to write write a program that reads a text file, counts each unique word, and then sorts the list of unique words and lists the number of occurrences of each word. However, I cannot seem to read in a single word from a string without messing up and reading in letters, numbers, and symbols. I've read other topics, but my logic is severely flawed in some way that I don't see.
int main()
{
fstream fp;
string line;
fp.open("syllabus.txt", ios::in);
getline(fp, line);
string word = findWords(line);
cout << word << endl;
}
string findWords(string &line)
{
int j = 0;
string word;
for(int i = 0; i < line.size(); i++)
{
while(isalpha((unsigned char)line[j]) != 0 && isdigit((unsigned char)line[j]) != 1)
j++;
word += line.substr(0, j) + " + ";
line = line.substr(j, (line.size() - j));
}
return word;
}

There's lot's of things wrong with your chunk of code. For one you don't want to change line while you iterate through it. As a rule you shouldn't change what your iterating on. You want a start index and a end index (that you find from a search).
Here's a trick for you, you can read a single word with the >> operator
ifstream fp( "syllabus.txt" );
string word;
vector<string> words;
while (fp>> word)
words.push_back(word);

You just read one line in your main but in question part you said you want to read the whole file
Why you define findwords for taking address of string but give string ?
i < line.size() your for condition case is wrong it is quite possible to exceed your string and get seg fault with this condition.

This loop looks rather strange:
for(int i = 0; i < line.size(); i++)
{
while(isalpha((unsigned char)line[j]) != 0 && isdigit((unsigned char)line[j]) != 1)
j++;
word += line.substr(0, j) + " + ";
line = line.substr(j, (line.size() - j));
}
Your "line" is being modified inside the loop but your "i" does not reset to the start of your new string when that happens. "i" is irrelevant in your loop anyway, it doesn't appear anywhere in it.
So why this loop?
As for the solution, there are multiple ways of doing it.
The simplest if you want to loop is to load the line into a string then use string::find_first_not_of where you have a string of all the alphabetic characters. That might not be the most efficient or even the most elegant. This returns a position, which will be std::string::npos for end of string or the position of the first non-alphabetic character.
The next simplest is a regular std::find algorithm which takes iterators and allows you to put in your own predicate, and you can put this base on not being alphabetic. Using C++11 it is easy enough to write a lambda based on isalpha (either the old C version or an enhanced C++ version using locale if your strings may contain characters outside the regular character set). This will return an iterator, either the end() of the string or the position of the first non-alphabetic character.

c++ strings and file input

Ok, its been a while since I've done any file input or string manipulation but what I'm attempting to do is as follows
while(infile >> word) {
for(int i = 0; i < word.length(); i++) {
if(word[i] == '\n') {
cout << "Found a new line" << endl;
lineNumber++;
}
if(!isalpha(word[i])) {
word.erase(i);
}
if(islower(word[i]))
word[i] = toupper(word[i]);
}
}
Now I assume this is not working because >> skips the new line character?? If so, whats a better way to do this.

I'll guess that word is a std::string. When using >>, the first white-space character terminates the 'word' and the next invocation will skip white-space so no white-space while occur in word.
You don't say what you're actually trying to do but for line based input you should consider using the free function std::getline and then splitting each line into words as a separate step.
E.g.
std::string line;
while( std::getline( std::cin, line ) )
{
// parse line
}

There is getline function.

How about using getline()?
string line;
while(getline(infile, line))
{
//Parse each line into individual words and do whatever you're going to do with them.
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading in only letters from a text file - c++

Related

How do i only read the second word from the .txt file

How to keep track of distinct chars and words?

How can I ignore the "end of line" or "new line" character when reading text files word by word?

Reading Words From Strings C++ While Ignoring Whitespace, Numbers, and Symbols.

c++ strings and file input

Categories

Resources