I am trying to parse a large text file and split it up into single words using strtok. The delimiters remove all special characters, whitespace, and new lines. For some reason when I printf() it, it only prints the first word and a bunch of (null) for the rest.
ifstream textstream(textFile);
string textLine;
while (getline(textstream, textLine))
{
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] += textLine.length() + 1;
char *line_c = new char[textLine.length() + 1]; // creates a character array the length of the line
strcpy(line_c, textLine.c_str()); // copies the line string into the character array
char *word = strtok(line_c, delimiters); // removes all unwanted characters
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
printf("%s", word);
}
}
Rather than jumping through the hoops necessary to use strtok, I'd write a little replacement that works directly with strings, without modifying its input, something on this general order:
std::vector<std::string> tokenize(std::string const &input, std::string const &delims = " ") {
std::vector<std::string> ret;
int start = 0;
while ((start = input.find_first_not_of(delims, start)) != std::string::npos) {
auto stop = input.find_first_of(delims, start+1);
ret.push_back(input.substr(start, stop-start));
start = stop;
}
return ret;
}
At least to me, this seems to simplify the rest of the code quite a bit:
std::string textLine;
while (std::getline(textStream, textLine)) {
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] += textLine.length() + 1;
auto words = tokenize(textLine, delims);
for (auto const &word : words) {
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n';
std::cout << word << '\n';
}
}
This also avoids (among other things) the massive memory leak you had, allocating memory every iteration of your loop, but never freeing any of it.
Move printf two lines UP.
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
printf("%s", word);
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
}
As #j23 pointed out, your printf is in the wrong location.
As #Jerry-Coffin points out, there are more c++-ish and modern ways to accomplish, what you try to do. Next to avoiding mutation, you can also avoid copying the words out of the text string. (In my code below, we read line by line, but if you know your whole text fits into memory, you could as well read the whole content into a std::string.)
So, using std::string_view avoids to perform extra copies, it being just something like a pointer into your string and a length.
Here, how it looks like, for a use case, where you need not store the words in another data structure - some kind of one-pass processing of words:
#include <iostream>
#include <fstream>
#include <string>
#include <string_view>
#include <cctype>
template <class F>
void with_lines(std::istream& stream, F body) {
for (std::string line; std::getline(stream,line);) {
body(line);
}
}
template <class F>
void with_words(std::istream& stream, F body) {
with_lines(stream,[&body](std::string& line) {
std::string_view line_view{line.cbegin(),line.cend()};
while (!line_view.empty()) {
// skip whitespaces
for (; !line_view.empty() && isspace(line_view[0]);
line_view.remove_prefix(1));
size_t position = 0;
for (; position < line_view.size() &&
!isspace(line_view[position]);
position++);
if (position > 0) {
body(line_view.substr(0,position));
line_view.remove_prefix(position);
}
}
});
}
int main (int argc, const char* argv[]) {
size_t word_count = 0;
std::ifstream stream{"input.txt"};
if(!stream) {
std::cerr
<< "could not open file input.txt" << std::endl;
return -1;
}
with_words(stream, [&word_count] (std::string_view word) {
std::cout << word_count << " " << word << std::endl;
word_count++;
});
std::cout
<< "input.txt contains "
<< word_count << " words."
<< std::endl;
return 0;
}
Related
I have a program that reverses the letters in a sentence but keeps the words in the same order. I need to change the code from an iostream library to an fstream library where the user inputs a sentence into an input file("input.txt") and the program outputs the reverse into an output text file.
example of input:
This problem is too easy for me. I am an amazing programmer. Do you agree?
Example of output:
sihT melborp si oot ysae rof em. I ma na gnizama remmargorp. oD uoy eerga?
The code I already have:
int main()
{
int i=0, j=0, k=0, l=0;
char x[14] = "I LOVE CODING";
char y[14] = {'\0'};
for(i=0; i<=14; i++) {
if(x[i]==' ' || x[i]=='\0') {
for(j=i-1; j>=l; j--)
y[k++] = x[j];
y[k++] = ' ';
l=i+1;
}
}
cout << y;
return 0;
}
I would use std::string to store the string, and benefit from std::vector and const_iterator to make better use of C++ features:
#include <string>
#include <vector>
int main()
{
std::string s("This problem is too easy for me. I am an amazing programmer. Do you agree?");
const char delim = ' ';
std::vector<std::string> v;
std::string tmp;
for(std::string::const_iterator i = s.begin(); i <= s.end(); ++i)
{
if(*i != delim && i != s.end())
{
tmp += *i;
}else
{
v.push_back(tmp);
tmp = "";
}
}
for(std::vector<std::string>::const_iterator it = v.begin(); it != v.end(); ++it)
{
std::string str = *it,b;
for(int i=str.size()-1;i>=0;i--)
b+=str[i];
std::cout << b << " ";
}
std::cout << std::endl;
}
Output:
sihT melborp si oot ysae rof .em I ma na gnizama .remmargorp oD uoy ?eerga
The code that you submitted looks much more like something from C rather than from C++. Not sure if you are familiar std::string and function calls. As the code you wrote is pretty sophisticated, I will assume that you are.
Here is an example of how to use fstream. I almost always you getline for the input because I find that it gets me into fewer problems.
I then almost always use stringstream for parsing the line because it neatly splits the lines at each space.
Finally, I try to figure out a while() or do{}while(); loop that will trigger off of the input from the getline() call.
Note that if the word ends in a punctuation character, to keep the punctuation at the end, the reverse_word() function has to look for non-alpha characters at the end and then save that aside. This could be done by only reversing runs of alphas.
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
///////////////////
/// return true if ch is alpha
/// return false for digits, punctuation, and all else
bool is_letter(char ch){
if((ch >= 'A' && ch <= 'Z') ||
(ch >= 'a' && ch <= 'z')) {
return true;
} else {
return false;
}
}
////////
// Only reverse the letter portion of each word
//
std::string reverse_word(std::string str)
{
std::string output_str; // Probably have to create a copy for output
output_str.reserve(str.length()); // reserve size equal to input string
// iterate through each letter of the string, backwards,
// and copy the letters to the new string
char save_non_alpha = 0;
for (auto it = str.rbegin(); it != str.rend(); it++) {
/// If the last character is punctuation, then save it to paste on the end
if(it == str.rbegin() && !is_letter(*it)) {
save_non_alpha = *it;
} else {
output_str += *it;
}
}
if(save_non_alpha != 0) {
output_str += save_non_alpha;
}
return output_str; // send string back to caller
}
int main()
{
std::string input_file_name{"input.txt"};
std::string output_file_name{"output.txt"};
std::string input_line;
std::ifstream inFile;
std::ofstream outFile;
inFile.open(input_file_name, std::ios::in);
outFile.open(output_file_name, std::ios::out);
// if the file open failed, then exit
if (!inFile.is_open() || !outFile.is_open()) {
std::cout << "File " << input_file_name
<< " or file " << output_file_name
<< " could not be opened...exiting\n";
return -1;
}
while (std::getline(inFile, input_line)) {
std::string word;
std::string sentence;
std::stringstream stream(input_line);
// I just like stringstreams. Process the input_line
// as a series of words from stringstream. Stringstream
// will split on whitespace. Punctuation will be reversed with the
// word that it is touching
while (stream >> word) {
if(!sentence.empty()) // add a space before all but the first word
sentence += " ";
word = reverse_word(word);
sentence += word;
}
outFile << sentence << std::endl;
}
inFile.close();
outFile.close();
return 0;
}
I need to check words inside the string to see whether any of them contains digits, and if it isn't — erase this word. Then print out the modified string
Here's my strugle to resolve the problem, but it doesn't work as I need it to
void sentence_without_latin_character( std::string &s ) {
std::cout << std::endl;
std::istringstream is (s);
std::string word;
std::vector<std::string> words_with_other_characters;
while (is >> word) {
std::string::size_type temp_size = word.find(std::ctype_base::digit);
if (temp_size == std::string::npos) {
word.erase(word.begin(), word.begin() + temp_size);
}
words_with_other_characters.push_back(word);
}
for (const auto i: words_with_other_characters) {
std::cout << i << " ";
}
std::cout << std::endl;
}
This part is not doing what you think it does:
word.find(std::ctype_base::digit);
std::string::find only searches for complete substrings (or single characters).
If you want to search for a set of some characters in a string, use std::string::find_first_of instead.
Another option is testing each character using something like std::isdigit, possibly with an algorithm like std::any_of or with a simple loop.
As Acorn explained, word.find(std::ctype_base::digit) does not search for the first digit. std::ctype_base::digit is a constant that indicates a digit to specific std::ctype methods. In fact there's a std::ctype method called scan_is that you can use for this purpose.
void sentence_without_latin_character( std::string &s ) {
std::istringstream is (s);
std::string word;
s.clear();
auto& ctype = std::use_facet<std::ctype<char>>(std::locale("en_US.utf8"));
while (is >> word) {
auto p = ctype.scan_is(std::ctype_base::digit, word.data(), &word.back()+1);
if (p == &word.back()+1) {
s += word;
if (is.peek() == ' ') s += ' ';
}
}
std::cout << s << std::endl;
}
I'm having difficulty creating a function that reverse the order of the sentence around. I've read many functions on how to recursively reverse the letters around and I have successfully done so, but I do not want to reverse the letters in the words. I want to reverse the placement of the words in the sentence.
Example would be:
This is a sentence.
sentence. a is This
This is my code so far. How do I go from reversing order of letters of the entire sentence to placement order of words in a sentence?
The output of the current code would provide: !dlroW olleH
void reverse(const std::string str)
{
int length = str.size();
if(length > 0)
{
reverse(str.substr(0,length-1));
std::cout << str[0];
}
}
Edit: Additional question. If this was a char array would the logic be different?
Simplify your logic by using a std::istringstream and a helper function. The program below works for me.
#include <sstream>
#include <iostream>
void reverse(std::istringstream& stream)
{
std::string word;
if ( stream >> word )
{
reverse(stream);
std::cout << word << " ";
}
}
void reverse(const std::string str)
{
std::istringstream stream(str);
reverse(stream);
std::cout << std::endl;
}
int main(int argc, char** argv)
{
reverse(argv[1]);
return 0;
}
// Pass string which comes after space
// reverse("This is a sentence.")
// reverse("is a sentence.")
// reverse("a sentence.")
// reverse("sentence.")
// will not find space
// start print only word in that function
void reverse(const std::string str)
{
int pos = str.find_first_of(" ");
if (pos == string::npos) // exit condition
{
string str1 = str.substr(0, pos);
cout << str1.c_str() << " " ;
return;
}
reverse(str.substr(pos+1));
cout << str.substr(0, pos).c_str() << " ";
}
Simple to understand:
void reverse(const std::string str)
{
int pos = str.find_first_of(" ");
if (pos != string::npos) // exit condition
{
reverse(str.substr(pos + 1));
}
cout << str.substr(0, pos).c_str() << " ";
}
std::vector<std::string> splitString(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> tokens;
while (getline(ss, item, delim)) {
tokens.push_back(item);
}
return tokens;
}
void reverseString(const std::string& string) {
std::vector<std::string> words = splitString(string, ' ');
auto end = words.rend();
for (auto it = words.rbegin(); it <= end; it++) {
std::cout << *it << std::endl;
}
}
reverseString("This is a sentence.");
You can split input and print them in inverse order
Or if you want to use recursive structure just move the cout after calling a function like this:
void reverse(const std::string str)
{
std::stringstream ss(str);
std::string firstWord, rest;
if(ss >> firstWord)
{
getline(ss , rest);
reverse(rest);
std::cout << firstWord << " ";
}
}
I am not a C++ programmer, but I would create another array (tempWord[ ]) to store individual word.
Scan each word and store them into tempWord array. In your case, the words are separated by space, so:
a.get the index of the next space,
b substring to the index of the next space and
c. you should get {"This", "is", "a", "sentence."}
Add them up again reversely:
a. loop index i from "tempWord.length -1" to "0"
b. new String = tempWord[i]+" ";
print out result.
I want to read data from stream, which has specific format, such as:
"number:name_that_can_contain_spaces:string,string,string..." without quotes where ... means that I dont know how many strings are there separated with commas and strings can have spaces before and after it but not in the middle of string, I want to stop reading at new line
I only come up with using getline() and store each line into string, but I dont know how to continue, if there is something like strtok(line, ":",":",",","\n") which would parse it for me or I have to parse it myself character by character
example of valid line format is:
54485965:abc abc abc: some string, next string , third string\n
parsed result would be:
int 54485965
string "abc abc abc"
string "some string"
string "next string"
string "third string"
You can read line with std::getline and then split it with std::string::find and std::string::substr. In the code below we read line from file data, then find : (so everything before it becomes number which we parse into int with std::stoi) and throw away first part. Similar we do it with name. And in the end we fill std::list with strings separated by ,.
#include <iostream>
#include <fstream>
#include <string>
#include <list>
#include <exception>
#include <stdexcept>
struct entry {
std::string name;
int number;
std::list<std::string> others;
};
int main(int argc, char** argv) {
std::ifstream input("data");
std::list<entry> list;
std::string line;
while(std::getline(input, line)) {
entry e;
std::string::size_type i = 0;
/* get number from line */
i = line.find(":");
if(i != std::string::npos) {
e.number = stoi(line.substr(0, i));
line = line.substr(i + 1);
} else {
throw std::runtime_error("error reading file");
}
/* get name from line */
i = line.find(":");
if(i != std::string::npos) {
e.name = line.substr(0, i);
line = line.substr(i + 1);
} else {
throw std::runtime_error("error reading file");
}
/* get other strings */
do {
i = line.find(",");
e.others.push_back(line.substr(0, i));
line = line.substr(i + 1);
} while(i != std::string::npos);
list.push_back(e);
}
/* output data */
for(entry& e : list) {
std::cout << "name: " << e.name << std::endl;
std::cout << "number: " << e.number << std::endl;
std::cout << "others: ";
for(std::string& s : e.others) {
std::cout << s << ",";
}
std::cout << std::endl;
}
return 0;
}
vector<string> wordstocheck;
in.open("readin.txt");
string line;
string word = "";
int linecount = 0;
while (getline(in, line))
{
//cout << line << endl;
for (int i = 0; i < line.size(); i++)
{
if(isalpha(line[i]))
{
word.push_back(tolower(line[i]));
}
else if (line[i] == ' ' || ispunct(line[i]) || line[i] == '\n')
{
wordstocheck.push_back(word);
word = "";
}
}
linecount++;
}
for (int i = 0; i < wordstocheck.size(); i++)
{
cout << wordstocheck[i] << endl;
}
system("pause");
}
The code above reads in the following from a .txt file:
If debugging is the
process of removing bugs.
Then programming must be the
process of putting them in.
I'm trying to get the program to recognize each word, and save that individual word into a vector, and then print that vector of words out. It does pretty well with the exception of the two 'the's on the first and third lines.
Output:
if
debugging
is
theprocess
of
removing
bugs
then
programming
must
be
theprocess
of
putting
them
in
Press any key to continue . . .
It doesn't split up "theprocess" as I had hoped.
getline won't read the newline. However, in this case it's relatively simple to work around this problem.
Where you currently have linecount++;, add these lines before it:
if (word != "")
{
wordstocheck.push_back(word);
word = "";
}
You may want to use the same if (word != "") on the first place where you push the word onto wordstocheck since if the text has "A Word", you'd add the word "A" followed by an empty word for as the seconds space triggers the word to be added to the list.
As an alternative, you could get rid of getline, and just use int ch = in.get() to read a character at a time from the input. Then instead of counting lines inside the while()..., and use ch instead of line[i] al through the loop, and then add a second if inside the else if section, which checks for newline and counts up linecount. This would probably make for shorter code.
I believe the problem is that you're expecting the newline character to be included in the result from getline(), which it isn't. It seems like if you take the two lines you already have in that block:
wordstocheck.push_back(word);
word = "";
And add them alongside the line:
linecount++;
Then it should work as you expect.
If you want to read a word at a time, why use std::getline in the first place?
// read the words into a vector of strings:
std::vector<std::string> words{std::istream_iterator<std::string(in),
std::istream_iterator<std::string()};
You can use std::for_each or std::transform to convert everything to lower case, and finally print them out with for (auto const &w : words) std::cout << w << "\n";
So far i know, getline reads a whole line and does not recognize a carriage return. The only way i know is to read the file, by read it char by char.
Here is a example that gives the correct result:
#include <iostream> // std::cin, std::cout
#include <fstream> // std::ifstream
int main ()
{
char str[256];
int line = 1;
int charcount = 0;
std::cout << "Enter the name of an existing text file: ";
std::cin.get (str,256);
std::ifstream is(str);
if (!is)
{
std::cerr << "Error opening file!" << std::endl;
return -1;
}
char c;
while ((c = is.get()) && is.good()) // loop while extraction from file if possible
{
if (c == 10 || c == 13 || c == 32) // if it is a line break or carriage return or space
{
std::cout << std::endl;
line++;
}
else // everything else
{
std::cout << c;
charcount++;
}
}
is.close();
std::cout << std::endl; // close file
std::cout << line << " lines" << std::endl;
std::cout << charcount << " chars" << std::endl;
return 0;
}