I need to check words inside the string to see whether any of them contains digits, and if it isn't — erase this word. Then print out the modified string
Here's my strugle to resolve the problem, but it doesn't work as I need it to
void sentence_without_latin_character( std::string &s ) {
std::cout << std::endl;
std::istringstream is (s);
std::string word;
std::vector<std::string> words_with_other_characters;
while (is >> word) {
std::string::size_type temp_size = word.find(std::ctype_base::digit);
if (temp_size == std::string::npos) {
word.erase(word.begin(), word.begin() + temp_size);
}
words_with_other_characters.push_back(word);
}
for (const auto i: words_with_other_characters) {
std::cout << i << " ";
}
std::cout << std::endl;
}
This part is not doing what you think it does:
word.find(std::ctype_base::digit);
std::string::find only searches for complete substrings (or single characters).
If you want to search for a set of some characters in a string, use std::string::find_first_of instead.
Another option is testing each character using something like std::isdigit, possibly with an algorithm like std::any_of or with a simple loop.
As Acorn explained, word.find(std::ctype_base::digit) does not search for the first digit. std::ctype_base::digit is a constant that indicates a digit to specific std::ctype methods. In fact there's a std::ctype method called scan_is that you can use for this purpose.
void sentence_without_latin_character( std::string &s ) {
std::istringstream is (s);
std::string word;
s.clear();
auto& ctype = std::use_facet<std::ctype<char>>(std::locale("en_US.utf8"));
while (is >> word) {
auto p = ctype.scan_is(std::ctype_base::digit, word.data(), &word.back()+1);
if (p == &word.back()+1) {
s += word;
if (is.peek() == ' ') s += ' ';
}
}
std::cout << s << std::endl;
}
Related
I am trying to parse a large text file and split it up into single words using strtok. The delimiters remove all special characters, whitespace, and new lines. For some reason when I printf() it, it only prints the first word and a bunch of (null) for the rest.
ifstream textstream(textFile);
string textLine;
while (getline(textstream, textLine))
{
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] += textLine.length() + 1;
char *line_c = new char[textLine.length() + 1]; // creates a character array the length of the line
strcpy(line_c, textLine.c_str()); // copies the line string into the character array
char *word = strtok(line_c, delimiters); // removes all unwanted characters
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
printf("%s", word);
}
}
Rather than jumping through the hoops necessary to use strtok, I'd write a little replacement that works directly with strings, without modifying its input, something on this general order:
std::vector<std::string> tokenize(std::string const &input, std::string const &delims = " ") {
std::vector<std::string> ret;
int start = 0;
while ((start = input.find_first_not_of(delims, start)) != std::string::npos) {
auto stop = input.find_first_of(delims, start+1);
ret.push_back(input.substr(start, stop-start));
start = stop;
}
return ret;
}
At least to me, this seems to simplify the rest of the code quite a bit:
std::string textLine;
while (std::getline(textStream, textLine)) {
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] += textLine.length() + 1;
auto words = tokenize(textLine, delims);
for (auto const &word : words) {
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n';
std::cout << word << '\n';
}
}
This also avoids (among other things) the massive memory leak you had, allocating memory every iteration of your loop, but never freeing any of it.
Move printf two lines UP.
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
printf("%s", word);
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
}
As #j23 pointed out, your printf is in the wrong location.
As #Jerry-Coffin points out, there are more c++-ish and modern ways to accomplish, what you try to do. Next to avoiding mutation, you can also avoid copying the words out of the text string. (In my code below, we read line by line, but if you know your whole text fits into memory, you could as well read the whole content into a std::string.)
So, using std::string_view avoids to perform extra copies, it being just something like a pointer into your string and a length.
Here, how it looks like, for a use case, where you need not store the words in another data structure - some kind of one-pass processing of words:
#include <iostream>
#include <fstream>
#include <string>
#include <string_view>
#include <cctype>
template <class F>
void with_lines(std::istream& stream, F body) {
for (std::string line; std::getline(stream,line);) {
body(line);
}
}
template <class F>
void with_words(std::istream& stream, F body) {
with_lines(stream,[&body](std::string& line) {
std::string_view line_view{line.cbegin(),line.cend()};
while (!line_view.empty()) {
// skip whitespaces
for (; !line_view.empty() && isspace(line_view[0]);
line_view.remove_prefix(1));
size_t position = 0;
for (; position < line_view.size() &&
!isspace(line_view[position]);
position++);
if (position > 0) {
body(line_view.substr(0,position));
line_view.remove_prefix(position);
}
}
});
}
int main (int argc, const char* argv[]) {
size_t word_count = 0;
std::ifstream stream{"input.txt"};
if(!stream) {
std::cerr
<< "could not open file input.txt" << std::endl;
return -1;
}
with_words(stream, [&word_count] (std::string_view word) {
std::cout << word_count << " " << word << std::endl;
word_count++;
});
std::cout
<< "input.txt contains "
<< word_count << " words."
<< std::endl;
return 0;
}
I use this function to replace words and phrases in a file and it works great
Just one problem
If the sentence I want to replace has the '+' sign as part of the sentence then the function does not replace anything in this sentence and it remains the same and everything because of this plus sign
int Replace(std::string Rfrom, std::string Rto) {
auto from = "Replace+txt", to = "sentence";
for (auto filename : { "A.txt", "B.txt" }) {
ifstream infile{ filename };
string content{ ist {infile}, ist{} };
infile.close();
ofstream outfile{ filename };
regex_replace(ost{ outfile }, begin(content), end(content), regex{ from }, to);
}
return 0;
}
I also tried to change the + sign to String and it did not work either
char c = '+';
std::string s;
s.push_back(c);
auto from = "Replace"+s+"txt", to = "sentence";
Update to my question
As suggested here "Replace\+text" works great. add "\" to the "+".
But the problem is that I have a function that automatically finds the sentences I want to replace. Basically, the function finds all the sentences that start with the "//" sign because what I want is to automatically delete all the notes from the file of my software. So I also need a function that automatically adds the "\" to any "+" found in sentences. Do you have an idea how to do this please?
int countSub(const std::string& str, const std::string& sub)
{
if (sub.length() == 0) return 0;
int count = 0;
for (size_t offset = str.find(sub); offset != std::string::npos;
offset = str.find(sub, offset + sub.length()))
{
++count;
}
return count;
}
int line(std::string mfile) {
string fline;
std::string lineFix;
int i = 0;
ifstream myfile(mfile);
if (myfile.is_open()) {
while (!myfile.eof())
{
getline(myfile, fline);
if (countSub(fline, "//") != 0) {
cout << "line [" << ++i << "]:" << fline << " | Successfully deleted" << endl;
Replace(fline, " ");
}
}
}
return 0;
}
Many thanks
The character '+' has special meaning in a regular expression. It means "match the previous thing one or more times", so the pattern "Replace+txt" will match "Replacetxt" or "Replaceetxt" or "Replaceeetxt", etc.
If you want to match a literal '+' character, you need to escape it in your pattern: "Replace\\+txt".
I'm having difficulty creating a function that reverse the order of the sentence around. I've read many functions on how to recursively reverse the letters around and I have successfully done so, but I do not want to reverse the letters in the words. I want to reverse the placement of the words in the sentence.
Example would be:
This is a sentence.
sentence. a is This
This is my code so far. How do I go from reversing order of letters of the entire sentence to placement order of words in a sentence?
The output of the current code would provide: !dlroW olleH
void reverse(const std::string str)
{
int length = str.size();
if(length > 0)
{
reverse(str.substr(0,length-1));
std::cout << str[0];
}
}
Edit: Additional question. If this was a char array would the logic be different?
Simplify your logic by using a std::istringstream and a helper function. The program below works for me.
#include <sstream>
#include <iostream>
void reverse(std::istringstream& stream)
{
std::string word;
if ( stream >> word )
{
reverse(stream);
std::cout << word << " ";
}
}
void reverse(const std::string str)
{
std::istringstream stream(str);
reverse(stream);
std::cout << std::endl;
}
int main(int argc, char** argv)
{
reverse(argv[1]);
return 0;
}
// Pass string which comes after space
// reverse("This is a sentence.")
// reverse("is a sentence.")
// reverse("a sentence.")
// reverse("sentence.")
// will not find space
// start print only word in that function
void reverse(const std::string str)
{
int pos = str.find_first_of(" ");
if (pos == string::npos) // exit condition
{
string str1 = str.substr(0, pos);
cout << str1.c_str() << " " ;
return;
}
reverse(str.substr(pos+1));
cout << str.substr(0, pos).c_str() << " ";
}
Simple to understand:
void reverse(const std::string str)
{
int pos = str.find_first_of(" ");
if (pos != string::npos) // exit condition
{
reverse(str.substr(pos + 1));
}
cout << str.substr(0, pos).c_str() << " ";
}
std::vector<std::string> splitString(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> tokens;
while (getline(ss, item, delim)) {
tokens.push_back(item);
}
return tokens;
}
void reverseString(const std::string& string) {
std::vector<std::string> words = splitString(string, ' ');
auto end = words.rend();
for (auto it = words.rbegin(); it <= end; it++) {
std::cout << *it << std::endl;
}
}
reverseString("This is a sentence.");
You can split input and print them in inverse order
Or if you want to use recursive structure just move the cout after calling a function like this:
void reverse(const std::string str)
{
std::stringstream ss(str);
std::string firstWord, rest;
if(ss >> firstWord)
{
getline(ss , rest);
reverse(rest);
std::cout << firstWord << " ";
}
}
I am not a C++ programmer, but I would create another array (tempWord[ ]) to store individual word.
Scan each word and store them into tempWord array. In your case, the words are separated by space, so:
a.get the index of the next space,
b substring to the index of the next space and
c. you should get {"This", "is", "a", "sentence."}
Add them up again reversely:
a. loop index i from "tempWord.length -1" to "0"
b. new String = tempWord[i]+" ";
print out result.
How to split the string to two-parts after I assign the operation to math operator? For example 4567*6789 I want to split string into three part
First:4567 Operation:* Second:6789
Input is from textfile
char operation;
while (getline(ifs, line)){
stringstream ss(line.c_str());
char str;
//get string from stringstream
//delimiter here + - * / to split string to two part
while (ss >> str) {
if (ispunct(str)) {
operation = str;
}
}
}
Maybe, just maybe, by thinking this out, we can come up with a solution.
We know that operator>> will stop processing when encounter a character that is not a digit. So we can use this fact.
int multiplier = 0;
ss >> multiplier;
The next characters are not digits, so they could be an operator character.
What happens if we read in a character:
char operation = '?';
ss >> operation;
Oh, I forgot to mention that the operator>> will skip spaces by default.
Lastly, we can input the second number:
int multiplicand = 0;
ss >> multiplicand;
To confirm, let's print out what we have read in:
std::cout << "First Number: " << multiplier << "\n";
std::cout << "Operation : " << operation << "\n";
std::cout << "Second Number: " << multiplicand << "\n";
Using a debugger here will help show what is happening, as each statement is executed, one at at time.
Edit 1: More complicated
You can always get more complicated and use a parser, lexer or write your own. A good method of implementation is to use a state machine.
For example, you would read a single character, then decide what to do with it depending on the state. For example, if the character is a digit, you may want to build a number. For a character (other than white space), convert it to a token and store it somewhere.
There are parse trees and other data structures which can ease the operation of parsing. There are parsing libraries out there too, such as boost::spirit, yacc, bison, flex and lex.
One way is:
char opr;
int firstNumber, SecondNumber;
ss>>firstNumber>>opr>>SecondNumber;
instead of:
while (ss >> str) {
if (ispunct(str)) {
operation = str;
}
}
Or using regex for complex expersions. Here is an example of using regex in math expersions.
If you have a string at hand, you could simply split the string into left and right at the operator position as follows:
char* linePtr = strdup("4567*6789"); // strdup to preserve original value
char* op = strpbrk(linePtr, "+-*");
if (op) {
string opStr(op,1);
*op = 0x0;
string lhs(linePtr);
string rhs(op+1);
cout << lhs << " " << opStr << " " << rhs;
}
A simple solution would be to use sscanf:
int left, right;
char o;
if (sscanf("4567*6789", "%d%c%d", &left, &o, &right) == 3) {
// scan valid...
cout << left << " " << o << " " << right;
}
My proposual is to create to functions:
std::size_t delimiter_pos(const std::string line)
{
std::size_t found = std::string::npos;
(found = line.find('+')) != std::string::npos ||
(found = line.find('-')) != std::string::npos ||
(found = line.find('*')) != std::string::npos ||
(found = line.find('/')) != std::string::npos;
return found;
}
And second function that calculate operands:
void parse(const std::string line)
{
std::string line;
std::size_t pos = delimiter_pos(line);
if (pos != std::string::npos)
{
std::string first = line.substr(0, pos);
char operation = line[pos];
std::string second = line.substr(pos + 1, line.size() - (pos + 1));
}
}
I hope my examples helped you
What is the best way if you want to read a input like this:
(1,13) { (22,446) (200,66) (77,103) }
(779,22) { } // this is also possible, but always (X,X) in the beginning
I would like to use regular expressions for doing it. But there is little info on usage of reqexp when parsing a string with more than only numbers. Currently im trying something similar with sscanf (from the c-library):
string data;
getline(in, data); // format: (X,X) { (Y,Y)* }
stringstream ss(data);
string point, tmp;
ss >> point; // (X,X)
// (X,X) the reason for three is that they could be more than one digit.
sscanf(point.c_str(), "(%3d,%3d)", &midx, &midy);
int x, y;
while(ss >> tmp) // { (Y,Y) ... (Y,Y) }
{
if(tmp.size() == 5)
{
sscanf(tmp.c_str(), "(%3d,%3d)", &x, &y);
cout << "X: " << x << " Y: " << y << endl;
}
}
The problem is that this does not work, as soon as there is more than one digit sscanf does not read the numbers. So is this the best way to go, or is there a better solution with regexp? I don´t want to use boost or something like that as this is part of a school assignment.
Maybe the following piece of code matches your requirements:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::smatch m;
std::string str("(1,13) { (22,446) (200,66) (77,103) }");
std::string regexstring = "(\\(\\s*\\d+\\s*,\\s*\\d+\\s*\\))\\s*(\\{)(\\s*\\(\\s*\\d+\\s*,\\s*\\d+\\s*\\)\\s*)*\\s*(\\})";
if (std::regex_match(str, m, std::regex(regexstring))) {
std::cout << "string literal matched" << std::endl;
std::cout << "matches:" << std::endl;
for (std::smatch::iterator it = m.begin(); it != m.end(); ++it) {
std::cout << *it << std::endl;
}
}
return 0;
}
Output:
Assuming you're using C++11, you could use something like: std::regex pattern(r"\((\d+),(\d+)\)\s*\{(\s*\(\d+,\d+\))+\s*\}") (Disclaimer: This hasn't been tested), and then use it like so:
std::smatch match;
while (ss >> tmp) {
if (std::regex_match(tmp, match, pattern)) {
// match[0] contains the first number as a string
// match[1] contains the second number as a string
// match[2] contains the list of points
}
}