Using Boost-Regex to parse string into characters and numerals - c++

I'd like to use Boost's Regex library to separate a string containing labels and numbers into tokens. For example 'abc1def002g30' would be separated into {'abc','1','def','002','g','30'}. I modified the example given in Boost documentation to come up with this code:
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main(int argc,char **argv){
string s,str;
int count;
do{
count=0;
if(argc == 1)
{
cout << "Enter text to split (or \"quit\" to exit): ";
getline(cin, s);
if(s == "quit") break;
}
else
s = "This is a string of tokens";
boost::regex re("[0-9]+|[a-z]+");
boost::sregex_token_iterator i(s.begin(), s.end(), re, 0);
boost::sregex_token_iterator j;
while(i != j)
{
str=*i;
cout << str << endl;
count++;
i++;
}
cout << "There were " << count << " tokens found." << endl;
}while(argc == 1);
return 0;
}
The number of tokens stored in count is correct. However, *it contains only an empty string so nothing is printed. Any guesses as to what I am doing wrong?
EDIT: as per the fix suggested below, I modified the code and it now works correctly.

From the docs on the sregex_token_iterator:
Effects: constructs a regex_token_iterator that will enumerate one string for each regular expression match of the expression re found within the sequence [a,b), using match flags m (see match_flag_type). The string enumerated is the sub-expression submatch for each match found; if submatch is -1, then enumerates all the text sequences that did not match the expression re (that is to performs field splitting)
Since your regex matching all items (unlike the sample code, which only matched the strings), you get empty results.
Try replacing it with a 0.

Related

The problem of analyzing a string and searching

I want to write code that takes a string of text from the user and shows the number of characters and the number of words using the .find () function. then takes a word from user and Search the text and show the position of the word. I'm in trouble now, please help me.
#include<iostream>
#include <cctype>
#include<string>
#include<cstring>
using namespace std;
int main()
{ char quit;
int word=0;
string txt;
cout << "Enter a string: ";
getline(cin, txt);
cout << "The number of characters in the string is:" << txt.length() << endl;
while(string txt != NULL)
{ if(txt.find(" "))
++word;
}
cout<<"wors is "<<word;
while(quit!='q')
{
cout<<"wors is ";
cin>>search;
cout<<"Enter(c)if you want to continue, and enter(q)if you want quic:";
cin>>quit;
}
return 0;
}
Here's an example of extracting words. There are many other methods.
static const char end_of_word_chars[] = "!?., :\t";
//...
std::string::size_type previous_position = 0;
std::string::size_type position = txt.find_first_of(end_of_word_chars);
while (position != std::string::npos)
{
std::string word = txt.substr(previous_position, position - previous_position);
std::cout << word << "\n";
previous_position = txt.find_first_of(position + 1);
position = txt.find_first_not_of(end_of_word_chars);
}
The above code uses an array of "end of word characters", to denote the end of a word. The string txt is searched from the beginning to find the position of the first character that is in the set of word endinging characters. In the while loop, the spaces or non-word characters are skipped. And the position of the next "word ending" character is found and the loop may repeat again.
Edit 1: String as stream
Another method is to treat the txt as a string stream and use operator>> to skip whitespace:
std::istringstream text_stream(txt);
std::string word;
while (text_stream >> word)
{
std::cout << word << "\n";
}
One issue with the above code fragment is that it doesn't account for word ending characters that are not spaces or tabs. So for example, in the text "Yes. I'm Home.", the period is included as part of the "word", such as "Yes." and "Home."

How to count how many words are in line?Smarter way?

How to find out how many words are in line? I now that method where you count how many there are spaces. But what if someone hit 2 spaces or start line with space.
Is there any other or smarter way to solve this?
And is there any remark on my way of solving it or my code?
I solved it like this:
#include <iostream>
#include <cctype>
#include <cstring>
using namespace std;
int main( )
{
char str[80];
cout << "Enter a string: ";
cin.getline(str,80);
int len;
len=strlen(str);
int words = 0;
for(int i = 0; str[i] != '\0'; i++) //is space after character
{
if (isalpha(str[i]))
{
if(isspace(str[i+1]))
words++;
}
}
if(isalpha(str[len]))
{
words++;
}
cout << "The number of words = " << words+1 << endl;
return 0;
}
The std one-liner is:
words= distance(istream_iterator<string>(istringstream(str)), istream_iterator<string>());
streams by default skip spaces (multiple also).
So if you do something like:
string word;
int numWords = 0;
while (cin >> word) ++numWords;
That should count the number of words for simple cases (not considering what the format of a word is, skipping spaces).
If you want per line, you could read first the line, create a stream from a string, and do a similar thing like this:
string line, word;
int wordCount = 0;
getline(cin, line);
stringstream lineStream(line);
while (lineStream >> word) ++wordCount;
You should not use cin.getline and should prefer the free function std::getline, which takes a string that can be grown up and prevents stack overflows (lol). Stick to the free function for better safety.
First, you need a very specific definition of "word." Most of the answers will give slightly different counts than your attempt because you're using different definitions of what constitutes a word. Your example specifically requires alpha characters in certain positions. The answers based on streams will allow any non-space character to be part of a word.
The general solution is to come up with a precise definition of a word, transform this into a regular expression or finite state machine, and then count each instance of a match.
Here's a sample state machine solution:
std::size_t CountWords(const std::string &line) {
std::size_t count = 0;
enum { between_words, in_word } state = between_words;
for (const auto c : line) {
switch (state) {
case between_words:
if (std::isalpha(c)) {
state = in_word;
++count;
}
break;
case in_word:
if (std::isspace(c)) state = between_words;
break;
}
}
return count;
}
Some test cases to consider (and that highlight the differences among the definitions of a word):
"" empty string
" " just spaces
"a"
" one "
"count two"
"hyphenated-word"
"\"That's Crazy!\" she said." punctuation between alpha characters and adjacent spaces
"the answer is 42" should the number count as a word?

compare two strings by individual characters C++

Working on a program that compares an argument to text in a file (my file being a dictionary containing a lot of english words).
Right now the application works only the strings match completely.
Wanted to know if there was a way to compare a partial string that is inputted to a complete string in the file and have it be a match.
Example if the arg is ap, it'll match it to apple, application alliance ext.
# include <iostream>
# include <fstream>
# include <cstdlib>
# include <string>
using namespace std;
int main ( int argc, char *argv[] ) {
ifstream inFile;
inFile.open("/Users/mikelucci/Desktop/american-english-insane");
//Check for error
if (inFile.fail()) {
cerr << "Fail to Open File" << endl;
exit(1);
}
string word;
int count = 0;
//Use loop to read through file until the end
while (!inFile.eof()) {
inFile >> word;
if (word == argv[1]) {
count++;
}
}
cout << count << " words matched." << endl;
inFile.close();
return 0;
}
If by "match" you mean "a string from file contains a string from the input" then you can use string::find method. In this case your condition would look like that:
word.find(argv[1]) != string::npos
If by "match" you mean "a string from file starts with a string from the input" then, again you can use string::find but with the following condition:
word.find(argv[1]) == 0
The relevant documentation is here.
Start by copying argv[1] into a string (not strictly necessary, but it makes the subsequent comparison a bit simpler):
std::string target(arg[1]);
Then use std::equal:
if (std::equal(word.begin(), word.end(), target.begin(). target.end()))
This form of equal (added in C++14) returns true if the shorter of the two sequences matches the corresponding characters at the beginning of the longer.

How to remove double quotation marks and comma in the output when I input space for the first string (C++)

I am trying to write a function to split the string and return it like a substring. The code is worked. I just meet a question: how to remove double quotation marks and comma in the output when I input space for the first string? Any help is appreciated! Thank you!
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <ctype.h>
using namespace std;
vector<string> split(string targer, string delimiter);
int main()
{
string s, delimiter;
vector<string> tokens;
cout << "Enter string to split:" << endl;
getline (cin,s);
cout << "Enter delimiter string:" << endl;
getline (cin,delimiter);
tokens = split(s, delimiter);
cout << "The substrings are: ";
for(int i = 0; i < tokens.size() - 1; i++)
{
if (tokens[i].length() != 0){
cout << "\"" << tokens[i] << "\"" << "," << " ";
}
}
if (tokens.size() != 0)
{
cout << "\"" << tokens[tokens.size() - 1] << "\"";
}
cout<<endl;
return 0;
}
vector<string> split(string target, string delimiter){
stringstream ss(target);
string item;
vector<string> tokens;
while (getline(ss, item, delimiter.at(0))) {
tokens.push_back(item);
}
return tokens;
}
Your issue is not "how to remove double quotation marks and comma in the output when I input space for the first string".
Your issue is correctly splitting the string, so that a quoted string that contains spaces gets correctly extracted as a single string.
The simple methods offered by the I/O library to parse input up until the next delimiting character is not sufficient in order to be able to handle splitting a string, in this manner, on its own. They don't know anything about quotes. You will have to do the job yourself:
Scan the input string one at a time, using the following logic.
If the current character is a delimiting character, continue to the next character.
If the current character is a quote, then continue to scan until the next quote character is seen, then extract everything between the quotes into a single word.
Otherwise, continue to scan until the next delimiting character is seen, and then extract everything until that point into a single word.
This is a basic high-level outline of a typical splitting algorithm. You will end up with individual extracted words, with quoted content as a single word. You can take this high-level overview, and rewrite it as a lower-level, more detailed algorithm, then explain it to your rubber duck. After your rubber duck agrees that your detailed algorithm will work, you can then translate it directly into code.
Once you have implemented this correctly, you can refine this further to allow the quote characters themselves to be included in a quoted word. Typically that's done by using either two consecutive quotes, in a row, or a backslash "escape" character.

How do you search a std::string for a substring in C++?

I'm trying to parse a simple string in C++. I know the string contains some text with a colon, followed immediately by a space, then a number. I'd like to extract just the number part of the string. I can't just tokenize on the space (using sstream and <<) because the text in front of the colon may or may not have spaces in it.
Some example strings might be:
Total disk space: 9852465
Free disk space: 6243863
Sectors: 4095
I'd like to use the standard library, but if you have another solution you can post that too, since others with the same question might like to see different solutions.
std::string strInput = "Total disk space: 9852465";
std::string strNumber = "0";
size_t iIndex = strInput.rfind(": ");
if(iIndex != std::string::npos && strInput.length() >= 2)
{
strNumber = strInput.substr(iIndex + 2, strInput.length() - iIndex - 2)
}
For completeness, here's a simple solution in C:
int value;
if(sscanf(mystring.c_str(), "%*[^:]:%d", &value) == 1)
// parsing succeeded
else
// parsing failed
Explanation: the %*[^:] says to read in as many possible characters that aren't colons, and the * suppresses assignment. Then, the integer is read in, after the colon and any intervening white space.
I can't just tokenize on the space (using sstream and <<) because the text in front of the colon may or may not have spaces in it.
Right, but you can use std::getline:
string not_number;
int number;
if (not (getline(cin, not_number, ':') and cin >> number)) {
cerr << "No number found." << endl;
}
Similar to Konrads answer, but using istream::ignore:
int number;
std::streamsize max = std::numeric_limits<std::streamsize>::max();
if (!(std::cin.ignore(max, ':') >> number)) {
std::cerr << "No number found." << std::endl;
} else {
std::cout << "Number found: " << number << std::endl;
}
I'm surprised that no one mentioned regular expressions. They were added as part of TR1 and are included in Boost as well. Here's the solution using regex's
typedef std::tr1::match_results<std::string::const_iterator> Results;
std::tr1::regex re(":[[:space:]]+([[:digit:]]+)", std::tr1::regex::extended);
std::string str("Sectors: 4095");
Results res;
if (std::tr1::regex_search(str, res, re)) {
std::cout << "Number found: " << res[1] << std::endl;
} else {
std::cerr << "No number found." << std::endl;
}
It looks like a lot more work but you get more out of it IMHO.
const std::string pattern(": ");
std::string s("Sectors: 4095");
size_t num_start = s.find(pattern) + pattern.size();