parse a string with regexp

parse a string with regexp - c++

What is the best way if you want to read a input like this:
(1,13) { (22,446) (200,66) (77,103) }
(779,22) {  } // this is also possible, but always (X,X) in the beginning
I would like to use regular expressions for doing it. But there is little info on usage of reqexp when parsing a string with more than only numbers. Currently im trying something similar with sscanf (from the c-library):
string data;
getline(in, data); // format: (X,X) { (Y,Y)* }
stringstream ss(data);
string point, tmp;
ss >> point; // (X,X)
// (X,X) the reason for three is that they could be more than one digit.
sscanf(point.c_str(), "(%3d,%3d)", &midx, &midy);
int x, y;
while(ss >> tmp) // { (Y,Y) ... (Y,Y) }
{
if(tmp.size() == 5)
{
sscanf(tmp.c_str(), "(%3d,%3d)", &x, &y);
cout << "X: " << x << " Y: " << y << endl;
}
}
The problem is that this does not work, as soon as there is more than one digit sscanf does not read the numbers. So is this the best way to go, or is there a better solution with regexp? I don´t want to use boost or something like that as this is part of a school assignment.

Maybe the following piece of code matches your requirements:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::smatch m;
std::string str("(1,13) { (22,446) (200,66) (77,103) }");
std::string regexstring = "(\\(\\s*\\d+\\s*,\\s*\\d+\\s*\\))\\s*(\\{)(\\s*\\(\\s*\\d+\\s*,\\s*\\d+\\s*\\)\\s*)*\\s*(\\})";
if (std::regex_match(str, m, std::regex(regexstring))) {
std::cout << "string literal matched" << std::endl;
std::cout << "matches:" << std::endl;
for (std::smatch::iterator it = m.begin(); it != m.end(); ++it) {
std::cout << *it << std::endl;
}
}
return 0;
}
Output:

Assuming you're using C++11, you could use something like: std::regex pattern(r"\((\d+),(\d+)\)\s*\{(\s*\(\d+,\d+\))+\s*\}") (Disclaimer: This hasn't been tested), and then use it like so:
std::smatch match;
while (ss >> tmp) {
if (std::regex_match(tmp, match, pattern)) {
// match[0] contains the first number as a string
// match[1] contains the second number as a string
// match[2] contains the list of points
}
}

Related

My code with regular expressions for file doesn't run properly

#include <iostream>
#include <fstream>
#include <string>
#include <regex>
using namespace std;
int main()
{
regex r1("(.*\\blecture\\b.*)");
regex r2("(.* practice.*)");
regex r3("(.* laboratory practice.*)");
smatch base_match;
int lecture = 0;
int prakt = 0;
int lab = 0;
string name = "schedule.txt";
ifstream fin;
fin.open(name);
if (!fin.is_open()) {
cout << "didint open ";
}
else {
string str;
while (!fin.eof()) {
str = "";
getline(fin, str);
cout << str << endl;
if (regex_match(str, base_match, r1)) {
lecture++;
}
if (regex_match(str, base_match, r2)) {
prakt++;
}
if (regex_match(str, base_match, r3)) {
lab++;
}
}
}
cout << "The number of lectures: " << lecture << "\n";
cout << "The number of practices: " << prakt << "\n";
cout << "[The number of laboratory work][1]: " << lab << "\n";
fin.close();
}
In this program, I need to count the number of lectures, practice and laboratory work per week using regular expressions. I have got a text file, which you can see on the screen. But for lectures and practice, it doesn't work right.
enter image description here

You need the number of times each regex matches. C++ has std::sregex_iterator for performing multiple regex matches over a string.
That means you can do the following:
for (auto it = std::sregex_iterator{str.cbegin(), str.cend(), r1}; it != std::sregex_iterator{}; it++) {
lecture++;
}
If you want to get really fancy you can even do it in one go:
auto it = std::sregex_iterator{str.cbegin(), str.cend(), r1};
lecture += std::distance(it, std::sregex_iterator{});
Alternatively, you can call std::regex_search several times, starting from the end offset of the previous match (or 0 for the first).
EDIT: as remarked in the comments, this assumes that your regexes are suitable to incremental matching. Yours eat the whole string (presumably because regex_match is anchored whereas regex_search/regex_iterator are not), so you need to at least change your regular expression definitions to the following:
regex r1("\\blecture\\b");
regex r2(" practice");
regex r3(" laboratory practice");
... and of course every match for r3 is also a match for r2, but I leave that for you.

How to find a word which contains digits in a string

I need to check words inside the string to see whether any of them contains digits, and if it isn't — erase this word. Then print out the modified string
Here's my strugle to resolve the problem, but it doesn't work as I need it to
void sentence_without_latin_character( std::string &s ) {
std::cout << std::endl;
std::istringstream is (s);
std::string word;
std::vector<std::string> words_with_other_characters;
while (is >> word) {
std::string::size_type temp_size = word.find(std::ctype_base::digit);
if (temp_size == std::string::npos) {
word.erase(word.begin(), word.begin() + temp_size);
}
words_with_other_characters.push_back(word);
}
for (const auto i: words_with_other_characters) {
std::cout << i << " ";
}
std::cout << std::endl;
}

This part is not doing what you think it does:
word.find(std::ctype_base::digit);
std::string::find only searches for complete substrings (or single characters).
If you want to search for a set of some characters in a string, use std::string::find_first_of instead.
Another option is testing each character using something like std::isdigit, possibly with an algorithm like std::any_of or with a simple loop.

As Acorn explained, word.find(std::ctype_base::digit) does not search for the first digit. std::ctype_base::digit is a constant that indicates a digit to specific std::ctype methods. In fact there's a std::ctype method called scan_is that you can use for this purpose.
void sentence_without_latin_character( std::string &s ) {
std::istringstream is (s);
std::string word;
s.clear();
auto& ctype = std::use_facet<std::ctype<char>>(std::locale("en_US.utf8"));
while (is >> word) {
auto p = ctype.scan_is(std::ctype_base::digit, word.data(), &word.back()+1);
if (p == &word.back()+1) {
s += word;
if (is.peek() == ' ') s += ' ';
}
}
std::cout << s << std::endl;
}

C++ Extract int from string using stringstream

I am trying to write a short line that gets a string using getline and checks it for an int using stringstream. I am having trouble with how to check if the part of the string being checked is an int. I've looked up how to do this, but most seem to throw exceptions - I need it to keep going until it hits an int.
Later I will adjust to account for a string that doesn't contain any ints, but for now any ideas on how to get past this part?
(For now, I'm just inputting a test string rather than use getline each time.)
int main() {
std::stringstream ss;
std::string input = "a b c 4 e";
ss.str("");
ss.clear();
ss << input;
int found;
std::string temp = "";
while(!ss.eof()) {
ss >> temp;
// if temp not an int
ss >> temp; // keep iterating
} else {
found = std::stoi(temp); // convert to int
}
}
std::cout << found << std::endl;
return 0;
}

You could make of the validity of stringstream to int conversion:
int main() {
std::stringstream ss;
std::string input = "a b c 4 e";
ss << input;
int found;
std::string temp;
while(std::getline(ss, temp,' ')) {
if(std::stringstream(temp)>>found)
{
std::cout<<found<<std::endl;
}
}
return 0;
}

While your question states that you wish to
get a string using getline and checks it for an int
using stringstream, it's worth noting that you don't need stringstream at all. You only use stringstreams when you want to do parsing and rudimentary string conversions.
A better idea would be to use functions defined by std::string to find if the string contains numbers as follows:
#include <iostream>
#include <string>
int main() {
std::string input = "a b c 4 e 9879";//I added some more extra characters to prove my point.
std::string numbers = "0123456789";
std::size_t found = input.find_first_of(numbers.c_str());
while (found != std::string::npos) {
std::cout << found << std::endl;
found = input.find_first_of(numbers.c_str(), found+1);
}
return 0;
}
And then perform the conversions.
Why use this? Think about happens if you use a stringstream object on something like the following:
"abcdef123ghij"
which will simply be parsed and stored as a regular string.

Exceptions should not scary you.
int foundVal;
found = false;
while(!found || !ss.eof()) {
try
{
foundVal = std::stoi(temp); //try to convert
found = true;
}
catch(std::exception& e)
{
ss >> temp; // keep iterating
}
}
if(found)
std::cout << foundVal << std::endl;
else
std::cout << "No integers found" << std::endl;

Accept only letters

This should accept only letters, but it is not yet correct:
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
std::string line;
double d;
while (std::getline(std::cin, line))
{
std::stringstream ss(line);
if (ss >> d == false && line != "") //false because can convert to double
{
std::cout << "its characters!" << std::endl;
break;
}
std::cout << "Error!" << std::endl;
}
return 0;
}
Here is the output:
567
Error!
Error!
678fgh
Error!
567fgh678
Error!
fhg687
its characters!
Press any key to continue . . .
fhg687 should output error because of the numbers in the string.
Accepted output should contain letters only, such as ghggjh.

You'd be much better off using std::all_of on the string, with an appropriate predicate. In your case, that predicate would be std::isalpha. (headers <algorithm> and <cctype> required)
if (std::all_of(begin(line), end(line), std::isalpha))
{
std::cout << "its characters!" << std::endl;
break;
}
std::cout << "Error!" << std::endl;

Updated: to show a fuller solution.
The simplest approach would probably be to iterate through each char in the input and check whether that char is within English-letter ranges in ascii (upper + lower):
char c;
while (std::getline(std::cin, line))
{
// Iterate through the string one letter at a time.
for (int i = 0; i < line.length(); i++) {
c = line.at(i); // Get a char from string
// if it's NOT within these bounds, then it's not a character
if (! ( ( c >= 'a' && c <= 'z' ) || ( c >= 'A' && c <= 'Z' ) ) ) {
std::cout << "Error!" << std::endl;
// you can probably just return here as soon as you
// find a non-letter char, but it's up to you to
// decide how you want to handle it exactly
return 1;
}
}
}

You can also use regular expressions, which may come in handy if you need more flexibility.
For this question, Benjamin answer is perfect, but just as a reference, this is how regex could be used (notice that regex is also part of the C++11 standard):
boost::regex r("[a-zA-Z]+"); // At least one character in a-z or A-Z ranges
bool match = boost::regex_match(string, r);
if (match)
std::cout << "it's characters!" << std::endl;
else
std::cout << "Error!" << std::endl;
If string contains only alphabetic characters and at least one of them (the +), then match is true.
Requirements:
With boost: <boost/regex.hpp> and -lboost_regex.
With C++11: <regex>.

Simple string parsing without using boost [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Splitting a string in C++
I'm working on an assignment for my C++ class and I was hoping I could get some help. One of my biggest problems in coding with C++ is parsing strings. I have found longer more complicated ways to parse strings but I have a very simple program I need to write which only needs to parse a string into 2 sections: a command and a data section. For instance: Insert 25 which will split it into Insert and 25.
I was planning on using an array of strings to store the data since I know that it will only split the string into 2 sections. However I also need to be able to read in strings that require no parsing such as Quit
What is the simplest way to accomplish this without using an outside library such as boost?

The simplest may be like this:
string s;
int i;
cin >> s;
if (s == "Insert")
{
cin >> i;
... // do stuff
}
else if (s == "Quit")
{
exit(0);
}
else
{
cout << "No good\n";
}
The simplest way may be not so good if you need e.g. good handing of user errors, extensibility etc.

You can read strings from a stream using getline, and then to a split by finding the firs position of a space character ' ' within the string, and using the substr function twice (for the command to the left of the space and for the data to the right of space).
while (cin) {
string line;
getline(cin, line);
size_t pos = line.find(' ');
string cmd, data;
if (pos != string::npos) {
cmd = line.substr(0, pos-1);
data = line.substr(pos+1);
} else {
cmd = line;
}
cerr << "'" << cmd << "' - '" << data << "'" << endl;
}
Here is a link to a demo on ideone.

This is another way :
string s("Insert 25");
istringstream iss(s);
do
{
string command; int value;
iss >> command >> value;
cout << "Values: " << command << " " << values << endl;
} while (iss);

I like using streams for such things.
int main()
{
int Value;
std::string Identifier;
std::stringstream ss;
std::multimap<std::string, int> MyCollection;
ss << "Value 25\nValue 23\nValue 19";
while(ss.good())
{
ss >> Identifier;
ss >> Value;
MyCollection.insert(std::pair<std::string, int>(Identifier, Value));
}
for(std::multimap<std::string, int>::iterator it = MyCollection.begin(); it != MyCollection.end(); it++)
{
std::cout << it->first << std::endl;
std::cout << it->second << std::endl;
}
std::cin.get();
return 0;
}
This way you can allready convert your data into the needed format. And the stream automatically splits on whitespaces. It works the same way with std::fstream if your working with files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

parse a string with regexp - c++

Related

My code with regular expressions for file doesn't run properly

How to find a word which contains digits in a string

C++ Extract int from string using stringstream

Accept only letters

Simple string parsing without using boost [duplicate]

Categories

Resources