How to seperate this string using by Reg Exp - c++

For this string [268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]
I want to get the numbers in [ , ] pair by pair.
I use c++ regex_search to parse this string.
This is my testing code:
ifstream file("output.txt");
char regex_base[] = "[\\[0-9, 0-9\\]]{10}";
char regex_num[] = "[0-9]{3}";
regex reg_base(regex_base, regex_constants::icase);
regex reg_num(regex_base, regex_constants::icase);
if (file.is_open())
{
string s;
while (!file.eof()){
getline(file, s);
smatch m;
while (regex_search(s, m, reg_num)) {
for (int i = 0; i < m.size(); i++)
cout << m[i] << endl;
}
}
}
But in the while of regex_search(), the variable m only get the[268, 950] and it make a infinity loop.
What's wrong in my regular expression or my code?

I have removed the capturing groups since you seem not to be using them anyway, and added some code to just show how to obtain the matches from your input string:
char regex_base[] = "\\[[0-9]+, [0-9]+\\]";
...
s = "[268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]"; // FOR TEST
smatch m;
while (regex_search(s, m, reg_num))
{
for (auto x:m) std::cout << x << "\r\n";
s = m.suffix().str();
}
Output:
If you need the values, you can use a different regex:
char regex_base[] = "\\[([0-9]+), ([0-9]+)\\]";
...
s = "[268, 950][268, 954][269, 955][272, 955][270, 955][268, 953]";
smatch m;
while (regex_search(s, m, reg_num))
{
std::cout << m[1] << ", " << m[2] << std::endl;
s = m.suffix().str();
}

Related

Separating a single string into separate variables

In my program I receive a string: "09:07:38,50,100"
(the numbers will be changing, only commas are consistent)
My desired output would be separating the string into 3 different variables for use in other things.
like so:
a = 09:07:38
b = 50
c = 100
Currently I tried splitting the string by separating it at commas, but I still lack the ability to put the data into different variables, or at least the knowledge on how to.
Here is my current code:
#include<iostream>
#include<vector>
#include<sstream>
int main() {
std::string my_str = "09:07:38,50,100";
std::vector<std::string> result;
std::stringstream s_stream(my_str); //create stringstream from the string
while(s_stream.good()){
std::string substr;
getline(s_stream, substr, ','); //get first string delimited by comma
result.push_back(substr);
}
for(int i = 0; i<result.size(); i++){ //print all splitted strings
std::cout << result.at(i) << std::endl;
}
}
I think is a good idea to approach this kind of problem using regular expressions:
#include <string>
#include <boost\regex.hpp>
int main()
{
std::string my_str = "09:07:38,50,100";
std::string a,b,c;
boost::regex regEx("(\\d{2}:\\d{2}:\\d{2}),(\\d*),(\\d*)");
if(boost::regex_match(my_str, regEx, boost::match_extra))
{
boost::smatch what;
boost::regex_search(my_str, what, regEx);
a = what[1];
b = what[2];
c = what[3];
}
std::cout<< "a = " << a << "\n";
std::cout<< "b = " << b << "\n";
std::cout<< "c = " << c;
}
The stoi function is what you require.
a = stoi(result.at(i));
std::cout << a;

How to run a string search algorithm through whole body of text

I am using the brute force string search algorithm to search through a small sentence, however I want the algorithm to return every time it finds the certain string instead of finding it once and then stopping
//Declare and initialise variables
string pat, text;
text = "This is a test sentence, find test within this string";
cout << text << endl;
//User input for pat
cout << "Please enter the string you want to search for" << endl;
cin >> pat;
//Set the length of the pat and text
int patLength = pat.size();
int textLength = text.size();
//Algorithm
for (int i = 0; i < textLength - patLength; ++i)
{
//Do while loop to run through the whole text
do
{
int j;
for (j = 0; j < patLength; j++)
{
if (text[i + j] != pat[j])
break; // Doesn't match here.
}
if (j == patLength)
{
finds.push(i); // Matched here.
}
} while (i < textLength);
}
//Print output
cout << "String: " << pat << " was found at positions: " << finds.top();
The program stores each find in a queue. When I run this program, it asks for the 'pat', then does nothing. I have done a bit of debugging and found that it is probably the do while loop. However I can't find a fix
You could use the std::string::find function combined with a function that you call for each find.
#include <iostream>
#include <functional>
#include <vector>
#include <sstream>
void Algorithm(
const std::string& text, const std::string& pat,
std::function<void(const std::string&,size_t)> f, std::vector<size_t>& positions)
{
size_t pos=0;
while((pos=text.find(pat, pos)) != std::string::npos) {
// store the position
positions.push_back(pos);
// call the supplied function
f(text, pos++);
}
}
// function to call for each position in which the pattern is found
void gotit(const std::string& found_in, size_t pos) {
std::cout << "Found in \"" << found_in << "\" # " << pos << "\n";
}
int main(int argc, char* argv[]) {
std::vector<std::string> args(argv+1, argv+argc);
if(args.size()==0)
args.push_back("This is a test sentence, find test within this string");
for(const auto& text : args) {
std::vector<size_t> found_at;
std::cout << "Please enter the string you want to search for: ";
std::string pat;
std::cin >> pat;
Algorithm(text, pat, gotit, found_at);
std::cout << "collected positions:\n";
for(size_t pos : found_at) {
std::cout << pos << "\n";
}
}
}
My first bit of advice would be to structure your code into separate functions.
Let's say you have a function that returns the position of the pattern's first occurrence in a sequence of characters:
using position = typename std::string::const_iterator;
position first_occurrence(position text_begin, position text_end, const std::string& pattern);
If there is no more occurrence of the pattern, it returns text_end.
You can now write a very simple loop:
auto occurrence = first_occurrence(text_begin, pattern);
while (occurrence != text_end) {
occurrences.push_back(occurrence);
occurrence = first_occurence(occurrence + 1, text_end, pattern);
}
to accumulate all the occurrences of the pattern.
The first_occurrence function already exists in the standard library under the name of std::search. Since C++17, you can customize this function with pattern-searching specialized searchers, such as std::boyer_moore_searcher: it pre-processes the pattern to make it faster to look for in the string. Here's an example application to your problem:
#include <algorithm>
#include <string>
#include <vector>
#include <functional>
using occurrence = typename std::string::const_iterator;
std::vector<occurrence> find_occurrences(const std::string& input, const std::string& pattern) {
auto engine = std::boyer_moore_searcher(pattern.begin(), pattern.end());
std::vector<occurrence> occurrences;
auto it = std::search(input.begin(), input.end(), engine);
while (it != input.end()) {
occurrences.push_back(it);
it = std::search(std::next(it), input.end(), engine);
}
return occurrences;
}
#include <iostream>
int main() {
std::string text = "This is a test sentence, find test within this string";
std::string pattern = "st";
auto occs = find_occurrences(text, pattern);
for (auto occ: occs) std::cout << std::string(occ, std::next(occ, pattern.size())) << std::endl;
}

How do I use std::regex_replace to replace string into lowercase?

I find this regex for replacement Regex replace uppercase with lowercase letters
Find: (\w) Replace With: \L$1
My code
string s = "ABC";
cout << std::regex_replace(s, std::regex("(\\w)"), "\\L$1") << endl;
runs in Visual Studio 2017.
output:
\LA\LB\LC
How do I write the lowercase function mark in C++?
Since there is no the magic like \L, we have to take a compromise - use regex_search and manually covert the uppers to lowers.
template<typename ChrT>
void RegexReplaceToLower(std::basic_string<ChrT>& s, const std::basic_regex<ChrT>& reg)
{
using string = std::basic_string<ChrT>;
using const_string_it = string::const_iterator;
std::match_results<const_string_it> m;
std::basic_stringstream<ChrT> ss;
for (const_string_it searchBegin=s.begin(); std::regex_search(searchBegin, s.cend(), m, reg);)
{
for (int i = 0; i < m.length(); i++)
{
s[m.position() + i] += ('a' - 'A');
}
searchBegin += m.position() + m.length();
}
}
void _replaceToLowerTest()
{
string sOut = "I will NOT leave the U.S.";
RegexReplaceToLower(sOut, regex("[A-Z]{2,}"));
cout << sOut << endl;
}

MSVC regular expression match

I am trying to match a literal number, e.g. 1600442 using a set of regular expressions in Microsoft Visual Studio 2010. My regular expressions are simply:
1600442|7654321
7895432
The problem is that both of the above matches the string.
Implementing this in Python gives the expected result:
import re
serial = "1600442"
re1 = "1600442|7654321"
re2 = "7895432"
m = re.match(re1, serial)
if m:
print "found for re1"
print m.groups()
m = re.match(re2, serial)
if m:
print "found for re2"
print m.groups()
Gives output
found for re1
()
Which is what I expected. Using this code in C++ however:
#include <string>
#include <iostream>
#include <regex>
int main(){
std::string serial = "1600442";
std::tr1::regex re1("1600442|7654321");
std::tr1::regex re2("7895432");
std::tr1::smatch match;
std::cout << "re1:" << std::endl;
std::tr1::regex_search(serial, match, re1);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl << "re2:" << std::endl;
std::tr1::regex_search(serial, match, re2);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
gives me:
re1:
1600442
re2:
1600442
which is not what I expected. Why do I get match here?
The smatch does not get overwritten by the second call to regex_search thus, it is left intact and contains the first results.
You can move the regex searching code to a separate method:
void FindMeText(std::regex re, std::string serial)
{
std::smatch match;
std::regex_search(serial, match, re);
for (auto i = 0;i <match.length(); ++i)
std::cout << match[i].str().c_str() << " ";
std::cout << std::endl;
}
int main(){
std::string serial = "1600442";
std::regex re1("^(?:1600442|7654321)");
std::regex re2("^7895432");
std::cout << "re1:" << std::endl;
FindMeText(re1, serial);
std::cout << "re2:" << std::endl;
FindMeText(re2, serial);
std::cout << std::endl;
std::string s;
std::getline (std::cin,s);
}
Result:
Note that Python re.match searches for the pattern match at the start of string only, thus I suggest using ^ (start of string) at the beginning of each pattern.

How to match multiple results using std::regex

For example, If I have a string like "first second third forth" and I want to match every single word in one operation to output them one by one.
I just thought that "(\\b\\S*\\b){0,}" would work. But actually it did not.
What should I do?
Here's my code:
#include<iostream>
#include<string>
using namespace std;
int main()
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
regex_search(str, res, exp);
cout << res[0] <<" "<<res[1]<<" "<<res[2]<<" "<<res[3]<< endl;
}
Simply iterate over your string while regex_searching, like this:
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
string::const_iterator searchStart( str.cbegin() );
while ( regex_search( searchStart, str.cend(), res, exp ) )
{
cout << ( searchStart == str.cbegin() ? "" : " " ) << res[0];
searchStart = res.suffix().first;
}
cout << endl;
}
This can be done in regex of C++11.
Two methods:
You can use () in regex to define your captures(sub expressions).
Like this:
string var = "first second third forth";
const regex r("(.*) (.*) (.*) (.*)");
smatch sm;
if (regex_search(var, sm, r)) {
for (int i=1; i<sm.size(); i++) {
cout << sm[i] << endl;
}
}
See it live: http://coliru.stacked-crooked.com/a/e1447c4cff9ea3e7
You can use sregex_token_iterator():
string var = "first second third forth";
regex wsaq_re("\\s+");
copy( sregex_token_iterator(var.begin(), var.end(), wsaq_re, -1),
sregex_token_iterator(),
ostream_iterator<string>(cout, "\n"));
See it live: http://coliru.stacked-crooked.com/a/677aa6f0bb0612f0
sregex_token_iterator appears to be the ideal, efficient solution, but the example given in the selected answer leaves much to be desired. Instead, I found some great examples here:
http://www.cplusplus.com/reference/regex/regex_token_iterator/regex_token_iterator/
For your convenience, I've copy-pasted the sample code shown by that page. I claim no credit for the code.
// regex_token_iterator example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"
// default constructor = end-of-sequence:
std::regex_token_iterator<std::string::iterator> rend;
std::cout << "entire matches:";
std::regex_token_iterator<std::string::iterator> a ( s.begin(), s.end(), e );
while (a!=rend) std::cout << " [" << *a++ << "]";
std::cout << std::endl;
std::cout << "2nd submatches:";
std::regex_token_iterator<std::string::iterator> b ( s.begin(), s.end(), e, 2 );
while (b!=rend) std::cout << " [" << *b++ << "]";
std::cout << std::endl;
std::cout << "1st and 2nd submatches:";
int submatches[] = { 1, 2 };
std::regex_token_iterator<std::string::iterator> c ( s.begin(), s.end(), e, submatches );
while (c!=rend) std::cout << " [" << *c++ << "]";
std::cout << std::endl;
std::cout << "matches as splitters:";
std::regex_token_iterator<std::string::iterator> d ( s.begin(), s.end(), e, -1 );
while (d!=rend) std::cout << " [" << *d++ << "]";
std::cout << std::endl;
return 0;
}
Output:
entire matches: [subject] [submarine] [subsequence]
2nd submatches: [ject] [marine] [sequence]
1st and 2nd submatches: [sub] [ject] [sub] [marine] [sub] [sequence]
matches as splitters: [this ] [ has a ] [ as a ]
You could use the suffix() function, and search again until you don't find a match:
int main()
{
regex exp("(\\b\\S*\\b)");
smatch res;
string str = "first second third forth";
while (regex_search(str, res, exp)) {
cout << res[0] << endl;
str = res.suffix();
}
}
My code will capture all groups in all matches:
vector<vector<string>> U::String::findEx(const string& s, const string& reg_ex, bool case_sensitive)
{
regex rx(reg_ex, case_sensitive ? regex_constants::icase : 0);
vector<vector<string>> captured_groups;
vector<string> captured_subgroups;
const std::sregex_token_iterator end_i;
for (std::sregex_token_iterator i(s.cbegin(), s.cend(), rx);
i != end_i;
++i)
{
captured_subgroups.clear();
string group = *i;
smatch res;
if(regex_search(group, res, rx))
{
for(unsigned i=0; i<res.size() ; i++)
captured_subgroups.push_back(res[i]);
if(captured_subgroups.size() > 0)
captured_groups.push_back(captured_subgroups);
}
}
captured_groups.push_back(captured_subgroups);
return captured_groups;
}
My reading of the documentation is that regex_search searches for the first match and that none of the functions in std::regex do a "scan" as you are looking for. However, the Boost library seems to be support this, as described in C++ tokenize a string using a regular expression