C++ substring contained between 2 specific characters

C++ substring contained between 2 specific characters - c++

In c++ would like to extract all substrings in a string contained between specific characters, as example:
std::string str = "XPOINT:D#{MON 3};S#{1}"
std::vector<std:string> subsplit = my_needed_magic_function(str,"{}");
std::vector<int>::iterator it = subsplit.begin();
for(;it!=subsplit.end(),it++) std::cout<<*it<<endl;
result of this call should be:
MON 3
1
also using boost if needed.

You could try Regex:
#include <iostream>
#include <iterator>
#include <string>
#include <regex>
int main()
{
std::string s = "XPOINT:D#{MON 3};S#{1}.";
std::regex word_regex(R"(\{(.*?)\})");
auto first = std::sregex_iterator(s.begin(), s.end(), word_regex),
last = std::sregex_iterator();;
while (first != last)
std::cout << first++->str() << ' ';
}
Prints
{MON 3} {1}
Demo.

Related

pattern search in text strings in c++

I just want look for a pattern in a string. for example for this "abaxavabaabcabbc" string the app should print "abc" and "abbc". So, the pattern should have "abc" but the numbers of "b" are changing.
pattern => "abc" => the numbers of "b" are changeable.
And the programm should be in c++.

Using regex_search instead of the iterator:
Live On Coliru
#include <regex>
#include <string>
#include <iostream>
int main() {
std::regex const pattern("ab+c");
for (std::string const text :
{
"abaxavabaabcabbc",
}) //
{
std::smatch match;
for (auto it = text.cbegin(), e = text.cend();
std::regex_search(it, e, match, pattern); it = match[0].second) {
std::cout << "Match: " << match.str() << "\n";
}
}
}
Prints
Match: abc
Match: abbc

There is only one answer to this question. You MUST use a std::regex. Regular expressions are exactly made for this purpose.
C++ supports also regular expressions. Please see here
The regex "ab+c" will match all strings starting with an "a", having one or more "b" and end with a "c"
See the following very short program:
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <regex>
const std::regex re{ R"(ab+c)" };
using Iter = std::sregex_token_iterator;
int main() {
const std::string test{ "abaxavabaabcabbc" };
std::copy(Iter(test.begin(), test.end(), re), Iter(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
This program will iterate over all matched patterns and copy them to std::cout

Regex search overlapping matches c++11

What regex expression should I use to search all occurrences that match:
Start with 55 or 66
followed by a minimum 8 characters in the range of [0-9a-fA-F] (HEX numbers)
Ends with \r (a carriage return)
Example string: 0205065509085503400066/r09\r
My expected result:
5509085503400066\r
5503400066\r
My current result:
5509085503400066\r
Using
(?:55|66)[0-9a-fA-F]{8,}\r
As you can sie, this finds onlny the first result but not the second one.
Edit clarification
I search the string using Regex. It'll select the message for further parsing. The target string can start anywhere in the string. The target string is only valid if it only contains base-16 (HEX) numbers, and ends with a carriage return.
[start] [information part minimum 8 chars] [end symbol-carigge return]
I'm using the std::regex library in c++11 with the flag ECMAScript
Edit
I have created an alternative solution that gives me the expected result. But this is not pure regex.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string log("0055\r0655036608090705\r");
std::regex r("(?:55|66)[0-9a-fA-F]{8,}\r");
std::smatch sm;
while(regex_search(log, sm, r))
{
std::cout << sm.str() << '\n';
log = sm.str();
log += sm.suffix();
log[0] = 'a' ;
}
}
** Edit: Working regex solution based on comments **
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string s("0055\r06550003665508090705\r0970");
std::regex r("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " words:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = s.substr(match.position(1), match.length(1) - 1); //-1 cr
std::cout << match_str << " =" << match.position(1) << '\n';
}
}

Your are actually looking for overlapping matches. This can be achieved using a regex lookahead like this:
(?=((?:55|66)[0-9a-fA-F]{8,}\/r))
You will find the matches in question in group 1. The full-match, however, is empty.
Regex Demo (using /r instead of a carriage return for demonstration purposes only)
Sample Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::string subject("0055\r06550003665508090705\r0970");
try {
std::regex re("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
return 0;
}
See also: Regex-Info: C++ Regular Expressions with std::regex

Multiple substrings in between the same delimiters

I am new to c++ and would like to know how to extract multiple substrings, from a single string, in between the same delimiters?
ex.
"{("id":"4219","firstname":"Paul"),("id":"4349","firstname":"Joe"),("id":"4829","firstname":"Brandy")}"
I want the ids:
4219 , 4349 , 4829

You can use regex to match the ids:
#include <iostream>
#include <regex>
int main() {
// This is your string.
std::string s{ R"({("id":"4219","firstname":"Paul"),("id":"4349","firstname":"Joe"),"("id":"4829","firstname":"Brandy")})"};
// Matches "id":"<any number of digits>"
// The id will be captured in the first group
std::regex r(R"("id"\s*:\s*"(\d+))");
// Make iterators that perform the matching
auto ids_begin = std::sregex_iterator(s.begin(), s.end(), r);
auto ids_end = std::sregex_iterator();
// Iterate the matches and print the first group of each of them
// (where the id is captured)
for (auto it = ids_begin; it != ids_end; ++it) {
std::smatch match = *it;
std::cout << match[1].str() << ',';
}
}
See it live on Coliru

Well, here is the q&d hack:
#include <iostream>
#include <sstream>
#include <string>
int main()
{
std::string s{ "{(\"id\":\"4219\",\"firstname\":\"Paul\"),"
"(\"id\":\"4349\",\"firstname\":\"Joe\"),"
"(\"id\":\"4829\",\"firstname\":\"Brandy\")}"
};
std::string id{ "\"id\":\"" };
for (auto f = s.find("\"id\":\""); f != s.npos; f = s.find(id, f)) {
std::istringstream iss{ std::string{ s.begin() + (f += id.length()), s.end() } };
int id; iss >> id;
std::cout << id << '\n';
}
}
Reliable? Well, just hope nobody names children "id":" ...

Writing a very simple lexical analyser in C++

NOTE : I'm using C++14 flag to compile... I am trying to create a very simple lexer in C++. I am using regular expressions to identify different tokens . My program is able to identify the tokens and display them. BUT THE out is of the form
int
main
hello
2
*
3
+
return
I want the output to be in the form
int IDENTIFIER
hello IDENTIFIER
* OPERATOR
3 NUMBER
so on...........
I am not able to achieve the above output.
Here is my program:
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
#include <map>
using namespace std;
int main()
{
string str = " hello how are 2 * 3 you? 123 4567867*98";
// define list of token patterns
map<string, string> v
{
{"[0-9]+" , "NUMBERS"} ,
{"[a-z]+" , "IDENTIFIERS"},
{"[\\*|\\+", "OPERATORS"}
};
// build the final regex
string reg = "";
for(auto it = v.begin(); it != v.end(); it++)
reg = reg + it->first + "|";
// remove extra trailing "|" from above instance of reg..
reg.pop_back();
cout << reg << endl;
regex re(reg);
auto words_begin = sregex_iterator(str.begin(), str.end(), re);
auto words_end = sregex_iterator();
for(sregex_iterator i = words_begin; i != words_end; i++)
{
smatch match = *i;
string match_str = match.str();
cout << match_str << "\t" << endl;
}
return 0;
}
what is the most optimal way of doing it and also maintain the order of tokens as they appear in the source program?

I managed to do this with only one iteration over the parsed string. All you have to do is add parentheses around regex for each token type, then you'll be able to access the strings of these submatches. If you get a non-empty string for a submatch, that means it was matched. You know the index of the submatch and therefore the index in v.
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
#include <vector>
int main()
{
std::string str = " hello how are 2 * 3 you? 123 4567867*98";
// use std::vector instead, we need to have it in this order
std::vector<std::pair<std::string, std::string>> v
{
{"[0-9]+" , "NUMBERS"} ,
{"[a-z]+" , "IDENTIFIERS"},
{"\\*|\\+", "OPERATORS"}
};
std::string reg;
for(auto const& x : v)
reg += "(" + x.first + ")|"; // parenthesize the submatches
reg.pop_back();
std::cout << reg << std::endl;
std::regex re(reg, std::regex::extended); // std::regex::extended for longest match
auto words_begin = std::sregex_iterator(str.begin(), str.end(), re);
auto words_end = std::sregex_iterator();
for(auto it = words_begin; it != words_end; ++it)
{
size_t index = 0;
for( ; index < it->size(); ++index)
if(!it->str(index + 1).empty()) // determine which submatch was matched
break;
std::cout << it->str() << "\t" << v[index].second << std::endl;
}
return 0;
}
std::regex re(reg, std::regex::extended); is for matching for the longest string which is necessary for a lexical analyzer. Otherwise it might identify while1213 as while and number 1213 and depends on the order you define for the regex.

This is a quick and dirty solution iterating on each pattern, and for each pattern trying to match the entire string, then iterating over matches and storing each match with its position in a map. The map implicitly sorts the matches by key (position) for you, so then you just need to iterate the map to get the matches in positional order, regardless of their pattern name.
#include <iterator>
#include <iostream>
#include <string>
#include <regex>
#include <list>
#include <map>
using namespace std;
int main(){
string str = " hello how are 2 * 3 you? 123 4567867*98";
// define list of patterns
map<string,string> patterns {
{ "[0-9]+" , "NUMBERS" },
{ "[a-z]+" , "IDENTIFIERS" },
{ "\\*|\\+", "OPERATORS" }
};
// storage for results
map< size_t, pair<string,string> > matches;
for ( auto pat = patterns.begin(); pat != patterns.end(); ++pat )
{
regex r(pat->first);
auto words_begin = sregex_iterator( str.begin(), str.end(), r );
auto words_end = sregex_iterator();
for ( auto it = words_begin; it != words_end; ++it )
matches[ it->position() ] = make_pair( it->str(), pat->second );
}
for ( auto match = matches.begin(); match != matches.end(); ++match )
cout<< match->second.first << " " << match->second.second << endl;
}
Output:
hello IDENTIFIERS
how IDENTIFIERS
are IDENTIFIERS
2 NUMBERS
* OPERATORS
3 NUMBERS
you IDENTIFIERS
123 NUMBERS
4567867 NUMBERS
* OPERATORS
98 NUMBERS

Using std::search to find multiple occurrence of pattern

I want to use function search or other similar function to find multiple occurrence of given pattern.
This is my code:
#include <cstring>
#include <iostream>
#include <iomanip>
#include <set>
#include <list>
#include <vector>
#include <map>
#include <algorithm>
#include <functional>
using namespace std;
int main () {
std::vector<int> haystack;
string a = "abcabcabc";
string b = "abc";
string::iterator it;
it = search(a.begin(),a.end(),b.begin(),b.end());
if(it!=a.end()){
cout << it-a.begin()<<endl;
}
return 0;
}
This code return 0 as the first occurrence of pattern "abc" , would like to return 0, 3, 6. That would be all of the indexes in the original string where pattern begins.
Thank you for any help.

for(size_t pos=a.find(b,0); pos!=std::string::npos; pos=a.find(b,pos+1)) {
std::cout << pos << std::endl;
}
This uses std::basic_string::find (ref) directly to find the starting position of the substring.

The search function searches the first string a for any occurence of the elements of the second string b. As Your second string contains the elements a, b and c the code would return an iterator to the first position, then to the second, the third one,...
What You what to use is the find function. It returns an iterator to the element that equals the one that You were searching for. In Your case You're search the string a for the element abc. So You would have to call
string::iterator it = std::find(a.begin(), a.end(), "abc");
while (it != a.end()) {
// Do whatever you want to do...
++it;
it = std::find(it, a.end(), "abc");
}

//find first result index
auto find_index = str.find("st");
while(string::npos != find_index) {
cout << "found at: " << find_index << endl;
//find next
find_index = str.find("st",find_index+1);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ substring contained between 2 specific characters - c++

Related

pattern search in text strings in c++

Regex search overlapping matches c++11

Multiple substrings in between the same delimiters

Writing a very simple lexical analyser in C++

Using std::search to find multiple occurrence of pattern

Categories

Resources