I wrote a small snippet to search for matched strings in an array, then output results for parallel arrays in a nicely formatted fashion. However, I must have some fundamental misunderstanding about how string outputs work, because for the life of me I cannot get this to output correctly, no matter where I put the tab, the newline, or whether I use an endl in my code.
Here is the relevant code below:
for (int i = 0; i < arrayCount; i++) {
if (arrayCopy[i].find(localString) != string::npos) {
cout << "\n\t"
<< array1[i] << " {"
<< subArray1[i] << ", "
<< subArray2[i] << "}";
}
}
I'm expecting results to have a tab at start of each line:
MatchedString1 {Data, MoreData}
MatchedString2 {Data, MoreData}
MatchedString3 {Data, MoreData}
Instead I am getting results like below, where the tabs appear on blank lines (except for the first result):
MatchedString1 {Data, MoreData}
MatchedString2 {Data, MoreData}
MatchedString3 {Data, MoreData}
What devilish quirk exists in c++ that is causing me so much pain?!
Using the following source to recreate your problem:
$ cat test.cpp
#include <iostream>
#include <string>
int main() {
using std::cout;
using std::string;
const std::string array1[] = {"MatchedString1", "MatchedString2", "MatchedString3"};
const std::string arrayCopy[] = {"MatchedString1", "MatchedString2", "MatchedString3"};
const std::string localString = "String";
const std::string subArray1[] = {"Data", "Data", "Data"};
const std::string subArray2[] = {"More data", "More data", "more data"};
const unsigned int arrayCount = 3;
for (int i = 0; i < arrayCount; i++) {
if (arrayCopy[i].find(localString) != string::npos) {
cout << "\n\t"
<< array1[i] << " {"
<< subArray1[i] << ", "
<< subArray2[i] << "}";
}
}
}
Compile and run:
$ g++ test.cpp
$ ./a.out
MatchedString1 {Data, More data}
MatchedString2 {Data, More data}
MatchedString3 {Data, more data}
Conclusion:
The extra line-breaks are in your data.
Suggestion:
Use square brackets to delineate your input:
std::cout << "[" << array1[i] << "] {[" << subArray1[i] << "], [" << subArray2[i] << "]}" << std::endl;
Stripping the strings:
If you find you need to strip your strings, you may find the following functions useful:
std::string lstrip(const std::string& s, const char* chars = " \t\r\n")
{
std::string::size_type begin = s.find_first_not_of(chars);
if (begin == std::string::npos)
{
return "";
}
return std::string(s, begin);
}
std::string rstrip(const std::string& s, const char* chars = " \t\r\n")
{
std::string::size_type end = s.find_last_not_of(chars);
return std::string(s, 0, end + 1);
}
std::string strip(const std::string& s)
{
return lstrip(rstrip(s));
}
Use like this:
std::cout << "\t"
<< strip(array1[i]) << " {"
<< strip(subArray1[i]) << ", "
<< strip(subArray2[i]) << "}"
<< std::endl;
(Restating my comment as a possible answer.)
The standard approach is to always put the \n at the end and everything else in the order it appears.
The only thing I can think of that would break this is a \r at the beginning of the second and third matched strings. (Someone else rightly observed that the double-spacing suggests a \n at the beginning of the input.)
Are you sure this isn't just somethin silly like "MatchedString" actually being "\nMatchedString"? You could try printing some extra stuff there to delineate your whitespace more clearly.
Related
I'm writing a C++ program that will need to take regular expressions that are defined in a XML Schema file and use them to validate XML data. The problem is, the flavor of regular expressions used by XML Schemas does not seem to be directly supported in C++.
For example, there are a couple special character classes \i and \c that are not defined by default and also the XML Schema regex language supports something called "character class subtraction" that does not seem to be supported in C++.
Allowing the use of the \i and \c special character classes is pretty simple, I can just look for "\i" or "\c" in the regular expression and replace them with their expanded versions, but getting character class subtraction to work is a much more daunting problem...
For example, this regular expression that is valid in an XML Schema definition throws an exception in C++ saying it has unbalanced square brackets.
#include <iostream>
#include <regex>
int main()
{
try
{
// Match any lowercase letter that is not a vowel
std::regex rx("[a-z-[aeiuo]]");
}
catch (const std::regex_error& ex)
{
std::cout << ex.what() << std::endl;
}
}
How can I get C++ to recognize character class subtraction within a regex? Or even better, is there a way to just use the XML Schema flavor of regular expressions directly within C++?
Character ranges subtraction or intersection is not available in any of the grammars supported by std::regex, so you will have to rewrite the expression into one of the supported ones.
The easiest way is to perform the subtraction yourself and pass the set to std::regex, for instance [bcdfghjklvmnpqrstvwxyz] for your example.
Another solution is to find either a more featureful regular expression engine or a dedicated XML library that supports XML Schema and its regular expression language.
Starting from the cppreference examples
#include <iostream>
#include <regex>
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
// greedy match, repeats [a-z] 4 times
show_matches("abcdefghi", "(?:(?![aeiou])[a-z]){2,4}");
}
You can test and check the the details of the regular expression here.
The choice of using a non capturing group (?: ...) is to prevent it from changing your groups in case you will use it in a bigger regular expression.
(?![aeiou]) will match without consuming the input if finds a character not matching [aeiou], the [a-z] will match letters. Combining these two condition is equivalent to your character class subtraction.
The {2,4} is a quantifier that says from 2 to 4, could also be + for one or more, * for zero or more.
Edit
Reading the comments in the other answer I understand that you want to support XMLSchema.
The next program shows how to use ECMA regular expression to translate the "character class differences" to a ECMA compatible format.
#include <iostream>
#include <regex>
#include <string>
#include <vector>
std::string translated_regex(const std::string &pattern){
// pattern to identify character class subtraction
std::regex class_subtraction_re(
"\\[((?:\\\\[\\[\\]]|[^[\\]])*)-\\[((?:\\\\[\\[\\]]|[^[\\]])*)\\]\\]"
);
// translate the regular expression to ECMA compatible
std::string translated = std::regex_replace(pattern,
class_subtraction_re, "(?:(?![$2])[$1])");
return translated;
}
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
std::vector<std::string> tests = {
"Some text [0-9-[4]] suffix",
"([abcde-[ae]])",
"[a-z-[aei]]|[A-Z-[OU]] "
};
std::string re = translated_regex("[a-z-[aeiou]]{2,4}");
show_matches("abcdefghi", re);
for(std::string test : tests){
std::cout << " " << test << '\n'
<< " -- " << translated_regex(test) << '\n';
}
return 0;
}
Edit: Recursive and Named character classes
The above approach does not work with recursive character class negation. And there is no way to deal with recursive substitutions using only regular expressions. This rendered the solution far less straight forward.
The solution has the following levels
one function scans the regular expression for a [
when a [ is found there is a function to handle the character classes recursively when '-[` is found.
The pattern \p{xxxxx} is handled separately to identify named character patterns. The named classes are defined in the specialCharClass map, I fill two examples.
#include <iostream>
#include <regex>
#include <string>
#include <vector>
#include <map>
std::map<std::string, std::string> specialCharClass = {
{"IsDigit", "0-9"},
{"IsBasicLatin", "a-zA-Z"}
// Feel free to add the character classes you want
};
const std::string getCharClassByName(const std::string &pattern, size_t &pos){
std::string key;
while(++pos < pattern.size() && pattern[pos] != '}'){
key += pattern[pos];
}
++pos;
return specialCharClass[key];
}
std::string translate_char_class(const std::string &pattern, size_t &pos){
std::string positive;
std::string negative;
if(pattern[pos] != '['){
return "";
}
++pos;
while(pos < pattern.size()){
if(pattern[pos] == ']'){
++pos;
if(negative.size() != 0){
return "(?:(?!" + negative + ")[" + positive + "])";
}else{
return "[" + positive + "]";
}
}else if(pattern[pos] == '\\'){
if(pos + 3 < pattern.size() && pattern[pos+1] == 'p'){
positive += getCharClassByName(pattern, pos += 2);
}else{
positive += pattern[pos++];
positive += pattern[pos++];
}
}else if(pattern[pos] == '-' && pos + 1 < pattern.size() && pattern[pos+1] == '['){
if(negative.size() == 0){
negative = translate_char_class(pattern, ++pos);
}else{
negative += '|';
negative = translate_char_class(pattern, ++pos);
}
}else{
positive += pattern[pos++];
}
}
return '[' + positive; // there is an error pass, forward it
}
std::string translate_regex(const std::string &pattern, size_t pos = 0){
std::string r;
while(pos < pattern.size()){
if(pattern[pos] == '\\'){
r += pattern[pos++];
r += pattern[pos++];
}else if(pattern[pos] == '['){
r += translate_char_class(pattern, pos);
}else{
r += pattern[pos++];
}
}
return r;
}
void show_matches(const std::string& in, const std::string& re)
{
std::smatch m;
std::regex_search(in, m, std::regex(re));
if(m.empty()) {
std::cout << "input=[" << in << "], regex=[" << re << "]: NO MATCH\n";
} else {
std::cout << "input=[" << in << "], regex=[" << re << "]: ";
std::cout << "prefix=[" << m.prefix() << "] ";
for(std::size_t n = 0; n < m.size(); ++n)
std::cout << " m[" << n << "]=[" << m[n] << "] ";
std::cout << "suffix=[" << m.suffix() << "]\n";
}
}
int main()
{
std::vector<std::string> tests = {
"[a]",
"[a-z]d",
"[\\p{IsBasicLatin}-[\\p{IsDigit}-[89]]]",
"[a-z-[aeiou]]{2,4}",
"[a-z-[aeiou-[e]]]",
"Some text [0-9-[4]] suffix",
"([abcde-[ae]])",
"[a-z-[aei]]|[A-Z-[OU]] "
};
for(std::string test : tests){
std::cout << " " << test << '\n'
<< " -- " << translate_regex(test) << '\n';
// Construct a reegx (validate syntax)
std::regex(translate_regex(test));
}
std::string re = translate_regex("[a-z-[aeiou-[e]]]{2,10}");
show_matches("abcdefghi", re);
return 0;
}
Try using a library function from a library with XPath support, like xmlregexp in libxml (is a C library), it can handle the XML regexes and apply them to the XML directly
http://www.xmlsoft.org/html/libxml-xmlregexp.html#xmlRegexp
----> http://web.mit.edu/outland/share/doc/libxml2-2.4.30/html/libxml-xmlregexp.html <----
An alternative could have been PugiXML (C++ library, What XML parser should I use in C++? ) however i think it does not implement the XML regex functionality ...
Okay after going through the other answers I tried out a few different things and ended up using the xmlRegexp functionality from libxml2.
The xmlRegexp related functions are very poorly documented so I figured I would post an example here because others may find it useful:
#include <iostream>
#include <libxml/xmlregexp.h>
int main()
{
LIBXML_TEST_VERSION;
xmlChar* str = xmlCharStrdup("bcdfg");
xmlChar* pattern = xmlCharStrdup("[a-z-[aeiou]]+");
xmlRegexp* regex = xmlRegexpCompile(pattern);
if (xmlRegexpExec(regex, str) == 1)
{
std::cout << "Match!" << std::endl;
}
free(regex);
free(pattern);
free(str);
}
Output:
Match!
I also attempted to use the XMLString::patternMatch from the Xerces-C++ library but it didn't seem to use an XML Schema compliant regex engine underneath. (Honestly I have no clue what regex engine it uses underneath and the documentation for that was pretty abysmal and I couldn't find any examples online so I just gave up on it.)
Okay so i get this JSON object from my client:
{"command":"BrugerIndtastTF","brugerT":"\"10\"","brugerF":"\"20\""}
Then i need to use the int value from "brugerT", but as you can see it has "\"10\"" around it. When i code this in javascript i dont get this problem. Is there a way to only use the part of "brugerT" that says 10?
the code where *temp only should print the int value 10:
socket_->hub_.onMessage([this](
uWS::WebSocket<uWS::SERVER> *ws,
char* message,
size_t length,
uWS::OpCode opCode
)
{
std::string data = std::string(message,length);
std::cout << "web::Server:\t Data received: " << data << std::endl;
// handle manual settings
std::cout << "Web::Server:\t Received request: manual. Redirecting message." << std::endl;
json test1 = json::parse(data);
auto test2 = test1.json::find("command");
std::cout << "Web::Server:\t Test 1" << test1 << std::endl;
std::cout << "Web::Server:\t Test 2" << *test2 << std::endl;
if (*test2 =="BrugerIndtastTF")
{
std::cout<<"Web::Server:\t BrugerIndtastTF modtaget" << std::endl;
auto temp= test1.json::find("brugerT");
auto humi= test1.json::find("brugerF");
std::cout << "Web::Server:\t temp: " << *temp << "humi: " << *humi << std::endl;
}
});
EDIT:
Here you can see the terminal
it should just say: temp: 10 humi: 20
You can try to get the string value of brugerT and strip the \" out of the string and then convert the resulting string into a int with stoi. You could even use a regular expression to find the integer inside the string and let that library figure out what is the best matching method. A regular expression for that would be something like: ([0-9]+)
ps string literal type 6 might be of some use when manually filtering out \"
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
string inputStr(R"("\"10\"")");
regex matchStr(R"(([0-9]+))");
auto matchesBegin = sregex_iterator(inputStr.begin(), inputStr.end(), matchStr);
auto matchesEnd = sregex_iterator();
for (sregex_iterator i = matchesBegin; i != matchesEnd; ++i) {
cout << i->str() << endl;
}
return 0;
}
I was just reviewing my C++. I tried to do this:
#include <iostream>
using std::cout;
using std::endl;
void printStuff(int x);
int main() {
printStuff(10);
return 0;
}
void printStuff(int x) {
cout << "My favorite number is " + x << endl;
}
The problem happens in the printStuff function. When I run it, the first 10 characters from "My favorite number is ", is omitted from the output. The output is "e number is ". The number does not even show up.
The way to fix this is to do
void printStuff(int x) {
cout << "My favorite number is " << x << endl;
}
I am wondering what the computer/compiler is doing behind the scenes.
The + overloaded operator in this case is not concatenating any string since x is an integer. The output is moved by rvalue times in this case. So the first 10 characters are not printed. Check this reference.
if you will write
cout << "My favorite number is " + std::to_string(x) << endl;
it will work
It's simple pointer arithmetic. The string literal is an array or chars and will be presented as a pointer. You add 10 to the pointer telling you want to output starting from the 11th character.
There is no + operator that would convert a number into a string and concatenate it to a char array.
adding or incrementing a string doesn't increment the value it contains but it's address:
it's not problem of msvc 2015 or cout but instead it's moving in memory back/forward:
to prove to you that cout is innocent:
#include <iostream>
using std::cout;
using std::endl;
int main()
{
char* str = "My favorite number is ";
int a = 10;
for(int i(0); i < strlen(str); i++)
std::cout << str + i << std::endl;
char* ptrTxt = "Hello";
while(strlen(ptrTxt++))
std::cout << ptrTxt << std::endl;
// proving that cout is innocent:
char* str2 = str + 10; // copying from element 10 to the end of str to stre. like strncpy()
std::cout << str2 << std::endl; // cout prints what is exactly in str2
return 0;
}
I am looking to find a C++ fstream equivalent function of C fgets. I tried with get function of fstream but did not get what I wanted. The get function does not extract the delim character whereas the fgets function used to extract it. So, I wrote a code to insert this delim character from my code itself. But it is giving strange behaviour. Please see my sample code below;
#include <stdio.h>
#include <fstream>
#include <iostream>
int main(int argc, char **argv)
{
char str[256];
int len = 10;
std::cout << "Using C fgets function" << std::endl;
FILE * file = fopen("C:\\cpp\\write.txt", "r");
if(file == NULL){
std::cout << " Error opening file" << std::endl;
}
int count = 0;
while(!feof(file)){
char *result = fgets(str, len, file);
std::cout << result << std::endl ;
count++;
}
std::cout << "\nCount = " << count << std::endl;
fclose(file);
std::fstream fp("C:\\cpp\\write.txt", std::ios_base::in);
int iter_count = 0;
while(!fp.eof() && iter_count < 10){
fp.get(str, len,'\n');
int count = fp.gcount();
std::cout << "\nCurrent Count = " << count << std::endl;
if(count == 0){
//only new line character encountered
//adding newline character
str[1] = '\0';
str[0] = '\n';
fp.ignore(1, '\n');
//std::cout << fp.get(); //ignore new line character from stream
}
else if(count != (len -1) ){
//adding newline character
str[count + 1] = '\0';
str[count ] = '\n';
//std::cout << fp.get(); //ignore new line character from stream
fp.ignore(1, '\n');
//std::cout << "Adding new line \n";
}
std::cout << str << std::endl;
std::cout << " Stream State : Good: " << fp.good() << " Fail: " << fp.fail() << std::endl;
iter_count++;
}
std::cout << "\nCount = " << iter_count << std::endl;
fp.close();
return 0;
}
The txt file that I am using is write.txt with following content:
This is a new lines.
Now writing second
line
DONE
If you observe my program, I am using fgets function first and then using the get function on same file. In case of get function, the stream state goes bad.
Can anyone please point me out what is going wrong here?
UPDATED: I am now posting a simplest code which does not work at my end. If I dont care about the delim character for now and just read the entire file 10 characters at a time using getline:
void read_file_getline_no_insert(){
char str[256];
int len =10;
std::cout << "\nREAD_GETLINE_NO_INSERT FUNCITON\n" << std::endl;
std::fstream fp("C:\\cpp\\write.txt", std::ios_base::in);
int iter_count = 0;
while(!fp.eof() && iter_count < 10){
fp.getline(str, len,'\n');
int count = fp.gcount();
std::cout << "\nCurrent Count = " << count << std::endl;
std::cout << str << std::endl;
std::cout << " Stream State : Good: " << fp.good() << " Fail: " << fp.fail() << std::endl;
iter_count++;
}
std::cout << "\nCount = " << iter_count << std::endl;
fp.close();
}
int main(int argc, char **argv)
{
read_file_getline_no_insert();
return 0;
}
If wee see the output of above code:
READ_GETLINE_NO_INSERT FUNCITON
Current Count = 9
This is a
Stream State : Good: 0 Fail: 1
Current Count = 0
Stream State : Good: 0 Fail: 1
You would see that the state of stream goes Bad and the fail bit is set. I am unable to understand this behavior.
Rgds
Sapan
std::getline() will read a string from a stream, until it encounters a delimiter (newline by default).
Unlike fgets(), std::getline() discards the delimiter. But, also unlike fgets(), it will read the whole line (available memory permitting) since it works with a std::string rather than a char *. That makes it somewhat easier to use in practice.
All types derived from std::istream (which is the base class for all input streams) also have a member function called getline() which works a little more like fgets() - accepting a char * and a buffer size. It still discards the delimiter though.
The C++-specific options are overloaded functions (i.e. available in more than one version) so you need to read documentation to decide which one is appropriate to your needs.
This question already has answers here:
How do you search a std::string for a substring in C++?
(6 answers)
Closed 8 years ago.
I have a client for a pre-existing server. Let's say I get some packets "MC123, 456!##".
I store these packets in a char called message. To print out a specific part of them, in this case the numbers part of them, I would do something like "cout << message.substr(3, 7) << endl;".
But what if I receive another message "MC123, 456, 789!##". "cout << message.substr(3,7)" would only print out "123, 456", whereas I want "123, 456, 789". How would I do this assuming I know that every message ends with "!##".
First - Sketch out the indexing.
std::string packet1 = "MC123, 456!##";
// 0123456789012345678
// ^------^ desired text
std::string packet2 = "MC123, 456, 789!##";
// 0123456789012345678
// ^-----------^ desired text
The others answers are ok. If you wish to use std::string find,
consider rfind and find_first_not_of, as in the following code:
// forward
void messageShow(std::string packet,
size_t startIndx = 2);
// /////////////////////////////////////////////////////////////////////////////
int main (int, char** )
{
// 012345678901234567
// |
messageShow("MC123, 456!##");
messageShow("MC123, 456, 789!##");
messageShow("MC123, 456, 789, 987, 654!##");
// error test cases
messageShow("MC123, 456, 789##!"); // missing !##
messageShow("MC123x 456, 789!##"); // extraneous char in packet
return(0);
}
void messageShow(std::string packet,
size_t startIndx) // default value 2
{
static size_t seq = 0;
seq += 1;
std::cout << packet.size() << " packet" << seq << ": '"
<< packet << "'" << std::endl;
do
{
size_t bangAtPound_Indx = packet.rfind("!##");
if(bangAtPound_Indx == std::string::npos){ // not found, can't do anything more
std::cerr << " '!##' not found in packet " << seq << std::endl;
break;
}
size_t printLength = bangAtPound_Indx - startIndx;
const std::string DIGIT_SPACE = "0123456789, ";
size_t allDigitSpace = packet.find_first_not_of(DIGIT_SPACE, startIndx);
if(allDigitSpace != bangAtPound_Indx) {
std::cerr << " extraneous char found in packet " << seq << std::endl;
break; // something extraneous in string
}
std::cout << bangAtPound_Indx << " message" << seq << ": '"
<< packet.substr(startIndx, printLength) << "'" << std::endl;
}while(0);
std::cout << std::endl;
}
This outputs
13 packet1: 'MC123, 456!##'
10 message1: '123, 456'
18 packet2: 'MC123, 456, 789!##'
15 message2: '123, 456, 789'
28 packet3: 'MC123, 456, 789, 987, 654!##'
25 message3: '123, 456, 789, 987, 654'
18 packet4: 'MC123, 456, 789##!'
'!##' not found in packet 4
18 packet5: 'MC123x 456, 789!##'
extraneous char found in packet 5
Note: String indexes start at 0. The index of the digit '1' is 2.
The correct approach is to look for existence / location of the "known termination" string, then take the substring up to (but not including) that substring.
Something like
str::string termination = "!#$";
std::size_t position = inputstring.find(termination);
std::string importantBit = message.substr(0, position);
You could check the front of the string separately as well. Combining these, you could use regular expressions to make your code more robust, using a regex like
MC([0-9,]+)!#\$
This will return the bit between MC and !#$ but only if it consists entirely of numbers and commas. Obviously you can adapt this as needed.
UPDATE you asked in your comment how to use the regular expression. Here is a very simple program. Note - this is using C++11: you need to make sure our compiler supports it.
#include <iostream>
#include <regex>
int main(void) {
std::string s ("ABC123,456,789!#$");
std::smatch m;
std::regex e ("ABC([0-9,]+)!#\\$"); // matches the kind of pattern you are looking for
if (std::regex_search (s,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
}
}
On my Mac, I can compile the above program with
clang++ -std=c++0x -stdlib=libc++ match.cpp -o match
If instead of just digits and commas you want "anything" in your expression (but it's still got fixed characters in front and behind) you can simply do
std::regex e ("ABC(.*)!#\\$");
Here, .+ means "zero or more of 'anything'" - but followed by !#$. The double backslash has to be there to "escape" the dollar sign, which has special meaning in regular expressions (it means "the end of the string").
The more accurately your regular expression reflects exactly what you expect, the better you will be able to trap any errors. This is usually a very good thing in programming. "Always check your inputs".
One more thing - I just noticed you mentioned that you might have "more stuff" in your string. This is where using regular expressions quickly becomes the best. You mentioned a string
MC123, 456!##*USRChester.
and wanted to extract 123, 456 and Chester. That is - stuff between MC and !#$, and more stuff after USR (if that is even there). Here is the code that shows how that is done:
#include <iostream>
#include <regex>
int main(void) {
std::string s1 ("MC123, 456!#$");
std::string s2 ("MC123, 456!#$USRChester");
std::smatch m;
std::regex e ("MC([0-9, ]+)!#\\$(?:USR)?(.*)$"); // matches the kind of pattern you are looking for
if (std::regex_search (s1,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
}
if (std::regex_search (s2,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
if (match[2].length() > 0) {
std::cout << m[2] << ": " << m[1] << std::endl;
}
}
}
Output:
match[0] = MC123, 456!#$
match[1] = 123, 456
match[2] =
match[0] = MC123, 456!#$USRChester
match[1] = 123, 456
match[2] = Chester
Chester: 123, 456
The matches are:
match[0] : "everything in the input string that was consumed by the Regex"
match[1] : "the thing in the first set of parentheses"
match[2] : "The thing in the second set of parentheses"
Note the use of the slightly tricky (?:USR)? expression. This says "This might (that's the ()? ) be followed by the characters USR. If it is, skip them (that's the ?: part) and match what follows.
As you can see, simply testing whether m[2] is empty will tell you whether you have just numbers, or number plus "the thing after the USR". I hope this gives you an inkling of the power of regular expressions for chomping through strings like yours.
If you are sure about the ending of the message, message.substr(3, message.size()-6) will do the trick.
However, it is good practice to check everything, just to avoid surprises.
Something like this:
if (message.size() < 6)
throw error;
if (message.substr(0,3) != "MCX") //the exact numbers do not match in your example, but you get the point...
throw error;
if (message.substr(message.size()-3) != "!##")
throw error;
string data = message.substr(3, message.size()-6);
Just calculate the offset first.
string str = ...;
size_t start = 3;
size_t end = str.find("!##");
assert(end != string::npos);
return str.substr(start, end - start);
You can get the index of "!##" by using:
message.find("!##")
Then use that answer instead of 7. You should also check for it equalling std::string::npos which indicates that the substring was not found, and take some different action.
string msg = "MC4,512,541,3123!##";
for (int i = 2; i < msg.length() - 3; i++) {
if (msg[i] != '!' && msg[i + 1] != '#' && msg[i + 2] != '#')
cout << msg[i];
}
or use char[]
char msg[] = "MC4,123,54!##";
sizeof(msg -1 ); //instead of msg.length()
// -1 for the null byte at the end (each char takes 1 byte so the size -1 == number of chars)