Validating an NMEA sentence using C++ [duplicate] - c++

This question already has answers here:
Regex statement in C++ isn't working as expected [duplicate]
(3 answers)
Closed 3 years ago.
I need help with creating regular expressions for NMEA sentence. The reason for this because I want to validate the data whether it is a correct form of NMEA sentence. Using C++. Below is some example of NMEA sentence in the form of GLL. If it's possible I would also like to get a sample of c++ that will validate the code.
$GPGLL,5425.32,N,106.92,W,82808*64
$GPGLL,5425.33,N,106.91,W,82826*6a
$GPGLL,5425.32,N,106.9,W,82901*5e
$GPGLL,5425.32,N,106.89,W,82917*61
I have also included the expression I have tried that I found it online. But when I run it, it says unknown escape sequence.
#include <iostream>
#include <regex>
#include<string.h>
using namespace std;
int main()
{
// Target sequence
string s = "$GPGLL, 54 30.49, N, 1 06.74, W, 16 39 58 *5E";
// An object of regex for pattern to be searched
regex r("[A-Z] \w+,\d,\d,(?:\d{1}|),[A-B],[^,]+,0\*([A-Za-z0-9]{2})");
// flag type for determining the matching behavior
// here it is for matches on 'string' objects
smatch m;
// regex_search() for searching the regex pattern
// 'r' in the string 's'. 'm' is flag for determining
// matching behavior.
regex_search(s, m, r);
// for each loop
for (auto x : m)
cout << "The nmea sentence is correct ";
return 0;
}

The C++ compiler interprets \d and friends as a character escape code.
Either double the backslashes:
regex r("[A-Z] \\w+,\\d,\\d,(?:\\d{1}|),[A-B],[^,]+,0\\*([A-Za-z0-9]{2})");
or use a raw literal:
regex r(R"re([A-Z] \w+,\d,\d,(?:\d{1}|),[A-B],[^,]+,0\*([A-Za-z0-9]{2}))re");

Related

regex_match not returning true [duplicate]

This question already has an answer here:
Regex not working as expected with C++ regex_match
(1 answer)
Closed 4 days ago.
I am very confused why this regex match in C++ not working.
#include <iostream>
#include <regex>
#include <string>
void test_code(){
const std::string test_string("this is a test of test");
const std::regex match_regex("test");
std::cout<<test_string<<std::endl;
std::smatch match;
if (std::regex_match(test_string, match, match_regex)){
std::cout<<match.size()<<std::endl;
}
}
int main() {
test_code();
}
I read the CPP reference documentation and tried to write a simple regex check. I am not sure why this is not working (i.e. it s not returning true for std::regex_match(...) call.
As stated in documentation for std::regex_match() (emphasis is mine):
Determines if the regular expression e matches the entire target character sequence, which may be specified as std::string, a C-string, or an iterator pair.
and your regex pattern does not obviously match the whole string. So you either need to change your regex to something like ".*test.*" or use std::regex_search() If you want to check substring for matching:
Determines if there is a match between the regular expression e and some subsequence in the target character sequence.

When using regex_replace(), what't the correct way to back-reference a submatch when it's directly followed by another digit in the format string?

In the following code, I tried to use $1 to refer to the first submatch:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main()
{
string str {"1-2-3 4-5-6 7-8-9"};
int r = 1;
str = regex_replace(str, regex{R"((\d*-\d*-)\d*)"}, "$1" + to_string(r));
cout << str << "\n";
return 0;
}
What I expect is:
1-2-1 4-5-1 7-8-1
But it doesn't work because the actual format string passed to regex_replace() is $11 as if I were trying to refer to the 11th submatch.
So when using regex_replace(), what is the correct way to back-reference a submatch which is followed directly by another digit in the format string?
I tried using ${1} but it didn't work for any of the mainstream implementations that I tried.
According to Standard N3337, §28.5.2, Table 139:
format_default: When a regular expression match is to be replaced by a new string, the new string shall be constructed using the rules used
by the ECMAScript replace function in ECMA-262, part 15.5.4.11
String.prototype.replace. In addition, during search and replace
operations all non-overlapping occurrences of the regular expression
shall be located and replaced, and sections of the input that did not
match the expression shall be copied unchanged to the output string.
And according to ECMA-262 part 15.5.4.11 String.prototype.replace, Table 22
$nn: The nn-th capture, where nn is a two-digit decimal number in the range
01 to 99. If nn≤m and the nnth capture is undefined, use the empty
String instead. If nn>m, the result is implementation-defined.
So, there could be at most two decimal digits after $, which refers to matching group, therefore you could use
"$01" + to_string(r)

regular expression Can't find sequence of numbers

Let
exp = ^[0-9!##$%^&*()_+-=[]{};':"\|,.<>/?\s]*$
be a regular expression that allows me to find all sequences of numbers with or without special characters.
by using exp I manage to extract all sequences of numbers that are greater than 5. But the number 98200 cannot be extracted. I am not using any limits to how long should the sequence of numbers be.
Source code:
#include <boost/regex.hpp>
#include iostream;
using namespace std;
int main()
{
string s = "16000";
string exp = ^[0-9!##$%^&*()_+-=[]{};':"\\|,.<>\\/?\\s]*$
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
In C#, you need to escape the ]. You don't need to escape [ {} () when they are inside a character class. Also, if you want to include the dash as an included character in the character class, it should be at the beginning or end of the list. The sequence that you have of +-= translates to [+,-./0123456789:;<=] which makes your regex redundant. Finally, because of the terminal quantifier, you are allowing matching of zero length strings. This may be what you want, but if not, consider the '+' quantifier.
What about simply
[^A-Za-z]+
with or without the ^ $ anchors at the beginning/end
Indiscriminately escaping everything works for me.. :)
string exp = "^[0-9\\!##\\$\\%\\^&*\\(\\)_\\+\\-=\\[\\]\\{\\};\\\':\\\"\\\\|,\\.<>\\/?\\s]*$";
Note the double backslash... I'm sure you can workout which of the characters in your list means anything special, and only escape those, as I don't have the time to lookup what has special meaning in this context, I escaped everything, and this works fine for a few of the cases I tested
16000 => returns 1 16A000 => returns 0 16#000 => returns 1
Which I'm guessing is what you want...
I have shifted the brackets to the front of the character class and therewith I get the output 1 for 98200 using the following code:
#include <string>
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
int main()
{
std::cout << "main()\n";
string s = "98200";
string exp = "^[][0-9!##$%^&*()_+-={};':\"\\|,.<>\\/?\\s]*$";
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
/**
Local Variables:
compile-command: "g++ -g test.cc -o test.exe -lboost_regex-mt; ./test.exe"
End:
*/
EDIT: Note, that I used my experience with emacs regular
expressions. The info pages of emacs explain: "To include a ] in a
character set, you must make it the first character." I tried this
with boost::regexp and it worked. Later on when I had more time I read
in the boost manual
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets
that this is not specified for the perl regular expression syntax.
The perl syntax is the standard setting for boost::regex. According to the
specification the comment by
https://stackoverflow.com/users/2872922/ron-rosenfeld is the best
answer.
In the following program I eliminate the character range which was incidentally encoded into your regular expression.
Testing shows that the bracket at the beginning of the character set is included into the character set. So it turns out that my statement was right even if it is not specified in the official manual of boost::regex.
Nevertheless, I suggest that https://stackoverflow.com/users/2872922/ron-rosenfeld inserts his comment as an answer and you mark it as the solution. This will help others reading this thread.
#include <string>
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
int main()
{
std::cout << "main()\n";
string s = "98-[2]00";
string exp = "^[][0-9!##$%^&*()_+={};':\"|,.<>/?\\s-]*$";
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
/**
Local Variables:
compile-command: "g++ -g test.cc -o test.exe -lboost_regex-mt; ./test.exe"
End:
*/
I asked at http://lists.boost.org/boost-users/2013/12/80707.php
The answer of John Maddock (the author of the boost::regex library) is:
>I discovered that if one uses an closing bracket as the first character of
>a
>character class the character class includes this bracket.
>This works with the standard setting of boost::regex (i.e., perl-regular
>expressions) but it is not documented in the
>manual page
>
>http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/
>perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets
>
>Is this an undocumented feature, a bug or did I misinterpret something in
>the manual?
It's a feature, both Perl and POSIX extended regular expression behave the
same way.
John.

regular expression to extract this string [duplicate]

This question already has answers here:
Ruby: Split string at character, counting from the right side
(6 answers)
Closed 8 years ago.
I have the following string
jenkins-client-1.4
perl-5.16
ruby-1.9
10gen-mms-agent-1.0
Is it possible to use regular expression to extract terms with the dash and version stripped out to end up with something like the following?
jenkins-cleint
perl
ruby
10gen-mms-agent
Thx,
-peter
Example for C#
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
string[] terms = new string[] {
"jenkins-client-1.4",
"perl-5.16",
"ruby-1.9",
"10gen-mms-agent-1.0"
};
Regex termRegex = new Regex(#"^(.+)-(\d+[.]\d+)$");
foreach (string term in terms)
if (termRegex.IsMatch(term))
Console.WriteLine(termRegex.Match(term).Groups[1].Value);
Console.ReadLine();
}
}
This will do it: (.*?)-[0-9][0-9.]+
.*? matches anything, but as little as possible. This is the capture group you want to extract.
-[0-9][0-9.] matches a hyphen, then a digit, then any number of digits and periods.

Extracting a double from a string with text [duplicate]

This question already has an answer here:
std::stringstream strange behaviour
(1 answer)
Closed 9 years ago.
I have a string with lots of different characters similar to: "$: " "213.23453"
How do i extract the double value 213.23453 and store it in a variable, it's C++/C and i cant use lambdas.
You can use "poor man's regex" of the sscanf function to skip over the characters prior to the first digit, and then reading the double, like this:
char *str = "\"$: \" \"213.23453\"";
double d;
sscanf(str, "%*[^0-9]%lf", &d);
Note the asterisk after the first percentage format: it instructs sscanf to read the string without writing its content into an output buffer.
Here is a demo on ideone.
Use a regular expression.
[$]?[0-9]*(\.)?[0-9]?[0-9]?
This should match those with a $ sign and those without.
Boost.Regex is a very good regular expression library
Personally, I find Boost.Xpressive much nicer to work with. It is a header-only library and it has some nice features such as static regexes (regexes compiled at compile time).
If you're using a C++11 compliant compiler, use std::regex unless you have good reason to use something else.
Pure C++ solution could be to manually cut off the trash characters preceding the number (first digit identified by std::isdigit) and then just construct a temporary istringstream object to retrieve the double from:
std::string myStr("$:. :$$#&*$ :213.23453$:#$;");
// find the first digit:
int startPos = 0;
for (; startPos < myStr.size(); ++startPos)
if (std::isdigit(myStr[startPos])) break;
// cut off the trash:
myStr = myStr.substr(startPos, myStr.size() - startPos);
// retrieve the value:
double d;
std::istringstream(myStr) >> d;
but C-style sscanf with appropriate format specified would suffice here as well :)