C++ regex - First group should be optional - c++

I have the following C++ code to parse a C++ code in a string:
std::string cpp_code = " static int foo(int a, float b)\n{\n/* here could be your code */\n}\n";
std::string function_regex_str = R"(\s*(\w+)?\s+(\w+)\s+(\w+)\((.*)\)\s+\{\s+(.*)\s+\})";
std::regex function_regex(function_regex_str, std::regex::ECMAScript);
std::cmatch sm;
auto ret = std::regex_search(cpp_code.c_str(), sm, function_regex);
if (ret) {
std::cout << "fbound:\t" << sm[1] << std::endl;
std::cout << "ftype:\t" << sm[2] << std::endl;
std::cout << "fname:\t" << sm[3] << std::endl;
std::cout << "fparam:\t" << sm[4] << std::endl;
std::cout << "fbody:\t" << sm[5] << std::en)dl;
}
The code works fine. Now the first group (sm[1]) should be optional. So I appended ? to the first group (\w+). But if I tested the code with the shorten string
cpp_code = "int foo(int a, float b)\n{\n/* here could be your code */\n}\n"
regex_search returns false.
How can I make the first group (in the code above for the substring static) optional?
I tested the code with Visual Studio 2022 C++.

Related

Converting a json value that keeps changing to int in c++

Okay so i get this JSON object from my client:
{"command":"BrugerIndtastTF","brugerT":"\"10\"","brugerF":"\"20\""}
Then i need to use the int value from "brugerT", but as you can see it has "\"10\"" around it. When i code this in javascript i dont get this problem. Is there a way to only use the part of "brugerT" that says 10?
the code where *temp only should print the int value 10:
socket_->hub_.onMessage([this](
uWS::WebSocket<uWS::SERVER> *ws,
char* message,
size_t length,
uWS::OpCode opCode
)
{
std::string data = std::string(message,length);
std::cout << "web::Server:\t Data received: " << data << std::endl;
// handle manual settings
std::cout << "Web::Server:\t Received request: manual. Redirecting message." << std::endl;
json test1 = json::parse(data);
auto test2 = test1.json::find("command");
std::cout << "Web::Server:\t Test 1" << test1 << std::endl;
std::cout << "Web::Server:\t Test 2" << *test2 << std::endl;
if (*test2 =="BrugerIndtastTF")
{
std::cout<<"Web::Server:\t BrugerIndtastTF modtaget" << std::endl;
auto temp= test1.json::find("brugerT");
auto humi= test1.json::find("brugerF");
std::cout << "Web::Server:\t temp: " << *temp << "humi: " << *humi << std::endl;
}
});
EDIT:
Here you can see the terminal
it should just say: temp: 10 humi: 20
You can try to get the string value of brugerT and strip the \" out of the string and then convert the resulting string into a int with stoi. You could even use a regular expression to find the integer inside the string and let that library figure out what is the best matching method. A regular expression for that would be something like: ([0-9]+)
ps string literal type 6 might be of some use when manually filtering out \"
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
string inputStr(R"("\"10\"")");
regex matchStr(R"(([0-9]+))");
auto matchesBegin = sregex_iterator(inputStr.begin(), inputStr.end(), matchStr);
auto matchesEnd = sregex_iterator();
for (sregex_iterator i = matchesBegin; i != matchesEnd; ++i) {
cout << i->str() << endl;
}
return 0;
}

string::replace throws std::out_of_range on valid iterators

Really can't figure out the reason for 'terminate called after throwing an instance of 'std::out_of_range''
std::cerr << std::string(s[0].first, s[0].second) << std::endl;
std::cerr << std::string(e[0].first, e[0].second) << std::endl;
std::cerr << std::string(s[0].first, e[0].second) << std::endl;
the above code return valid strings with matched results
boost::regex start(elementStartTag);
boost::regex end(elementEndTag);
boost::match_results<std::string::const_iterator> s, e;
if(!boost::regex_search(tmpTemplate, s, start)) {
dDebug() << "No start token: " << elementStartTag << " was found in file: " << templatePath();
std::cerr << "No start token: " << elementStartTag << " was found in file: " << templatePath() << std::endl;
return;
}
if(!boost::regex_search(tmpTemplate, e, end)) {
dDebug() << "No end token: " << elementEndTag << " was found in file: " << templatePath();
std::cerr << "No end token: " << elementEndTag << " was found in file: " << templatePath() << std::endl;
return;
}
//std::string::iterator si, ei;
// si = fromConst(tmpTemplate.begin(), s[0].second);
// ei = fromConst(tmpTemplate.begin(), e[0].first);
// std::cerr << std::string(si, ei) << "\t" << ss.str(); // return valid string
std::cerr << std::string(s[0].first, s[0].second) << std::endl;
std::cerr << std::string(e[0].first, e[0].second) << std::endl;
std::cerr << std::string(s[0].first, e[0].second) << std::endl;
std::cerr << "s[0].first - tmpTemplate.begin()\t" << s[0].first - tmpTemplate.begin() << std::endl;
std::cerr << "s[0].first - e[0].second\t" << s[0].first - e[0].second << std::endl;
//tmpTemplate.replace(fi, se, ss.str()); //also throws exeption
tmpTemplate.replace(s[0].first - tmpTemplate.begin(), s[0].first - e[0].second, "test"); // throws exeption
gcc version: 4.7.3 if it really matters
boost version: 1.52.0
UPDATE:
First:
The following equation is wrong s[0].first - e[0].second should be e[0].second - s[0].first - i wonder why nobody saw this (me also) - but consider it a typo, cause s[0].first - tmpTemplate.begin() return negative number anyway.
tmpTemplate defined and initialized as
std::string tmpTemplate= getTemplate();
Great - as i said s[0].first - tmpTemplate.begin() returns negative number
if tmpTemplate is defined and initialized as
std::string tmpTemplate(getTemplate().data(), getTemplate().length());
everything is fine.
Second:
stop boost::match_results uninitialized nonsense please read the regex_search documentation it says: "If i find no match i return false"
Third:
std::string tmpTemplate= getTemplate();
and
std::string tmpTemplate(getTemplate().data(), getTemplate().length());
DOES REALLY DIFFER.
Own Сonclusion:
It is ether a memory corruption which occurs else where in my code and i can't detect it with valgrind, or a bug which is not part of my code.
What are the contents of tmpTemplate, elementStartTag and elementEndTag? If the elementEndTag precedes the elementStartTag in tmpTemplate, then you'll definitely get an out_of_range error.
In the end, I'd recommend using just one regular expression, along the lines of:
boost::regex matcher( ".*(" + elementStartTag + ")(.*)(" + elementEndTag + ").*");
and then using boost::regex_match rather than search. This guarantees the order; it may cause problems if there is more than one matching element in the sequence, however. If this is an issue: you should use:
boost::regex_search( s[1].second, tmpTemplate.end(), e, end )
as the expression for matching the end.

Retrieving a regex search in C++

Hello I am new to regular expressions and from what I understood from the c++ reference website it is possible to get match results.
My question is: how do I retrieve these results? What is the difference between smatch and cmatch? For example, I have a string consisting of date and time and this is the regular expression I wrote:
"(1[0-2]|0?[1-9])([:][0-5][0-9])?(am|pm)"
Now when I do a regex_search with the string and the above expression, I can find whether there is a time in the string or not. But I want to store that time in a structure so I can separate hours and minutes. I am using Visual studio 2010 c++.
If you use e.g. std::regex_search then it fills in a std::match_result where you can use the operator[] to get the matched strings.
Edit: Example program:
#include <iostream>
#include <string>
#include <regex>
void test_regex_search(const std::string& input)
{
std::regex rgx("((1[0-2])|(0?[1-9])):([0-5][0-9])((am)|(pm))");
std::smatch match;
if (std::regex_search(input.begin(), input.end(), match, rgx))
{
std::cout << "Match\n";
//for (auto m : match)
// std::cout << " submatch " << m << '\n';
std::cout << "match[1] = " << match[1] << '\n';
std::cout << "match[4] = " << match[4] << '\n';
std::cout << "match[5] = " << match[5] << '\n';
}
else
std::cout << "No match\n";
}
int main()
{
const std::string time1 = "9:45pm";
const std::string time2 = "11:53am";
test_regex_search(time1);
test_regex_search(time2);
}
Output from the program:
Match
match[1] = 9
match[4] = 45
match[5] = pm
Match
match[1] = 11
match[4] = 53
match[5] = am
Just use named groups.
(?<hour>(1[0-2]|0?[1-9]))([:](?<minute>[0-5][0-9]))?(am|pm)
Ok, vs2010 doesn't support named groups. You already using unnamed capture groups. Go through them.

Regex C++: extract substring

I would like to extract a substring between two others.
ex: /home/toto/FILE_mysymbol_EVENT.DAT
or just FILE_othersymbol_EVENT.DAT
And I would like to get : mysymbol and othersymbol
I don't want to use boost or other libs. Just standard stuffs from C++, except CERN's ROOT lib, with TRegexp, but I don't know how to use it...
Since last year C++ has regular expression built into the standard. This program will show how to use them to extract the string you are after:
#include <regex>
#include <iostream>
int main()
{
const std::string s = "/home/toto/FILE_mysymbol_EVENT.DAT";
std::regex rgx(".*FILE_(\\w+)_EVENT\\.DAT.*");
std::smatch match;
if (std::regex_search(s.begin(), s.end(), match, rgx))
std::cout << "match: " << match[1] << '\n';
}
It will output:
match: mysymbol
It should be noted though, that it will not work in GCC as its library support for regular expression is not very good. Works well in VS2010 (and probably VS2012), and should work in clang.
By now (late 2016) all modern C++ compilers and their standard libraries are fully up to date with the C++11 standard, and most if not all of C++14 as well. GCC 6 and the upcoming Clang 4 support most of the coming C++17 standard as well.
TRegexp only supports a very limited subset of regular expressions compared to other regex flavors. This makes constructing a single regex that suits your needs somewhat awkward.
One possible solution:
[^_]*_([^_]*)_
will match the string until the first underscore, then capture all characters until the next underscore. The relevant result of the match is then found in group number 1.
But in your case, why use a regex at all? Just find the first and second occurrence of your delimiter _ in the string and extract the characters between those positions.
If you want to use regular expressions, I'd really recommend using C++11's regexes or, if you have a compiler that doesn't yet support them, Boost. Boost is something I consider almost-part-of-standard-C++.
But for this particular question, you do not really need any form of regular expressions. Something like this sketch should work just fine, after you add all appropriate error checks (beg != npos, end != npos etc.), test code, and remove my typos:
std::string between(std::string const &in,
std::string const &before, std::string const &after) {
size_type beg = in.find(before);
beg += before.size();
size_type end = in.find(after, beg);
return in.substr(beg, end-beg);
}
Obviously, you could change the std::string to a template parameter and it should work just fine with std::wstring or more seldomly used instantiations of std::basic_string as well.
I would study corner cases before trusting it.
But This is a good candidate:
std::string text = "/home/toto/FILE_mysymbol_EVENT.DAT";
std::regex reg("(.*)(FILE_)(.*)(_EVENT.DAT)(.*)");
std::cout << std::regex_replace(text, reg, "$3") << '\n';
The answers of Some programmer dude, Tim Pietzcker, and Christopher Creutzig are cool and correct, but they seemed to me not very obvious for beginners.
The following function is an attempt to create an auxiliary illustration for Some programmer dude and Tim Pietzcker's answers:
void ExtractSubString(const std::string& start_string
, const std::string& string_regex_extract_substring_template)
{
std::regex regex_extract_substring_template(
string_regex_extract_substring_template);
std::smatch match;
std::cout << std::endl;
std::cout << "A substring extract template: " << std::endl;
std::cout << std::quoted(string_regex_extract_substring_template)
<< std::endl;
std::cout << std::endl;
std::cout << "Start string: " << std::endl;
std::cout << start_string << std::endl;
std::cout << std::endl;
if (std::regex_search(start_string.begin(), start_string.end()
, match, regex_extract_substring_template))
{
std::cout << "match0: " << match[0] << std::endl;
std::cout << "match1: " << match[1] << std::endl;
std::cout << "match2: " << match[2] << std::endl;
}
std::cout << std::endl;
}
The following overloaded function is an attempt to help illustrate Christopher Creutzig's answer:
void ExtractSubString(const std::string& start_string
, const std::string& before_substring, const std::string& after_substring)
{
std::cout << std::endl;
std::cout << "A before substring: " << std::endl;
std::cout << std::quoted(before_substring) << std::endl;
std::cout << std::endl;
std::cout << "An after substring: " << std::endl;
std::cout << std::quoted(after_substring) << std::endl;
std::cout << std::endl;
std::cout << "Start string: " << std::endl;
std::cout << start_string << std::endl;
std::cout << std::endl;
size_t before_substring_begin
= start_string.find(before_substring);
size_t extract_substring_begin
= before_substring_begin + before_substring.size();
size_t extract_substring_end
= start_string.find(after_substring, extract_substring_begin);
std::cout << "Extract substring: " << std::endl;
std::cout
<< start_string.substr(extract_substring_begin
, extract_substring_end - extract_substring_begin)
<< std::endl;
std::cout << std::endl;
}
This is the main function to run the overloaded functions:
#include <regex>
#include <iostream>
#include <iomanip>
int main()
{
const std::string start_string
= "/home/toto/FILE_mysymbol_EVENT.DAT";
const std::string string_regex_extract_substring_template(
".*FILE_(\\w+)_EVENT\\.DAT.*");
const std::string string_regex_extract_substring_template2(
"[^_]*_([^_]*)_");
ExtractSubString(start_string, string_regex_extract_substring_template);
ExtractSubString(start_string, string_regex_extract_substring_template2);
const std::string before_substring = "/home/toto/FILE_";
const std::string after_substring = "_EVENT.DAT";
ExtractSubString(start_string, before_substring, after_substring);
}
This is the result of executing the main function:
A substring extract template:
".*FILE_(\\w+)_EVENT\\.DAT.*"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
match0: /home/toto/FILE_mysymbol_EVENT.DAT
match1: mysymbol
match2:
A substring extract template:
"[^_]*_([^_]*)_"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
match0: /home/toto/FILE_mysymbol_
match1: mysymbol
match2:
A before substring:
"/home/toto/FILE_"
An after substring:
"_EVENT.DAT"
Start string:
"/home/toto/FILE_mysymbol_EVENT.DAT"
Extract substring:
mysymbol

Boost regex don't match tabs

I'm using boost regex_match and I have a problem with matching no tab characters.
My test application looks as follows:
#include <iostream>
#include <string>
#include <boost/spirit/include/classic_regex.hpp>
int
main(int args, char** argv)
{
boost::match_results<std::string::const_iterator> what;
if(args == 3) {
std::string text(argv[1]);
boost::regex expression(argv[2]);
std::cout << "Text : " << text << std::endl;
std::cout << "Regex: " << expression << std::endl;
if(boost::regex_match(text, what, expression, boost::match_default) != 0) {
int i = 0;
std::cout << text;
if(what[0].matched)
std::cout << " matches with regex pattern!" << std::endl;
else
std::cout << " does not match with regex pattern!" << std::endl;
for(boost::match_results<std::string::const_iterator>::const_iterator it=what.begin(); it!=what.end(); ++it) {
std::cout << "[" << (i++) << "] " << it->str() << std::endl;
}
} else {
std::cout << "Expression does not match!" << std::endl;
}
} else {
std::cout << "Usage: $> ./boost-regex <text> <regex>" << std::endl;
}
return 0;
}
If I run the program with these arguments, I don't get the expected result:
$> ./boost-regex "`cat file`" "(?=.*[^\t]).*"
Text : This text includes some tabulators
Regex: (?=.*[^\t]).*
This text includes some tabulators matches with regex pattern!
[0] This text includes some tabulators
In this case I would have expected that what[0].matched is false, but it's not.
Is there any mistake in my regular expression?
Or do I have to use other format/match flag?
Thank you in advance!
I am not sure what you want to do. My understanding is, you want the regex to fail as soon as there is a tab in the text.
Your positive lookahead assertion (?=.*[^\t]) is true as soon as it finds a non tab, and there are a lot of non tabs in your text.
If you want it to fail, when there is a tab, go the other way round and use a negative lookahead assertion.
(?!.*\t).*
this assertion will fail as soon as it find a tab.