string::replace throws std::out_of_range on valid iterators - c++

Really can't figure out the reason for 'terminate called after throwing an instance of 'std::out_of_range''
std::cerr << std::string(s[0].first, s[0].second) << std::endl;
std::cerr << std::string(e[0].first, e[0].second) << std::endl;
std::cerr << std::string(s[0].first, e[0].second) << std::endl;
the above code return valid strings with matched results
boost::regex start(elementStartTag);
boost::regex end(elementEndTag);
boost::match_results<std::string::const_iterator> s, e;
if(!boost::regex_search(tmpTemplate, s, start)) {
dDebug() << "No start token: " << elementStartTag << " was found in file: " << templatePath();
std::cerr << "No start token: " << elementStartTag << " was found in file: " << templatePath() << std::endl;
return;
}
if(!boost::regex_search(tmpTemplate, e, end)) {
dDebug() << "No end token: " << elementEndTag << " was found in file: " << templatePath();
std::cerr << "No end token: " << elementEndTag << " was found in file: " << templatePath() << std::endl;
return;
}
//std::string::iterator si, ei;
// si = fromConst(tmpTemplate.begin(), s[0].second);
// ei = fromConst(tmpTemplate.begin(), e[0].first);
// std::cerr << std::string(si, ei) << "\t" << ss.str(); // return valid string
std::cerr << std::string(s[0].first, s[0].second) << std::endl;
std::cerr << std::string(e[0].first, e[0].second) << std::endl;
std::cerr << std::string(s[0].first, e[0].second) << std::endl;
std::cerr << "s[0].first - tmpTemplate.begin()\t" << s[0].first - tmpTemplate.begin() << std::endl;
std::cerr << "s[0].first - e[0].second\t" << s[0].first - e[0].second << std::endl;
//tmpTemplate.replace(fi, se, ss.str()); //also throws exeption
tmpTemplate.replace(s[0].first - tmpTemplate.begin(), s[0].first - e[0].second, "test"); // throws exeption
gcc version: 4.7.3 if it really matters
boost version: 1.52.0
UPDATE:
First:
The following equation is wrong s[0].first - e[0].second should be e[0].second - s[0].first - i wonder why nobody saw this (me also) - but consider it a typo, cause s[0].first - tmpTemplate.begin() return negative number anyway.
tmpTemplate defined and initialized as
std::string tmpTemplate= getTemplate();
Great - as i said s[0].first - tmpTemplate.begin() returns negative number
if tmpTemplate is defined and initialized as
std::string tmpTemplate(getTemplate().data(), getTemplate().length());
everything is fine.
Second:
stop boost::match_results uninitialized nonsense please read the regex_search documentation it says: "If i find no match i return false"
Third:
std::string tmpTemplate= getTemplate();
and
std::string tmpTemplate(getTemplate().data(), getTemplate().length());
DOES REALLY DIFFER.
Own Сonclusion:
It is ether a memory corruption which occurs else where in my code and i can't detect it with valgrind, or a bug which is not part of my code.

What are the contents of tmpTemplate, elementStartTag and elementEndTag? If the elementEndTag precedes the elementStartTag in tmpTemplate, then you'll definitely get an out_of_range error.
In the end, I'd recommend using just one regular expression, along the lines of:
boost::regex matcher( ".*(" + elementStartTag + ")(.*)(" + elementEndTag + ").*");
and then using boost::regex_match rather than search. This guarantees the order; it may cause problems if there is more than one matching element in the sequence, however. If this is an issue: you should use:
boost::regex_search( s[1].second, tmpTemplate.end(), e, end )
as the expression for matching the end.

Related

C++ regex - First group should be optional

I have the following C++ code to parse a C++ code in a string:
std::string cpp_code = " static int foo(int a, float b)\n{\n/* here could be your code */\n}\n";
std::string function_regex_str = R"(\s*(\w+)?\s+(\w+)\s+(\w+)\((.*)\)\s+\{\s+(.*)\s+\})";
std::regex function_regex(function_regex_str, std::regex::ECMAScript);
std::cmatch sm;
auto ret = std::regex_search(cpp_code.c_str(), sm, function_regex);
if (ret) {
std::cout << "fbound:\t" << sm[1] << std::endl;
std::cout << "ftype:\t" << sm[2] << std::endl;
std::cout << "fname:\t" << sm[3] << std::endl;
std::cout << "fparam:\t" << sm[4] << std::endl;
std::cout << "fbody:\t" << sm[5] << std::en)dl;
}
The code works fine. Now the first group (sm[1]) should be optional. So I appended ? to the first group (\w+). But if I tested the code with the shorten string
cpp_code = "int foo(int a, float b)\n{\n/* here could be your code */\n}\n"
regex_search returns false.
How can I make the first group (in the code above for the substring static) optional?
I tested the code with Visual Studio 2022 C++.

Server Status to XML using fwrite?

// Update the server status xml
string filelocation ("/var/www/html/index.xml");
string firstline ("<server>\n");
string secondline ("\t<current>" + msg.getCount() + "</current>\n");
string thirdline ("\t<highest>" + "--" + "</highest>\n");
string fourthline ("\t<status>Online</status>\n")
string finalline ("</server>");
fstream file;
file.open(filelocation);
file.write(firstline + secondline + thirdline + fourthline + finalline);
string updateFlush ("Server Status updated.");
printf("%s\n", updateFlush);
file.close();
Note that msg.getCount() is a function in the same file to get player count from the central server.
Gives out errors about an operands const char*. Something to do with + or -
Thanks
Take a look at the line
string secondline ("\t<current>" + msg.getCount() + "</current>\n");
"\t<current>" is a const char *
msg.getCount() looks like an int or size_t
</current>\n again is a const char *
Adding a const char * to an int or size_t creates a new const char * pointing to a different address.
The same happens in the line
string thirdline ("\t<highest>" + "--" + "</highest>\n");
Here you are adding pointers together. The result is a pointer pointing to a more or less random address.
And in these two lines:
string updateFlush ("Server Status updated.");
printf("%s\n", updateFlush);
You are creating a C++ string-object and trying to print it using a C print function with a format string that requires a char *.
You are mixing C and C++ or stream based I/O with conventional I/O.
In current C++ you should do it this way:
string filelocation ("/var/www/html/index.xml");
fstream file;
file.open(filelocation);
file
<< "<server>\n"
<< "\t<current>" << msg.getCount() << "</current>\n"
<< "\t<highest>" << "--" << "</highest>\n"
<< "\t<status>Online</status>\n"
<< "</server>";
string updateFlush ("Server Status updated.");
cout << updateFlush << std::endl;
file.close();
Or even more readable:
auto file = std::ofstream("/var/www/html/index.xml");
file
<< "<server>" << std::endl
<< "\t<current>" << msg.getCount() << "</current>" << std::endl
<< "\t<highest>" << "--" << "</highest>" << std::endl
<< "\t<status>Online</status>" << std::endl
<< "</server>";
file.close();
std::cout << "Server status updated." << std::endl;
If operating with streams use std::endl to output a newline. It outputs the correct newline for the operation system (CRLF or LF or whatever) and it flushes the stream.
To use std::cout you have to include <iostream> and for std::ofstream include <fstream>.
If you like it short, you could even do this:
std::ofstream("/var/www/html/index.xml")
<< "<server>" << std::endl
<< "\t<current>" << msg.getCount() << "</current>" << std::endl
<< "\t<highest>" << "--" << "</highest>" << std::endl
<< "\t<status>Online</status>" << std::endl
<< "</server>";
std::cout << "Server status updated." << std::endl;

Regex program not catching exception, other problems

When I run this code:
#include <iostream>
#include <regex>
using namespace std;
main () {
const string source = "hello(abc_def)";
const regex regexp("he(l)lo.*");
smatch m;
if (regex_match(source, m, regexp)) {
cout << "Found, group 1 = " << m[1].str() << endl;
} else {
cout << "Not found" << endl;
}
const regex regexp2("hello\\((\\w+)\\)");
try {
if (regex_match(source, m, regexp2)) {
cout << "Found, group 1 = " << m[1].str() << endl;
} else {
cout << "Not found" << endl;
}
} catch(const exception& exc) {
cout << "Got exception: " << exc.what() << endl;
}
}
the output is:
Found, group 1 = el
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
accompanied by a dialog box that the program is crashing. I'm using g++ on Windows, 4.8.1 (yes, I specified -std=c++11), and I realize that the regular expression stuff was still experimental until 4.9, so that could explain why the first capture group is wrong and why it might have had a problem with the second regex. I'm still concerned about why it said it was throwing std::regex_error but my code didn't catch it. Changing exception& to regex_error& in the catch clause didn't change the behavior. Are all of these just library bugs, or did I do something wrong? I'm trying to relearn C++ after not having used it for 15 years or so (and also trying to learn C++11), so I'm concerned that I might have done something dumb.
The exception occurs in this line:
const regex regexp2("hello\\((\\w+)\\)");
And this line is not inside a "Try-catch" block.

const char * changing value during loop

I have a function that iterates through a const char * and uses the character to add objects to an instance of std::map if it is one of series of recognized characters.
#define CHARSEQ const char*
void compile(CHARSEQ s) throw (BFCompilationError)
{
std::cout << "#Receive call " << s << std::endl;
for(int i = 0; s[i] != '\0'; i++)
{
if (std::string("<>-+.,[]").find_first_of(s[i]) == std::string::npos)
{
throw BFCompilationError("Unknown operator",*s,i);
}
std::cout << "#Compiling: " << s[i] << std::endl;
std::cout << "#address s " << (void*)s << std::endl;
std::cout << "#var s " << s << std::endl;
controlstack.top().push_back(opmap[s[i]]);
}
}
The character sequence passed is "++++++++++."
For the first three iterations, the print statements display the expected values of '+', '+', and '+', and the value of s continues to be "+++++++++++.". However, on the fourth iteration, s becomes mangled, producing bizarre values such as 'Ð', 'öê', 'cR ', 'œk' and many other character sequences. If the line that throws the exception is removed and the loop is allowed to continue, the value of s does not change after again.
Other functions have access to s but since this is not a multithreaded program I don't see why that would matter. I am not so much confused about why s is changing but why it only changes on the fourth iteration.
I have searched SO and the only post that seems at all relevant is this one but it still doesn't answer my question. (Research has been difficult because searching "const char* changing value" or similar terms just comes up with hundreds of posts about what part of is is const).
Lastly, I know I should probably be using std::string, which I will if no answers come forth, but I would still like to understand this behavior.
EDIT:
Here is the code that calls this function.
CHARSEQ text = load(s);
std::cout << "#Receive load " << text << std::endl;
try
{
compile(text);
}
catch(BFCompilationError& err)
{
std::cerr << "\nError in bf code: caught BFCompilationError #" << err.getIndex() << " in file " << s << ":\n";
std::cerr << text << '\n';
for(int i = 0; i < err.getIndex(); i++)
{
std::cerr << " ";
}
std::cerr << "^\n";
std::cerr << err.what() << err.getProblemChar() << std::endl;
return 1;
}
Where load is:
CHARSEQ load(CHARSEQ fname)
{
std::ifstream infile (fname);
std::string data(""), line;
if (infile.is_open())
{
while(infile.good())
{
std::getline(infile,line);
std::cout << "#loading: "<< line << '\n';
data += line;
}
infile.close();
}
else
{
std::cerr << "Error: unable to open file: " << fname << std::endl;
}
return std::trim(data).c_str();
}
and the file fname is ++++++++++. spread such that there is one character per line.
EDIT 2:
Here is an example of console output:
#loading: +
#loading: +
#loading: +
#loading: +
#loading: +
#loading: +
#loading: +
#loading: +
#loading: +
#loading: +
#loading: .
#Receive load ++++++++++.
#Receive call ++++++++++.
#Compiling: +
#address s 0x7513e4
#var s ++++++++++.
#Compiling: +
#address s 0x7513e4
#var s ++++++++++.
#Compiling: +
#address s 0x7513e4
#var s ++++++++++.
#Compiling:
#address s 0x7513e4
#var s ßu
Error in bf code: caught BFCompilationError #4 in file bf_src/Hello.txt:
ßu
^
Unknown operatorß
Your load function is flawed. The const char* pointer returned by c_str() is valid only until the underlying std::string object exists. But data is a local variable in load and is cleared after return. Its buffer is not overwritten by zeroes but left as it were as free memory. Therefore printing out the value immediately after returning is likely to work but your program may put new values there and the value pointed by your pointer will change.
I suggest to use std::string as the return value of load as a workaround.

Boost regex don't match tabs

I'm using boost regex_match and I have a problem with matching no tab characters.
My test application looks as follows:
#include <iostream>
#include <string>
#include <boost/spirit/include/classic_regex.hpp>
int
main(int args, char** argv)
{
boost::match_results<std::string::const_iterator> what;
if(args == 3) {
std::string text(argv[1]);
boost::regex expression(argv[2]);
std::cout << "Text : " << text << std::endl;
std::cout << "Regex: " << expression << std::endl;
if(boost::regex_match(text, what, expression, boost::match_default) != 0) {
int i = 0;
std::cout << text;
if(what[0].matched)
std::cout << " matches with regex pattern!" << std::endl;
else
std::cout << " does not match with regex pattern!" << std::endl;
for(boost::match_results<std::string::const_iterator>::const_iterator it=what.begin(); it!=what.end(); ++it) {
std::cout << "[" << (i++) << "] " << it->str() << std::endl;
}
} else {
std::cout << "Expression does not match!" << std::endl;
}
} else {
std::cout << "Usage: $> ./boost-regex <text> <regex>" << std::endl;
}
return 0;
}
If I run the program with these arguments, I don't get the expected result:
$> ./boost-regex "`cat file`" "(?=.*[^\t]).*"
Text : This text includes some tabulators
Regex: (?=.*[^\t]).*
This text includes some tabulators matches with regex pattern!
[0] This text includes some tabulators
In this case I would have expected that what[0].matched is false, but it's not.
Is there any mistake in my regular expression?
Or do I have to use other format/match flag?
Thank you in advance!
I am not sure what you want to do. My understanding is, you want the regex to fail as soon as there is a tab in the text.
Your positive lookahead assertion (?=.*[^\t]) is true as soon as it finds a non tab, and there are a lot of non tabs in your text.
If you want it to fail, when there is a tab, go the other way round and use a negative lookahead assertion.
(?!.*\t).*
this assertion will fail as soon as it find a tab.