Received signal: Segmentation fault (11) while using Rocksdb - c++

I am using Rocksdb as database for the program(C++). For one of the use case, I am making a key of following format to store: key=< fix-prefix >< string-type-element >< foo-type >
I am getting the following error while accessing the "string-type-element" piece of data from the key:
Received signal: Segmentation fault (11)
The piece of code to iterate the stored data is something like:
auto prefix = // defined here
auto from = // defined here
auto to = // defined here
std::unique_ptr<rocksdb::Transaction> trans(db_.BeginTransaction(rocksdb::WriteOptions()));
rocksdb::ReadOptions opts;
opts.snapshot = trans->GetSnapshot();
std::unique_ptr<rocksdb::Iterator> iter(trans->GetIterator(opts));
iter->Seek(from);
for (; iter->Valid() && iter->key().compare(to) < 0; iter->Next())
{
if (iter->key().starts_with(prefix))
{
// This line of code is producing the error
const auto string-type-element = *reinterpret_cast<const string-type*>(iter->key().data() + prefix.size());
// some stuffs here
}
}
My Attempt:
As you can see in the piece of code above that I pointed out the buggy line. As the error is Segmentation fault (11) which usually means attempt to undefined/outside memory location, so my guess is reinterpret_cast is not able to deduce the size of "string-type-element" element as std::string is not a fixed size unlike "int" etc and it eventually end up accessing the memory portion it should not access...
I want to ask:
If it is actually the case that std::string cannot be extracted from in between of the key because of unknown size, what can we do about it?
Could there be any other reason for the issue I mentioned and how to approach it?

reinterpret_cast is a wildly dangerous tool to use and should only be used in extraordinary circumstances, which you don't have here. Look at the std::string constructors and you'll find one that accepts a pointer to a null-terminated character array. It's not clear what the type of iter->key().data() is, but if it is a null-terminated string, you could just change the line to:
const auto string-type {iter->key().data() + prefix.size()};
to initialize string-type from the part of the data after the prefix.
In your case, reinterpret_cast is trying to pretend that the data pointed to is a string object, which is more than just raw bytes of character type, but has other elements to the structure as well.

Related

Segmentation Fault when trying to copy string from one to another at particular lengths?

#include <iostream>
using namespace std;
int main() {
string s,s_new;
cin>>s;
int len=s.length();
cout<<len<<"\n";
for(int i=0;i<len;i++){
s_new[i]=s[i];
}
cout<<s[len-1]<<"\n";
cout<<s_new[len-1];
return 0;
}
I am trying to copy string 's' to another string 's_new'. String 's' is taken as an input from user.The code snippet outputs the string length for reference and the other two lines prints the
last character from 's' and 's_new'
But for a particular length this program is creating segmentation fault in various IDE.
For example, I got segmentation fault for length 25 in Ideone. Also in onlineGDB I got segmentation fault for length 1961.These lengths were not always constant in these IDE.
I was only using lower_case alphabetical characters for these strings and i used randomly generated strings of various lengths for these tests.
I didnot receive any error when I used character arrays for the same job.
I wanted to know if this issue was because of the code or If there is some problem in using the string STL like I did ?
s_new[i]=s[i]; has undefined behavior since
s_new.size() == 0.
You need to resize it to do what you are doing:
s_new.resize(len);
for(int i=0;i<len;i++){
s_new[i]=s[i];
}
for a particular length this program is creating segmentation fault in various IDE
Writing out-of-bounds always has undefined behavior. In this case, what happens is most likely this:
A std::string often uses small string optimization where the complete string data is stored internally in the std::string. For longer strings a pointer to the string data is used instead (but the user of the class won't notice this). When you do s_new[i] = ... and the string length is 0 and you pass the small string optimization threshold, you start overwriting other internal data in std::string or something else in the memory stored after the std::string.

Random behavior of an array of char arrays

I have an array of character arrays that is split based on a pipe ('|') operator (example below) and the function I am using to create this array seems to work on occasion, and then on occasion, it will create the array then abort and give me one of two different errors.
I am not sure what I am doing wrong? Particularly I am not sure why it creates successfully every time but then seems to break after creation about half the time, regardless of the input.
Example array:
"here is | an example | input" = {"here is", "an example", "input"}
Errors:
Error in './msh': malloc(): memory corruption (fast): 0x000...
Error in './msh': free(): invalid pointer: 0x0000....
Code:
char** allArgs = new char*[100];
void createArgArrays(const char* line) {
char* token;
token = strtok((char*)line, "|");
int i = 0;
while(token != NULL) {
allArgs[i] = token;
i++;
token = strtok(NULL, "|");
}
}
Where I call the code:
string input;
getline(cin, input);
createArgArrays(input.c_str());
Any insight/help is greatly appreciated.
c_str() returns a const char *. strtok() modifies the string it refers to.
Per http://www.cplusplus.com/reference/string/string/c_str/:
c++98
A program shall not alter any of the characters in this sequence.
Don't cast away const to force things to "work".
A few things:
The C++ way is sometimes different than the C way.
Andrew Henle's point about casting should be carved into stone tablets.
If you really want to use a C function, try walking the string using strchr().
Also, try something like std::vector<std::string> (see std::vector::push_back) to store your string chunks - it'll be a bit cleaner and avoids an arbitrary cap on the size of allArgs.
Another thing you could look at is boost::split(), which probably does exactly what you want anyway.

How to copy data into certain parts of a byte array

I want to create a byte array out of an unknown struct and add a number additionally in the front of this byte array. How do I do this?
I currently have this code:
template <class T>
void CopterConnection::infoToByteArray(char *&bit_data, size_t *msglen,
T data) {
// Determine which kind of element is in the array, will change in the final code
char typeID = -1;
*msglen = sizeof(data);
*msglen += 1; // take in account of typeID
// Create the pointer to the byte representation of the struct
bit_data = new char[*msglen];
// copy the information from the struct into the byte array
memcpy(bit_data, &data+1, *msglen-1);
bit_data[1] = typeID;
}
But this is not working. I guess I use the memcpy wrong. I want to copy the unkown struct T into the positions bit_data[1] to bit_data[*end*]. What is the best way to achieve this?
One possible problem and one definitive problem:
The possible problem is that array indexing starts at zero. So you should copy to bit_data + 1 to skip over the first byte, and then of course use bit_data[0] to set the type id.
The definitive problem is that &data + 1 is equal to (&data)[1], and that will be out of bounds and lead to undefined behavior. You should just copy from &data.
Putting it all together the last to lines should be
memcpy(bit_data + 1, &data, *msglen-1);
bit_data[0] = typeID;
There is another possible problem, which depends on what you're doing with the data in bit_data and what T is. If T is not a POD type then you simply can not expect a bitwise copy (what memcpy does) to work very well.
Also if T is a class or structure with members that are pointers then you can't save those to disk or transfer to another computer or even to another process on the same computer.
There are a few bugs in there, in addition to the fact you are messing around with new.
The memcpy line itself you use &data + 1 as the source which here will be undefined behaviour. It will add sizeof(data) bytes to the address which is copied so in the stack somewhere and whilst "one past the end" is a valid pointer so this address is valid in pointer arithmetic, nothing you read from it will be, nor anything after it.
bit_data[1] is the 2nd character in your buffer.

C++ rapidjson Error: free(): invalid next size (normal)

I am reading in data on JavaScript an pass the Jsonstring like that:{"data_size":500, "array":[0,0,0,0,..,0,0]} to the webserver. The numbers in the array could be anything between 0 to 4294967295.
On the Mongoose webserver I am using the lib rapidjson to work with the Jsonstring. Therefore, I create a Document d and reads values from the "jsonstring" into an uint32_t Array using this:
#include "rapidjson/document.h"
int i_data_size=0;
Document d;
conn->content[conn->content_len]=0; //to zero terminate
if (d.Parse(conn->content).HasParseError())
{
//Error
}
else
{
Value& s = d["data_size"];
i_data_size=s.GetInt();
uint32_t *Data=NULL;
Data=new uint32_t[i_data_size];
Value& a = d["array"];
for(SizeType i=0;i<a.Size();i++)
{
Data[i]=a[i].GetUint();
}
}
conn->content is containing the json char*.
When I am sending: {"data_size":500, "array":[0,0,0,0,..,0,0]} everything works find. But sometimes, not everytime, when the a number becomes greater, like this:
{"data_size":500, "array":[123,222,0,0,..,0,0]}
I get the Error:
free(): invalid next size (normal)
This error is not related to rapidJson it is from C, showing because something is messed up with memory management. C++, the heir to the language C, won't check anything regarding to memory management because it supposed to be fast and it will trust you with this. Because of that, the error you are getting won't just tell you what is wrong exactly. In your code, you are using an array but you are not managing its boundaries well, double check what you are doing regarding to reading and writing that unsigned array (check boundaries more carefully) or use a container like std::vector which will do that for you. Take a look at here:
http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c4027/C-Tutorial-A-Beginners-Guide-to-stdvector-Part-1.htm
Solved the problem!
Writing a zero out of the boundry was causing the error:
conn->content[conn->content_len]=0; //to zero terminate
Solved the problem by using a string instead:
string json="";
json = string(conn->content);
json=json.substr(0,conn->content_len);
if (d.Parse(json.c_str()).HasParseError())
{ ...

Splitting a std::string into two const char*s resulting in the second const char* overwriting the first

I am taking a line of input which is separated by a space and trying to read the data into two integer variables.
for instance: "0 1" should give child1 == 0, child2 == 1.
The code I'm using is as follows:
int separator = input.find(' ');
const char* child1_str = input.substr(0, separator).c_str(); // Everything is as expected here.
const char* child2_str = input.substr(
separator+1, //Start with the next char after the separator
input.length()-(separator+1) // And work to the end of the input string.
).c_str(); // But now child1_str is showing the same location in memory as child2_str!
int child1 = atoi(child1_str);
int child2 = atoi(child2_str); // and thus are both of these getting assigned the integer '1'.
// do work
What's happening is perplexing me to no end. I'm monitoring the sequence with the Eclipse debugger (gdb). When the function starts, child1_str and child2_str are shown to have different memory locations (as they should). After splitting the string at separator and getting the first value, child1_str holds '0' as expected.
However, the next line, which assigns a value to child2_str not only assigns the correct value to child2_str, but also overwrites child1_str. I don't even mean the character value is overwritten, I mean that the debugger shows child1_str and child2_str to share the same location in memory.
What the what?
1) Yes, I'll be happy to listen to other suggestions to convert a string to an int -- this was how I learned to do it a long time ago, and I've never had a problem with it, so never needed to change, however:
2) Even if there's a better way to perform the conversion, I would still like to know what's going on here! This is my ultimate question. So even if you come up with a better algorithm, the selected answer will be the one that helps me understand why my algorithm fails.
3) Yes, I know that std::string is C++ and const char* is standard C. atoi requires a c string. I'm tagging this as C++ because the input will absolutely be coming as a std::string from the framework I am using.
First, the superior solutions.
In C++11 you can use the newfangled std::stoi function:
int child1 = std::stoi(input.substr(0, separator));
Failing that, you can use boost::lexical_cast:
int child1 = boost::lexical_cast<int>(input.substr(0, separator));
Now, an explanation.
input.substr(0, separator) creates a temporary std::string object that dies at the semicolon. Calling c_str() on that temporary object gives you a pointer that is only valid as long as the temporary lives. This means that, on the next line, the pointer is already invalid. Dereferencing that pointer has undefined behaviour. Then weird things happens, as is often the case with undefined behaviour.
The value returned by c_str() is invalid after the string is destructed. So when you run this line:
const char* child1_str = input.substr(0, separator).c_str();
The substr function returns a temporary string. After the line is run, this temporary string is destructed and the child1_str pointer becomes invalid. Accessing that pointer results in undefined behavior.
What you should do is assign the result of substr to a local std::string variable. Then you can call c_str() on that variable, and the result will be valid until the variable is destructed (at the end of the block).
Others have already pointed out the problem with your current code. Here's how I'd do the conversion:
std::istringstream buffer(input);
buffer >> child1 >> child2;
Much simpler and more straightforward, not to mention considerably more flexible (e.g., it'll continue to work even if the input has a tab or two spaces between the numbers).
input.substr returns a temporary std::string. Since you are not saving it anywhere, it gets destroyed. Anything that happens afterwards depends solely on your luck.
I recommend using an istringstream.