Another C++ strange segmentation fault by object creation - c++

I've recently encountered a problem in c++ object creation. The problem is somewhat like it in question C++ strange segmentation fault by object creation, however the codes here are part of an open source project and may not have easy errors.
The object creation is called in a method and the method is called in two continuous steps.
The class is defined in strtokenizer.h as follows:
class strtokenizer {
protected:
vector<string> tokens;
int idx;
public:
strtokenizer(string str, string seperators = " ");
void parse(string str, string seperators);
int count_tokens();
string next_token();
void start_scan();
string token(int i);
};
And in strtokenizer.cpp, it is like this:
using namespace std;
strtokenizer::strtokenizer(string str, string seperators) {
parse(str, seperators);
}
void strtokenizer::parse(string str, string seperators) {
int n = str.length();
int start, stop;
if (flag) {
printf("%d\n", n);
}
start = str.find_first_not_of(seperators);
while (start >= 0 && start < n) {
stop = str.find_first_of(seperators, start);
if (stop < 0 || stop > n) {
stop = n;
}
tokens.push_back(str.substr(start, stop - start));
start = str.find_first_not_of(seperators, stop + 1);
}
start_scan();
}
int strtokenizer::count_tokens() {
return tokens.size();
}
void strtokenizer::start_scan() {
idx = 0;
return;
}
string strtokenizer::next_token() {
if (idx >= 0 && idx < tokens.size()) {
return tokens[idx++];
} else {
return "";
}
}
string strtokenizer::token(int i) {
if (i >= 0 && i < tokens.size()) {
return tokens[i];
} else {
return "";
}
}
The method that create the strtokenizer objects is as follows:
int dataset::read_wordmap(string wordmapfile, mapword2id * pword2id) {
pword2id->clear();
FILE * fin = fopen(wordmapfile.c_str(), "r");
if (!fin) {
printf("Cannot open file %s to read!\n", wordmapfile.c_str());
return 1;
}
char buff[BUFF_SIZE_SHORT];
string line;
fgets(buff, BUFF_SIZE_SHORT - 1, fin);
int nwords = atoi(buff);
for (int i = 0; i < nwords; i++) {
fgets(buff, BUFF_SIZE_SHORT - 1, fin);
line = buff;
strtokenizer strtok(line, " \t\r\n");
if (strtok->count_tokens() != 2) {
continue;
}
pword2id->insert(pair<string, int>(strtok->token(0), atoi(strtok->token(1).c_str())));
}
fclose(fin);
return 0;
}
When the read_wordmap() method is run for the first time (first read_wordmap() call), the 'strtok' object is created about 87k times and in the second time (second read_wordmap() call), the oject is expected to be run for more than 88k times. However, it will raise a error (sometime 'segmentation fault' and sometimes 'memory corruption (fast)') at about 86k times in the second method call, at the line:
strtokenizer strtok(line, " \t\r\n");
And when the code block of object creation is revised like those below, there will be no errors.
strtokenizer *strtok = new strtokenizer(line, " \t\r\n");
printf("line: %s", line.c_str());
if (strtok->count_tokens() != 2) {
continue;
}
pword2id->insert(pair<string, int>(strtok->token(0), atoi(strtok->token(1).c_str())));

It look like you have a memory corruption in your code. You should consider using a tool like valgrind (http://valgrind.org/) to check that the code does not write out of bounds.
Your revised code use heap memory instead of stack memory, which may hide the problem (even if it still exists).
By reading your code, there is several missing tests to ensure safe handling in case the provided wordmapfile has some unexpected data.
For example you do not check the result of fgets, so if the number of words at the begining of the file is bigger than the real number of words, you will have issues.

I carefully debugged my code under the suggestion of #Paul R and other friends and found it is because I haven't free memory in stack.
The codes proposed above are tiny parts of my project, and in the project a gibbs sampling algorithm is supposed to run for one thousand times(iterations).
In each iteration, old matrixes are supposed to be freed and new ones are to be "newed out". However, I forgot to free all the matrix and lists, and that's why my program corrupts.
The reason why I posted codes above is that the program will crash every time when it ran into the line:
strtokenizer strtok(line, " \t\r\n");
The object "strtok" will be run for 1000 * lines in files(with 10000+ lines). So it made me think maybe there are too many objects created and take up all of the stack memory. Even though I found there are no need to manually free them.
When debugged the program in visual studio, the monitor of memory occupancy showed a dramatic growth in each iteration and "bad_alloc" error took place every now and then. These made me realize that I forget to free some large dynamic matrix.
Thanks for you all!
And I apologise for the wrongly described question that takes up your time!

Related

Looking for an error in a fairly basic C++ program I've written which deals with csv files

Can't seem to figure out why exactly this program won't work. It is supposed to store data from a csv file into a structure called SurnameInfo (when used with a loop that iterates through each line) but whenever I run it it gets to line 1280 of 151671 of the csv file, crashes, and gives the windows "program.exe has stopped working" popup. Anyone see anything that might cause this? Thanks!!
#include <iostream>
#include <fstream>
#include <cstring>
#include <cstdlib>
using namespace std;
const int MAXLINE = 1000;
const int MAXARRAY = 1000;
int numberOfNames;
struct SurnameInfo
{
char *name;
int count;
float pctrace[6];
};
SurnameInfo*surnames[MAXARRAY];
void processLine(char *line, int n)
{
surnames[n] = new SurnameInfo; //allocate memory
char * pch = strtok(line, ",");//start tokenizing
int len = strlen(pch); // name length
surnames[n]->name = new char[len+1]; //allocate memory
strcpy(surnames[n]->name, pch); // copy name
surnames[n]->count = atoi(strtok(NULL, ","));//get count
for (int i = 0; i < 6; i++)
{
pch = strtok(NULL, ",");
surnames[n]->pctrace[i] = pch[0] == '(' ? -1 : atof(pch);
}
}
void readLines()
{
char line[MAXLINE];
ifstream inputfile;
inputfile.open("names.csv");
if (!inputfile) return; // can't open
inputfile.getline(line, MAXLINE); //skip title
inputfile.getline(line, MAXLINE);
numberOfNames = 0;
while (!inputfile.eof()) //not end of file
{
processLine(line, numberOfNames++);
inputfile.getline(line, MAXLINE);
}
inputfile.close();
}
int main() {
readLines();
return 0;
}
I see a discrepancy in the code and the stuff that you are talking.
const int MAXARRAY = 1000; && SurnameInfo*surnames[MAXARRAY]; goes against 151671 of the csv file.
You are allocating 1000 and trying to push more to the heap unattended which means it starts eating the memory allocated to the program itself. Or it tries to access the area which it is not supposed to (may be program area of some other process is allocated), and thus pushes out a Segmentation Fault
Also, you need to have a way to destruct the Surnames that are dynamically fed.
My Suggestion :
Approach 1 : Read through the file first and get the number of lines. Allocate the respective memory to Surnames and proceed the way you are.
Though it requires one additional scan of file, but would solve your purpose. Time complexity goes very high if the file size is high.(May be you can cache stuff while reading , use vector?? (think on that))
Approach 2 : Implement a functionality similar to resize of vector.On every new addition to the Surnames, Free the previously allocated memory on heap and reallocate with the higher memory spec by deep copying and inserting new info.
Also,
surnames[n]->pctrace[i] = pch[0] == '(' ? -1 : atof(pch);
I am not very sure whether this would work correctly or not. Just for the sake of safety and more clear code, put that up in parenthesis . Something like this
surnames[n]->pctrace[i] = ((pch[0] == '(') ? -1 : atof(pch));
If this is your one of the first attempts on C++, this is nicely done. Cheers.
Hope the answer helps.

Valgrind result explaination

i have a part of code in a bigger program that when and when i use the part that i ran trough valgrind i get seg faults later in the program.
I isolated the call to function that does some TAG manipulations. Can someone explain to me if this output is problematic?
[http://pastebin.com/5J6PHxSs][1]
and this is the code:
std::string Tag::GetDataAsString()
{
if(IsDataAvailable() <=0)
return "";
std::string retData="";
std::vector<byte>& tmp = *this->Data;
for(int i=0, length = tmp.size(); i!=length; ++i)
{
if(tmp[i] != 0x00)
retData+=tmp[i];
else
break;
}
return retData;
}
Edit 1
To explain:
Data is: std::vector* Data which is allocated when needed.
std::vector& tmp = *this->Data; // this is only that they probably did not use () :D
I'm just trying to make some code work that someone else wrote.
What i know that in the main() method Settings::GetInstance() is called which reads from file some data that is written in tags. Tag is a class that does manipulation of tags, which can be nested or contain data, thats why i believe there is a dynamic alocation of Data and Vector of sub tags.
So what happens, there is a method:
LoadFromTaggedDump( TagCollection & coll )
{
SomeData scr;
Tag* someTag;
std::vector<SomeData*>& tmpVect = *_localizedSomeDb;
for(int id = 0, dbLen = tmpVect.size(); id != dbLen; ++id)
{
Tag* tmp;
someTag= coll.GetTag(SOME_TAG | id);
if(someTag== NULL)
continue;
tmpVect[id]->SomeDat[i]=tmp->GetDataAsString();
}
}
So valgrind finds some losses in GetDataAsString() which is enclosed and should be all cleared.
And i don't understand why there would something be lost in the operator new?

Segmentation fault : Address out of bounds for a pointer in C

I am trying to build and run some complicated code that was written by someone else, I don't know who they are and can't ask them to help. The code reads a bpf (brain potential file) and converts it to a readable ascii format. It has 3 C files, and 2 corresponding header files. I got it to build successfully with minor changes, however now it crashes
with a segmentation fault.
I narrowed the problem down to FindSectionEnd() (in ReadBPFHeader.c) and find that the error occurs when sscanfLine() (in the file sscanfLine.c) is called (code for both is below).
ui1 is defined as unsigned char.
si1 is defined as char.
Just before returning from sscanfLine(), the address pointed to by dp is 0x7e5191, or something similar ending with 191. However, on returning to FindSectionEnd(), dp points to 0x20303035 and it says 'Address 0x20303035 is out of bounds', which then causes a fault at strstr(). The loop in FindSectionEnd() runs without problem for 14 iterations before the fault occurs. I have no idea what is going wrong. I really hope the information I have given here is adequate.
ui1 *FindSectionEnd(ui1 *dp)
{
si1 Line[256], String[256];
int cnt=0;
while (sscanfLine(dp, Line) != EOF){
dp = (ui1 *)strstr(dp, Line);
dp+= strlen(Line);
sscanf(Line,"%s",String);
if(SectionEnd(String))
return(dp);
}
return(NULL);
}
si1 *sscanfLine(ui1 *dp, si1 *s)
{
int i = 0;
*s = NULL;
int cnt = 0;
while (sscanf(dp, "%c", s + i) != EOF){
cnt++;
dp++;
if(*(s + i) == '\n') {
*(s + i + 1) = '\0';
return s;
}
++i;
}
*(s + i) = '\0';
return s;
}
The sscanfLine function doesn't respect the size of the buffer passed in, and if it doesn't find '\n' within the first 256 bytes, happily trashes the stack next to the Line array.
You may be able to work around this by making Line bigger.
If you're going to improve the code, you should pass the buffer size to sscanfLine and make it stop when the count is reached even if a newline wasn't found. While you're at it, instead of returning s, which the caller already has, make sscanfLine return the new value of dp, which will save the caller from needing to use strstr and strlen.
My first guess would be that your string is not null terminated and strstr() segfaults because it reads past the boundaries of the array

Why does adding a cout line to this method stop my program from tripping a buffer overflow exception in Windows?

I have an object with the following method:
int PathSubstitution::updateField(Field4memo &field, int record_id, int field_id) const
{
int type = field.type();
if((type == r4str) || (type == r4memo) || (type == r4unicode))
{
string value = field.str();
trim(value);
if(!substituteDriveLetters(value))
return -1;
if(!substituteGridMount(value))
return -1;
return field.assign(value.c_str(), value.length());
}
return r4success;
}
When I build this code with my Debug profile in Visual Studio C++ 2010 everything works just fine. This method gets called 4 times, on four unique Field4memo objects, and it works.
When I build this code with my Release profile the method works the first time it's called, but causes Vista Enterprise to display a "program.exe has stopped working" dialog window. The "View problem details" area of the window says:
Problem signature:
Problem Event Name: BEX
Application Name: program.exe
Application Version: 0.0.0.0
Application Timestamp: 4ef4edc6
Fault Module Name: program.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 4ef4edc6
Exception Offset: 0000668a
Exception Code: c0000409
Exception Data: 00000000
OS Version: 6.0.6002.2.2.0.256.4
Locale ID: 1033
Additional Information 1: 6243
Additional Information 2: 0d5daf38e26c963685a835e6f40ff03d
Additional Information 3: aa53
Additional Information 4: 5d02a603659cce53ff840117c3a9c7a7
The BEX event name indicates a buffer overflow. But which buffer, I cannot tell.
Here's where it gets weird for me though...
When I change this method and add an unnecessary cout line to it, it works with the Release profile:
int PathSubstitution::updateField(Field4memo &field, int record_id, int field_id) const
{
int type = field.type();
if((type == r4str) || (type == r4memo) || (type == r4unicode))
{
// THIS IS THE NEW LINE I ADDED RIGHT BELOW HERE!!!
cout << endl;
string value = field.str();
trim(value);
if(!substituteDriveLetters(value))
return -1;
if(!substituteGridMount(value))
return -1;
return field.assign(value.c_str(), value.length());
}
return r4success;
}
I can't tell why the method crashes with the Release profile or why adding the cout line resolves the crashing issue. I'm uncomfortable just accepting the "cout fixes it" answer -- can someone help me understand what my problem is here and why the cout fixes it? How does the cout call save me from a buffer overflow here?
Edit: some additional context for the call to this method was asked for. It's called in a loop. With the test input I'm using, it's called 4 times. The function that calls it looks like so:
int PathSubstitution::updateRecord(Data4 &dbf, int record_id) const
{
// Update all fields
int numFields = dbf.numFields();
for(int i = 1; i <= numFields; i++ )
{
Field4memo field(dbf, i);
int rc = updateField(field, record_id, i);
if(rc != r4success)
return rc;
}
return r4success;
}
Edit 2: Flushing the cout buffer also fixes the overflow problem as long as cout.flush() is called from within the PathSubstitution::updateField method and before the return field.assign(value.c_str(), value.length()); line.
Edit 3: This is promising. If I comment out the calls the substituteDriveLetters() and substituteGridMount() methods the program doesn't crash. So it's something to do with those method calls (which use pcre to do some regular expression string substitutions).
Edit 4: If I comment out just the substituteDriveLetters() method it works. So I've got a prime suspect now. This method is supposed to replace a drive letter in a path with the corresponding UNC value. None of the fields in my test input are file paths so this should be a null op as far as data transformation is concerned.
bool PathSubstitution::substituteDriveLetters(string &str, string::size_type offset) const
{
int offsets[6];
int groups = pcre_exec(drivePattern, NULL, str.c_str(), str.size(), 0, 0, offsets, sizeof(offsets));
if(groups < 0)
{
switch(groups)
{
case PCRE_ERROR_NOMATCH:
case PCRE_ERROR_PARTIAL:
return true;
case PCRE_ERROR_NOMEMORY:
cerr << "WARNING: Out of memory." << endl;
break;
case PCRE_ERROR_BADUTF8:
case PCRE_ERROR_BADUTF8_OFFSET:
cerr << "WARNING: Bad UNICODE string." << endl;
break;
default:
cerr << "WARNING: Unable to substitute drive letters (Err: " << groups << ")" << endl;
break;
}
return false;
}
char driveLetter = toupper(str[offsets[2]]);
DriveMap::const_iterator i = driveMap.find(driveLetter);
if(i == driveMap.end())
{
cerr << "ERROR: The " << driveLetter << " drive is not mapped to a network share." << endl;
return false;
}
string::iterator start = str.begin() + offsets[0];
string::iterator end = str.begin() + offsets[1];
str.replace(start, end, i->second);
return substituteDriveLetters(str, offsets[1]);
}
Without a complete test case it's almost impossible to say what the exact problem is but given the behaviour, it is highly likely that your code has some form of undefined behavior and works in debug and with the extra cout statement through blind luck.
You should analyse the code and fix the underlying issue otherwise it's highly likely that a related bug will recur at the least convenient moment.
If you want help analysing the actual problem then you need to post a complete compilable example. At the moment we don't know anything about Field4memo, trim, substituteDriveLetters or substituteGridMount for starters.
Edit You may want to insert more checks for the string operations that you perform.
// need to check that offsets[2] >= 0 and < str.size()
char driveLetter = toupper(str[offsets[2]]);
// need to check that offsets[0] >= 0 and <= str.size()
string::iterator start = str.begin() + offsets[0];
// need to check that offsets[1] >= 0 and <= str.size()
string::iterator end = str.begin() + offsets[1];

How can text in my main C++ file (in code that hasn't executed yet) show up in a string?

I'm new to C++ so there's a lot I don't really understand, I'm trying to narrow down how I'm getting exc_bad_access but my attempts to print out values seems to be aggravating (or causing) the problem!
#include <iostream>
#include "SI_Term.h"
#include "LoadPrefabs.h"
int main() {
SI_Term * velocity = new SI_Term(1, "m/s");
std::cout<<"MAIN: FIRST UNITS "<<std::endl;
velocity->unitSet()->displayUnits();
return 0;
}
The above code produces an error (EXC_BAD_ACCESS) before the std::cout<< line even occurs. I traced it with xcode and it fails within the function call to new SI_Term(1, "m/s").
Re-running with the cout line commented out it runs and finishes. I would attach more code but I have a lot and I don't know what is relevant to this line seeming to sneak backwards and overwrite a pointer. Can anyone help me with where to look or how to debug this?
NEW INFO:
I narrowed it down to this block. I should explain at this point, this block is attempting to decompose a set of physical units written in the format kg*m/s^2 and break it down into kg, m, divide by s * s. Once something is broken down it uses LoadUnits(const char*) to read from a file. I am assuming (correctly at this point) that no string of units will contain anywhere near my limit of 40 characters.
UnitSet * decomposeUnits(const char* setOfUnits){
std::cout<<"Decomposing Units";
int i = 0;
bool divide = false;
UnitSet * nextUnit = 0;
UnitSet * temp = 0;
UnitSet * resultingUnit = new UnitSet(0, 0, 0, 1);
while (setOfUnits[i] != '\0') {
int j = 0;
char decomposedUnit[40];
std::cout<<"Wiped unit."<<std::endl;
while ((setOfUnits[i] != '\0') && (setOfUnits[i] != '*') && (setOfUnits[i] != '/') && (setOfUnits[i] != '^')) {
std::cout<<"Adding: " << decomposedUnit[i]<<std::endl;
decomposedUnit[j] = setOfUnits[i];
++i;
++j;
}
decomposedUnit[j] = '\0';
nextUnit = LoadUnits(decomposedUnit);
//The new unit has been loaded. now check for powers, if there is one read it, and apply it to the new unit.
//if there is a power, read the power, read the sign of the power and flip divide = !divide
if (setOfUnits[i] == '^') {
//there is a power. Analize.
++i;++j;
double power = atof(&setOfUnits[i]);
temp = *nextUnit^power;
delete nextUnit;
nextUnit = temp;
temp = 0;
}
//skip i and j till the next / or * symbol.
while (setOfUnits[i] != '\0' && setOfUnits[i] != '*' && setOfUnits[i] != '/') {
++i; ++j;
}
temp = resultingUnit;
if (divide) {
resultingUnit = *temp / *nextUnit;
} else {
resultingUnit = *temp * *nextUnit;
}
delete temp;
delete nextUnit;
temp = 0;
nextUnit = 0;
// we just copied a word and setOfUnits[i] is the multiply or divide or power character for the next set.
if (setOfUnits[i] == '/') {
divide = true;
}
++i;
}
return resultingUnit;
}
I'm tempted to say that SI_Term is messing with the stack (or maybe trashing the heap). Here's a great way to do that:
char buffer[16];
strcpy(buffer, "I'm writing too much into a buffer");
Your function will probably finish, but then wreak havoc. Check all arrays you have on the stack and make sure you don't write out of bounds.
Then apply standard debugging practices: Remove code one by one until it doesn't crash anymore, then start reinstating it to find your culprit.
You are mentioning xcode, so I assume you're on a MAC. I'D then suggest looking at the valgrind tool from http://valgrind.org/ That's a memory checker giving you information when yo're doing something wrong with memory. If your program was build including debugging symbols it should give you an stacktrace helping you to find the error.
Here, I removed the unimportant stuff:
while (setOfUnits[i] != '\0') {
while ((setOfUnits[i] != '\0') && (setOfUnits[i] != '*') && (setOfUnits[i] != '/') && (setOfUnits[i] != '^')) {
...
++i;
}
...
nextUnit = LoadUnits(decomposedUnit);
...
if (...) {
double power = ...;
temp = *nextUnit^power;
delete nextUnit;
}
....
temp = resultingUnit;
delete temp;
delete nextUnit;
...
++i;
}
There are a number of problems with this:
In the inner-loop, you increment i until setOfUnits[i] == '\0', the end of the string. Then you increment i again, past the end of the string.
nextUnit is of type UnitSet, which presumably overloads ^. Though it's possible that it overloads it to mean "exponentiation", it probably doesn't (and if it does, it shouldn't): in C-based languages, including C++, ^ means XOR, not exponentiation.
You are deleting pointers returned from other functions - that is, you have functions that return dynamically-allocated memory, and expect the caller to delete that memory. While not incorrect, and in fact common practice in C, it is considered bad practice in C++. Just have LoadUnits() return a UnitSet (rather than a UnitSet*), and make sure to overload the copy constructor and operator= in the UnitSet class. If performance then becomes a concern, you could return a const UnitSet& instead, or use smart pointers.
In similar vein, you are allocating and deleting inside the same function. There is no need for this: just make resultingUnit stack-allocated:
UnitSet resultingUnit(0, 0, 0, 1);
I know that last bullet-point sounds very confusing, but once you finally come to understand it, you'll likely know more about C++ than 90% of coders who claim to "know" C++. This site and this book are good places to start learning.
Good luck!