I am using Ubuntu Linux to write two programs. I am attempting to change the value of an integer from another process. My first process (A) is a simple program that loops forever and displays the value to the screen. This program works as intended and simply displays the value -1430532899 (0xAABBCCDD) to the screen.
#include <stdio.h>
int main()
{
//The needle that I am looking for to change from another process
int x = 0xAABBCCDD;
//Loop forever printing out the value of x
int counter = 0;
while(1==1)
{
while(counter<100000000)
{
counter++;
}
counter = 0;
printf("%d",x);
fflush(stdout);
}
return 0;
}
In a separate terminal, I use the ps -e command to list the processes and note the process id for process (A). Next as root use (sudo) I run this next program (B) and enter in the process ID that I noted from process (A).
The program basically searches for the needle which is in memory backwards (DD CC BB AA) find the needle, and takes note of the address. It then goes and tries to write the hex value (0xEEEEEEEE) to that same location, but I get a bad address error when errno is set to 14. The strange thing is a little later in the address space, I am able to write the values successfully to the address (0x601000) but the address where the needle(0xAABBCCDD) is at 0x6005DF I cannot write there. (But can read obviously because that is where I found the needle)
#include <stdio.h>
#include <iostream>
#include <sys/uio.h>
#include <string>
#include <errno.h>
#include <vector>
using namespace std;
char getHex(char value);
string printHex(unsigned char* buffer, int length);
int getProcessId();
int main()
{
//Get the process ID of the process we want to read and write
int pid = getProcessId();
//Lists of addresses where we find our needle 0xAABBCCDD and the addresses where we simply cannot read
vector<long> needleAddresses;
vector<long> unableToReadAddresses;
unsigned char buf1[1000]; //buffer used to store memory values read from other process
//Number of bytes read, also is -1 if an error has occurred
ssize_t nread;
//Structures used in the process_vm_readv system call
struct iovec local[1];
struct iovec remote[1];
local[0].iov_base = buf1;
local[0].iov_len = 1000;
remote[0].iov_base = (void * ) 0x00000; //start at address 0 and work up
remote[0].iov_len = 1000;
for(int i=0;i<10000;i++)
{
nread = process_vm_readv(pid, local, 1, remote, 1 ,0);
if(nread == -1)
{
//errno is 14 then the problem is "bad address"
if(errno == 14)
unableToReadAddresses.push_back((long)remote[0].iov_base);
}
else
{
cout<<printHex(buf1,local[0].iov_len);
for(int j=0;j<1000-3;j++)
{
if(buf1[j] == 0xDD && buf1[j+1] == 0xCC && buf1[j+2] == 0xBB && buf1[j+3] == 0xAA)
{
needleAddresses.push_back((long)(remote[0].iov_base+j));
}
}
}
remote[0].iov_base += 1000;
}
cout<<"Addresses found at...";
for(int i=0;i<needleAddresses.size();i++)
{
cout<<needleAddresses[i]<<endl;
}
//How many bytes written
int nwrite = 0;
struct iovec local2[1];
struct iovec remote2[1];
unsigned char data[] = {0xEE,0xEE,0xEE,0xEE};
local2[0].iov_base = data;
local2[0].iov_len = 4;
remote2[0].iov_base = (void*)0x601000;
remote2[0].iov_len = 4;
for(int i=0;i<needleAddresses.size();i++)
{
cout<<"Attempting to write "<<printHex(data,4)<<" to address "<<needleAddresses[i]<<endl;
remote2[0].iov_base = (void*)needleAddresses[i];
nwrite = process_vm_writev(pid,local2,1,remote2,1,0);
if(nwrite == -1)
{
cout<<"Error writing to "<<needleAddresses[i]<<endl;
}
else
{
cout<<"Successfully wrote data";
}
}
//For some reason THIS will work
remote2[0].iov_base = (void*)0x601000;
nwrite = process_vm_writev(pid,local2,1,remote2,1,0);
cout<<"Wrote "<<nwrite<<" Bytes to the address "<<0x601000 <<" "<<errno;
return 0;
}
string printHex(unsigned char* buffer, int length)
{
string retval;
char temp;
for(int i=0;i<length;i++)
{
temp = buffer[i];
temp = temp>>4;
temp = temp & 0x0F;
retval += getHex(temp);
temp = buffer[i];
temp = temp & 0x0F;
retval += getHex(temp);
retval += ' ';
}
return retval;
}
char getHex(char value)
{
if(value < 10)
{
return value+'0';
}
else
{
value = value - 10;
return value+'A';
}
}
int getProcessId()
{
int data = 0;
printf("Please enter the process id...");
scanf("%d",&data);
return data;
}
Bottom line is that I cannot modify the repeating printed integer from another process.
I can see at least these problems.
No one guarantees there's 0xAABBCCDD anywhere in the writable memory of the process. The compiler can optimize it away entirely, or put in in a register. One way to enssure a variable will be placed in the main memory is to declare it volatile.
volatile int x = 0xAABBCCDDEE;
No one guarantees there's no 0xAABBCCDD somewhere in the read-only memory of the process. On the contrary, one could be quite certain there is in fact such a value there. Where else could the program possibly obtain it to initialise the variable? The initialisation probably translates to an assembly instruction similar to this
mov eax, 0xAABBCCDD
which, unsurprisingly, contains a bit pattern that matches 0xAABBCCDD. The address 0x6005DF could well be in the .text section. It is extremely unlikely it is on the stack, because stack addresses are typically close to the top of the address space.
The address space of a 64-bit process is huge. There is no hope to traverse it all in a reasonable amount of time. One needs to limit the range of addresses somehow.
Related
I have a file.txt with the following values:
1234
567
8910
I want to create a program in c++, that creates x number of child threads with x being the number of lines in "file.txt". The child thread receives the line so it splits the line into digits and stores them in an array.
Then I want to create y number of grandchild threads under each child thread with y being the number of digits in that child thread array or received line value and pass each grandchild thread a single digit.
For example, for the above file.txt, my parent thread in main() will create 3 child threads. The first child thread will receive "1234", second "567", third "8910".
Then the first child will create 4 grandchild threads and pass the first grandchild thread "1", second "2", third "3", fourth "4".
Similarly, the second child will create 3 grandchild threads and pass the "5", "6", "7" digits.
And lastly, the third child will create 4 grandchild threads and pass the "8", "9", "1", "0" digits.
I want to pass all these values using multi-threading in parallel. I am able to get everything working fine till the child thread, and I am able to create the right amount of grandchild threads, but I am unable to pass them the values. Any help and guidance are much appreciated.
Here is my code: (please guide me to fix it)
#include <iostream>
#include <pthread.h>
#include <math.h>
#include <fstream>
#include <sys/wait.h>
#include <cstdlib>
#include <unistd.h>
using namespace std;
struct info {
int* digit = new int;
int* totalDigits = new int;
};
struct grandchildThreadData {
int GCTDdigit;
int GCTDgrandchildIndex;
int GCTDchildIndex;
info* GCTDinfo = new info;
info* GCTDparentInfo = new info;
};
struct childThreadData {
long int CTDlineValue;
int CTDchildIndex;
info* CTDinfo = new info;
};
// thread declaration
void* childThread(void*);
void* grandchildThread(void*);
// function to convert and store line to digits
int* digitSeparator(long int);
// MAIN PROGRAM
int main() {
// FILE OPENING
int totalLines{0};
ifstream file1("file.txt"), file2("file.txt");
string temp;
while (getline(file1, temp)) {
totalLines++;
}
file1.close();
long int* valueOnLine = new long int[totalLines];
int i =0;
while (!file2.eof()) {
file2 >> valueOnLine[i];
i++;
}
file2.close();
// FILE CLOSING
// THREAD START
static struct info* mainInfo = new info[totalLines];
pthread_t* child = new pthread_t[totalLines];
// will be used to pass values to child
static struct childThreadData* Carg = new childThreadData[totalLines];
// Creating Childthreads
for (int i = 0; i < totalLines; i++) {
Carg[i].CTDinfo[i] = mainInfo[i];
Carg[i].CTDchildIndex = i;
Carg[i].CTDlineValue = valueOnLine[i];
if (pthread_create(&child[i], nullptr, childThread, &Carg[i])) {
fprintf(stderr, "Error creating thread\n");
return 1;
}
}
// Joining Childthreads
for (int i = 0; i < totalLines; i++) {
if (pthread_join(child[i], nullptr)) {
fprintf(stderr, "Error joining thread\n");
return 2;
}
}
delete[] valueOnLine;
delete[] child;
delete[] Carg;
return 0;
}
void* childThread(void* i) {
struct childThreadData* CTptr = (struct childThreadData*)i;
int totalDigits = log10((float)CTptr->CTDlineValue) + 1;
int* numberArray = digitSeparator(CTptr->CTDlineValue);
// THIS LINE WILL PUT TOTAL DIGITS INTO mainInfo
CTptr->CTDinfo[CTptr->CTDchildIndex].totalDigits[0] = totalDigits;
static struct info* childInfo = new info[totalDigits]; // This can be used to print modified info in grandchild
pthread_t* grandchild = new pthread_t[totalDigits];
static struct grandchildThreadData* GCarg = new grandchildThreadData[totalDigits];
// THIS LINE WILL PUT EACH DIGIT ON CORRECT LOCATION of mainInfo
for (int i=0; i< totalDigits; i++) {
CTptr->CTDinfo[CTptr->CTDchildIndex].digit[i] = numberArray[i];
}
// GRANDCHILD THREAD
for (int i = 0; i < totalDigits; i++) {
GCarg[i].GCTDinfo[i] = childInfo[i]; // grandchild to child communication but does not work
// or
GCarg[i].GCTDparentInfo[i] = CTptr->CTDinfo[i]; // grandchild to parent communication but does not work
GCarg[i].GCTDgrandchildIndex = CTptr->CTDchildIndex; // Here CTptr->CTDchildIndex should pass 0, 1, 2 to grandchild but I get different values
GCarg[i].GCTDchildIndex = i; // This line works fine for some reason
GCarg[i].GCTDdigit = CTptr->CTDinfo[CTptr->CTDchildIndex].digit[i]; // This line should pass the correct digit, but again, I am getting different results in grandchild
if (pthread_create(&grandchild[i], nullptr, grandchildThread, &GCarg[i])) {
fprintf(stderr, "Error creating thread\n");
}
}
//Joining GrandChildthreads
for (int i = 0; i < totalDigits; i++) {
if (pthread_join(grandchild[i], nullptr)) {
fprintf(stderr, "Error joining thread\n");
}
}
return nullptr;
}
void* grandchildThread(void* i) {
struct grandchildThreadData* GCTptr = (struct grandchildThreadData*)i;
// THIS LINE SHOULD PRINT THE DIGIT
cout << GCTptr->GCTDdigit;
return nullptr;
}
int* digitSeparator(long int number) {
int totalDigits = log10((float)number) + 1;
int* separatorPtr = new int[totalDigits];
int j = 0;
for (int i = totalDigits - 1; i >= 0; i--) {
long int divisor = pow((float)10, i);
long int digit = number / divisor;
number -= digit * divisor;
separatorPtr[j] = digit;
j++;
}
return separatorPtr;
}
The issue was caused by using static structs:
So, if we replace:
static struct childThreadData* Carg = new childThreadData[totalLines];
static struct grandchildThreadData* GCarg = new grandchildThreadData[totalLines];
with:
childThreadData* Carg = new childThreadData[totalLines];
grandchildThreadData* GCarg = new grandchildThreadData[totalLines];
It will work fine.
This is not an exhaustive list of the problems in the program but it deals some fundamental problems.
Your program does not check if file.txt is successfully opened. If it's not, the program will continue to run until it gets a signed integer overflow - which means that the program has undefined behavior.
If the file is successfully opened, you'll run into heap-buffer-overflow
// Creating Childthreads
for(int i = 0; i < totalLines; i++) {
Carg[i].CTDinfo[i] = mainInfo[i]; // <- here
That's because CTDinfo[i] is out of bounds when i > 0. CTDinfo is a pointer to one single info - not totalLines of infos.
Same thing below. CTDinfo can't be treated as an array since it's a pointer to a single info - and digit is a pointer to a single int - not an array of int.
// THIS LINE WILL PUT EACH DIGIT ON CORRECT LOCATION of mainInfo
for(int i = 0; i < totalDigits; i++) {
CTptr->CTDinfo[CTptr->CTDchildIndex].digit[i] = numberArray[i];
}
Suggestions:
Write a much smaller program to get the hang of threading.
Write a much smaller program to learn how to build up very complicated structures (if you are really required to do so). Use C++ classes, like std::vector. I doubt this program would need a single new/new[] if you replaced all the manual memory management with standard C++ containers. Things like info looks like there's been a misunderstanding. There is no reason why pointers should be used here:
struct info {
int* digit = new int;
int* totalDigits = new int;
};
Use std::thread and its support functions and classes instead of the platform specific C API in pthread.
I have a small example program here for the particle photon that has a memory bug that I cannot figure out.
What it does: loads up a buffer with small string chunks, converts that large buffer back into a string. Then it creates a bunch of objects that are only wrappers for small chunks of buffer. It does this repetitively, and I don't allocate any new memory after the setup(), yet the memory goes down slowly until it crashes.
main.cpp
includes, variable declarations
#include "application.h" //needed when compiling spark locally
#include <string>
#include <unordered_map>
#include "dummyclass.h"
using namespace std;
SYSTEM_MODE(MANUAL);
char* buffer;
unordered_map<int, DummyClass*> store;
string alphabet;
unsigned char alphabet_range;
unsigned char state;
int num_chars;
static const unsigned char STATE_INIT = 0;
static const unsigned char STATE_LOAD_BUFFER = 1;
static const unsigned char STATE_PREP_FOR_DESERIALIZE = 2;
static const unsigned char STATE_FAKE_DESERIALIZE = 3;
static const unsigned char STATE_FINISH_RESTART = 4;
delete objects helper function
bool delete_objects()
{
Serial.println("deleting objects in 'store'");
for(auto iter = store.begin(); iter != store.end(); iter++)
{
delete iter->second;
iter->second = nullptr;
}
store.clear();
if(store.empty())
return true;
else
return false;
}
set up function, allocates memory, initial assignments
void setup()
{
Serial.begin(9600);
Serial1.begin(38400);
delay(2000);
buffer = new char[9000];
alphabet = string("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!##$^&*()_-?/><[]{}|");
alphabet_range = alphabet.length() - 1;
state = STATE_INIT;
num_chars = 0;
}
loop function, gets run over and over
void loop()
{
switch(state){
case STATE_INIT: {
strcpy(buffer, "");
state = STATE_LOAD_BUFFER;
delay(1000);
break;
}
case STATE_LOAD_BUFFER: {
if(num_chars < 6000){
string chunk;
for(char i = 0; i < 200; i++){
int index = rand() % alphabet_range;
chunk.append(alphabet.substr(index, 1));
num_chars++;
}
strcat(buffer, chunk.c_str());
}
else{
num_chars = 0;
state = STATE_PREP_FOR_DESERIALIZE;
}
delay(500);
break;
}
case STATE_PREP_FOR_DESERIALIZE: {
Serial.println("\nAttempting to delete current object set...");
delay(500);
if(delete_objects())
Serial.println("_delete_objects succeeded");
else {
Serial.println("_delete_objects failed");
break;
}
state = STATE_FAKE_DESERIALIZE;
delay(1000);
break;
}
case STATE_FAKE_DESERIALIZE: {
string buff_string(buffer);
if(buff_string.length() == 0){
Serial.println("Main:: EMPTY STRING CONVERTED FROM BUFFER");
}
int index = 0;
int key = 1;
while(index < buff_string.length())
{
int amount = (rand() % 50) + 5;
DummyClass* dcp = new DummyClass(buff_string.substr(index, amount));
store[key] = dcp;
index += amount;
key++;
}
state = STATE_FINISH_RESTART;
delay(1000);
break;
}
case STATE_FINISH_RESTART: {
state = STATE_INIT;
break;
}
}
}
dummyclass.h
very minimal, constructor just stores a string in a character buffer. this object is just a wrapper.
using namespace std;
class DummyClass {
private:
char* _container;
public:
DummyClass(){
}
DummyClass(string input){
_container = new char[input.length()];
strcpy(_container, input.c_str());
}
~DummyClass(){
delete _container;
_container = nullptr;
}
char* ShowMeWhatYouGot(){
return _container;
}
};
EDIT:
This is a real problem that I am having, I'm not sure why it is getting downvoted. Help me out here, how can I be more clear? I'm reluctant to shrink the code since it imitates many aspects of a much bigger program that it is modeling simply. I want to keep the structure of the code in place in case this bug is an emergent property.
Always account for the string terminator:
DummyClass(string input){
_container = new char[input.length()];
strcpy(_container, input.c_str());
}
Allocates one too few bytes to hold the input string and terminator that is then copied into it. The \0that's appended at the end is overwriting something, which is most likely metadata required to re-integrate the alloced memory fragment back into the heap successfully. I'm actually surprised it didn't crash...
It probably doesn't happen every allocation (only when you overflow into a new 8 byte aligned chunk), but once is enough :)
So, after some testing, I'd like to give a shout out to Russ Schultz who commented the right answer. If you want to post a solution formally, I would be happy to mark it as correct.
The memory bug is caused by allocating the char buffer _container without considering the null terminating character, meaning I am loading in a string that is too big. (not entirely sure why this causes a bug and doesn't throw an error?)
On a different site however, I also received this piece of advice:
string chunk;
for(char i = 0; i < 200; i++){
int index = rand() % alphabet_range;
chunk.append(alphabet.substr(index, 1));
// strcat(buffer, alphabet.substring(index, index + 1));
num_chars++;
}
This loop looks suspect to me. You are depending on the string append method to grow chunk as needed, but you know you are going to run that loop 200 times. Why not use the string reserve method to just allocate that much space? I bet that this chews up a lot of memory with each new char you append calling realloc, potentially fragmenting memory.
This ended up not being the solution, but it might be good to know.
I'm Java user coming over to C++, and I am having a hard time understanding what is going wrong with this statement. My program has been segfaulting anywhere I put the push_back command. So I'm wondering what exactly is going on.
class Process {
public:
int nice;
int arrivalTime;
int cpuBursts;
list<int> burstList;
Process() {
burstList.push_back(10); // Segfaults here...
}
};
Here is the full code:
#include<iostream>
#include<stdlib.h>
#include<fstream>
#include<list>
#include<string.h>
using namespace std;
int calcTimeslice(int priority);
int calcOriginalPrio(int nice);
int readFile(int ,char **);
int calcPrioBonus(int,int);
void tokenizeAndAdd(char *);
class Bursts {
public:
int isCPUBurst;
int time;
Bursts() {}
// Constructor to make it easier to add to list
Bursts(int tempIsCPU, int tempTime) {
isCPUBurst = tempIsCPU;
time = tempTime;
}
};
class Process {
public:
int nice;
int arrivalTime;
int cpuBursts;
list<int> burstList;
Process() {
burstList.push_back(10);
}
};
int main(int arg, char **argv) {
// This is if the file was not correctly read into the program
// or it doesnt exist ...
if(readFile(arg,argv)==-1) {
cout << "File could not be read. \n";
return -1;
}
//cout << "Original Calc Whatever: " << calcOriginal(19) << '\n';
return 0;
}
/*
* Calculates the timeslice based on the priority
*/
int calcTimeslice(int priority) {
double finalCalc;
// This is the given function in the prompt
finalCalc = ( (1 - (priority / 140)) * 290 + (.5) ) + 10;
// Cast to int, this will be a truncate
return ((int)finalCalc);
}
int readFile(int arg, char **argv) {
char *temp,*pointer;
int endOfFile = 1;
// While its not the end of the file
while(endOfFile) {
// Read in the input from stdin
fgets(temp,256,stdin);
// Check to see if this line had a * in it
if(*temp =='*')
endOfFile = 0;
else
tokenizeAndAdd(temp);
}
return 0;
}
void tokenizeAndAdd(char *string) {
char *token = strtok(string," \n");
int i = 0;
Process p;
while(token != NULL) {
cout << token << endl;
if(i>2) { // If it is odd (CPU burst)
if(i%2 == 1) {
int tempInt = atoi(token);
//p.burstList.push_back(tempInt);
}
else { // If it is even (IO burst)
int tempInt = atoi(token);
//p.burstLis.push_back(tempInt);
}
}
else if(i==0)
p.nice = atoi(token);
else if(i==1)
p.arrivalTime = atoi(token);
else if(i==2)
p.cpuBursts = atoi(token);
token = strtok(NULL," \n");
i++;
}
//cout << p.nice << " " << p.arrivalTime << " " << p.cpuBursts << "\n";
//i = 0;
//cout << p.burstList.size() << "\n";
// cout <<
//}
return;
}
/*
* Calculates and returns the original priority based on the nice number
* provided in the file.
*/
int calcOriginalPrio(int nice) {
double finalCalc;
// This is the given function from the prompt
finalCalc = (( nice + 20 ) / 39 ) * 30 + 105.5;
// Cast to int, this is a truncate in C++
return ((int)finalCalc);
}
/*
* Calculates the bonus time given to a process
*/
int calcPrioBonus(int totalCPU, int totalIO) {
double finalCalc;
// How to calculate bonus off of the prompt
if(totalCPU < totalIO)
finalCalc = ( (1 - (totalCPU / (double)totalIO)) * (-5)) - .5;
else
finalCalc = ( (1 - (totalIO / (double)totalCPU)) * 5) + .5;
// Cast to int
return ((int)finalCalc);
}
You are using temp uninitialized in the following code:
char *temp;
...
while(endOfFile) {
fgets(temp,256,stdin);
...
This can have any side effect, since it most likely destroys your stack or parts of the heap memory. It could fail immediately (when calling the fgets() function), it could fail later (as in your sample) or it could even run fine - maybe until you upgrade your OS, your compiler or anything else, or until you want to run the same executable on another machine. This is called undefined behaviour.
You need to allocate space for the temp variable, not a pointer only. Use something like
char temp[256];
...
while(endOfFile) {
fgets(temp,256,stdin);
...
For more information, see the fgets() documentation. The first parameter is a pointer to a char array - that is where fgets() will store the bytes which have been read. In your code, you pass an uninitialized pointer which means that fgets() will store the bytes to an undefined memory location - this is catched by the OS which terminates your application with a segmentation fault.
BTW: You should consider enabling pedantic warnings when compiling - I compiled with
g++ -Wall -pedantic -o list list.cpp
which gave me the following warning:
list.cpp: In function 'int readFile(int, char**)':
list.cpp:76:26: warning: 'temp' may be used uninitialized in this function [-Wuninitialized]
This is probably not the actual code with the error you report. But here is one of the problems with give you UB.
char *temp,*pointer; // uninicialized pointer char temp[1000]; could work?
int endOfFile = 1;
// While its not the end of the file
while(endOfFile) {
// Read in the input from stdin
fgets(temp,256,stdin);
The last function call will read a maximum of 256 bytes from stdin and will write it in the memory pointed by pointer tmp. So, you need to first "prepare" that memory. But with char *tmp; you only define a pointer, with no defined value, that is, with point to some possible unexisting or illegal/inaccessible for you memory. In contrary, char tmp[1000]; will define in the "stack memory" a block of 1000 bytes, with you can point to using simple the variable tmp. Hope this is clear for you.
EDIT:
I don't know why that would change the behavior of the list,
You are right. That is Undefined Behavior (UB). When you write in some unknown memory (pointed by an uninitialized pointer) you may overwrite data or even code that will broke somewhere the correct function of your program in an unpredicted way.
You will need to learn more about pointers but better you use std::string, and look how parse your file using string and stringstream. That will manage for you the memmory,
I am building an FTP client in C++ for personal use and for the learning experience, but I have run into a problem when allocating memory for storing LIST responses. The library I am using for FTP requests is libcurl which will call the following function when it receives a response from the server:
size_t FTP_getList( char *ptr, size_t size, size_t nmemb, void *userdata) {
//GLOBAL_FRAGMENT is global
//libcurl will split the resulting list into smaller approx 2000 character
//strings to pass into this function so I compensate by storing the leftover
//fragment in a global variable.
size_t fraglen = 0;
if(GLOBAL_FRAGMENT!=NULL) {
fraglen = strlen(GLOBAL_FRAGMENT);
}
size_t listlen = size*nmemb+fraglen+1;
std::cout<<"Size="<<size<<" nmemb="<<nmemb;
char *list = new char[listlen];
if(GLOBAL_FRAGMENT!=NULL) {
snprintf(list,listlen,"%s%s",GLOBAL_FRAGMENT,ptr);
} else {
strncpy(list,ptr,listlen);
}
list[listlen]=0;
size_t packetSize = strlen(list);
std::cout<<list;
bool isComplete = false;
//Check to see if the last line is complete (i.e. newline terminated)
if(list[size]=='\n') {
isComplete = true;
}
if(GLOBAL_FRAGMENT!=NULL) {
delete[] GLOBAL_FRAGMENT;
}
GLOBAL_FRAGMENT = GLOBAL_FTP->listParse(list,isComplete);
delete[] list;
//We return the length of the new string to prove to libcurl we
//our function properly executed
return size*nmemb;
}
The function above calls the next function to split each line returned into individual
strings to be further processed:
char* FTP::listParse(char* list, bool isComplete) {
//std::cout << list;
//We split the list into seperate lines to deal with independently
char* line = strtok(list,"\n");
int count = 0;
while(line!=NULL) {
count++;
line = strtok(NULL,"\n");
}
//std::cout << "List Count: " << count << "\n";
int curPosition = 0;
for(int i = 0; i < count-1 ; i++) {
//std::cout << "Iteration: " << i << "\n";
curPosition = curPosition + lineParse((char*)&(list[curPosition])) + 1;
}
if(isComplete) {
lineParse((char*)&(list[curPosition]));
return NULL;
} else {
int fraglen = strlen((char*)&(list[curPosition]));
char* frag = new char[fraglen+1];
strcpy(frag,(char*)&(list[curPosition]));
frag[fraglen] = 0;
return frag;
}
}
The function above then calls the function below to split the individual entries in a line into separate tokens:
int FTP::lineParse(char *line) {
int result = strlen(line);
char* value = strtok(line, " ");
while(value!=NULL) {
//std::cout << value << "\n";
value = strtok(NULL, " ");
}
return result;
}
This program works for relatively small list responses but when I tried stress testing it by getting a listing for a remote directory with ~10,000 files in it, my program threw a SIGSEGV... I used backtrace in gdb and found that the segfault happens on lines delete[] GLOBAL_FRAGMENT;' anddelete[] list;inFTP_getList. Am I not properly deleting these arrays? I am callingdelete[]` exactly once for each time I allocate them so I don't see why it wouldn't be allocating memory correctly...
On a side note: Is it necessary to check to see if an array is NULL before you try to delete it?
Also, I know this would be easier to do with STD::Strings but I am trying to learn c style strings as practice, and the fact that it is crashing is a perfect example of why I need practice, I will also be changing the code to store these in a dynamically allocated buffer that only is reallocated when the new ptr size is larger than the previous length, but I want to figure out why the current code isn't working first. :-) Any help would be appreciated.
In this code
size_t listlen = size*nmemb+fraglen+1;
std::cout<<"Size="<<size<<" nmemb="<<nmemb;
char *list = new char[listlen];
if(GLOBAL_FRAGMENT!=NULL) {
snprintf(list,listlen,"%s%s",GLOBAL_FRAGMENT,ptr);
} else {
strncpy(list,ptr,listlen);
}
list[listlen]=0;
You are overruning your list buffer. You have allocated listlen bytes, but you write a 0 value one past the last allocated byte. This invokes undefined behavior. More practically speaking, it can cause heap corruption, which can cause the kind of errors you observed.
I didn't see any issues with the way you are calling delete[].
It is perfectly safe to delete a NULL pointer.
I am working on a C++ program that uses some external C libraries. As far as I can tell though that is not the cause of the problem, and the issue is with my C++ code. The program runs fine with no errors or anything on my test datasets, but after going through nearly the entire full dataset, I get a segfault. Running GDB gives me this segfault:
(gdb) run -speciesMain=allMis1 -speciesOther=anoCar2 -speciesMain=allMis1 -speciesOther=anoCar2 /hive/data/genomes/allMis1/bed/lastz.anoCar2/mafRBestNet/*.maf.gz
Starting program: /cluster/home/jstjohn/bin/mafPairwiseSyntenyDecay -speciesMain=allMis1 -speciesOther=anoCar2 -speciesMain=allMis1 -speciesOther=anoCar2 /hive/data/genome
s/allMis1/bed/lastz.anoCar2/mafRBestNet/*.maf.gz
Detaching after fork from child process 3718.
Program received signal SIGSEGV, Segmentation fault.
0x0000003009cb7672 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib64/libstdc++.so.6
(gdb) up
#1 0x0000003009c9db59 in std::basic_string, std::allocator >::~basic_string() () from /usr/lib64/libstdc++.so.6
(gdb) up
#2 0x00000000004051e7 in PairAlnInfo::~PairAlnInfo (this=0x7fffffffcd70, __in_chrg=) at mafPairwiseSyntenyDecay.cpp:37
(gdb) up
#3 0x0000000000404eb0 in main (argc=2, argv=0x7fffffffcf78) at mafPairwiseSyntenyDecay.cpp:260
It looks like something is going on with a double free of my PairAlnInfo class. The weird thing is that I don't define a destructor, and I am not allocating anything with new. I have tried this both with g++44 and g++4.1.2 on the linux machine and have had the same results.
To make things even weirder, on my linux box (with more available RAM and everything, not that RAM is an issue with this program, but it is a beefy system) the seg fault happens as described above before the program reaches the loop to print output. On my much smaller macbook air using either g++ or clang++, the program still segfaults, but it doesn't do that until after the results are printed, right before the final return(0) out of the main function. Here is what the GDB trace looks like on my mac running on the same file after compiling with Mac's default g++4.2:
(more results)...
98000 27527 162181 0.83027
99000 27457 161467 0.829953
100000 27411 160794 0.829527
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00004a2c00106077
0x00007fff9365a6e5 in std::string::_Rep::_M_dispose ()
(gdb) up
#1 0x00007fff9365a740 in std::basic_string, std::allocator >::~basic_string ()
(gdb) up
#2 0x0000000100003938 in main (argc=1261, argv=0x851d5fbff533) at mafPairwiseSyntenyDecay.cpp:301
(gdb)
Just in case you didn't notice the time of my posting, it's about 2:30AM now... I have been hacking away at this problem for about 10 hours now. Thanks so much for taking the time to look at this and help me out! The code and some instructions for replicating my situation follow.
If you are interested in downloading and installing the whole thing with dependencies then download my KentLib repository, make in the base directory, and then go to examples/mafPairwiseSyntenyDecay and run make there. An example (rather large) that causes the bug I am discussing is the gziped file available here: 100Mb file that the program crashes on. Then execute the program with these arguments -speciesMain=allMis1 -speciesOther=anoCar2 anoCar2.allMis1.rbest.maf.gz.
/**
* mafPairwiseSyntenyDecay
* Author: John St. John
* Date: 4/26/2012
*
* calculates the mean synteny decay in different range bins
*
*
*/
//Kent source C imports
extern "C" {
#include "common.h"
#include "options.h"
#include "maf.h"
}
#include <map>
#include <string>
#include <set>
#include <vector>
#include <sstream>
#include <iostream>
//#define NDEBUG
#include <assert.h>
using namespace std;
/*
Global variables
*/
class PairAlnInfo {
public:
string oname;
int sstart;
int send;
int ostart;
int oend;
char strand;
PairAlnInfo(string _oname,
int _sstart, int _send,
int _ostart, int _oend,
char _strand):
oname(_oname),
sstart(_sstart),
send(_send),
ostart(_ostart),
oend(_oend),
strand(_strand){}
PairAlnInfo():
oname("DUMMY"),
sstart(-1),
send(-1),
ostart(-1),
oend(-1),
strand(-1){}
};
vector<string> &split(const string &s, char delim, vector<string> &elems) {
stringstream ss(s);
string item;
while(getline(ss, item, delim)) {
elems.push_back(item);
}
return(elems);
}
vector<string> split(const string &s, char delim) {
vector<string> elems;
return(split(s, delim, elems));
}
#define DEF_MIN_LEN (200)
#define DEF_MIN_SCORE (200)
typedef map<int,PairAlnInfo> PairAlnInfoByPos;
typedef map<string, PairAlnInfoByPos > ChromToPairAlnInfoByPos;
ChromToPairAlnInfoByPos pairAlnInfoByPosByChrom;
void usage()
/* Explain usage and exit. */
{
errAbort(
(char*)"mafPairwiseSyntenyDecay -- Calculates pairwise syntenic decay from maf alignment containing at least the two specified species.\n"
"usage:\n"
"\tmafPairwiseSyntenyDecay [options] [*required options] file1.maf[.gz] ... \n"
"Options:\n"
"\t-help\tPrints this message.\n"
"\t-minScore=NUM\tMinimum MAF alignment score to consider (default 200)\n"
"\t-minAlnLen=NUM\tMinimum MAF alignment block length to consider (default 200)\n"
"\t-speciesMain=NAME\t*Name of the main species (exactly as it appears before the '.') in the maf file (REQUIRED)\n"
"\t-speciesOther=NAME\t*Name of the other species (exactly as it appears before the '.') in the maf file (REQUIRED)\n"
);
}//end usage()
static struct optionSpec options[] = {
/* Structure holding command line options */
{(char*)"help",OPTION_STRING},
{(char*)"minScore",OPTION_INT},
{(char*)"minAlnLen",OPTION_INT},
{(char*)"speciesMain",OPTION_STRING},
{(char*)"speciesOther",OPTION_STRING},
{NULL, 0}
}; //end options()
/**
* Main function, takes filenames for paired qseq reads
* and outputs three files.
*/
int iterateOverAlignmentBlocksAndStorePairInfo(char *fileName, const int minScore, const int minAlnLen, const string speciesMain, const string speciesOther){
struct mafFile * mFile = mafOpen(fileName);
struct mafAli * mAli;
//loop over alignment blocks
while((mAli = mafNext(mFile)) != NULL){
struct mafComp *first = mAli->components;
int seqlen = mAli->textSize;
//First find and store set of duplicates in this block
set<string> seen;
set<string> dups;
if(mAli->score < minScore || seqlen < minAlnLen){
//free here and pre-maturely end
mafAliFree(&mAli);
continue;
}
for(struct mafComp *item = first; item != NULL; item = item->next){
string tmp(item->src);
string tname = split(tmp,'.')[0];
if(seen.count(tname)){
//seen this item
dups.insert(tname);
}else{
seen.insert(tname);
}
}
for(struct mafComp *item1 = first; item1->next != NULL; item1 = item1->next){
//stop one before the end
string tmp1(item1->src);
vector<string> nameSplit1(split(tmp1,'.'));
string name1(nameSplit1[0]);
if(dups.count(name1) || (name1 != speciesMain && name1 != speciesOther)){
continue;
}
for(struct mafComp *item2 = item1->next; item2 != NULL; item2 = item2->next){
string tmp2(item2->src);
vector<string> nameSplit2(split(tmp2,'.'));
string name2 = nameSplit2[0];
if(dups.count(name2) || (name2 != speciesMain && name2 != speciesOther)){
continue;
}
string chr1(nameSplit1[1]);
string chr2(nameSplit2[1]);
char strand;
if(item1->strand == item2->strand)
strand = '+';
else
strand = '-';
int start1,end1,start2,end2;
if(item1->strand == '+'){
start1 = item1->start;
end1 = start1 + item1->size;
}else{
end1 = item1->start;
start1 = end1 - item1->size;
}
if(item2->strand == '+'){
start2 = item2->start;
end2 = start2+ item2->size;
}else{
end2 = item2->start;
start2 = end2 - item2->size;
}
if(name1 == speciesMain){
PairAlnInfo aln(chr2,start1,end1,start2,end2,strand);
pairAlnInfoByPosByChrom[chr1][start1] = aln;
}else{
PairAlnInfo aln(chr1,start2,end2,start1,end1,strand);
pairAlnInfoByPosByChrom[chr2][start2] = aln;
}
} //end loop over item2
} //end loop over item1
mafAliFree(&mAli);
}//end loop over alignment blocks
mafFileFree(&mFile);
return(0);
}
int main(int argc, char *argv[])
/* Process command line. */
{
optionInit(&argc, argv, options);
if(optionExists((char*)"help") || argc <= 1){
usage();
}
int minAlnScore = optionInt((char*)"minScore",DEF_MIN_SCORE);
int minAlnLen = optionInt((char*)"minAlnLen",DEF_MIN_LEN);
string speciesMain(optionVal((char*)"speciesMain",NULL));
string speciesOther(optionVal((char*)"speciesOther",NULL));
if(speciesMain.empty() || speciesOther.empty())
usage();
//load the relevant alignment info from the maf(s)
for(int i = 1; i<argc; i++){
iterateOverAlignmentBlocksAndStorePairInfo(argv[i], minAlnScore, minAlnLen, speciesMain, speciesOther);
}
const int blockSize = 1000;
const int blockCount = 100;
int totalWindows[blockCount] = {0};
int containBreak[blockCount] = {0};
//we want the fraction of windows of each size that contain a break
//
for(ChromToPairAlnInfoByPos::iterator mainChromItter = pairAlnInfoByPosByChrom.begin();
mainChromItter != pairAlnInfoByPosByChrom.end();
mainChromItter++){
//process the alignments shared by this chromosome
//note that map stores them sorted by begin position
vector<int> keys;
for(PairAlnInfoByPos::iterator posIter = mainChromItter->second.begin();
posIter != mainChromItter->second.end();
posIter++){
keys.push_back(posIter->first);
}
for(int i = 0; i < keys.size(); i++){
//first check for trivial window (ie our block)
PairAlnInfo pi1 = mainChromItter->second[keys[i]];
assert(pi1.send > pi1.sstart);
assert(pi1.sstart == keys[i]);
int numBucketsThisWindow = (pi1.send - pi1.sstart) / blockSize;
for(int k = 0; k < numBucketsThisWindow && k < blockCount; k++)
totalWindows[k]++;
for(int j = i+1; j < keys.size(); j++){
PairAlnInfo pi2 = mainChromItter->second[keys[j]];
assert(pi2.sstart == keys[j]);
assert(pi2.send > pi2.sstart);
assert(pi2.sstart > pi1.sstart);
if(pi2.oname == pi1.oname){
int moreToInc = (pi2.send - pi1.sstart) / blockSize;
for(int k = numBucketsThisWindow; k < moreToInc && k < blockCount; k++)
totalWindows[k]++;
numBucketsThisWindow = moreToInc; //so we don't double count
}else{
int numDiscontigBuckets = (pi2.send - pi1.sstart) / blockSize;
for(int k = numBucketsThisWindow; k < numDiscontigBuckets && k < blockSize; k++){
containBreak[k]++;
totalWindows[k]++;
}
numBucketsThisWindow = numDiscontigBuckets;
}
if((keys[j] - keys[i]) >= (blockSize * blockCount)){
//i = j;
break;
}
}
}
}
cout << "#WindowSize\tNumContainBreak\tNumTotal\t1-(NumContainBreak/NumTotal)" << endl;
for(int i = 0; i < blockCount; i++){
cout << (i+1)*blockSize << '\t';
cout << containBreak[i] << '\t';
cout << totalWindows[i] << '\t';
cout << (totalWindows[i] > 0? 1.0 - (double(containBreak[i])/double(totalWindows[i])): 0) << endl;
}
return(0);
} //end main()
Try running your program under valgrind. This will give you a report of possibly or actually lost memory, uninitialised, etc.
Your issues are probably due to due memory corruption occurring at some point in the program sometime prior to the actual errors you are seeing.
One potential issue in the code you posted is the loop:
for(int k = numBucketsThisWindow; k<numDiscontigBuckets && k < blockSize; k++){
which uses blockSize instead of the correct blockCount which leads to a possible overflow of both the totalWindows[] and containBreak[] arrays. This would overwrite the speciesMain and speciesOther strings, alonth with anything else on the stack, which might very well result in the errors you are seeing.