Segmentation fault when trying to create a buffer of 100MB - c++

I am trying to write a large binary file into a buffer in a C++ program. GDB always gets a segfault after trying to create a buffer the same size as the file read. It either fails on fclose(pf), rewind, or f(open) which leads me to believe that there is something wrong when I am trying to create the buffer. My code segment is as follows.
static int fileTransfer(struct mg_connection *conn, char * filename){
FILE *fp = fopen(filename, "r");
fseek(fp, 0, SEEK_END);
int size = ftell(fp);
char buf[size];
fclose(fp);
// This is an attempt to stop a segment fault from rewind.
fp = fopen(filename, "r");
conn->connection_param = (void *) fp;
size_t n = 0;
if(fp != NULL)
{
n = fread(buf, 1, sizeof(buf), fp);
mg_send_data(conn, buf, n);
if(n < sizeof(buf) || conn->wsbits != 0)
{
fclose(fp);
conn->connection_param = NULL;
}
}
return 1;
}
I have tried putting print statements in this code but they don't print to the console as they are running in a separate thread. Can someone give me some insight on why this segfault is happening, or some suggestions on how to make this code more efficient.
I should note that this code works properly on 1 and 10 MB files but not on anything larger.

never do this:
int size = ftell(fp);
char buf[size];
You are creating size on the STACK, not on the heap.... 100MB on the stack will not work.
AND... size must be a constant number, not a number coming from ftell(). I even don't know how it is compiling...
What you have to to is to allocate memory using malloc() or new operator.
static int fileTransfer(struct mg_connection *conn, char * filename){
FILE *fp = fopen(filename, "r");
fseek(fp, 0, SEEK_END);
int size = ftell(fp);
char * buf = new char[size]; // fix also here!
fclose(fp);
// This is an attempt to stop a segment fault from rewind.
fp = fopen(filename, "r");
conn->connection_param = (void *) fp;
size_t n = 0;
if(fp != NULL)
{
n = fread(buf, 1, size, fp); // fix also here!
mg_send_data(conn, buf, n);
if(n < size || conn->wsbits != 0)
{
fclose(fp);
conn->connection_param = NULL;
}
}
delete [] buf; // and you have to deallocate your buffer
return 1;
}

You are creating a buffer with automatic storage duration, which means it will be put on the stack by g++. The default stack size for any OS known to me is below 100 MB, meaning it will cause a segfault on system supporting them.
Try allocating your buffer with dynamic storage duration, which will place it on the heap.

What's going on is actually the namesake of this site! Basically, what is happening is your program is created it has a set amount of memory allocated for the stack.
When you create char buf[size], you are using a C99 feature called a variable length array (VLA). This allocates space on the stack for buf. However, buf is too large for the stack, so your program fails.
In order to fix this problem, you should use char * buf; and then do buf = malloc(size). This will place buf on the heap, which is larger than the stack. It also lets you check if you do not have enough memory, by checking if malloc() returns NULL. You need to be sure to free(buf) before you exit though!
As a side note, you can check how much space you have on the stack by using the ulimit -s command.

That seems like a lot to allocate on the stack. What if you put it on the heap instead?
char *buf = new char[size];

Use std::vector. Then you don't have the issues of stack space, or the other issue of writing non-standard C++ code:
#include <vector>
//...
static int fileTransfer(struct mg_connection *conn, char * filename)
{
FILE *fp = fopen(filename, "r");
fseek(fp, 0, SEEK_END);
int size = ftell(fp);
std::vector<char> buf(size);
fclose(fp);
fp = fopen(filename, "r");
conn->connection_param = (void *) fp;
size_t n = 0;
if(fp != NULL)
{
n = fread(&buf[0], 1, buf.size(), fp);
mg_send_data(conn, &buf[0], n);
if(n < buf.size() || conn->wsbits != 0)
{
fclose(fp);
conn->connection_param = NULL;
}
}
return 1;
}

Related

Buffer overflow in fread and strncpy in C++

I'm getting buffer overflow case from the appscan for the below set of code.
I'm not sure what is wrong in it.
If someone suggest a solution that would be great. Common Code is for all the platform.
int main()
{
char* src = NULL;
char* chenv = getenv("HOME");
if (chenv == NULL || strlen(chenv) == 0)
return -1;
else
{
int len = strlen(chenv);
src = new char[len+1];
strncpy(src, chenv, len); // AppScan throws buffer overflow
src[len+1]='\0';
}
FILE* fp;
char content[4096];
int len = 0;
fp = fopen("filename.txt", "r");
if(fp)
{
while( (len = fread(content, sizeof(char), sizeof(content), fp))> 0) // AppScan throws buffer overflow on content
{
docopy(content, len);// External funtion call.
}
}
return 0;
}
Instead of strncpy I tried using strdup() and the issue solved. But the fread is still having the issue.

Cannot Read Binary files in byte mode in C++

I am trying to read a binary file's data sadly opening in C++ is a lot different than in python for these things as they have byte mode. It seems C++ does not have that.
for (auto p = directory_iterator(path); p != directory_iterator(); p++) {
if (!is_directory(p->path()))
byte tmpdata;
std::ifstream tmpreader;
tmpreader.open(desfile, std::ios_base::binary);
int currentByte = tmpreader.get();
while (currentByte >= 0)
{
//std::cout << "Does this get Called?" << std::endl;
int currentByte = tmpreader.get();
tmpdata = currentByte;
}
tmpreader.close()
}
else
{
continue;
}
I want basically a clone of Python's methods of opening a file in 'rb' mode. To have to actual byte data of all of the contents (which is not readable as it has nonprintable chars even for C++. Most of which probably cant be converted to signed chars just because it contains zlib compressed data that I need to feed in my DLL to decompress it all.
I do know that in Python I can do something like this:
file_object = open('[file here]', 'rb')
turns out that replacing the C++ Code above with this helps. However fopen is depreciated but I dont care.
What the Code above did not do was work because I was not reading from the buffer data. I did realize later that fopen, fseek, fread, and fclose was the functions I needed for read bytes mode ('rb').
for (auto p = directory_iterator(path); p != directory_iterator(); p++) {
if (!is_directory(p->path()))
{
std::string desfile = p->path().filename().string();
byte tmpdata;
unsigned char* data2;
FILE *fp = fopen("data.d", "rb");
fseek(fp, 0, SEEK_END); // GO TO END OF FILE
size_t size = ftell(fp);
fseek(fp, 0, SEEK_SET); // GO BACK TO START
data2 = new unsigned char[size];
tmpdata = fread(data2, 1, size, fp);
fclose(fp);
}
else
{
continue;
}
int currentByte = tmpreader.get();
while (currentByte >= 0)
{
//std::cout << "Does this get Called?" << std::endl;
int currentByte = tmpreader.get();
//^ here!
You are declaring a second variable hiding the outer one. However, this inner one is only valid within the while loop's body, so the while condition checks the outer variable which is not modified any more. Rather do it this way:
int currentByte;
while ((currentByte = tmpreader.get()) >= 0)
{

How do I divide binary data into frames in C++?

I need to read a binary file containing several bytes and divide the contents into frames, each consisting of 535 bytes each. The number of frames present in the file is not known at runtime and thus I need to dynamically allocate memory for them. The code below is a snippet and as you can see, I'm trying to create a pointer to an array of bytes (uint8_t) and then increment into the next frame and so on, in the loop that reads the buffered data into the frames. How do I allocate memory at runtime and is this the best way to do the task? Please let me know if there is a more elegant solution. Also, how I manage the memory?
#include <cstdio>
using namespace std;
long getFileSize(FILE *file)
{
long currentPosition, endPosition;
currentPosition = ftell(file);
fseek(file, 0, 2);
endPosition = ftell(file);
fseek(file, currentPosition, 0);
return endPosition;
}
int main()
{
const char *filePath = "C:\Payload\Untitled.bin";
uint8_t *fileBuffer;
FILE *file = NULL;
if((file = fopen(filePath, "rb")) == NULL)
cout << "Failure. Either the file does not exist or this application lacks sufficient permissions to access it." << endl;
else
cout << "Success. File has been loaded." << endl;
long fileSize = getFileSize(file);
fileBuffer = new uint8_t[fileSize];
fread(fileBuffer, fileSize, 1, file);
uint8_t (*frameBuffer)[535];
for(int i = 0, j = 0; i < fileSize; i++)
{
frameBuffer[j][i] = fileBuffer[i];
if((i % 534) == 0)
{
j++;
}
}
struct frame {
unsigned char bytes[535];
};
std::vector<frame> frames;
Now your loop can simply read a frame and push it into frames. No explicit memory management needed: std::vector does that for you.

How to overcome Memory error when reading large file (over 1GB) -- C/C++

I am trying to read a file of over 1GB (which has 1,157,421,364 bytes), it gives memory error when using fread() function, but works well when I use fgets() function.
Note: I have intermixed C and C++..
Can someone help me to overcome this memory error, am I doing something wrong?
Thanks in advance...
Error is "Memory Error"
#include &ltiostream>
#include &ltcstdlib>
#include &ltcstdio>
#include &ltcerrno>
#include &ltcstring>
void read_file2(FILE* readFilePtr){
long file_size;
fseek(readFilePtr, 0L, SEEK_END);
file_size = ftell(readFilePtr);
rewind(readFilePtr);
char *buffer;
buffer = (char*) malloc (sizeof(char)*file_size);
if (buffer == NULL) {
fputs("Memory Error", stderr);
exit(2);
}
long lines = 0;
if (fread(buffer, 1, file_size, readFilePtr) != file_size){
fputs("Reading Error", stderr);
exit(1);
}
char *p = buffer;
while (p = (char*) memchr(p, '\n', (buffer + file_size) - p)){
++p;
++lines;
}
printf("Num of lines %ld\n", lines);
free(buffer);
}
int main(int argc, char** argv){
clock_t begin_time, end_time;
float time_consumed;
begin_time = clock();
FILE* inputFilePtr = fopen(argv[1], "rb");
if(inputFilePtr == NULL){
printf("Error Opening %s: %s (%u)\n", argv[1], strerror(errno), errno);
return 1;
}
read_file2(inputFilePtr);
end_time = clock();
time_consumed = ((float)end_time - (float)begin_time)/CLOCKS_PER_SEC;
printf("Time consumed is -- %f\n", time_consumed);
return 0;
}
You can read the file in chunks, instead of reading it as a whole, reading all of the file to one allocated buffer means a huge memory allocation of your application, do you really want that?. That's being said assuming you don't need to process it all in once (which is true in most cases).
You usually don't read big files in one go like that. You use something called buffered reads. Essentially you continuously call fread in a loop until there's nothing left to read.

How to implement readlink to find the path

Using the readlink function used as a solution to How do I find the location of the executable in C?, how would I get the path into a char array? Also, what do the variables buf and bufsize represent and how do I initialize them?
EDIT: I am trying to get the path of the currently running program, just like the question linked above. The answer to that question said to use readlink("proc/self/exe"). I do not know how to implement that into my program. I tried:
char buf[1024];
string var = readlink("/proc/self/exe", buf, bufsize);
This is obviously incorrect.
This Use the readlink() function properly for the correct uses of the readlink function.
If you have your path in a std::string, you could do something like this:
#include <unistd.h>
#include <limits.h>
std::string do_readlink(std::string const& path) {
char buff[PATH_MAX];
ssize_t len = ::readlink(path.c_str(), buff, sizeof(buff)-1);
if (len != -1) {
buff[len] = '\0';
return std::string(buff);
}
/* handle error condition */
}
If you're only after a fixed path:
std::string get_selfpath() {
char buff[PATH_MAX];
ssize_t len = ::readlink("/proc/self/exe", buff, sizeof(buff)-1);
if (len != -1) {
buff[len] = '\0';
return std::string(buff);
}
/* handle error condition */
}
To use it:
int main()
{
std::string selfpath = get_selfpath();
std::cout << selfpath << std::endl;
return 0;
}
Accepted answer is almost correct, except you can't rely on PATH_MAX because it is
not guaranteed to be defined per POSIX if the system does not have such
limit.
(From readlink(2) manpage)
Also, when it's defined it doesn't always represent the "true" limit. (See http://insanecoding.blogspot.fr/2007/11/pathmax-simply-isnt.html )
The readlink's manpage also give a way to do that on symlink :
Using a statically sized buffer might not provide enough room for the
symbolic link contents. The required size for the buffer can be
obtained from the stat.st_size value returned by a call to lstat(2) on
the link. However, the number of bytes written by readlink() and read‐
linkat() should be checked to make sure that the size of the symbolic
link did not increase between the calls.
However in the case of /proc/self/exe/ as for most of /proc files, stat.st_size would be 0. The only remaining solution I see is to resize buffer while it doesn't fit.
I suggest the use of vector<char> as follow for this purpose:
std::string get_selfpath()
{
std::vector<char> buf(400);
ssize_t len;
do
{
buf.resize(buf.size() + 100);
len = ::readlink("/proc/self/exe", &(buf[0]), buf.size());
} while (buf.size() == len);
if (len > 0)
{
buf[len] = '\0';
return (std::string(&(buf[0])));
}
/* handle error */
return "";
}
Let's look at what the manpage says:
readlink() places the contents of the symbolic link path in the buffer
buf, which has size bufsiz. readlink does not append a NUL character to
buf.
OK. Should be simple enough. Given your buffer of 1024 chars:
char buf[1024];
/* The manpage says it won't null terminate. Let's zero the buffer. */
memset(buf, 0, sizeof(buf));
/* Note we use sizeof(buf)-1 since we may need an extra char for NUL. */
if (readlink("/proc/self/exe", buf, sizeof(buf)-1) < 0)
{
/* There was an error... Perhaps the path does not exist
* or the buffer is not big enough. errno has the details. */
perror("readlink");
return -1;
}
char *
readlink_malloc (const char *filename)
{
int size = 100;
char *buffer = NULL;
while (1)
{
buffer = (char *) xrealloc (buffer, size);
int nchars = readlink (filename, buffer, size);
if (nchars < 0)
{
free (buffer);
return NULL;
}
if (nchars < size)
return buffer;
size *= 2;
}
}
Taken from: http://www.delorie.com/gnu/docs/glibc/libc_279.html
#include <stdlib.h>
#include <unistd.h>
static char *exename(void)
{
char *buf;
char *newbuf;
size_t cap;
ssize_t len;
buf = NULL;
for (cap = 64; cap <= 16384; cap *= 2) {
newbuf = realloc(buf, cap);
if (newbuf == NULL) {
break;
}
buf = newbuf;
len = readlink("/proc/self/exe", buf, cap);
if (len < 0) {
break;
}
if ((size_t)len < cap) {
buf[len] = 0;
return buf;
}
}
free(buf);
return NULL;
}
#include <stdio.h>
int main(void)
{
char *e = exename();
printf("%s\n", e ? e : "unknown");
free(e);
return 0;
}
This uses the traditional "when you don't know the right buffer size, reallocate increasing powers of two" trick. We assume that allocating less than 64 bytes for a pathname is not worth the effort. We also assume that an executable pathname as long as 16384 (2**14) bytes has to indicate some kind of anomaly in how the program was installed, and it's not useful to know the pathname as we'll soon encounter bigger problems to worry about.
There is no need to bother with constants like PATH_MAX. Reserving so much memory is overkill for almost all pathnames, and as noted in another answer, it's not guaranteed to be the actual upper limit anyway. For this application, we can pick a common-sense upper limit such as 16384. Even for applications with no common-sense upper limit, reallocating increasing powers of two is a good approach. You only need log n calls for a n-byte result, and the amount of memory capacity you waste is proportional to the length of the result. It also avoids race conditions where the length of the string changes between the realloc() and the readlink().