Unzip buffer with large data length is crashing

Unzip buffer with large data length is crashing - c++

This is the function I am using to unzip buffer.
string unzipBuffer(size_t decryptedLength, unsigned char * decryptedData)
{
z_stream stream;
stream.zalloc = Z_NULL;
stream.zfree = Z_NULL;
stream.avail_in = decryptedLength;
stream.next_in = (Bytef *)decryptedData;
stream.total_out = 0;
stream.avail_out = 0;
size_t dataLength = decryptedLength* 1.5;
char c[dataLength];
if (inflateInit2(&stream, 47) == Z_OK)
{
int status = Z_OK;
while (status == Z_OK)
{
if (stream.total_out >= dataLength)
{
dataLength += decryptedLength * 0.5;
}
stream.next_out = (Bytef *)c + stream.total_out;
stream.avail_out = (uint)(dataLength - stream.total_out);
status = inflate (&stream, Z_SYNC_FLUSH);
}
if (inflateEnd(&stream) == Z_OK)
{
if (status == Z_STREAM_END)
{
dataLength = stream.total_out;
}
}
}
std::string decryptedContentStr(c, c + dataLength);
return decryptedContentStr;
}
And it was working fine until today when I realized that it crashes with large data buffer (Ex: decryptedLength: 342792) on this line:
status = inflate (&stream, Z_SYNC_FLUSH);
after one or two iterations. Can anyone help me please?

If your code generally works correctly, but fails for large data sets, then this could be due to a stack overflow as indicated by #StillLearning in his comment.
A usual (default) stack size is 1 MB. When your decryptedLength is 342,792, then you try to allocate 514,188 byte in the following line:
char c[dataLength];
Together with other allocations in your code (and finally in the inflate() function), this might already be too much. To overcome this problem, you should allocate the memory dynamically:
char* c = new char[dataLength];
If you so this, then please do not forget to release the allocated memory at the end of your unzipBuffer() function:
delete[] c;
If you forget to delete the allocated memory, then you will have a memory leak.
In case this doesn't (fully) solve your problem, you should do it anyway, because for even larger data sets your code will break for sure due to the limited size of the stack.
In case you need to "reallocate" your dynamically allocated buffer in your while() loop, then please take a look at this Q&A. Basically you need to use a combination of new, std::copy, and delete[]. However, it would be more appropriate if your exchange your char array with a std::vector<char> or even std::vector<Bytef>. Then you would be able enlarge your buffer easily by using the resize() function. You can directly access the buffer of a vector by using &my_vector[0] in order to assign it to stream.next_out.

c is not going to get bigger just because you increase datalength. You are probably overwriting past the end of c because your initial guess of 1.5 times the compressed size was wrong, causing the fault.
(It might be a stack overflow as suggested in another answer here, but I think that 8 MB stack allocations are common nowadays.)

Related

how to avoid buffer overflow with ostrstream

I have a code where i compile & load it in vxworks machine, i see the buffer overflow.
#include<strstream>
#include<iostream>
#include<sstream>
using namespace std;
ostrstream *strm = 0;
int newcout()
{
if(strm == 0)
{
strm = new ostrstream();
}
while(1)
{
(*strm)<<".VXworks_print"<<endl;
}
return 0;
}
The problem here is, the memory request keeps on doublingfor every loop in while.
[maxBlock = 8497968/ allocSize = 12700]
[maxBlock = 8485176/ allocSize = 25500]
[maxBlock = 8459584/ allocSize = 51100]
[maxBlock = 8408392/ allocSize = 102300]
[maxBlock = 8306000/ allocSize = 204700]
[maxBlock = 8101208/ allocSize = 409500]
[maxBlock = 7691616/ allocSize = 819100]
[maxBlock = 7086744/ allocSize = 1638300]
[maxBlock = 7086744/ allocSize = 3276700]
[maxBlock = 7086744/ allocSize = 6553500]
[maxBlock = 8497288/ allocSize = 13107100]
when the alllocation request is going beyond max available block, it is resulting in trap.
I think we are seeing this behavior because of re-using ostrstream object.
How to correct this behavior?

As per the documentation your ostrstream will keep allocating memory for each call. This memory is never freed. To avoid this, declare the ostream as local object (in the stack) and call the freeze(false) once you are done (after each str()) this way the memory is freed when the destructor of the ostream is called.
from: http://en.cppreference.com/w/cpp/io/ostrstream/freeze
After a call to str(), dynamic streams become frozen automatically. A call to freeze(false) is required before exiting the scope in which this ostrstream object was created. otherwise the destructor will leak memory. Also, additional output to a frozen stream may be truncated once it reaches the end of the allocated buffer.

As someone else already said in the comments, ostrstream is decprecated.
Furthermore this is c++ not java, you shouldn't use new to allocate a resource.
Just declare the object on the stack.
ostrstream strm;
And you could consider forgetting the c++ streaming stuff, its syntax and look is rather clunky. Personally I prefer the good old printf, even when its not type-safe - it just is way more compact.

c++ Allocating memory on real time without vector

asking on stack again. I have an array wich I want to be always at the minimum size, because I have to send over the internet. The problem is, the program has no way to know what the minimum size is until the operation is finished. This leads me to having to ways: using vectors, or make an array of the maximum lenght the program could ever need, and then that it knows the minimum size, initialize a pointer with new and put the data there. But I can't use vectors because they require serialization to be sent, and both vector and serialization have overheads I don't want. Example:
unsigned short data[1270], // the maximum size the operation could take is 1270 shorts
*packet; // pointer
int counter; //this is to count how big "packet" will be
//example of operation, wich of course is different from my program
// in this case the operation takes 6 bytes
while(true) {
for (int i; i != 6; i++) {
counter++;
data[i]= 1;
}
packet=new unsigned short[counter];
for (int i; i!=counter; i++) {
packet[i]=data[i];
}
}
Like you might have noticed, this code runs in cycles, so the problem might be my way to repeatedly re-initialize the same pointer.
The problem in this code is, if I do:
std::cout<<counter<<" "<<sizeof(packet)/sizeof(unsigned short)<<" ";
counter variates in size (usually from 1 to 35), but the size of packet is always 2. I also tried delete [] before new, but it didn't solve the problem.
This issue could also be related to another part of the code, but here i am just asking:
Is my way of repeatedly allocate memory right?

Continually add to an std::vector while requesting to the compiler that the size allocated in heap memory not exceed the amount actually needed:
std::vector<int> vec;
std::size_t const maxSize = 10;
for (std::size_t i; i != maxSize; ++i)
{
vec.reserve(vec.size() + 1u);
vec.push_back(1234); // whatever you're adding
}
I should add though that I see no good reason for doing this under normal circumstances. The performance of this "program" could be severely hampered with no obvious benefit.

You can always use pointers and realloc. C++ is such a powerfull language because of its pointers, you don't need to use arrays.
Take a look at the cplusplus entry on realloc.
For your case you could use it like this:
new_packet = (unsigned short*) realloc (packet, new_size * sizeof(unsigned short));
if (new_packet!=NULL) {
packet = new_packet;
for(int i ; i < new_size ; i++)
packet[i] = new_values[i];
}
else {
if( packet != NULL )
free (packet);
puts ("Error (re)allocating memory");
exit (1);
}

Okay, I see a couple problems in your logic here. Lets start with the main one: Why do you need to alloc a whole fresh array with a copy of whats in data just to send it over a socket? It's not like sending a letter dude, send() will transfer a copy of the information, not literally move it over the network. It's perfectly fine to do this:
send(socket, data, counter * sizeof(unsigned short), 0);
There. You don't need a new pointer for anything.
Also, I don't know where you got the serialization thing from. Vectors are basically arrays that resize automatically, and will also delete themselves from memory once the function is done. You could do this:
std::vector<unsigned short> packet;
packet.reserve(counter);
for (std::size_t i = 0; i < counter; ++i)
packet[i] = data[i];
send(socket, &packet[0], packet.size() * sizeof(unsigned short), 0);
Or even shorten to:
std::vector<unsigned short> packet;
for (std::size_t i = 0; i < counter; ++i)
packet.push_back(data[i]);
But with this option the vector will resize counter times, what is performance consuming. Always set its size first if you have the information available.

C++ Optimal Block Size For Reading From A File

I have a program that generates files containing random distributions of the character A - Z. I have written a method that reads these files (and counts each character) using fread with different buffer sizes in an attempt to determine the optimal block size for reads. Here is the method:
int get_histogram(FILE * fp, long *hist, int block_size, long *milliseconds, long *filelen)
{
char *buffer = new char[block_size];
bzero(buffer, block_size);
struct timeb t;
ftime(&t);
long start_in_ms = t.time * 1000 + t.millitm;
size_t bytes_read = 0;
while (!feof(fp))
{
bytes_read += fread(buffer, 1, block_size, fp);
if (ferror (fp))
{
return -1;
}
int i;
for (i = 0; i < block_size; i++)
{
int j;
for (j = 0; j < 26; j++)
{
if (buffer[i] == 'A' + j)
{
hist[j]++;
}
}
}
}
ftime(&t);
long end_in_ms = t.time * 1000 + t.millitm;
*milliseconds = end_in_ms - start_in_ms;
*filelen = bytes_read;
return 0;
}
However, when I plot bytes/second vs. block size (buffer size) using block sizes of 2 - 2^20, I get an optimal block size of 4 bytes -- which just can't be correct. Something must be wrong with my code but I can't find it.
Any advice is appreciated.
Regards.
EDIT:
The point of this exercise is to demonstrate the optimal buffer size by recording the read times (plus computation time) for different buffer sizes. The file pointer is opened and closed by the calling code.

There are many bugs in this code:
It uses new[], which is C++.
It doesn't free the allocated memory.
It always loops over block_size bytes of input, not bytes_read as returned by fread().
Also, the actual histogram code is rather inefficient, since it seems to loop over each character to determine which character it is.
UPDATE: Removed claim that using feof() before I/O is wrong, since that wasn't true. Thanks to Eric for pointing this out in a comment.

You're not stating what platform you're running this on, and what compile time parameters you use.
Of course, the fread() involves some overhead, leaving user mode and returning. On the other hand, instead of setting the hist[] information directly, you're looping through the alphabet. This is unnecessary and, without optimization, causes some overhead per byte.
I'd re-test this with hist[j-26]++ or something similar.
Typically, the best timing would be achieved if your buffer size equals the system's buffer size for the given media.

A "free(): invalid next size (fast)" in C++

I just ran into a free(): invalid next size (fast) problem while writing a C++ program. And I failed to figure out why this could happen unfortunately. The code is given below.
bool not_corrupt(struct packet *pkt, int size)
{
if (!size) return false;
bool result = true;
char *exp_checksum = (char*)malloc(size * sizeof(char));
char *rec_checksum = (char*)malloc(size * sizeof(char));
char *rec_data = (char*)malloc(size * sizeof(char));
//memcpy(rec_checksum, pkt->data+HEADER_SIZE+SEQ_SIZE+DATA_SIZE, size);
//memcpy(rec_data, pkt->data+HEADER_SIZE+SEQ_SIZE, size);
for (int i = 0; i < size; i++) {
rec_checksum[i] = pkt->data[HEADER_SIZE+SEQ_SIZE+DATA_SIZE+i];
rec_data[i] = pkt->data[HEADER_SIZE+SEQ_SIZE+i];
}
do_checksum(exp_checksum, rec_data, DATA_SIZE);
for (int i = 0; i < size; i++) {
if (exp_checksum[i] != rec_checksum[i]) {
result = false;
break;
}
}
free(exp_checksum);
free(rec_checksum);
free(rec_data);
return result;
}
The macros used are:
#define RDT_PKTSIZE 128
#define SEQ_SIZE 4
#define HEADER_SIZE 1
#define DATA_SIZE ((RDT_PKTSIZE - HEADER_SIZE - SEQ_SIZE) / 2)
The struct used is:
struct packet {
char data[RDT_PKTSIZE];
};
This piece of code doesn't go wrong every time. It would crash with the free(): invalid next size (fast) sometimes in the free(exp_checksum); part.
What's even worse is that sometimes what's in rec_checksum stuff is just not equal to what's in pkt->data[HEADER_SIZE+SEQ_SIZE+DATA_SIZE] stuff, which should be the same according to the watch expressions from my debugging tools. Both memcpy and for methods are used but this problem remains.
I don't quite understand why this would happen. I would be very thankful if anyone could explain this to me.
Edit:
Here's the do_checksum() method, which is very simple:
void do_checksum(char* checksum, char* data, int size)
{
for (int i = 0; i < size; i++)
{
checksum[i] = ~data[i];
}
}
Edit 2:
Thanks for all.
I switched other part of my code from the usage of STL queue to STL vector, the results turn to be cool then.
But still I didn't figure out why. I am sure that I would never pop an empty queue.

The error you report is indicative of heap corruption. These can be hard to track down and tools like valgrind can be extremely helpful. Heap corruptions are often hard to debug with a simple debugger because the runtime error often occurs long after the actual corruption.
That said, the most obvious potential cause of your heap corruption, given the code posted so far, is if DATA_SIZE is greater than size. If that occurs then do_checksum will write beyond the end of exp_checksum.

Three immediate suggestions:
Check for size <= 0 (instead of "!size")
Check for size >= DATA_SIZE
Check for malloc returning NULL

Have you tried Valgrind?
Also, make sure to never send more than RDT_PKTSIZE as size to not_corrupt()
bool not_corrupt(struct packet *pkt, int size)
{
if (!size) return false;
if (size > RDT_PKTSIZE) return false;
/* ... */

Valgrind is good ... but validating all your inputs and checking all error conditions is even better.
Stepping through the code in the debugger isn't a bad idea, either.
I would also call "do_checksum (size)" (your actual size), instead of DATA_SIZE (presumably "maximum size").

DATA_SIZE is a macro defined the max length in my program so the size
should be less than DATA_SIZE
even if that is true, your logic only creates enough memory to hold size characters.
so you should call
do_checksum(exp_checksum, rec_data, size);
and, if you do not want to use std::string (which is fine), you should switch from malloc/free to new/delete when talking C++

Stack around variable corrupt, not sure what the problem is

Problem solved, thank you all for the help
I've got a bit of a problem here it's not something that's blowing my program up, but it's just bothering me that I can't fix it. I have a function reading in some data from a file, at the end of the execution, the stack around variable longGarbage is corrupted. I've looked around a bit and found that a possible cause is writing to invalid memory. I cleaned up some memory leaks that I had and the problem still persists. What's confusing me is that it happens when the function finishes executing, so it appears to be happening when the variable goes out of scope. Here's the code...
CHCF::CHCF(std::string fileName)
: PAKID("HVST84838672")
{
FILE * archive = fopen(fileName.c_str(), "rb");
std::string strGarbage = "";
unsigned int intGarbage = 0;
unsigned long longGarbage = 0;
unsigned char * data = 0;
char charGarbage = '0';
if (!archive)
{
fclose (archive);
return;
}
for (int i = 0; i < 12; i++)
{
fread(&charGarbage, 1, 1, archive);
strGarbage += charGarbage;
}
if (strGarbage != PAKID)
{
fclose(archive);
throw "Incorrect archive format";
}
strGarbage = "";
fread(&_gameID, sizeof(_gameID),1,archive);
fread(&_fileCount, sizeof(_fileCount),1,archive);
for (int i = 0; i < _fileCount; i++)
{
fread(&longGarbage, 8,1,archive); //file offset
fread(&intGarbage, 4, 1, archive);//fileName
for (int i = 0; i < intGarbage; i++)
{
fread(&charGarbage, 1, 1, archive);
strGarbage += charGarbage;
}
fread(&longGarbage, 8, 1, archive); //fileSize
fread(&intGarbage, 4, 1, archive); //fileType
data = new unsigned char[longGarbage];
for (long i = 0; i < longGarbage; i++)
{
fread(&charGarbage, 1, 1, archive);
data[i] = charGarbage;
}
switch ((FILETYPES)intGarbage)
{
case MAP:
_maps.append(strGarbage, new CFileData(strGarbage, FILETYPES::MAP, data, longGarbage));
break;
default:
break;
}
delete [] data;
data = 0;
strGarbage.clear();
longGarbage = 0;
}
fclose(archive);
} //error happens here
Here is the CFileData constructor:
CFileData::CFileData(std::string fileName, FILETYPES type, unsigned char *data, long fileSize)
{
_fileName = fileName;
_type = type;
_data = new unsigned char[fileSize];
for (int i = 0; i < fileSize; i++)
_data[i] = data[i];
}

Might I suggest std::vector instead of calling new and delete manually? Your code is not exception safe -- you leak if an exception is thrown.
fread(&longGarbage, 8, 1, archive); //fileSize Are you sure sizeof(long) is 8? I suspect it's 4. I believe on Linux boxes sometimes it's 8, but most everywhere else sizeof(long) is 4, and sizeof(long long) is 8.
What about any constructors on members of this class? They can corrupt the stack too.

What's happening is that something is writing to memory around or over the location of longGarbage which is causing the corruption.
You don't say what development environment you are using. One way to diagnose this would be to set a breakpoint that triggers when a specific memory location changes. Choose a memory location that overlaps the area of corruption and wait for it to trigger unexpectedly.
Another way to diagnose this would be to examine code that changes memory around or over longGarbage. That could be almost anything of course but likely candidates are modifications to 'data', modifications to 'intGarbage' and modifications to 'longGarbage' itself.
We can narrow things down even further because we can (usually) be fairly sure the assignment operator itself is safe. Code like data = new... isn't likely to be the culprit so really we need to focus on memory changes that involve taking the address of 'data', 'intGarbage' or 'longGarbage'. In particular memory changes that change more bytes than they should.
Several others have already pointed out that a long is probably not eight bytes in length. If you pass the wrong length to fread, the extra bytes retrieved have to go somewhere.

You are using a lot of magic numbers for data sizes, so I would check that first. In particular, I doubt that sizeof(unsigned long)==8 and sizeof(unsigned in)==4 in all possible circumstances. Refer to your compiler's documentation, but you should still be wary, as this is very likely to change from one compiler/platform to another.
Check for these bits:
fread(&longGarbage, 8,1,archive); //file offset
You also might want to use C++ <iostream> library instead of the C FILE* stuff for reading. It would allow for a much shorter version because you wouldn't need to close the file 3 times.

It seems from the other comments and the information provided that the issue is on the C++ side you should use either __int64 for a windows environment, or int64_t for cross platform.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Unzip buffer with large data length is crashing - c++

Related

how to avoid buffer overflow with ostrstream

c++ Allocating memory on real time without vector

C++ Optimal Block Size For Reading From A File

A "free(): invalid next size (fast)" in C++

Stack around variable corrupt, not sure what the problem is

Categories

Resources