Windows readfile with dynamic buffer

Windows readfile with dynamic buffer - c++

I am trying to write a wrapper around Windows file functions, one would read num bytes amount of data from the file and retrun it. For some reason I fail to allocate the memory properly, but I just can't find the reason why:
PBYTE Read(int num_bytes, HANDLER hFile){
PBYTE bBuffer;
DWORD new_size = sizeof(BYTE)*num_bytes;
//after the allocation the debugger already displays a 16 char wide placeholder
bBuffer = (PBYTE)malloc(new_size);
OVERLAPPED o = { 0 };
o.Offset = 0;
BOOL bReadDone = ReadFile(hFile, (LPVOID)bBuffer, sizeof(BYTE)*num_bytes, NULL, &o);
return bBuffer;
}
Data gets copied, but the allocated buffer is always too wide and contains extra wierd filler characters. Can sby please explain what am I doing wrong?

"what am I doing wrong?"
sizeof(BYTE) is 1 so you can remove it everywhere and eliminate the redundant new_size variable.
You tagged your question C++ but used malloc to allocate the buffer. Your design makes the caller responsible for freeing the buffer, which is a poor design approach, and even more so by using malloc/free in C++ program. A good C++ solution to this quandry would be to return a
std::vector.
It is vital that you provide the lpNumberOfBytesRead parameter to ReadFile. Without it you don't know how many bytes were read. And if you don't know how many bytes were read you can't tell the difference between "extra wierd filler characters" and unused memory at the end of the buffer. If the data is characters then character-oriented output routines (and debugger tools) don't know the difference either, since there is no null terminator at the end of the data that was actually read. You could use NumberOfBytesRead to put in a nul terminator so you and the debugger don't read beyond the real data.

Related

Buffer Overflow Sample: Why is this code dangerous?

I'm fairly new to c and I'm reading a book regarding Software Vulnerabilities and I came across this buffer overflow sample, it mentions that this can cause a buffer overflow. I am trying to determine how this is the case.
int handle_query_string(char *query_string)
{
struct keyval *qstring_values, *ent;
char buf[1024];
if(!query_string) {
return 0;
}
qstring_values = split_keyvalue_pairs(query_string);
if((ent = find_entry(qstring_values, "mode")) != NULL) {
sprintf(buf, "MODE=%s", ent->value);
putenv(buf);
}
}
I am paying close attention to this block of code because this appears to be where the buffer overflow is caused.
if((ent = find_entry(qstring_values, "mode")) != NULL)
{
sprintf(buf, "MODE=%s", ent->value);
putenv(buf);
}

I think here is it, because your buf is only 1024 and because ent->value can have more than 1024, then this may overflow.
sprintf(buf, "MODE=%s", ent->value);
But depends of implementations of split_keyvalue_pairs(query_string). If this function already checks the value and threat it (which I doubt).

klutt provided a good fix for the problem in a previous answer, so I'll try to go a bit more specific and in-depth on the exact nature of the overflow in your code.
char buf[1024];
This line allocates 1024 bytes on the stack, addressed by the pointer named buf. The big problem here is that it is on the stack. If you dynamically allocate using malloc (or my favorite: calloc), it will be on the heap. The location doesn't necessarily prevent or fix an overflow. But it can change the effect. Right above (give or take some bytes) this space on the stack would be the return address from the function, and an overflow can change that causing the program to redirect when it returns.
sprintf(buf, "MODE=%s", ent->value);
This line is what actually performs the overflow. sprintf = "string print format." This means that the destination is a string (char *), and you are printing a formatted string. It doesn't care about the length, it will just take the starting memory address of the destination string, and keep writing until it has finished. If there's more than 1024 characters to be written (in this case), then it will go past the end of your buffer and overflow into other parts of memory. The solution is to use the function snprint instead. The "n" tells you that it will limit the amount to be written to the destination, and avoid an overflow.
The ultimate thing to understand is that a "buffer" does not actually exist. It's simply not a thing. It is a concept we use to order the area in memory, but the computer has no idea what a buffer is, where it starts, or where it ends. So in writing, the computer doesn't really care if it is inside or outside of the buffer, and doesn't know where to stop writing. So, we need to tell it very explicitly how far it is allowed to write, or it will just keep writing.

A very big thing here is that you passed a pointer to a local variable to putenv. The buffer will cease to exist when handle_query_string returns. After that it will contain garbage variables. Note that what putenv does require that the string passed to it remains unchanged for the rest of the program. From the documentation for putenv (emphasis mine):
int putenv(char *string);
The putenv() function adds or changes the value of environment variables. The argument string is of the form name=value. If name does not already exist in the environment, then string is added to the environment. If name does exist, then the value of name in the environment is changed to value. The string pointed to by string becomes part of the environment, so altering the string changes the environment.
This can be corrected by using dynamic allocation. char *buf = malloc(1024) instead of char buf[1024]
Another thing is that sprintf(buf, "MODE=%s", ent->value); might overflow. That would happen if the string ent->value is too long. A solution there is to use snprintf instead.
snprintf(buf, sizeof buf, "MODE=%s", ent->value);
This prevents overflow, but it might still cause problems, because if ent->value is too big to fit in buf, then buf will for obvious reasons not contain the full string.
Here is a way to rectify both issues:
int handle_query_string(char *query_string)
{
struct keyval *qstring_values, *ent;
char *buf = NULL;
if(!query_string)
return 0;
qstring_values = split_keyvalue_pairs(query_string);
if((ent = find_entry(qstring_values, "mode")) != NULL)
{
// Make sure that the buffer is big enough instead of using
// a fixed size. The +5 on size is for "MODE=" and +1 is
// for the string terminator
const char[] format_string = "MODE=%s";
const size_t size = strlen(ent->value) + 5 + 1;
buf = malloc(size);
// Always check malloc for failure or chase nasty bugs
if(!buf) exit(EXIT_FAILURE);
sprintf(buf, format_string, ent->value);
putenv(buf);
}
}
Since we're using malloc the allocation will remain after the function exits. And for the same reason, we make sure that the buffer is big enough beforehand, and thus, using snprintf instead of sprintf is not necessary.
Theoretically, this has a memory leak unless you use free on all strings you have allocated, but in reality, not freeing before exiting main is very rarely a problem. Might be good to know though.
It can also be good to know that even though this code now is fairly protected, it's still not thread safe. The content of query_string and thus also ent->value may be altered. Your code does not show it, but it seems highly likely that find_entry returns a pointer that points somewhere in query_string. This can of course also be solved, but it can get complicated.

proper way to use GNU/Linux read() function

in the man pages of GNU/Linux the read function is described with following synopsis:
ssize_t read(int fd, void *buf, size_t count);
I would like to use this function to read data from a socket or a serial port. If the count is greater than one, the pointer supplied in the function argument will point to the last byte that was read from the port in the memory so pointer decrement is necessary for bringing the pointer to the first byte of data. This is dangerous because using it in a language like C++ with it's dynamic memory allocation of containers based on their size and space needs could corrupt data at the point of return from read() function. I thought of using a C-style array instead of a pointer. Is this the correct approach? If not, what is the correct way to do this? The programming language I'm using is C++.
EDIT:
The code that caused the described situation is as follows:
QSerialPort class was used to configure and open the port with following parameters:
Baudrate of 115200
8 data bits
No parity
One stop bit
No flow control
and for the reading part as long as the stackoverflow is concerned the read is performed exactly like this:
A std::vector containing a number of structs defined this way:
struct DataMember
{
QString name;
size_t count;
char *buff;
}
then within a while loop until the end of the mentioned std::vector is reached, a read() is performed based on count member variable of the said struct and the data is stored in the same struct's buff:
ssize_t nbytes = read(port->handle(), v.at(i).buff, v.at(i).count);
and then the data is printed on the console. In my test case as long as the data is one byte the value printed is correct but for more than one byte the value displayed is the last value that was read from the port plus some garbage values. I don't know why is this happening. Note that the correct result is obtained when the char *buff is changed to char buff[count].

If the count is greater than one, the pointer supplied in the function argument will point to the last byte that was read from the port in the memory
No. The pointer is passed to the read() method by value, so it is therefore completely and utterly impossible for the value to be any different after the call than it was before, regardless of the count.
so pointer decrement is necessary for bringing the pointer to the first byte of data.
The pointer already points to the first byte of data. No decrement is necessary.
This is dangerous because using it in a language like C++ with it's dynamic memory allocation of containers based on their size and space needs could corrupt data at the point of return from read() function.
This is all nonsense based on an impossibility.
You are mistaken about all this.

In my test case as long as the data is one byte the value printed is correct but for more than one byte the value displayed is the last value that was read from the port plus some garbage values.
From the read(2) manpage:
On success, the number of bytes read is returned (zero indicates end of file),
and the file position is advanced by this number. It is not an error if this number is
smaller than the number of bytes requested; this may happen for example because fewer
bytes are actually available right now (maybe because we were close to end-of-file, or
because we are reading from a pipe, or from a terminal), or because read() was interrupted
by a signal. On error, -1 is returned, and errno is set appropriately. In this case it
is left unspecified whether the file position (if any) changes.
In the case of pipes, sockets and character devices (that includes serial ports) and a blocking file descriptor (default) read will, in practice, not wait for the full count. In your case read() blocks until a byte comes in on the serial port and returns. That is why in the output the first byte is correct and the rest is garbage (uninitialized memory). You have to add a loop around the read() that repeats until count bytes have been read if you need the full count.

I don't know why is this happening.
But I know. char * is just a pointer, but that pointer needs to be initialized to something before you can use it. Without doing so you're invoking undefined behavior and everything might happen.
Instead of the size_t count; and char *buff elements you should just use a std::vector<char>, before making the read call, resize it to the number of bytes you want to read, then take the address of the first element of that vector and pass that to read:
struct fnord {
std::string name;
std::vector data;
};
and use it like this; note that using read requires some additional work to properly deal with signal and error conditions.
size_t readsomething(int fd, size_t count, fnord &f)
{
// reserve memory
f.data.reserve(count);
int rbytes = 0;
int rv;
do {
rv = read(fd, &f.data[rbytes], count - rbytes);
if( !rv ) {
// End of File / Stream
break;
}
if( 0 > rv ) {
if( EINTR == errno ) {
// signal interrupted read... restart
continue;
}
if( EAGAIN == errno
|| EWOULDBLOCK == errno ) {
// file / socket is in nonblocking mode and
// no more data is available.
break;
}
// some critical error happened. Deal with it!
break;
}
rbytes += rv;
} while(rbytes < count);
return rbyteS;
}
Looking at your first paragraph of gibberish:
If the count is greater than one, the pointer supplied in the function argument will point to the last byte that was read from the port in the memory
What makes you think so? This is not how it works. Most likely you passed some invalid pointer that wasn't properly initialized. Anything can happen.
so pointer decrement is necessary for bringing the pointer to the first byte of data.
Nope. That's not how it works.
This is dangerous because using it in a language like C++ with it's dynamic memory allocation of containers based on their size and space needs could corrupt data at the point of return from read() function.
Nope. That's not how it works!
C and C++ are an explicit languages. Everything happens in plain sight and nothing happens without you (the programmer) explicitly requesting it. No memory is allocated without you requesting this to happen. It can either be an explicit new, some RAII, automatic storage or the use of a container. But nothing happens "out of the blue" in C and C++. There's no built-in garbage collection^1 in C nor C++. Objects don't move around in memory or resize without you explicitly coding something into your program that makes this happen.
[1]: There are GC libraries you can use, but those never will stomp onto anything that can be reached by code that's executing. Essentially garbage collector libraries for C and C++ are memory leak detectors, which will free memory that can no longer be reached by normal program flow.

(C/C++) Size of an array of chars

How to find out the lenght of an array of chars that is not null terminated/zero terminated or anything like that?
Because I wrote a writeFile function and I wanna get rid of that 'len' parameter.
int writeFile(FILE * handle, char * data, int len)
{
fseek(handle, 0, SEEK_SET);
for(int i=0; i <= len; i++)
fputc(data[i], handle);
}

You cannot get rid of the len parameter. The computer is not an oracle to guess your intentions. But you can use the fwrite() function which will write your data much more efficiently than fputc().

there is no portable way*, that is why sentinel values like null terminators are used.
In fact, its better to specify a length parameter, as it allows partial writes from data buffers (though in your case I would use a size_t/std::size_t for the length).
*you could try using _msize on windows, but it will lead to tears.

#define writeFile(handle, data) writeFileImpl(handle, data, sizeof data)

As Seith Carnegie commented and others answered, you cannot do that (getting the length of any array of char).
Some C libraries provide you with an extension giving an (over-sized) estimate of the length of heap-allocated memory (e.g. pointers obtained by malloc).
If you uses Boehm's garbage collector (which is very useful!), it gives you in <gc/gc.h> the GC_size function.
But when the array of char is inside a structure, or on the call stack, there is no way to get its size at runtime. Only your program knows it.

You can't get rid of the len parameter unless you have another way of determining the length of your data (usually by using a null terminator). This is because C and C++ don't store the length of the data. Furthermore, programmers might appreciate the len parameter. You don't always want to write out all the bytes in your array.

boost memorybuffer and char array

I'm currently unpacking one of blizzard's .mpq file for reading.
For accessing the unpacked char buffer, I'm using a boost::interprocess::stream::memorybuffer.
Because .mpq files have a chunked structure always beginning with a version header (usually 12 bytes, see http://wiki.devklog.net/index.php?title=The_MoPaQ_Archive_Format#2.2_Archive_Header), the char* array representation seems to truncate at the first \0, even if the filesize (something about 1.6mb) remains constant and (probably) always allocated.
The result is a streambuffer with an effective length of 4 ('REVM' and byte nr.5 is \0). When attempting to read further, an exception is thrown. Here an example:
// (somewhere in the code)
{
MPQFile curAdt(FilePath);
size_t size = curAdt.getSize(); // roughly 1.6 mb
bufferstream memorybuf((char*)curAdt.getBuffer(), curAdt.getSize());
// bufferstream.m_buf.m_buffer is now 'REVM\0' (Debugger says so),
// but internal length field still at 1.6 mb
}
//////////////////////////////////////////////////////////////////////////////
// wrapper around a file oof the mpq_archive of libmpq
MPQFile::MPQFile(const char* filename) // I apologize my naming inconsistent convention :P
{
for(ArchiveSet::iterator i=gOpenArchives.begin(); i!=gOpenArchives.end();++i)
{
// gOpenArchives points to MPQArchive, wrapper around the mpq_archive, has mpq_archive * mpq_a as member
mpq_archive &mpq_a = (*i)->mpq_a;
// if file exists in that archive, tested via hash table in file, not important here, scroll down if you want
mpq_hash hash = (*i)->GetHashEntry(filename);
uint32 blockindex = hash.blockindex;
if ((blockindex == 0xFFFFFFFF) || (blockindex == 0)) {
continue; //file not found
}
uint32 fileno = blockindex;
// Found!
size = libmpq_file_info(&mpq_a, LIBMPQ_FILE_UNCOMPRESSED_SIZE, fileno);
// HACK: in patch.mpq some files don't want to open and give 1 for filesize
if (size<=1) {
eof = true;
buffer = 0;
return;
}
buffer = new char[size]; // note: size is 1.6 mb at this time
// Now here comes the tricky part... if I step over the libmpq_file_getdata
// function, I'll get my truncated char array, which I absolutely don't want^^
libmpq_file_getdata(&mpq_a, hash, fileno, (unsigned char*)buffer);
return;
}
}
Maybe someone could help me. I'm really new to STL and boost programming and also inexperienced in C++ programming anyways :P Hope to get a convenient answer (plz not suggest to rewrite libmpq and the underlying zlib architecture^^).
The MPQFile class and the underlying uncompress methods are acutally taken from a working project, so the mistake is either somewhere in the use of the buffer with the streambuffer class or something internal with char array arithmetic I haven't a clue of.
By the way, what is the difference between using signed/unsigned chars as data buffers? Has it anything to do with my problem (you might see, that in the code randomly char* unsigned char* is taken as function arguments)
If you need more infos, feel free to ask :)

How are you determining that your char* array is being 'truncated' as you call it? If you're printing it or viewing it in a debugger it will look truncated because it will be treated like a string, which is terminated by \0. The data in 'buffer' however (assuming libmpq_file_getdata() does what it's supposed to do) will contain the whole file or data chunk or whatever.

Sorry, messed up a bit with these terms (not memorybuffer actually, streambuffer is meant as in the code)
Yeah you where right... I had a mistake in my exception handling. Right after that first bit of code comes this:
// check if the file has been open
//if (!mpf.is_open())
pair<char*, size_t> temp = memorybuf.buffer();
if(temp.first)
throw AdtException(ADT_PARSEERR_EFILE);//Can't open the File
notice the missing ! before temp.first . I was surprized by the exception thrown, looked at the streambuffer .. internal buffer at was confused of its length (C# background :P).
Sorry for that, it's working as expected now.

Is there any way to determine how many characters will be written by sprintf?

I'm working in C++.
I want to write a potentially very long formatted string using sprintf (specifically a secure counted version like _snprintf_s, but the idea is the same). The approximate length is unknown at compile time so I'll have to use some dynamically allocated memory rather than relying on a big static buffer. Is there any way to determine how many characters will be needed for a particular sprintf call so I can always be sure I've got a big enough buffer?
My fallback is I'll just take the length of the format string, double it, and try that. If it works, great, if it doesn't I'll just double the size of the buffer and try again. Repeat until it fits. Not exactly the cleverest solution.
It looks like C99 supports passing NULL to snprintf to get the length. I suppose I could create a module to wrap that functionality if nothing else, but I'm not crazy about that idea.
Maybe an fprintf to "/dev/null"/"nul" might work instead? Any other ideas?
EDIT: Alternatively, is there any way to "chunk" the sprintf so it picks up mid-write? If that's possible it could fill the buffer, process it, then start refilling from where it left off.

The man page for snprintf says:
Return value
Upon successful return, these functions return the number of
characters printed (not including the trailing '\0' used to end
output to strings). The functions snprintf and vsnprintf do not
write more than size bytes (including the trailing '\0'). If
the output was truncated due to this limit then the return value
is the number of characters (not including the trailing '\0')
which would have been written to the final string if enough
space had been available. Thus, a return value of size or more
means that the output was truncated. (See also below under
NOTES.) If an output error is encountered, a negative value is
returned.
What this means is that you can call snprintf with a size of 0. Nothing will get written, and the return value will tell you how much space you need to allocate to your string:
int how_much_space = snprintf(NULL, 0, fmt_string, param0, param1, ...);

As others have mentioned, snprintf() will return the number of characters required in a buffer to prevent the output from being truncated. You can simply call it with a 0 buffer length parameter to get the required size then use an appropriately sized buffer.
For a slight improvement in efficiency, you can call it with a buffer that's large enough for the normal case and only do a second call to snprintf() if the output is truncated. In order to make sure the buffer(s) are properly freed in that case, I'll often use an auto_buffer<> object that handles the dynamic memory for me (and has the default buffer on the stack to avoid a heap allocation in the normal case).
If you're using a Microsoft compiler, MS has a non-standard _snprintf() that has serious limitations of not always null terminating the buffer and not indicating how big the buffer should be.
To work around Microsoft's non-support, I use a nearly public domain snprintf() from Holger Weiss.
Of course if your non-MS C or C++ compiler is missing snprintf(), the code from the above link should work just as well.

I would use a two-stage approach. Generally, a large percentage of output strings will be under a certain threshold and only a few will be larger.
Stage 1, use a reasonable sized static buffer such as 4K. Since snprintf() can restrict how many characters are written, you won't get a buffer overflow. What you will get returned from snprintf() is the number of characters it would have written if your buffer had been big enough.
If your call to snprintf() returns less than 4K, then use the buffer and exit. As stated, the vast majority of calls should just do that.
Some will not and that's when you enter stage 2. If the call to snprintf() won't fit in the 4K buffer, you at least now know how big a buffer you need.
Allocate, with malloc(), a buffer big enough to hold it then snprintf() it again to that new buffer. When you're done with the buffer, free it.
We worked on a system in the days before snprintf() and we acheived the same result by having a file handle connected to /dev/null and using fprintf() with that. /dev/null was always guaranteed to take as much data as you give it so we would actually get the size from that, then allocate a buffer if necessary.
Keep in kind that not all systems have snprintf() (for example, I understand it's _snprintf() in Microsoft C) so you may have to find the function that does the same job, or revert to the fprintf /dev/null solution.
Also be careful if the data can be changed between the size-checking snprintf() and the actual snprintf() to the buffer (i.e., wathch out for threads). If the sizes increase, you'll get buffer overflow corruption.
If you follow the rule that data, once handed to a function, belongs to that function exclusively until handed back, this won't be a problem.

For what it's worth, asprintf is a GNU extension that manages this functionality. It accepts a pointer as an output argument, along with a format string and a variable number of arguments, and writes back to the pointer the address of a properly-allocated buffer containing the result.
You can use it like so:
#define _GNU_SOURCE
#include <stdio.h>
int main(int argc, char const *argv[])
{
char *hi = "hello"; // these could be really long
char *everyone = "world";
char *message;
asprintf(&message, "%s %s", hi, everyone);
puts(message);
free(message);
return 0;
}
Hope this helps someone!

Take a look at CodeProject: CString-clone Using Standard C++. It uses solution you suggested with enlarging buffer size.
// -------------------------------------------------------------------------
// FUNCTION: FormatV
// void FormatV(PCSTR szFormat, va_list, argList);
//
// DESCRIPTION:
// This function formats the string with sprintf style format-specs.
// It makes a general guess at required buffer size and then tries
// successively larger buffers until it finds one big enough or a
// threshold (MAX_FMT_TRIES) is exceeded.
//
// PARAMETERS:
// szFormat - a PCSTR holding the format of the output
// argList - a Microsoft specific va_list for variable argument lists
//
// RETURN VALUE:
// -------------------------------------------------------------------------
void FormatV(const CT* szFormat, va_list argList)
{
#ifdef SS_ANSI
int nLen = sslen(szFormat) + STD_BUF_SIZE;
ssvsprintf(GetBuffer(nLen), nLen-1, szFormat, argList);
ReleaseBuffer();
#else
CT* pBuf = NULL;
int nChars = 1;
int nUsed = 0;
size_type nActual = 0;
int nTry = 0;
do
{
// Grow more than linearly (e.g. 512, 1536, 3072, etc)
nChars += ((nTry+1) * FMT_BLOCK_SIZE);
pBuf = reinterpret_cast<CT*>(_alloca(sizeof(CT)*nChars));
nUsed = ssnprintf(pBuf, nChars-1, szFormat, argList);
// Ensure proper NULL termination.
nActual = nUsed == -1 ? nChars-1 : SSMIN(nUsed, nChars-1);
pBuf[nActual+1]= '\0';
} while ( nUsed < 0 && nTry++ < MAX_FMT_TRIES );
// assign whatever we managed to format
this->assign(pBuf, nActual);
#endif
}

I've looked for the same functionality you're talking about, but as far as I know, something as simple as the C99 method is not available in C++, because C++ does not currently incorporate the features added in C99 (such as snprintf).
Your best bet is probably to use a stringstream object. It's a bit more cumbersome than a clearly written sprintf call, but it will work.

Since you're using C++, there's really no need to use any version of sprintf. The simplest thing to do is use a std::ostringstream.
std::ostringstream oss;
oss << a << " " << b << std::endl;
oss.str() returns a std::string with the contents of what you've written to oss. Use oss.str().c_str() to get a const char *. It's going to be a lot easier to deal with in the long run and eliminates memory leaks or buffer overruns. Generally, if you're worrying about memory issues like that in C++, you're not using the language to its full potential, and you should rethink your design.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Windows readfile with dynamic buffer - c++

Related

Buffer Overflow Sample: Why is this code dangerous?

proper way to use GNU/Linux read() function

(C/C++) Size of an array of chars

boost memorybuffer and char array

Is there any way to determine how many characters will be written by sprintf?

Categories

Resources