Memory consumption by printf( ) - c++

Does displaying a simple statement in C (or C++) occupy some memory?
For example,
//in C
printf("\nHello World");
//in C++
cout<<"Hello World" ;
and, will it make a difference if I attach some value of a variable to be displayed in the same statement?
For example,
printf("Value is %d" , var) ;

Code occupies memory. String literals occupy memory. Function calls (usually) use some stack.
Generally speaking I don't think printf should need to perform any dynamic memory allocations in order to work. But although (I believe) it's possible to avoid it I don't think they're forbidden from doing so. The same goes for cout << when outputting the types that have built-in support. If it ends up calling a user-defined overload then that can use whatever memory it likes.
Posix lists ENOMEM as a possible error for printf but not for snprintf. This suggests that on Posix systems (which of course is not all C implementations) the output might dynamically allocate memory, but the formatting will not.

Does displaying a simple statement in C(or C++) occupies some memory?
Yes, of course. The string constant has to be stored somewhere, usually in a read-only segment of memory. The printf and cout facilities also take up space.
will it make a difference if i attach some value of a variable to be displayed in the same statement?
Yes. The parameters have to be stored somewhere, usually on the stack, so the memory used by the parameters will be returned after the printf or cout call ends. Also the call itself probably generates a few more instructions in the calling routine to push the parameters on the stack.

Also note that printf and cout are buffered write. Chances are if the buffer is not yet filled for output and your program quits which triggers a closure of output stream port, the print string will never show.

Well, yes. The characters in the string requires memory, so these objects are placed in the global memory, but not in the process heap. When the function is called, its arguments passes trough stack, and then the call happens.
push offset string "hello, world"
call dword ptr printf
We have its address, calculated by the offset directive. So it use sizeof(uintptr_t) bytes of stack memory.
[UPD.]
This code:
int val = 5;
printf("val = %d", val);
Disassembles to:
mov eax, dword ptr[val]
push eax
push offset string "val = %d"
call dword ptr printf
So it use 2 * sizeof(uintptr_t) bytes of stack memory.

If you mean heap memory, then the answer is, yes (at least, in general).
We produced some middleware to run on a variety of game consoles, with GCC, visual studio and Metrowerks compilers. A common constraint in the gaming industry is that you may need to allocate to custom heaps -- so we had to ensure that our middleware library made no heap allocations (other than to a heap allocator explicitly provided to us). So ensure this, we had to:
drop calls to print, vsprintf, ...
drop the use of C++ streams
both of these sets of calls (in general) use the heap. printf (in general) requires the heap because of the use of format strings.

Related

Segmentation fault - why and how does it work?

In both the functions defined below, it tries to allocate 10M of memory in the stack. But the segmentation fault happens only in the second case and not it the first and I am trying to understand why so.
Function definition 1:
a(int *i)
{
char iptr[50000000];
*i = 1;
}
Function definition 2:
a()
{
char c;
char iptr[5000000];
printf("&c = 0x%lx, iptr = 0x%x ... ", &c, iptr);
fflush(stdout);
c = iptr[0];
printf("ok\n");
}
According to my understanding in case of local variables that are not alloted memory dynamically are stored in stack section of the program. So I suppose, during compile time itself the compiler checks if the variable fits in the stack or not.
Hence if above stated is true, then segmentation fault should occur in both the cases (i.e. also in case 1).
The website (http://web.eecs.utk.edu/courses/spring2012/cs360/360/notes/Memory/lecture.html) from where I picked this states that the segfault happens in function 2 in a when the code attempts to push iptr on the stack for the printf call. This is because the stack pointer is pointing to the void. Had we not referenced anything at the stack pointer, our program should have worked.
I need help understanding this last statement and my earlier doubt related to this.
So I suppose, during compile time itself the compiler checks if the variable fits in the stack or not.
No, that cannot be done. When compiling a function, the compiler does not know what the call stack will be when the function is called, so it will assume that you know what you are doing (which might or not be the case). Also note that the amount of stack space may be affected by both compile time and runtime restrictions (in Linux you can set the stack size with ulimit on the shell that starts the process).
I need help understanding this last statement and my earlier doubt related to this.
I would not attempt to look too much into that statement, it is not standard but rather based on knowledge of a particular implementation that is not even described there, and thus is built on some assumptions that are not necessarily true.
It assumes that the act of allocating the array does not 'touch' the allocated memory (in some debug builds in some implementations that is false) and thus whether you attempt to allocate 1 byte or 100M if the data is not touched by your program the allocation is fine --this need not be the case.
It also assumes that the arguments of the function printf are passed in the stack (this is actually the case in all implementations I know, due to the variadic arguments nature of the function). With the previous assumption, the array would overflow the stack (assuming an stack of <10M), but would not crash as the memory is not accessed, but to be able to call printf the value of the argument would be pushed to the stack beyond the array. This will write to memory and that write will be beyond the allocated space for the stack and crash.
Again, all this is implementation, not defined by the language.
Error in your code is being thrown by the following code:
; Find next lower page and probe
cs20:
sub eax, _PAGESIZE_ ; decrease by PAGESIZE
test dword ptr [eax],eax ; probe page. "**This line throws the error**"
jmp short cs10
_chkstk endp
end
From chkstk.asm file, which Provide stack checking on procedure entry. And this file explicitically defines:
_PAGESIZE_ equ 1000h
Now as a explanation of your problem This Question tells everything you need as mentioned by: Shafik Yaghmour
Your printf format string assumes that pointers, ints (%x), and longs (%lx) are all the same size; this may be false on your platform, leading to undefined behavior. Use %p instead. I intended to make this a comment, but can't yet.
I am surprised no one noticed that the first function allocates 10 times the space than the second function. There are seven zeros after 5 in the first function whereas the second function has six zeros after 5 :-)
I compiled it with gcc-4.6.3 and got segmentation fault on the first function but not on the second function. After I removed the additional zero in the first function, seg fault went away. Adding a zero in the second function introduced the seg fault. So at least in my case, the reason of this seg fault is that the program could not allocate the required space on the stack. I would be happy to hear about the observations that differ from the above.

Does buffer overflow happen in C++ strings?

This is concerning strings in C++. I have not touched C/C++ for a very long time; infact I did programming in those languages only for the first year in my college, about 7 years ago.
In C to hold strings I had to create character arrays(whether static or dynamic, it is not of concern). So that would mean that I need to guess well in advance the size of the string that an array would contain. Well I applied the same approach in C++. I was aware that there was a std::string class but I never got around to use it.
My question is that since we never declare the size of an array/string in std::string class, does a buffer overflow occur when writing to it. I mean, in C, if the array’s size was 10 and I typed more than 10 characters on the console then that extra data would be writtein into some other object’s memory place, which is adjacent to the array. Can a similar thing happen in std::string when using the cin object.
Do I have to guess the size of the string before hand in C++ when using std::string?
Well! Thanks to you all. There is no one right answer on this page (a lot of different explanations provided), so I am not selecting any single one as such. I am satisfied with the first 5. Take Care!
Depending on the member(s) you are using to access the string object, yes. So, for example, if you use reference operator[](size_type pos) where pos > size(), yes, you would.
Assuming no bugs in the standard library implementation, no. A std::string always manages its own memory.
Unless, of course, you subvert the accessor methods that std::string provides, and do something like:
std::string str = "foo";
char *p = (char *)str.c_str();
strcpy(p, "blah");
You have no protection here, and are invoking undefined behaviour.
The std::string generally protects against buffer overflow, but there are still situations in which programming errors can lead to buffer overflows. While C++ generally throws an out_of_range exception when an operation references memory outside the bounds of the string, the subscript operator [] (which does not perform bounds checking) does not.
Another problem occurs when converting std::string objects to C-style strings. If you use string::c_str() to do the conversion, you get a properly null-terminated C-style string. However, if you use string::data(), which writes the string directly into an array (returning a pointer to the array), you get a buffer that is not null terminated. The only difference between c_str() and data() is that c_str() adds a trailing null byte.
Finally, many existing C++ programs and libraries have their own string classes. To use these libraries, you may have to use these string types or constantly convert back and forth. Such libraries are of varying quality when it comes to security. It is generally best to use the standard library (when possible) or to understand the semantics of the selected library. Generally speaking, libraries should be evaluated based on how easy or complex they are to use, the type of errors that can be made, how easy these errors are to make, and what the potential consequences may be.
refer https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/coding/295-BSI.html
In c the cause is explained as follow:
void function (char *str) {
char buffer[16];
strcpy (buffer, str);
}
int main () {
char *str = "I am greater than 16 bytes"; // length of str = 27 bytes
function (str);
}
This program is guaranteed to cause unexpected behavior, because a string (str) of 27 bytes has been copied to a location (buffer) that has been allocated for only 16 bytes. The extra bytes run past the buffer and overwrites the space allocated for the FP, return address and so on. This, in turn, corrupts the process stack. The function used to copy the string is strcpy, which completes no checking of bounds. Using strncpy would have prevented this corruption of the stack. However, this classic example shows that a buffer overflow can overwrite a function's return address, which in turn can alter the program's execution path. Recall that a function's return address is the address of the next instruction in memory, which is executed immediately after the function returns.
here is a good tutorial that can give your answer satisfactory.
In C++, the std::string class starts out with a minimum size (or you can specify a starting size). If that size is exceeded, std::string allocates more dynamic memory.
Assuming the library providing std::string is correctly written, you cannot cause a buffer overflow by adding characters to a std::string object.
Of course, bugs in the library are not impossible.
"Do buffer overflows occur in C++ code?"
To the extent that C programs are legal C++ code (they almost all are), and C programs have buffer overflows, C++ programs can have buffer overflows.
Being richer than C, I'm sure C++ can have buffer overflows in ways that C cannot :-}

may a whole array reside in some cpu register?

As I'm not too much familiar with cpu registers, in general and in any architecture specially x86 and if compiler-relevant using VC++ I'm curious that is it possible for all elements of an array with a tiny number of elements like an array of 1-byte characters with 4 elements to reside in some cpu register as I know this could be true for single primitives like double, integer, etc ?
when we have a parameter like below:
void someFunc(char charArray[4]){
//whatever
}
Will this parameter passing be definitely done through passing a pointer to the function or that array would be residing in some cpu register eliminating the need to pass a pointer to main memory?
This is not compiler dependent, nor is it possible. Arrays cannot be passed by value in the same way as other types, i.e. they cannot be copied when passed into a function. The C++ standard is clear in that when processing a function signature in a declaration the following are exact equivalencies:
void foo( char *a );
void foo( char a[] );
void foo( char a[4] );
void foo( char a[ 100000 ] );
A compliant compiler will convert the array in the function signature into a pointer. Now, at the place of call, a similar operation takes place: if the argument is an array, the compiler has to decay it into a pointer to the first element. Again, the size of the array is lost in the decay.
Specific registers can be used to hold more than one value and perform operations on them (google for vectorized operations, MME and variants). But while that means that the compiler can actually insert the contents of a small array into a single register, that cannot be used to change the function call that you refer to.
Within a single function, an array could be held in one or more registers, just so long as the compiler is able to produce CPU instructions to manipulate it as the code dictates. The standard doesn't really define what it means for something to "be" in a register. It's a private matter between the compiler and the debugger, and there may be a fine line between something being in a register, and being "optimized away" entirely.
In your example, the parameter is a pointer, not an array (see dribeas' answer). So it would be unusual that the array it points to could possibly be held a register. The "main" architectures that you probably deal with don't allow a pointer to a register, so even if the array was held in a register in the calling code, it would have to be written into memory in order to take a pointer to it, to pass to the callee.
If the function call was inlined, then better optimizations might be possible, just as if there were no call at all.
If you wrap your array in a struct, then you turn it into something that can be passed by value:
struct Foo {
char a[4];
};
void FooFunc(Foo f) {
// whatever
}
Now, the function is taking the actual array data as its parameter, so there's one less barrier to holding it in a register. Whether the implementation's calling convention actually does pass small structs in registers is another question, though. I don't know what calling conventions do this, if any.
Out of the 5 or so compilers I'm fairly familiar with, (Borland/Turbo C/C++ from 1.0, Watcom C/C++ from v8.0, MSC from 5.0, IBM Visual Age C/C++, gcc of various versions on DOS, Linux and Windows) I've not seen this optimization happen naturally.
There was a string library, whose name I cannot remember, that did optimizations similar to this in x86 ASM. It may have been part of the "Spontaneous Assembly" library, but no guarantees.
A function that accepts an array is probably going to index into that array. I know of no architecture that supports efficient indexing into a register, so it's probably pointless to pass arrays in registers.
(On an x86 architecture, you could access a[0] and a[1] by accessing al and ah of the eax register, but that is a special case that only works if the indexes are known at compile time.)
You asked if its possible with VC++ on an x86.
I doubt it's possible in that configuration. True, you could produce assembler code where that array is kept in a register, but due to the nature of arrays it would be by no means a natural optimization for a compiler, so I doubt they put it in.
You can try it out though and produce some code where the compiler would have an "incentive" to put it in a register, but it would look pretty weird like
char x[4];
*((int*)x) = 36587467;
Compile that with optimizations and the /FA switch and look at the assembler code produced (and then tell us the results :-))
If you use it in a more "natural" way, like accessing single characters or initializing it with a string there is no reason at all for the compiler to put that array into a register.
Even when passing it to a function - the compiler might put the address of the array into the register, but not the array itself
Only variables can be stored in a register. You can try to force register storage by using the register keyword: register int i;
Arrays are by default pointers.
You can get the value located at the 4 position like this (using pointer syntax):
char c = *(charArray + 4);

off-by-one error with string functions (C/C++) and security potentials

So this code has the off-by-one error:
void foo (const char * str) {
char buffer[64];
strncpy(buffer, str, sizeof(buffer));
buffer[sizeof(buffer)] = '\0';
printf("whoa: %s", buffer);
}
What can malicious attackers do if she figured out how the function foo() works?
Basically, to what kind of security potential problems is this code vulnerable?
I personally thought that the attacker can't really do anything in this case, but I heard that they can do a lot of things even if they are limited to work with 1 byte.
The only off-by-one error I see here is this line:
buffer[sizeof(buffer)] = '\0';
Is that what you're talking about? I'm not an expert on these things, so maybe I've overlooking something, but since the only thing that will ever get written to that wrong byte is a zero, I think the possibilities are quite limited. The attacker can't control what's being written there. Most likely it would just cause a crash, but it could also cause tons of other odd behavior, all of it specific to your application. I don't see any code injection vulnerability here unless this error causes your app to expose another such vulnerability that would be used as the vector for the actual attack.
Again, take with a grain of salt...
Read Shell Coder's Handbook 2nd Edition for lots of information.
Disclaimer: This is inferred knowledge from some research I just did, and should not be taken as gospel.
It's going to overwrite part or all of your saved frame pointer with a null byte - that's the reference point that your calling function will use to offset it's memory accesses. So at that point the calling function's memory operations are going to a different location. I don't know what that location will be, but you don't want to be accessing the wrong memory. I won't say you can do anything, but you might be able to do something.
How do I know this (really, how did I infer this)? Smashing the stack for Fun and Profit by Aleph One. It's quite old, and I don't know if Windows or Compilers have changed the way the stack behaves to avoid these problems. But it's a starting point.
example1.c:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
To understand what the program does to call function() we compile it with
gcc using the -S switch to generate assembly code output:
$ gcc -S -o example1.s example1.c
By looking at the assembly language output we see that the call to
function() is translated to:
pushl $3
pushl $2
pushl $1
call function
This pushes the 3 arguments to function backwards into the stack, and
calls function(). The instruction 'call' will push the instruction pointer
(IP) onto the stack. We'll call the saved IP the return address (RET). The
first thing done in function is the procedure prolog:
pushl %ebp
movl %esp,%ebp
subl $20,%esp
This pushes EBP, the frame pointer, onto the stack. It then copies the
current SP onto EBP, making it the new FP pointer. We'll call the saved FP
pointer SFP. It then allocates space for the local variables by subtracting
their size from SP.
We must remember that memory can only be addressed in multiples of the
word size. A word in our case is 4 bytes, or 32 bits. So our 5 byte buffer
is really going to take 8 bytes (2 words) of memory, and our 10 byte buffer
is going to take 12 bytes (3 words) of memory. That is why SP is being
subtracted by 20. With that in mind our stack looks like this when
function() is called (each space represents a byte):
bottom of top of
memory memory
buffer2 buffer1 sfp ret a b c
<------ [ ][ ][ ][ ][ ][ ][ ]
top of bottom of
stack stack
What can malicious attackers do if she
figured out how the function foo()
works? Basically, to what kind of
security potential problems is this
code vulnerable?
This is probably not the best example of a bug that could be easily exploited for security purposes although it could exploited to potentially crash the code simply by using a string of 64-characters or longer.
While it certainly is a bug that will corrupt the address immediately after the array (on the stack) with a single zero byte, there is no easy way for a hacker to inject data into the corrupted area. Calling the printf() function will push parameters on the stack and may clear the zero that was written out of array bounds and lead to a potentially unterminated string being passed to printf.
However, without intimate knowledge of what goes on in printf (and needing to exploit printf as well as foo), a hacker would be hard pressed to do anything other than crash your code.
FWIW, this is a good reason to compile with warnings on or to use functions like strncpy_s which both respects buffer size and also includes a terminating null even if the copied string is larger than the buffer. With strncpy_s, the line "buffer[sizeof(buffer)] = '\0';" is not even necessary.
The issue is that you don't have permission to write to the item after the array. When you asked for 64 chars for buffer, the system is required to give you at least 64 bytes. It's normal for the system to give you more than that -- in which case the memory belongs to you and there is no problem in practice -- but it is possible that even the first byte after the array belongs to "somebody else."
So what happens if you overwrite it? If the "somebody else" is actually inside your program (maybe in a different structure or thread) the operating system probably won't notice you trampled on that data, but that other structure or thread might. There's no telling what data should be there or how trampling over it will affect things.
In this case you allocated buffer on the stack, which means (1) the somebody else is you, and in fact is your current stack frame, and (2) it's not in another thread (but could affect other local variables in the current stack frame).

Using sprintf without a manually allocated buffer

In the application that I am working on, the logging facility makes use of sprintf to format the text that gets written to file. So, something like:
char buffer[512];
sprintf(buffer, ... );
This sometimes causes problems when the message that gets sent in becomes too big for the manually allocated buffer.
Is there a way to get sprintf behaviour without having to manually allocate memory like this?
EDIT: while sprintf is a C operation, I'm looking for C++ type solutions (if there are any!) for me to get this sort of behaviour...
You can use asprintf(3) (note: non-standard) which allocates the buffer for you so you don't need to pre-allocate it.
No you can't use sprintf() to allocate enough memory. Alternatives include:
use snprintf() to truncate the message - does not fully resolve your problem, but prevent the buffer overflow issue
double (or triple or ...) the buffer - unless you're in a constrained environment
use C++ std::string and ostringstream - but you'll lose the printf format, you'll have to use the << operator
use Boost Format that comes with a printf-like % operator
I dont also know a version wich avoids allocation, but if C99 sprintfs allows as string the NULL pointer. Not very efficient, but this would give you the complete string (as long as enough memory is available) without risking overflow:
length = snprintf(NULL, ...);
str = malloc(length+1);
snprintf(str, ...);
"the logging facility makes use of sprintf to format the text that gets written to file"
fprintf() does not impose any size limit. If you can write the text directly to file, do so!
I assume there is some intermediate processing step, however. If you know how much space you need, you can use malloc() to allocate that much space.
One technique at times like these is to allocate a reasonable-size buffer (that will be large enough 99% of the time) and if it's not big enough, break the data into chunks that you process one by one.
With the vanilla version of sprintf, there is no way to prevent the data from overwriting the passed in buffer. This is true regardless of wether the memory was manually allocated or allocated on the stack.
In order to prevent the buffer from being overwritten you'll need to use one of the more secure versions of sprintf like sprintf_s (windows only)
http://msdn.microsoft.com/en-us/library/ybk95axf.aspx