Hypothetical Memory Usage Monitoring Program - c++

Would it be at all possible (I don't care about practicality or usefulness) to write a C or C++ program that monitored memory usage in the following, very basic way?
Given that declaring a variable without assigning it a value results in it having the value of whatever is already at its memory location, one could create a large array (thousands or millions of elements) and leave all the values unassigned. Then to see if any of these elements have been overwritten, we would simply need to repeatedly compare their current values to a previous value.
I highly doubt this would be as simple as I posited above. Assuming my doubt is well-founded, wherein would the problem lie and, more importantly, would it be something we could circumvent with some creative or esoteric code? I imagine that the problem would be attributable to something along the lines of the declared, uninitialized elements being not allowing other system processes to write to their memory address. Please give me some pointers! (heehee) Thanks.

Lets say your program is in C
Creating a large array is limited to the extent free memory is allowed and how the OS limits you.
So let's say you created a pretty large array (uninitialized).
Now that memory is given to your process(program you ran) and no other process can access it ! (It's OS role to avoid such things , basic requirements of Virtualization).
So as no other process can access its value won't be changed once its allocated to you.

Related

Dynamical initialization of memory at a given memory address

Ok this might seem odd but please bear with me, I'm just a beginner. Over the past few days i have been trying to develop a general purpose hash function for maintaining an associative array with a hash table using all the best parts of hash functions like RS ,JS , ELF e.t.c to reduce hash collisions. but now the problem is even now to avoid a appreciable amount of collision i will have to use unsigned long values with atleast 6 digits to avoid collision.
Lets just assume i just need to map names of students to their marks.So i maintain an integer array for the marks.
Now back to my question.
The idea i thought of was to use these values as few lower order bits of of an actual memory address and then dynamically initialize memory large enough to store a integer for the marks obtained. This process is repeated for each new value added.
Now assuming i somehow managed to avoid all memory locations that would be reserved by the OS
Is there any viable way to dynamically initialize memory at an address we like rather than letting the new operator to initialize it and then return a pointer to that address location in C++. (i'm using gcc).
It is platform-dependant. On Unix systems, you might try using mmap(). The Windows equivalent is VirtualAlloc(). But there is no guarantee since the address might already be in use.

Why does my dynamically allocated array get initialized to 0?

I have some code that creates a dynamically allocated array with
int *Array = new int[size];
From what I understand, Array should be a pointer to the first item of Array in memory. When using gdb, I can call x Array to examine the value at the first memory location, x Array+1 to examine the second, etc. I expect to have junk values left over from whatever application was using those spots in memory prior to mine. However, using x Array returns 0x00000000 for all those spots. What am I doing wrong? Is my code initializing all of the values of the Array to zero?
EDIT: For the record, I ask because my program is an attempt to implement this: http://eli.thegreenplace.net/2008/08/23/initializing-an-array-in-constant-time/. I want to make sure that my algorithm isn't incrementing through the array to initialize every element to 0.
In most modern OSes, the OS gives zeroed pages to applications, as opposed to letting information seep between unrelated processes. That's important for security reasons, for example. Back in the old DOS days, things were a bit more casual. Today, with memory protected OSes, the OS generally gives you zeros to start with.
So, if this new happens early in your program, you're likely to get zeros. You'd be crazy to rely on that though; it's undefined behavior if you do.
If you keep allocating, filling, and freeing memory, eventually new will return memory that isn't zeroed. Rather, it'll contain remnants of your process' own earlier scribblings.
And there's no guarantee that any particular call to new, even at the beginning of your program, will return memory filled with zeros. You're just likely to see that for calls to new early in your program. Don't let that mislead you.
I expect to have junk values left over from whatever application was using those spots
It's certainly possible but by no means guaranteed. Particularly in debug builds, you're just as likely to have the runtime zero out that memory (or fill it with some recognisable bit pattern) instead, to help you debug things if you use the memory incorrectly.
And, really, "those spots" is a rather loose term, given virtual addressing.
The important thing is that, no, your code is not setting all those values to zero.

What does "STL allocate memory internally" means?

I was reading this answer and maybe because I have never encountered this words, I don't understand what the user was mentioning in the first point of that answer, can someone use simpler words or an example to show what that statement means ?
When you use something like vectors or map ,... it belongs to STL (STANDARD TEMPLATE LIBRARY). you don't need to allocate memory as you do in arrays. In realtime the arrays are not sufficient and we cannot determine size.
STL containers will allocate memory internally, as you add elements to it. so there is good memory management. [if users manually allot, it might be not enough if alloted less or gets wasted if alloted too much memory].

how to manage large arrays

I have a c++ program that uses several very large arrays of doubles, and I want to reduce the memory footprint of this particular part of the program. Currently, I'm allocating 100 of them and they can be 100 Mb each.
Now, I do have the advantage, that eventually parts of these arrays become obsolete during later parts of the program's execution, and there is little need to ever have the whole of any one of then in memory at any one time.
My question is this:
Is there any way of telling the OS after I have created the array with new or malloc that a part of it is unnecessary any more ?
I'm coming to the conclusion that the only way to achieve this is going to be to declare an array of pointers, each of which may point to a chunk say 1Mb of the desired array, so that old chunks that are not needed any more can be reused for new bits of the array. This seems to me like writing a custom memory manager which does seem like a bit of a sledgehammer, that's going to create a bit of a performance hit as well
I can't move the data in the array because it is going to cause too many thread contention issues. the arrays may be accessed by any one of a large number of threads at any time, though only one thread ever writes to any given array.
It depends on the operating system. POSIX - including Linux - has the system call madvise to do improve memory performance. From the man page:
The madvise() system call advises the kernel about how to handle paging input/output in the address range beginning at address addr and with size length bytes. It allows an application to tell the kernel how it expects to use some mapped or shared memory areas, so that the kernel can choose appropriate read-ahead and caching techniques. This call does not influence the semantics of the application (except in the case of MADV_DONTNEED), but may influence its performance. The kernel is free to ignore the advice.
See the man page of madvise for more information.
Edit: Apparently, the above description was not clear enough. So, here are some more details, and some of them are specific to Linux.
You can use mmap to allocate a block of memory (directly from the OS instead of the libc), that is not backed by any file. For large chunks of memory, malloc is doing exactly the same thing. You have to use munmap to release the memory - regardless of the usage of madvise:
void* data = ::mmap(nullptr, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// ...
::munmap(data, size);
If you want to get rid of some parts of this chunk, you can use madvise to tell the kernel to do so:
madvise(static_cast<unsigned char*>(data) + 7 * page_size,
3 * page_size, MADV_DONTNEED);
The address range is still valid, but it is no longer backed - neither by physical RAM nor by storage. If you access the pages later, the kernel will allocate some new pages on the fly and re-initialize them to zero. Be aware, that the dontneed pages are also part of the virtual memory size of the process. It might be necessary to make some configuration changes to the virtual memory management, e.g. activating over-commit.
It would be easier to answer if we had more details.
1°) The answer to the question "Is there any way of telling the OS after I have created the array with new or malloc that a part of it is unnecessary any more ?" is "not really". That's the point of C and C++, and any language that let you handle memory manually.
2°) If you're using C++ and not C, you should not be using malloc.
3°) Nor arrays, unless for a very specific reason. Use a std::vector.
4°) Preferably, if you need to change often the content of the array and reduce the memory footprint, use a linked list (std::list), though it'll be more expensive to "access" individually the content of the list (but will be almost as fast if you only iterate through it).
A std::deque with pointers to std::array<double,LARGE_NUMBER> may do the job, but you better make a dedicated container with the deque, so you can remap the indexes and most importantly, define when entries are not used anymore.
The dedicated container can also contain a read/write lock, so it can be used in a thread-safe way.
You could try using lists instead of arrays. Of course list is 'heavyer' than array but on the other hand it is easy to reconstruct a list so that you can throw away a part of it when it becomes obsolete. You could also use a wrapper which would only contain indexes saying which part of the list is up-to-date and which part may be reused.
This will help you improve performance, but will require a little bit more (reusable) memory.
Allocating by chunk and delete[]-ing and new[]-ing on the way seems like the good solution. It may be possible to do as little as memory management as possible. Do not reuse chunk yourself, simply deallocate old one and allocate new chunks when needed.

Why compiler does not complain about accessing elements beyond the bounds of a dynamic array? [duplicate]

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 7 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
I am defining an array of size 9. but when I access the array index 10 it is not giving any error.
int main() {
bool* isSeedPos = new bool[9];
isSeedPos[10] = true;
}
I expected to get a compiler error, because there is no array element isSeedPos[10] in my array.
Why don't I get an error?
It's not a problem.
There is no bound-check in C++ arrays. You are able to access elements beyond the array's limit (but this will usually cause an error).
If you want to use an array, you have to check that you are not out of bounds yourself (you can keep the sizee in a separate variable, as you did).
Of course, a better solution would be to use the standard library containers such as std::vector.
With std::vector you can either
use the myVector.at(i)method to get the ith element (which will throw an exception if you are out of bounds)
use myVector[i] with the same syntax as C-style arrays, but you have to do bound-checking yourself ( e.g. try if (i < myVector.size()) ... before accessing it)
Also note that in your case, std::vector<bool> is a specialized version implemented so that each booltakes only one bit of memory (therefore it uses less memory than an array of bool, which may or may not be what you want).
Use std::vector instead. Some implementations will do bounds checking in debug mode.
No, the compiler is not required to emit a diagnostic for this case. The compiler does not perform bounds checking for you.
It is your responsibility to make sure that you don't write broken code like this, because the compiler will not error on it.
Unlike in other languages like java and python, array access is not bound-checked in C or C++. That makes accessing arrays faster. It is your responsibility to make sure that you stay within bounds.
However, in such a simple case such as this, some compilers can detect the error at compile time.
Also, some tools such as valgrind can help you detect such errors at run time.
What compiler/debugger are you using?
MSVC++ would complain about it and tell you that you write out of bounds of an array.
But it is not required to do it by the standard.
It can crash anytime, it causes undefined behaviour.
Primitive arrays do not do bounds-checking. If you want bounds-checking, you should use std::vector instead. You are accessing invalid memory after the end of array, and purely by luck it is working.
There is no runtime checking on the index you are giving, accessing element 10 is incorrect but possible. Two things can happen:
if you are "unlucky", this will not crash and will return some data located after your array.
if you are "lucky", the data after the array is not allocated by your program, so access to the requested address is forbidden. This will be detected by the operating system and will produce a "segmentation fault".
There is no rule stateing that the memory access is checked in c, plain and simple. When you ask for an array of bool's it might be faster for the Operating system to give you a 16bit og 32bit array, instead of a 9bit one. This means that you might not even be writing or reading into someone elses space.
C++ is fast, and one of the reasons that it is fast is becaurse there are very few checks on what you are doing, if you ask for some memory, then the programming language will assume that you know what you are doing, and if the operating system does not complain, then everything will run.
There is no problem! You are just accessing memory that you shouldn't access. You get access to memory after the array.
isSeedPos doesn't know how big the array is. It is just a pointer to a position in memory. When you point to isSeepPos[10] the behaviour is undefined. Chances are sooner or later this will cause a segfault, but there is no requirement for a crash, and there is certainly no standard error checking.
Writing to that position is dangerous.
But the compiler will let you do it - Effectively you're writing one-past the last byte of memory assigned to that array = not a good thing.
C++ isn't a lot like many other languages - It assumes that you know what you are doing!
Both C and C++ let you write to arbitrary areas of memory. This is because they originally derived from (and are still used for) low-level programming where you may legitimately want to write to a memory mapped peripheral, or similar, and because it's more efficient to omit bounds checking when the programmer already knows the value will be within (eg. for a loop 0 to N over an array, he/she knows 0 and N are within the bounds, so checking each intermediate value is superfluous).
However, in truth, nowadays you rarely want to do that. If you use the arr[i] syntax, you essentially always want to write to the array declared in arr, and never do anything else. But you still can if you want to.
If you do write to arbitrary memory (as you do in this case) either it will be part of your program, and it will change some other critical data without you knowing (either now, or later when you make a change to the code and have forgotten what you were doing); or it will write to memory not allocated to your program and the OS will shut it down to prevent worse problems.
Nowadays:
Many compilers will spot it if you make an obvious mistake like this one
There are tools which will test if your program writes to unallocated memory
You can and should use std::vector instead, which is there for the 99% of the time you want bounds checking. (Check whether you're using at() or [] to access it)
This is not Java. In C or C++ there is no bounds checking; it's pure luck that you can write to that index.