Memory usage is the same as data size? - c++

I'm doing a performance test of the SHA3 algorithm on a variable, I'm checking the execution time of the algorithm for different size of the variable. For this I am using char type and increasing the size of it, but I do not know if I am doing it effectively right. I will use the line of code below to explain my doubt.
char[1000] = "A text";
I know that each char has a size of 1 Byte. My question is: when I predefine a vector, will the size of the variable be the index of the vector, in this case 1000? Or will the size of the variable be given by the content inside it, in this case by the text, which would be 6 Bytes?
The test that I'm doing is right? Or does not the allocated memory size account for the performance of SHA3? (I ask this because I intend to do the same test with larger values. If I want to, for example, do this test with 20 KBytes, will I have to fill in the variable with 20000 characters?)
I'm using C++.

The amount of memory allocated on the stack by that line of code will be 1,000 bytes. However, what you send to your SHA3 code may only be the number of bytes of the string "A text", depending on how you're calling it, and how it uses the data. If it calculates the length of the string using a function like strlen(), then it will likely only iterate over the 6 characters (and 1 NUL byte) of the string and ignore the remaining 993 bytes. So it really depends on how you're using it and how you're calculating the size for your tests.

Related

Limit on vectors in c++

I have a question regarding vectors used in c++. I know unlike array there is no limit on vectors. I have a graph with 6 million vertices and I am using vector of class. When I am trying to insert nodes into vector it is failing by saying bad memory allocation. where as it is working perfectly over 2 million nodes. I know bad allocation means it s failing due to pointers I am using in my code but to me this does not seems the case. My question is it possible that it is failing due to the large size of graph as limit on vector is increased. If it is is there any way we can increase that Limit.
First of all you should verify how much memory a single element requires. What is the size of one vertex/node? (You can verify that by using the sizeof operator). Consider that if the answer is, say, 50 bytes, you need 50 bytes times 6 million vertices = 300 MBytes.
Then, consider the next problem: in a vector the memory must be contiguous. This means your program will ask the OS to give it a contiguous chunk of 300 MBytes, and there's no guarantee this chunk is available even if the available memory is more than 300 MB. You might have to split your data, or to choose another, non-contiguous container. RAM fragmentation is impossible to control, which means if you run your program and it works, maybe you run it again and it doesn't work (or vice versa).
Another possible approach is to resize the vector manually, instead of letting it choose its new size automatically. The vector tries to anticipate some future growth, so if it has to grow it will try to allocate more capacity than is needed. This extra capacity might be the difference between having enough memory and not having it. You can use std::vector::reserve for this, though I think the exact behaviour is implementation dependent - it might still decide to reserve more than the amount you have requested.
One more option you have is to optimize the data types you are using. For example, if inside your vertex class you are using 32-bit integers while you only need 16 bits, you might use int16_t which would take half the space. See the full list of fixed size variables at CPP Reference.
There is std::vector::max_size that you can use to see the maximum number of elements the the vector you declared can potentially hold.
Return maximum size
Returns the maximum number of elements that the
vector can hold.
This is the maximum potential size the container can reach due to
known system or library implementation limitations, but the container
is by no means guaranteed to be able to reach that size: it can still
fail to allocate storage at any point before that size is reached.

Is dynamic memory deletion also possible in arrays?

Supposingly I've declared a character array and take a string from the user as follows:
char s[100000];
std::cin>>s;
Now say the user has entered the string "Program". My character array will be as as follows:
'P''r''o''g''r''a''m''\0'......(99992 remaining indices with no/junk values)
Is there a way to free the space occupied those 99992 indices? Similarly if I've an integer array of size say 100000 and I'm using only first 10 indices during run time, is there a way to resize my array during the run time of my program. I know we can use vectors for this purpose but is the thing possible somehow using arrays? For integer arrays, I know we may declare arrays dynamically and then declare size as per our requirement but say I have array of 10 integers as follows:
1 2 3 4 5 6 7 8 9 10
Now, I want to use only first 9 indices and wnat to kind of delete the 10th index. In other words, along with dynamic allocation, is dynamic deletion also possible with arrays?
EDIT:
I know the thing is possible using STLs but I want to know if we can do the same thing in arrays?
No.
If you have arrays defined with a fixed size, you cannot release part of those arrays at run-time. Use a dynamically allocated array of some sort — probably a string or vector<int> for your two example arrays respectively, though a vector<char> might also work sufficiently well for you.
When you write:
char s[100000];
You are telling the compiler to stack 100000 bytes in the program stack.
However when you reserve memory dynamically:
char * = new char[100000];
You are asking the system to reserve 100000 bytes in the heap so you can handle that asked memory as you want, even tell the system to free it as a resource.
You can't free memory at the stack until your local context is finished. For example, exiting the function you where you declared char s[100000].
Check this question:
What and where are the stack and heap?
std::string is implemented using dynamic memory allocation at the heap and that is why it allows you to reduce its size.
that is not possible.
you may wrap your user input capturing in a subroutine that allocates stack space and allocates heap memory at the actual required length.
You are confused over when to use static allocation and when to use dynamic allocation.
Static allocation is used when the maximum number of items is known in advance, at compile-time.
Dynamic allocation is used when the number of items is unknown until run-time.
There exists no other case than the two above. You cannot mix them and it wouldn't make sense to do so.
The only case where you should allocate a static array char s[100000]; is the case where you know, at some point, that there will be 100000 items that the program needs to handle.
You designed your program to handle the worst case of 100000 items. It must still be able to handle that many. If the program needs to have an array of variable, unknown size, you should have used dynamic allocation.
If we ignore that C++ exists, then what you would have done in C is this:
char* s = malloc(sizeof(*s) * 100000);
...
s = realloc(s, some_strlenght);
Please note that huge static arrays allocated on the stack is bad practice in many operative systems. So you might have to declare the 100000 array on the heap anyway, even though you won't resize it. Simply because there is likely not enough stack space in your process to declare large, bulky variables like that.
(Also, because of the way C++ is designed, std::string and std::vector etc are always implemented with dynamic memory internally, even if you only use them with one fixed size.)

Why is the heap after array allocation so large

I've got a very basic application that boils down to the following code:
char* gBigArray[200][200][200];
unsigned int Initialise(){
for(int ta=0;ta<200;ta++)
for(int tb=0;tb<200;tb++)
for(int tc=0;tc<200;tc++)
gBigArray[ta][tb][tc]=new char;
return sizeof(gBigArray);
}
The function returns the expected value of 32000000 bytes, which is approximately 30MB, yet in the Windows Task Manager (and granted it's not 100% accurate) gives a Memory (Private Working Set) value of around 157MB. I've loaded the application into VMMap by SysInternals and have the following values:
I'm unsure what Image means (listed under Type), although irrelevant of that its value is around what I'm expecting. What is really throwing things out for me is the Heap value, which is where the apparent enormous size is coming from.
What I don't understand is why this is? According to this answer if I've understood it correctly, gBigArray would be placed in the data or bss segment - however I'm guessing as each element is an uninitialised pointer it would be placed in the bss segment. Why then would the heap value be larger by a silly amount than what is required?
It doesn't sound silly if you know how memory allocators work. They keep track of the allocated blocks so there's a field storing the size and also a pointer to the next block, perhaps even some padding. Some compilers place guarding space around the allocated area in debug builds so if you write beyond or before the allocated area the program can detect it at runtime when you try to free the allocated space.
you are allocating one char at a time. There is typically a space overhead per allocation
Allocate the memory on one big chunk (or at least in a few chunks)
Do not forget that char* gBigArray[200][200][200]; allocates space for 200*200*200=8000000 pointers, each word size. That is 32 MB on a 32 bit system.
Add another 8000000 char's to that for another 8MB. Since you are allocating them one by one it probably can't allocate them at one byte per item so they'll probably also take the word size per item resulting in another 32MB (32 bit system).
The rest is probably overhead, which is also significant because the C++ system must remember how many elements an array allocated with new contains for delete [].
Owww! My embedded systems stuff would roll over and die if faced with that code. Each allocation has quite a bit of extra info associated with it and either is spaced to a fixed size, or is managed via a linked list type object. On my system, that 1 char new would become a 64 byte allocation out of a small object allocator such that management would be in O(1) time. But in other systems, this could easily fragment your memory horribly, make subsequent new and deletes run extremely slowly O(n) where n is number of things it tracks, and in general bring doom upon an app over time as each char would become at least a 32 byte allocation and be placed in all sorts of cubby holes in memory, thus pushing your allocation heap out much further than you might expect.
Do a single large allocation and map your 3D array over it if you need to with a placement new or other pointer trickery.
Allocating 1 char at a time is probably more expensive. There are metadata headers per allocation so 1 byte for a character is smaller than the header metadata so you might actually save space by doing one large allocation (if possible) that way you mitigate the overhead of each individual allocation having its own metadata.
Perhaps this is an issue of memory stride? What size of gaps are between values?
30 MB is for the pointers. The rest is for the storage you allocated with the new call that the pointers are pointing to. Compilers are allowed to allocate more than one byte for various reasons, like to align on word boundaries, or give some growing room in case you want it later. If you want 8 MB worth of characters, leave the * off your declaration for gBigArray.
Edited out of the above post into a community wiki post:
As the answers below say, the issue here is I am creating a new char 200^3 times, and although each char is only 1 byte, there is overhead for every object on the heap. It seems creating a char array for all chars knocks the memory down to a more believable level:
char* gBigArray[200][200][200];
char* gCharBlock=new char[200*200*200];
unsigned int Initialise(){
unsigned int mIndex=0;
for(int ta=0;ta<200;ta++)
for(int tb=0;tb<200;tb++)
for(int tc=0;tc<200;tc++)
gBigArray[ta][tb][tc]=&gCharBlock[mIndex++];
return sizeof(gBigArray);
}

speed of handling input: growing an array, or counting input, allocating, and then reading

Basically I am wondering what would be a faster way of handling input from standard input:
Method one: Declaring an array of some arbitrary size, reading into the array, and if the input is more than the size, allocate a new array twice the size, copying the contents into the new array, and deallocating the previous array.
Method two: Read the whole input and count the number of lines while reading. reset the pointer back to the top of the input, declare an array of the length of the size of the number of lines, and then input into that array.
some background:
I'm not using vectors. please don't say to just use vectors...
they won't be typing the input, it'll be redirected from the command line to a file. akin to ./program < input.txt
I understand that the first method is more inefficient in terms of space, but is it faster than method two? if so, by how much? method 2 essentially takes 2n time to finish. I want to know if the first method would increase the runtime of my code.
Both methods are O(n). However, you're reading from stdin, so there's no way to rewind it back to the beginning unless something is already storing the data somewhere, so I don't see how you could use method 2.
You would need to use method 1. If you can use realloc, it might not even have to do any copying. If you're worried about the extra copying, you can store the items in a linked-list of buffers of exponentially increasing size, then create a single array at the end and copy each one only once.

C++: How does string vectors' random access time work?

I know a simple int vector have O(1) random access time, since it is easy to compute the position of the xth element, given all elements have the same size.
Now whats up with a string vector?
Since the string lengths vary, it can't have O(1) random access time, can it? If it can, what is the logic behind it?
Thanks.
Update:
The answers are very clear and concise, thank you all for the help.
I accepted Joey's answer because it is simple and easy to understand.
The vector does have O(1) access time.
String objects are all the same size (on a given implementation), regardless of the size of the string they represent. Typically the string object contains a pointer to allocated memory which holds the string data.
So, if s is a std::string, then sizeof s is constant and equal to sizeof(std::string), but s.size() depends on the string value. The vector only cares about sizeof(std::string).
The string references are stored in one location. The strings may be stored anywhere in memory. So, you still get O(1) random access time.
---------------------------
| 4000 | 4200 | 5000 | 6300 | <- data
---------------------------
[1000] [1004] [1008] [1012] <- address
[4000] [4200] [5000] [6300] <- starting address
"string1" "string2" "string3" "string4" <- string
Because the string object has a fixed size just like any other type. The difference is that string object stores its own string on heap, and it keeps a pointer to the string which is fixed in size.
The actual string in a std::string is usually just a pointer. The sizeof a string is always the same, even if the length of the string it holds vary.
You've gotten a number of answers (e.g., Steve Jessop's and AraK's) that are mostly correct already. I'll add just one minor detail: many current implementations of std::string use what's called a short string optimization (SSO), which means they allocate a small, fixed, amount of space in the string object itself that can be used to store short strings, and only when/if the length exceeds what's allocated in the string object itself does it actually allocate separate space on the heap to store the data.
As far as a vector of strings goes, this make no real difference: each string object has a fixed size regardless of the length of the string itself. The difference is that with SSO that fixed size is larger -- and in many cases the string object does not have block allocated on the heap to hold the actual data.