I need to allocate space for a temporary array once per iteration. I try to use realloc each iteration to optimize memory using. Like that:
int *a = (int*)std::alloc(2 * sizeof(int));
for(int i=0; i<N; ++i)
{
int m = calculate_enough_size();
a = (int*)std::realloc(m * sizeof(int));
// ...
}
std::free(a);
N is a big number, 1000000 for example. There are example m values per iteration: 8,2,6,10,4,8
Am I doing right when I realloc a at each iteration? Does it prevent redundant memory allocation?
Firstly, realloc takes 2 parameters. First is the original pointer and the second is the new size. You are trying to pass the size as the original pointer and the code shouldn't compile.
Secondly, the obligatory reminder: Don't optimize prematurely. Unless you've measured and found that the allocations are a bottleneck, just use std::vector.
Few issues I have noticed are:
Realloc should be used in case you want older values remain in the memory, if you didn't bother about old values as mentioned in one of your comment use just alloc.
Please check size of already allocated memory before allocating again, if allocated memory is insufficient for new data then only allocate new memory.
Please refer to the sample code which will taking care of above mentioned problems:
int size = 2;
int *a = (int*)std::alloc(size * sizeof(int));
for(int i=0; i<N; ++i)
{
int m = calculate_enough_size();
if(m > size)
{
size = m;
std::free(a);
a = (int*)std::alloc(size * sizeof(int));
}
// ...
}
std::free(a);
Also you can further optimized memory allocation by allocating some extra memory, e.g:
size = m*2; //!
To better understand this step let's take an example suppose m = 8, then you will allocate memory = 16, so when now m changes to 10, 12 up-to 16 there is no need to allocate memory again.
If you can get all the sizes beforehand, allocate the biggest you need before the cycle and then use as much as needed.
If, on the other hand, you can not do that, then reallocation is a good solution, I think.
You can also further optimize your solution by reallocating only when a bigger size is needed:
int size = 0;
for(int i = 0; i < N; ++i)
{
int new_size = calculate_enough_size();
if ( new_size > size ){
a = (int*)std::realloc(new_size * sizeof(int));
size = new_size;
}
// ...
}
Like this you will need less reallocations (half of them in a randomized case).
Related
I am trying to copy from an array of arrays, to another one, while leaving a space between arrays in the target.
They are both contiguous each vector sizes size is between 5000 and 52000 floats,
Output_jump is the vector size times eight, and vector_count vary in my tests.
I did the best I learned here https://stackoverflow.com/a/34450588/1238848 and here https://stackoverflow.com/a/16658555/1238848
but still it seems so slow.
void copyToTarget(const float *input, float *output, int vector_count, int vector_size, int output_jump)
{
int left_to_do,offset;
constexpr int block=2048;
constexpr int blockInBytes = block*sizeof(float);
float temp[2048];
for (int i = 0; i < vector_count; ++i)
{
left_to_do = vector_size;
offset = 0;
while(left_to_do > block)
{
memcpy(temp, input, blockInBytes);
memcpy(output, temp, blockInBytes);
left_to_do -= block;
input += block;
output += block;
}
if (left_to_do)
{
memcpy(temp, input, left_to_do*sizeof(float));
memcpy(output, temp, left_to_do*sizeof(float));
input += left_to_do;
output += left_to_do;
}
output += output_jump;
}
}
I'm skeptical of the answer you linked, which encourages avoiding a function call to memcpy. Surely the implementation of memcpy is very well optimized, probably hand written in assembly, and therefore hard to beat! Moreover for large-sized copies, the function call overhead is negligible compared to memory access latency. So simply calling memcpy is likely the fastest way to copy contiguous bytes around in memory.
If output_jump were zero, a single call to memcpy can copy input directly to output (and this would be hard to beat). For nonzero output_jump, the copy needs to be divided up over the contiguous vectors. Use one memcpy per vector, without the temp buffer, copying directly from input + i * vector_size to output + i * (vector_size + output_jump).
But better yet, like the top answer on that thread suggests, try if possible to find a way to avoid copying data in the first place.
My question is related to a problem described here. I have written a C++ implementation of the Sieve of Eratosthenes that hits a memory overflow if I set the target value too high. As suggested in that question, I am able to fix the problem by using a boolean <vector> instead of a normal array.
However, I am hitting the memory overflow at a much lower value than expected, around n = 1 200 000. The discussion in the thread linked above suggests that the normal C++ boolean array uses a byte for each entry, so with 2 GB of RAM, I expect to be able to get to somewhere on the order of n = 2 000 000 000. Why is the practical memory limit so much smaller?
And why does using <vector>, which encodes the booleans as bits instead of bytes, yield more than an eightfold increase in the computable limit?
Here is a working example of my code, with n set to a small value.
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;
int main() {
// Count and sum of primes below target
const int target = 100000;
// Code I want to use:
bool is_idx_prime[target];
for (unsigned int i = 0; i < target; i++) {
// initialize by assuming prime
is_idx_prime[i] = true;
}
// But doesn't work for target larger than ~1200000
// Have to use this instead
// vector <bool> is_idx_prime(target, true);
for (unsigned int i = 2; i < sqrt(target); i++) {
// All multiples of i * i are nonprime
// If i itself is nonprime, no need to check
if (is_idx_prime[i]) {
for (int j = i; i * j < target; j++) {
is_idx_prime[i * j] = 0;
}
}
}
// 0 and 1 are nonprime by definition
is_idx_prime[0] = 0; is_idx_prime[1] = 0;
unsigned long long int total = 0;
unsigned int count = 0;
for (int i = 0; i < target; i++) {
// cout << "\n" << i << ": " << is_idx_prime[i];
if (is_idx_prime[i]) {
total += i;
count++;
}
}
cout << "\nCount: " << count;
cout << "\nTotal: " << total;
return 0;
}
outputs
Count: 9592
Total: 454396537
C:\Users\[...].exe (process 1004) exited with code 0.
Press any key to close this window . . .
Or, changing n = 1 200 000 yields
C:\Users\[...].exe (process 3144) exited with code -1073741571.
Press any key to close this window . . .
I am using the Microsoft Visual Studio interpreter on Windows with the default settings.
Turning the comment into a full answer:
Your operating system reserves a special section in the memory to represent the call stack of your program. Each function call pushes a new stack frame onto the stack. If the function returns, the stack frame is removed from the stack. The stack frame includes the memory for the parameters to your function and the local variables of the function. The remaining memory is referred to as the heap. On the heap, arbitrary memory allocations can be made, whereas the structure of the stack is governed by the control flow of your program. A limited amount of memory is reserved for the stack, when it gets full (e.g. due to too many nested function calls or due to too large local objects), you get a stack overflow. For this reason, large objects should be allocated on the heap.
General references on stack/heap: Link, Link
To allocate memory on the heap in C++, you can:
Use vector<bool> is_idx_prime(target);, which internally does a heap allocation and deallocates the memory for you when the vector goes out of scope. This is the most convenient way.
Use a smart pointer to manage the allocation: auto is_idx_prime = std::make_unique<bool[]>(target); This will also automatically deallocate the memory when the array goes out of scope.
Allocate the memory manually. I am mentioning this only for educational purposes. As mentioned by Paul in the comments, doing a manual memory allocation is generally not advisable, because you have to manually deallocate the memory again. If you have a large program with many memory allocations, inevitably you will forget to free some allocation, creating a memory leak. When you have a long-running program, such as a system service, creating repeated memory leaks will eventually fill up the entire memory (and speaking from personal experience, this absolutely does happen in practice). But in theory, if you would want to make a manual memory allocation, you would use bool *is_idx_prime = new bool[target]; and then later deallocate again with delete [] is_idx_prime.
I've run into a strange situation.
In my program I have a loop that combines a bunch of data together in a giant vector. I was trying to figure out why it was running so slowly, even though it seemed like I was doing everything right to allocate memory in an efficient manner on the go.
In my program it is difficult to determine how big the final vector of combined data should be, but the size of each piece of data is known as it is processed. So instead of reserving and resizing the combined data vector in one go, I was reserving enough space for each data chunk as it is added to the larger vector. That's when I ran into this issue that is repeatable using the simple snippet below:
std::vector<float> arr1;
std::vector<float> arr2;
std::vector<float> arr3;
std::vector<float> arr4;
int numLoops = 10000;
int numSubloops = 50;
{
// Test 1
// Naive test where no pre-allocation occurs
for (int q = 0; q < numLoops; q++)
{
for (int g = 0; g < numSubloops; g++)
{
arr1.push_back(q * g);
}
}
}
{
// Test 2
// Ideal situation where total amount of data is reserved beforehand
arr2.reserve(numLoops * numSubloops);
for (int q = 0; q < numLoops; q++)
{
for (int g = 0; g < numSubloops; g++)
{
arr2.push_back(q * g);
}
}
}
{
// Test 3
// Total data is not known beforehand, so allocations made for each
// data chunk as they are processed using 'resize' method
int arrInx = 0;
for (int q = 0; q < numLoops; q++)
{
arr3.resize(arr3.size() + numSubloops);
for (int g = 0; g < numSubloops; g++)
{
arr3[arrInx++] = q * g;
}
}
}
{
// Test 4
// Total data is not known beforehand, so allocations are made for each
// data chunk as they are processed using the 'reserve' method
for (int q = 0; q < numLoops; q++)
{
arr4.reserve(arr4.size() + numSubloops);
for (int g = 0; g < numSubloops; g++)
{
arr4.push_back(q * g);
}
}
}
The results of this test, after compilation in Visual Studio 2017, are as follows:
Test 1: 7 ms
Test 2: 3 ms
Test 3: 4 ms
Test 4: 4000 ms
Why is there the huge discrepancy in running times?
Why does calling reserve a bunch of times, followed by push_back take 1000x times longer than calling resize a bunch of times, followed by direct index access?
How does it make any sense that it could take 500x longer than the naive approach which includes no pre-allocations at all?
How does it make any sense that it could take 500x longer than the
naive approach which includes no pre-allocations at all?
That's where you're mistaken. The 'naive' approach you speak of does do pre-allocations. They're just done behind the scenes, and infrequently, in the call to push_back. It doesn't just allocate room for one more element every time you call push_back. It allocates some amount that is a factor (usually between 1.5x and 2x) of the current capacity. And then it doesn't need to allocate again until that capacity runs out. This is much more efficient than your loop which does an allocation every time 50 elements are added, with no regard for the current capacity.
#Benjamin Lindley's answer explains the capacity of std::vector. However, for exactly why the 4th test case is that slow, in fact it's an implementation detail of the standard library.
[vector.capacity]
void reserve(size_type n);
...
Effects: A directive that informs a vector of a planned change in size, so that it can manage the storage allocation accordingly. After reserve(), capacity() is greater or equal to the argument of reserve if reallocation happens; and equal to the previous value of capacity() otherwise. Reallocation happens at this point if and only if the current capacity is less than the argument of reserve().
Thus it is not guaranteed by C++ standard that after reserve() for a larger capacity, the actual capacity should be the requested one. Personally I think it's not unreasonable for an implementation to follow some specific policy when such larger capacity request is received. However, I also tested on my machine, it seems the STL just does the simplest thing.
asking on stack again. I have an array wich I want to be always at the minimum size, because I have to send over the internet. The problem is, the program has no way to know what the minimum size is until the operation is finished. This leads me to having to ways: using vectors, or make an array of the maximum lenght the program could ever need, and then that it knows the minimum size, initialize a pointer with new and put the data there. But I can't use vectors because they require serialization to be sent, and both vector and serialization have overheads I don't want. Example:
unsigned short data[1270], // the maximum size the operation could take is 1270 shorts
*packet; // pointer
int counter; //this is to count how big "packet" will be
//example of operation, wich of course is different from my program
// in this case the operation takes 6 bytes
while(true) {
for (int i; i != 6; i++) {
counter++;
data[i]= 1;
}
packet=new unsigned short[counter];
for (int i; i!=counter; i++) {
packet[i]=data[i];
}
}
Like you might have noticed, this code runs in cycles, so the problem might be my way to repeatedly re-initialize the same pointer.
The problem in this code is, if I do:
std::cout<<counter<<" "<<sizeof(packet)/sizeof(unsigned short)<<" ";
counter variates in size (usually from 1 to 35), but the size of packet is always 2. I also tried delete [] before new, but it didn't solve the problem.
This issue could also be related to another part of the code, but here i am just asking:
Is my way of repeatedly allocate memory right?
Continually add to an std::vector while requesting to the compiler that the size allocated in heap memory not exceed the amount actually needed:
std::vector<int> vec;
std::size_t const maxSize = 10;
for (std::size_t i; i != maxSize; ++i)
{
vec.reserve(vec.size() + 1u);
vec.push_back(1234); // whatever you're adding
}
I should add though that I see no good reason for doing this under normal circumstances. The performance of this "program" could be severely hampered with no obvious benefit.
You can always use pointers and realloc. C++ is such a powerfull language because of its pointers, you don't need to use arrays.
Take a look at the cplusplus entry on realloc.
For your case you could use it like this:
new_packet = (unsigned short*) realloc (packet, new_size * sizeof(unsigned short));
if (new_packet!=NULL) {
packet = new_packet;
for(int i ; i < new_size ; i++)
packet[i] = new_values[i];
}
else {
if( packet != NULL )
free (packet);
puts ("Error (re)allocating memory");
exit (1);
}
Okay, I see a couple problems in your logic here. Lets start with the main one: Why do you need to alloc a whole fresh array with a copy of whats in data just to send it over a socket? It's not like sending a letter dude, send() will transfer a copy of the information, not literally move it over the network. It's perfectly fine to do this:
send(socket, data, counter * sizeof(unsigned short), 0);
There. You don't need a new pointer for anything.
Also, I don't know where you got the serialization thing from. Vectors are basically arrays that resize automatically, and will also delete themselves from memory once the function is done. You could do this:
std::vector<unsigned short> packet;
packet.reserve(counter);
for (std::size_t i = 0; i < counter; ++i)
packet[i] = data[i];
send(socket, &packet[0], packet.size() * sizeof(unsigned short), 0);
Or even shorten to:
std::vector<unsigned short> packet;
for (std::size_t i = 0; i < counter; ++i)
packet.push_back(data[i]);
But with this option the vector will resize counter times, what is performance consuming. Always set its size first if you have the information available.
I'm trying to do this:
int main(void){
u_int64_t NNUM = 2<<19;
u_int64_t list[NNUM], i;
for(i = 0; i < 4; i++){
list[i] = 999;
}
}
Why am I getting segfault at my Ubuntu 10.10 64 bits (gcc 4.6.1)?
You try to create a very large array on the stack. This leads to a stack overflow.
Try allocating the array on the heap instead. For example:
// Allocate memory
u_int64_t *list = malloc(NNUM * sizeof(u_int64_t));
// work with `list`
// ...
// Free memory again
free(list);
You declare NNUM = 2*2^19 == 2<<19 == 1048576.
and try to allocate on the stack 64 bits * 1048576 = num of bits* num of cells.
It is 8.5 MegaBytes, it is just too much for allocation on the stack, you can try to allocate it on the heap and check if it really works using the return value of malloc.
heap VS. stack
Your program requires a minimum stack size of 1048576,
if you check with 'ulimit -s', it is most likely less than that.
you can try 'ulimit -s 16384' and then re-execute again.