asking on stack again. I have an array wich I want to be always at the minimum size, because I have to send over the internet. The problem is, the program has no way to know what the minimum size is until the operation is finished. This leads me to having to ways: using vectors, or make an array of the maximum lenght the program could ever need, and then that it knows the minimum size, initialize a pointer with new and put the data there. But I can't use vectors because they require serialization to be sent, and both vector and serialization have overheads I don't want. Example:
unsigned short data[1270], // the maximum size the operation could take is 1270 shorts
*packet; // pointer
int counter; //this is to count how big "packet" will be
//example of operation, wich of course is different from my program
// in this case the operation takes 6 bytes
while(true) {
for (int i; i != 6; i++) {
counter++;
data[i]= 1;
}
packet=new unsigned short[counter];
for (int i; i!=counter; i++) {
packet[i]=data[i];
}
}
Like you might have noticed, this code runs in cycles, so the problem might be my way to repeatedly re-initialize the same pointer.
The problem in this code is, if I do:
std::cout<<counter<<" "<<sizeof(packet)/sizeof(unsigned short)<<" ";
counter variates in size (usually from 1 to 35), but the size of packet is always 2. I also tried delete [] before new, but it didn't solve the problem.
This issue could also be related to another part of the code, but here i am just asking:
Is my way of repeatedly allocate memory right?
Continually add to an std::vector while requesting to the compiler that the size allocated in heap memory not exceed the amount actually needed:
std::vector<int> vec;
std::size_t const maxSize = 10;
for (std::size_t i; i != maxSize; ++i)
{
vec.reserve(vec.size() + 1u);
vec.push_back(1234); // whatever you're adding
}
I should add though that I see no good reason for doing this under normal circumstances. The performance of this "program" could be severely hampered with no obvious benefit.
You can always use pointers and realloc. C++ is such a powerfull language because of its pointers, you don't need to use arrays.
Take a look at the cplusplus entry on realloc.
For your case you could use it like this:
new_packet = (unsigned short*) realloc (packet, new_size * sizeof(unsigned short));
if (new_packet!=NULL) {
packet = new_packet;
for(int i ; i < new_size ; i++)
packet[i] = new_values[i];
}
else {
if( packet != NULL )
free (packet);
puts ("Error (re)allocating memory");
exit (1);
}
Okay, I see a couple problems in your logic here. Lets start with the main one: Why do you need to alloc a whole fresh array with a copy of whats in data just to send it over a socket? It's not like sending a letter dude, send() will transfer a copy of the information, not literally move it over the network. It's perfectly fine to do this:
send(socket, data, counter * sizeof(unsigned short), 0);
There. You don't need a new pointer for anything.
Also, I don't know where you got the serialization thing from. Vectors are basically arrays that resize automatically, and will also delete themselves from memory once the function is done. You could do this:
std::vector<unsigned short> packet;
packet.reserve(counter);
for (std::size_t i = 0; i < counter; ++i)
packet[i] = data[i];
send(socket, &packet[0], packet.size() * sizeof(unsigned short), 0);
Or even shorten to:
std::vector<unsigned short> packet;
for (std::size_t i = 0; i < counter; ++i)
packet.push_back(data[i]);
But with this option the vector will resize counter times, what is performance consuming. Always set its size first if you have the information available.
Related
I am trying to copy from an array of arrays, to another one, while leaving a space between arrays in the target.
They are both contiguous each vector sizes size is between 5000 and 52000 floats,
Output_jump is the vector size times eight, and vector_count vary in my tests.
I did the best I learned here https://stackoverflow.com/a/34450588/1238848 and here https://stackoverflow.com/a/16658555/1238848
but still it seems so slow.
void copyToTarget(const float *input, float *output, int vector_count, int vector_size, int output_jump)
{
int left_to_do,offset;
constexpr int block=2048;
constexpr int blockInBytes = block*sizeof(float);
float temp[2048];
for (int i = 0; i < vector_count; ++i)
{
left_to_do = vector_size;
offset = 0;
while(left_to_do > block)
{
memcpy(temp, input, blockInBytes);
memcpy(output, temp, blockInBytes);
left_to_do -= block;
input += block;
output += block;
}
if (left_to_do)
{
memcpy(temp, input, left_to_do*sizeof(float));
memcpy(output, temp, left_to_do*sizeof(float));
input += left_to_do;
output += left_to_do;
}
output += output_jump;
}
}
I'm skeptical of the answer you linked, which encourages avoiding a function call to memcpy. Surely the implementation of memcpy is very well optimized, probably hand written in assembly, and therefore hard to beat! Moreover for large-sized copies, the function call overhead is negligible compared to memory access latency. So simply calling memcpy is likely the fastest way to copy contiguous bytes around in memory.
If output_jump were zero, a single call to memcpy can copy input directly to output (and this would be hard to beat). For nonzero output_jump, the copy needs to be divided up over the contiguous vectors. Use one memcpy per vector, without the temp buffer, copying directly from input + i * vector_size to output + i * (vector_size + output_jump).
But better yet, like the top answer on that thread suggests, try if possible to find a way to avoid copying data in the first place.
I need to allocate space for a temporary array once per iteration. I try to use realloc each iteration to optimize memory using. Like that:
int *a = (int*)std::alloc(2 * sizeof(int));
for(int i=0; i<N; ++i)
{
int m = calculate_enough_size();
a = (int*)std::realloc(m * sizeof(int));
// ...
}
std::free(a);
N is a big number, 1000000 for example. There are example m values per iteration: 8,2,6,10,4,8
Am I doing right when I realloc a at each iteration? Does it prevent redundant memory allocation?
Firstly, realloc takes 2 parameters. First is the original pointer and the second is the new size. You are trying to pass the size as the original pointer and the code shouldn't compile.
Secondly, the obligatory reminder: Don't optimize prematurely. Unless you've measured and found that the allocations are a bottleneck, just use std::vector.
Few issues I have noticed are:
Realloc should be used in case you want older values remain in the memory, if you didn't bother about old values as mentioned in one of your comment use just alloc.
Please check size of already allocated memory before allocating again, if allocated memory is insufficient for new data then only allocate new memory.
Please refer to the sample code which will taking care of above mentioned problems:
int size = 2;
int *a = (int*)std::alloc(size * sizeof(int));
for(int i=0; i<N; ++i)
{
int m = calculate_enough_size();
if(m > size)
{
size = m;
std::free(a);
a = (int*)std::alloc(size * sizeof(int));
}
// ...
}
std::free(a);
Also you can further optimized memory allocation by allocating some extra memory, e.g:
size = m*2; //!
To better understand this step let's take an example suppose m = 8, then you will allocate memory = 16, so when now m changes to 10, 12 up-to 16 there is no need to allocate memory again.
If you can get all the sizes beforehand, allocate the biggest you need before the cycle and then use as much as needed.
If, on the other hand, you can not do that, then reallocation is a good solution, I think.
You can also further optimize your solution by reallocating only when a bigger size is needed:
int size = 0;
for(int i = 0; i < N; ++i)
{
int new_size = calculate_enough_size();
if ( new_size > size ){
a = (int*)std::realloc(new_size * sizeof(int));
size = new_size;
}
// ...
}
Like this you will need less reallocations (half of them in a randomized case).
I have a program that generates files containing random distributions of the character A - Z. I have written a method that reads these files (and counts each character) using fread with different buffer sizes in an attempt to determine the optimal block size for reads. Here is the method:
int get_histogram(FILE * fp, long *hist, int block_size, long *milliseconds, long *filelen)
{
char *buffer = new char[block_size];
bzero(buffer, block_size);
struct timeb t;
ftime(&t);
long start_in_ms = t.time * 1000 + t.millitm;
size_t bytes_read = 0;
while (!feof(fp))
{
bytes_read += fread(buffer, 1, block_size, fp);
if (ferror (fp))
{
return -1;
}
int i;
for (i = 0; i < block_size; i++)
{
int j;
for (j = 0; j < 26; j++)
{
if (buffer[i] == 'A' + j)
{
hist[j]++;
}
}
}
}
ftime(&t);
long end_in_ms = t.time * 1000 + t.millitm;
*milliseconds = end_in_ms - start_in_ms;
*filelen = bytes_read;
return 0;
}
However, when I plot bytes/second vs. block size (buffer size) using block sizes of 2 - 2^20, I get an optimal block size of 4 bytes -- which just can't be correct. Something must be wrong with my code but I can't find it.
Any advice is appreciated.
Regards.
EDIT:
The point of this exercise is to demonstrate the optimal buffer size by recording the read times (plus computation time) for different buffer sizes. The file pointer is opened and closed by the calling code.
There are many bugs in this code:
It uses new[], which is C++.
It doesn't free the allocated memory.
It always loops over block_size bytes of input, not bytes_read as returned by fread().
Also, the actual histogram code is rather inefficient, since it seems to loop over each character to determine which character it is.
UPDATE: Removed claim that using feof() before I/O is wrong, since that wasn't true. Thanks to Eric for pointing this out in a comment.
You're not stating what platform you're running this on, and what compile time parameters you use.
Of course, the fread() involves some overhead, leaving user mode and returning. On the other hand, instead of setting the hist[] information directly, you're looping through the alphabet. This is unnecessary and, without optimization, causes some overhead per byte.
I'd re-test this with hist[j-26]++ or something similar.
Typically, the best timing would be achieved if your buffer size equals the system's buffer size for the given media.
I just ran into a free(): invalid next size (fast) problem while writing a C++ program. And I failed to figure out why this could happen unfortunately. The code is given below.
bool not_corrupt(struct packet *pkt, int size)
{
if (!size) return false;
bool result = true;
char *exp_checksum = (char*)malloc(size * sizeof(char));
char *rec_checksum = (char*)malloc(size * sizeof(char));
char *rec_data = (char*)malloc(size * sizeof(char));
//memcpy(rec_checksum, pkt->data+HEADER_SIZE+SEQ_SIZE+DATA_SIZE, size);
//memcpy(rec_data, pkt->data+HEADER_SIZE+SEQ_SIZE, size);
for (int i = 0; i < size; i++) {
rec_checksum[i] = pkt->data[HEADER_SIZE+SEQ_SIZE+DATA_SIZE+i];
rec_data[i] = pkt->data[HEADER_SIZE+SEQ_SIZE+i];
}
do_checksum(exp_checksum, rec_data, DATA_SIZE);
for (int i = 0; i < size; i++) {
if (exp_checksum[i] != rec_checksum[i]) {
result = false;
break;
}
}
free(exp_checksum);
free(rec_checksum);
free(rec_data);
return result;
}
The macros used are:
#define RDT_PKTSIZE 128
#define SEQ_SIZE 4
#define HEADER_SIZE 1
#define DATA_SIZE ((RDT_PKTSIZE - HEADER_SIZE - SEQ_SIZE) / 2)
The struct used is:
struct packet {
char data[RDT_PKTSIZE];
};
This piece of code doesn't go wrong every time. It would crash with the free(): invalid next size (fast) sometimes in the free(exp_checksum); part.
What's even worse is that sometimes what's in rec_checksum stuff is just not equal to what's in pkt->data[HEADER_SIZE+SEQ_SIZE+DATA_SIZE] stuff, which should be the same according to the watch expressions from my debugging tools. Both memcpy and for methods are used but this problem remains.
I don't quite understand why this would happen. I would be very thankful if anyone could explain this to me.
Edit:
Here's the do_checksum() method, which is very simple:
void do_checksum(char* checksum, char* data, int size)
{
for (int i = 0; i < size; i++)
{
checksum[i] = ~data[i];
}
}
Edit 2:
Thanks for all.
I switched other part of my code from the usage of STL queue to STL vector, the results turn to be cool then.
But still I didn't figure out why. I am sure that I would never pop an empty queue.
The error you report is indicative of heap corruption. These can be hard to track down and tools like valgrind can be extremely helpful. Heap corruptions are often hard to debug with a simple debugger because the runtime error often occurs long after the actual corruption.
That said, the most obvious potential cause of your heap corruption, given the code posted so far, is if DATA_SIZE is greater than size. If that occurs then do_checksum will write beyond the end of exp_checksum.
Three immediate suggestions:
Check for size <= 0 (instead of "!size")
Check for size >= DATA_SIZE
Check for malloc returning NULL
Have you tried Valgrind?
Also, make sure to never send more than RDT_PKTSIZE as size to not_corrupt()
bool not_corrupt(struct packet *pkt, int size)
{
if (!size) return false;
if (size > RDT_PKTSIZE) return false;
/* ... */
Valgrind is good ... but validating all your inputs and checking all error conditions is even better.
Stepping through the code in the debugger isn't a bad idea, either.
I would also call "do_checksum (size)" (your actual size), instead of DATA_SIZE (presumably "maximum size").
DATA_SIZE is a macro defined the max length in my program so the size
should be less than DATA_SIZE
even if that is true, your logic only creates enough memory to hold size characters.
so you should call
do_checksum(exp_checksum, rec_data, size);
and, if you do not want to use std::string (which is fine), you should switch from malloc/free to new/delete when talking C++
Problem solved, thank you all for the help
I've got a bit of a problem here it's not something that's blowing my program up, but it's just bothering me that I can't fix it. I have a function reading in some data from a file, at the end of the execution, the stack around variable longGarbage is corrupted. I've looked around a bit and found that a possible cause is writing to invalid memory. I cleaned up some memory leaks that I had and the problem still persists. What's confusing me is that it happens when the function finishes executing, so it appears to be happening when the variable goes out of scope. Here's the code...
CHCF::CHCF(std::string fileName)
: PAKID("HVST84838672")
{
FILE * archive = fopen(fileName.c_str(), "rb");
std::string strGarbage = "";
unsigned int intGarbage = 0;
unsigned long longGarbage = 0;
unsigned char * data = 0;
char charGarbage = '0';
if (!archive)
{
fclose (archive);
return;
}
for (int i = 0; i < 12; i++)
{
fread(&charGarbage, 1, 1, archive);
strGarbage += charGarbage;
}
if (strGarbage != PAKID)
{
fclose(archive);
throw "Incorrect archive format";
}
strGarbage = "";
fread(&_gameID, sizeof(_gameID),1,archive);
fread(&_fileCount, sizeof(_fileCount),1,archive);
for (int i = 0; i < _fileCount; i++)
{
fread(&longGarbage, 8,1,archive); //file offset
fread(&intGarbage, 4, 1, archive);//fileName
for (int i = 0; i < intGarbage; i++)
{
fread(&charGarbage, 1, 1, archive);
strGarbage += charGarbage;
}
fread(&longGarbage, 8, 1, archive); //fileSize
fread(&intGarbage, 4, 1, archive); //fileType
data = new unsigned char[longGarbage];
for (long i = 0; i < longGarbage; i++)
{
fread(&charGarbage, 1, 1, archive);
data[i] = charGarbage;
}
switch ((FILETYPES)intGarbage)
{
case MAP:
_maps.append(strGarbage, new CFileData(strGarbage, FILETYPES::MAP, data, longGarbage));
break;
default:
break;
}
delete [] data;
data = 0;
strGarbage.clear();
longGarbage = 0;
}
fclose(archive);
} //error happens here
Here is the CFileData constructor:
CFileData::CFileData(std::string fileName, FILETYPES type, unsigned char *data, long fileSize)
{
_fileName = fileName;
_type = type;
_data = new unsigned char[fileSize];
for (int i = 0; i < fileSize; i++)
_data[i] = data[i];
}
Might I suggest std::vector instead of calling new and delete manually? Your code is not exception safe -- you leak if an exception is thrown.
fread(&longGarbage, 8, 1, archive); //fileSize Are you sure sizeof(long) is 8? I suspect it's 4. I believe on Linux boxes sometimes it's 8, but most everywhere else sizeof(long) is 4, and sizeof(long long) is 8.
What about any constructors on members of this class? They can corrupt the stack too.
What's happening is that something is writing to memory around or over the location of longGarbage which is causing the corruption.
You don't say what development environment you are using. One way to diagnose this would be to set a breakpoint that triggers when a specific memory location changes. Choose a memory location that overlaps the area of corruption and wait for it to trigger unexpectedly.
Another way to diagnose this would be to examine code that changes memory around or over longGarbage. That could be almost anything of course but likely candidates are modifications to 'data', modifications to 'intGarbage' and modifications to 'longGarbage' itself.
We can narrow things down even further because we can (usually) be fairly sure the assignment operator itself is safe. Code like data = new... isn't likely to be the culprit so really we need to focus on memory changes that involve taking the address of 'data', 'intGarbage' or 'longGarbage'. In particular memory changes that change more bytes than they should.
Several others have already pointed out that a long is probably not eight bytes in length. If you pass the wrong length to fread, the extra bytes retrieved have to go somewhere.
You are using a lot of magic numbers for data sizes, so I would check that first. In particular, I doubt that sizeof(unsigned long)==8 and sizeof(unsigned in)==4 in all possible circumstances. Refer to your compiler's documentation, but you should still be wary, as this is very likely to change from one compiler/platform to another.
Check for these bits:
fread(&longGarbage, 8,1,archive); //file offset
You also might want to use C++ <iostream> library instead of the C FILE* stuff for reading. It would allow for a much shorter version because you wouldn't need to close the file 3 times.
It seems from the other comments and the information provided that the issue is on the C++ side you should use either __int64 for a windows environment, or int64_t for cross platform.