When running the release executable only (No problems occur when running through visual studio) my program crashes. When using "attach to process" function visual studio indicates the crash occurred in the following function:
World::blockmap World::newBlankBlockmap(int sideLen, int h){
cout << "newBlankBlockmap side: "<<std::to_string((long long)sideLen) << endl;
cout << "newBlankBlockmap height: "<<std::to_string((long long)h) << endl;
short*** bm = new short**[sideLen];
for(int i=0;i<sideLen;i++){
bm[i] = new short*[h];
for(int j=0;j<h;j++){
bm[i][j] = new short[sideLen];
for (int k = 0; k < sideLen ; k++)
{
bm[i][j][k] = blocks->getAIR_BLOCK();
}
}
}
return (blockmap)bm;
}
Which is called from a child class...
World::chunk* World_E::newChunkMap(World::floatmap north, World::floatmap east, World::floatmap south, World::floatmap west
,float lowlow, float highlow, float highhigh, float lowhigh, bool displaceSides){
World::chunk* c = newChunk(World::CHUNK_SIZE+1,World::HEIGHT);
for (int i = 0; i < World::CHUNK_SIZE ; i++)
{
for (int k = 0; k < World::CHUNK_SIZE ; k++)
{
c->bm[i][0][k] = blocks->getDUMMY_BLOCK();
}
}
c->bm[(int)floor((float)(World::CHUNK_SIZE+1)/2.0f)-1][1][(int)floor((float)(World::CHUNK_SIZE+1)/2.0f)-1] = blocks->getSTONE_BLOCK();
c->bm[(int)ceil((float)(World::CHUNK_SIZE+1)/2.0f)-1][1][(int)floor((float)(World::CHUNK_SIZE+1)/2.0f)-1] = blocks->getSTONE_BLOCK();
c->bm[(int)floor((float)(World::CHUNK_SIZE+1)/2.0f)-1][1][(int)ceil((float)(World::CHUNK_SIZE+1)/2.0f)-1] = blocks->getSTONE_BLOCK();
c->bm[(int)ceil((float)(World::CHUNK_SIZE+1)/2.0f)-1][1][(int)ceil((float)(World::CHUNK_SIZE+1)/2.0f)-1] = blocks->getSTONE_BLOCK();
return c;
}
where...
class World {
public: typedef short*** blockmap;
...
The line which VS points at is...
short*** bm = new short**[sideLen];
The "attach to process" function stats the Local variables are...
sideLen = 1911407648
h = 0
which is what i did NOT expect, but the cout outputs 9 and 30 respectively, which was expected.
I am aware that most "crashes in release only" problems are due to uninitialized variables, however, I fail to see that related here.
The only error message I get is...
Windows has triggered a breakpoint in Blocks Project.exe.
This may be due to a corruption of the heap
I am stumped on this problem, what's the error? how can I better debug release executable?
I can post more code if needed, however, bear in mind there is a lot of it.
Thank you in advanced.
"And I don't see World::newBlankBlockmap() called from that second chunk of code. – Michael Burr", I forgot that bit, here you go...
World::chunk* World::newChunk(int side, int height){
cout << "newChunk side: "<<std::to_string((long long)side) << endl;
cout << "newChunk height: "<<std::to_string((long long)height) << endl;
chunk* ch = new chunk();
ch->bm = newBlankBlockmap(side,height);
ch->fm = newBlankFloatmap(side);
return ch;
}
where...
struct chunk {
blockmap bm;
floatmap fm;
};
as defined in the World class
To reiterate what the comments where hinting at: From what you've posted, you're code seems to be badly structured. Triple pointer constructs like short*** are almost impossible to debug and should be avoided at all costs. The heap corruption error message you got suggests that you have a bad memory access somewhere in your code, which is impossible to find automatically with your current setup.
Your only options at this point are to either dig through your entire code manually, until you've found the bug, or start refactoring. The latter might seem like the more time-consuming now, but it won't be if you plan to work with this code in the future.
Consider the following as possible hints for a refactoring:
Don't use plain arrays for storing values. std::vector is just as effective and a lot easier to debug.
Avoid plain new and delete. In modern C++ with the STL containers and smart pointers, plain memory allocation should only happen in very rare exceptional cases.
Always range-check your array access operations. If you worry about performance, use asserts which disappear in release builds, but be sure the checks are there when you need them for debugging.
Modeling three-dimensional arrays in C++ can be tricky, since operator[] only offers support for one-dimensional arrays. A nice compromise is using operator() instead, which can take an arbitrary number of indices.
Avoid C-style casts. They can be very unpredictable. Use the C++ casts static_cast, dynamic_castand reinterpret_cast instead. If you find yourself using reinterpret_cast regularly, you probably have a mistake in your design somewhere.
There is a problem in this line short*** bm = new short**[sideLen];. The memory is allocated for sideLen elements, but the assignment line bm[i][j][k] = blocks->getAIR_BLOCK(); requires an array having size sideLen * sideLen * h. To fix this problem changing of the 1st line to short*** bm = new short**[sideLen * sideLen * h]; is required.
Related
My question is related to a problem described here. I have written a C++ implementation of the Sieve of Eratosthenes that hits a memory overflow if I set the target value too high. As suggested in that question, I am able to fix the problem by using a boolean <vector> instead of a normal array.
However, I am hitting the memory overflow at a much lower value than expected, around n = 1 200 000. The discussion in the thread linked above suggests that the normal C++ boolean array uses a byte for each entry, so with 2 GB of RAM, I expect to be able to get to somewhere on the order of n = 2 000 000 000. Why is the practical memory limit so much smaller?
And why does using <vector>, which encodes the booleans as bits instead of bytes, yield more than an eightfold increase in the computable limit?
Here is a working example of my code, with n set to a small value.
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;
int main() {
// Count and sum of primes below target
const int target = 100000;
// Code I want to use:
bool is_idx_prime[target];
for (unsigned int i = 0; i < target; i++) {
// initialize by assuming prime
is_idx_prime[i] = true;
}
// But doesn't work for target larger than ~1200000
// Have to use this instead
// vector <bool> is_idx_prime(target, true);
for (unsigned int i = 2; i < sqrt(target); i++) {
// All multiples of i * i are nonprime
// If i itself is nonprime, no need to check
if (is_idx_prime[i]) {
for (int j = i; i * j < target; j++) {
is_idx_prime[i * j] = 0;
}
}
}
// 0 and 1 are nonprime by definition
is_idx_prime[0] = 0; is_idx_prime[1] = 0;
unsigned long long int total = 0;
unsigned int count = 0;
for (int i = 0; i < target; i++) {
// cout << "\n" << i << ": " << is_idx_prime[i];
if (is_idx_prime[i]) {
total += i;
count++;
}
}
cout << "\nCount: " << count;
cout << "\nTotal: " << total;
return 0;
}
outputs
Count: 9592
Total: 454396537
C:\Users\[...].exe (process 1004) exited with code 0.
Press any key to close this window . . .
Or, changing n = 1 200 000 yields
C:\Users\[...].exe (process 3144) exited with code -1073741571.
Press any key to close this window . . .
I am using the Microsoft Visual Studio interpreter on Windows with the default settings.
Turning the comment into a full answer:
Your operating system reserves a special section in the memory to represent the call stack of your program. Each function call pushes a new stack frame onto the stack. If the function returns, the stack frame is removed from the stack. The stack frame includes the memory for the parameters to your function and the local variables of the function. The remaining memory is referred to as the heap. On the heap, arbitrary memory allocations can be made, whereas the structure of the stack is governed by the control flow of your program. A limited amount of memory is reserved for the stack, when it gets full (e.g. due to too many nested function calls or due to too large local objects), you get a stack overflow. For this reason, large objects should be allocated on the heap.
General references on stack/heap: Link, Link
To allocate memory on the heap in C++, you can:
Use vector<bool> is_idx_prime(target);, which internally does a heap allocation and deallocates the memory for you when the vector goes out of scope. This is the most convenient way.
Use a smart pointer to manage the allocation: auto is_idx_prime = std::make_unique<bool[]>(target); This will also automatically deallocate the memory when the array goes out of scope.
Allocate the memory manually. I am mentioning this only for educational purposes. As mentioned by Paul in the comments, doing a manual memory allocation is generally not advisable, because you have to manually deallocate the memory again. If you have a large program with many memory allocations, inevitably you will forget to free some allocation, creating a memory leak. When you have a long-running program, such as a system service, creating repeated memory leaks will eventually fill up the entire memory (and speaking from personal experience, this absolutely does happen in practice). But in theory, if you would want to make a manual memory allocation, you would use bool *is_idx_prime = new bool[target]; and then later deallocate again with delete [] is_idx_prime.
I am loking for free tool to Visual Studio 2015, which could control accessing to memory. I found just this article (https://msdn.microsoft.com/en-us/library/e5ewb1h3%28v=vs.90%29.aspx?f=255&MSPPError=-2147217396), which controls just memory leaks - its fine, but not enought. When I create array, and access out of that array, I need to know it. For example this code:
int* pointer = (int*)malloc(sizeof(int)*8);
for (int a = 0;a <= 8;a++)
std::cout << *(pointer+a) << ' ';
free(pointer);
_CrtDumpMemoryLeaks();
This example will not throw exception, but still it access to memory, which is out of allocated space. Is tool for Visual Studio 2015, which will tell mi about it? Something like valgrind.
Thank you.
You can replace the given Microsoft'ish code
int* pointer = (int*)malloc(sizeof(int)*8);
for (int a = 0;a <= 8;a++)
std::cout << *(pointer+a) << ' ';
free(pointer);
_CrtDumpMemoryLeaks();
… with this:
{
vector<int> v( 8 );
for( i = 0; i <= 8; ++i )
{
cout << v.at( i ) << ' ';
}
}
_CrtDumpMemoryLeaks();
And since there's no point in checking for memory leaks in that code (indeed the exception from the attempt to access v[8] ensures that the _CrtDumpMemoryLeaks() statement is not executed), you can simplify it to just
vector<int> v( 8 );
for( i = 0; i <= 8; ++i )
{
cout << v.at( i ) << ' ';
}
Now, it's so long since I've been checking for indexing errors etc. that I'm no longer sure of the exact magic incantations to add checking of ordinary [] indexing with g++ and Visual C++. I just remember that one could, at least with g++. But anyway, the practical way to go about things is not to laboriously check indexing code, but rather avoid manual indexing, writing
// Fixed! No more problems! Hurray!
vector<int> v( 8 );
for( int const value : v )
{
cout << value << ' ';
}
About the more general aspect of the question, how to ensure that every invalid memory access generates a trap or something of the sort, that's not generally practically possible, because the granularity of memory protection is a page of memory, which in the old days of Windows was 4 KB (imposed by the processor's memory access architecture). It's probably 4 KB still. There would just be too dang much overhead if every little allocation was rounded up to 4K with a 4K protected page after that.
Still I think it's done for the stack. That's affordable: a single case of great practical utility.
However, the C++ standard library has no functionality for that, and I don't think Visual C++ has any extensions for that either, but if you really want to go that route, then check out the Windows API, such as VirtualAlloc and friends.
I'm making ASCII game and I need performance, so decided to go with printf(). But there is a problem, I designed my char array as multidimensional char ** array, and printing it outputs garbage of memory instead of data. I know it's possible to print it with a for loop but the performance rapidly drops that way. I need to printf it like a static array[][]. Is there a way?
I did some example of working and notWorking array. I only need printf() to work with nonWorking array.
edit: using Visual Studio 2015 on Win 10, and yeah, I tested performance and cout is much slower than printf (but I don't really know why is this happening)
#include <iostream>
#include <cstdio>
int main()
{
const int X_SIZE = 40;
const int Y_SIZE = 20;
char works[Y_SIZE][X_SIZE];
char ** notWorking;
notWorking = new char*[Y_SIZE];
for (int i = 0; i < Y_SIZE; i++) {
notWorking[i] = new char[X_SIZE];
}
for (int i = 0; i < Y_SIZE; i++) {
for (int j = 0; j < X_SIZE; j++) {
works[i][j] = '#';
notWorking[i][j] = '#';
}
works[i][X_SIZE-1] = '\n';
notWorking[i][X_SIZE - 1] = '\n';
}
works[Y_SIZE-1][X_SIZE-1] = '\0';
notWorking[Y_SIZE-1][X_SIZE-1] = '\0';
printf("%s\n\n", works);
printf("%s\n\n", notWorking);
system("PAUSE");
}
Note: I think I could make some kind of a buffer or static array for just copying and displaying data, but I wonder if that can be done without it.
If you would like to print a 2D structure with printf without a loop, you need to present it to printf as a contiguous one-dimension C string. Since your game needs access to the string as a 2D structure, you could make an array of pointers into this flat structure that would look like this:
Array of pointers partitions the buffer for use as a 2D structure, while the buffer itself can be printed by printf because it is a contiguous C string.
Here is the same structure in code:
// X_SIZE+1 is for '\n's; overall +1 is for '\0'
char buffer[Y_SIZE*(X_SIZE+1)+1];
char *array[Y_SIZE];
// Setup the buffer and the array
for (int r = 0 ; r != Y_SIZE ; r++) {
array[r] = &buffer[r*(X_SIZE+1)];
for (int c = 0 ; c != X_SIZE ; c++) {
array[r][c] = '#';
}
array[r][X_SIZE] = '\n';
}
buffer[Y_SIZE*(X_SIZE+1)] = '\0';
printf("%s\n", buffer);
Demo.
Some things you can do to increase performance:
There is absolutely no reason to have an array of pointers, each pointing at an array. This will cause heap fragmentation as your data will end up all over the heap. Allocating memory in adjacent cells have many benefits in terms of speed, for example it might improve the use of data cache.
Instead, allocate a true 2D array:
char (*array2D) [Y] = new char [X][Y];
printf as well as cout are both incredibly slow, as they come with tons of overhead and extra features which you don't need. Since they are just advanced wrappers around the system-specific console functions, you should consider using the system-specific functions directly. For example, the Windows console API. It will however turn your program non-portable.
If that's not an option, you could try to use puts instead of printf, since it has far less overhead.
Main performance issue with printf/cout is that they write to the end of the "standard output stream", meaning you can't write where you like, but always at the bottom of the screen. Forcing you to constantly redraw the whole thing every time you changed something, which will be slow and possibly cause flicker issues.
Old DOS/Turbo C programs solved this with a non-standard function called gotoxy which allowed you to move the "cursor" and print where you liked. In modern programming, you can do this with the console API functions. Example for Windows.
You could/should separate graphics from the rest of the program. If you have one thread handing graphics only and the main thread handling algorithms, the graphic updates will work smoother, without having to wait for whatever else the program is doing. It makes the program far more advanced though, as you have to consider thread safety issues.
I'm writing a code in which I read and image and process it and get a Mat of double/float. I'm saving it to a file and later on, I'm reading it from that file.
When I use double, the space it requires is 8MB for 1Kx1K image, when I use float it is 4MB. So I want to use float.
Here is my code and output:
Mat data = readFloatFile("file_location");
cout << data.at<float>(0,0) << " " << data.at<double>(0,0);
When I run this code in the DEBUG mode, the print out for float is -0 and double gives exception namely assertion failed. But when I use RELEASE mode the print out for float is -0 and 0.832 for double which is true value.
My question is why I cant get output when I use data.at<float>(0,0) and why I don't get exception when I use data.at<double>(0,0) in RELEASE mode which is supposed to be the case?
EDIT: Here is my code which writes and reads
void writeNoiseFloat(string imageName,string fingerprintname) throw(){
Mat noise = getNoise(imageName);
FILE* fp = fopen(fingerprintname.c_str(),"wb");
if (!fp){
cout << "not found ";
perror("fopen");
}
float *buffer = new float[noise.cols];
for(int i=0;i<noise.rows;++i){
for(int j=0;j<noise.cols;++j)
buffer[j]=noise.at<float>(i,j);
fwrite(buffer,sizeof(float),noise.cols,fp);
}
fclose(fp);
free(buffer);
}
void readNoiseFloat(string fpath,Mat& data){
clock_t start = clock();
cout << fpath << endl;
FILE* fp = fopen(fpath.c_str(),"rb");
if (!fp)perror("fopen");
int size = 1024;
data.create(size,size,CV_32F);
float* buffer= new float[size];
for(int i=0;i<size;++i) {
fread(buffer,sizeof(float),size,fp);
for(int j=0;j<size;++j){
data.at<float>(i,j)=buffer[j];
cout << data.at<float>(i,j) << " " ;
cout << data.at<double>(i,j);
}
}
fclose(fp);
}
Thanks in advance,
The first of all, you can not use the float and double in one cv::Mat as storage itself is only array of bytes. Size of this array will be different for matrix of float and matrix of double.
So, you have to decide what you are using.
Essentially, data.at<type>(x,y) is equivalent to (type*)data_ptr[x][y] (note this is not exact code, its purpose is to show what is happening)
EDIT:
On the basis of code you added you are creating matrix of CV_32F this means that you must use float to write and read and element. Using of double causes reinterpretation of value and will definitely give you an incorrect result.
Regarding to assertion, I am sure that inside the cv::MAT::at<class T> there is a kind of following code:
assert(sizeof<T>==this.getDepth());
Usually asserts are compiled only in DEBUG mode, so that's why you do not give this error in RELEASE.
EDIT2:
Not regarding to issue, but never use free() with new or delete with malloc(). The result can be a hardly debugging issue.
So please use delete[] for buffer.
Difference between debug & release:
There's a bug in your code. It just doesn't show up in Release mode. That is what the debugger is for. Debugger tells you if there's any bug/issue with the code, Release just runs through it...
Also the compiler optimizes your code to run faster and is therefore smaller, the debugger uses more size on your HD because you can actually DEBUG it.
Release initializes your un-initialized variables to 0. This may vary on different compilers.
When looking at some of our logging I've noticed in the profiler that we were spending a lot of time in the operator<< formatting ints and such. It looks like there is a shared lock that is used whenever ostream::operator<< is called when formatting an int(and presumably doubles). Upon further investigation I've narrowed it down to this example:
Loop1 that uses ostringstream to do the formatting:
DWORD WINAPI doWork1(void* param)
{
int nTimes = *static_cast<int*>(param);
for (int i = 0; i < nTimes; ++i)
{
ostringstream out;
out << "[0";
for (int j = 1; j < 100; ++j)
out << ", " << j;
out << "]\n";
}
return 0;
}
Loop2 that uses the same ostringstream to do everything but the int format, that is done with itoa:
DWORD WINAPI doWork2(void* param)
{
int nTimes = *static_cast<int*>(param);
for (int i = 0; i < nTimes; ++i)
{
ostringstream out;
char buffer[13];
out << "[0";
for (int j = 1; j < 100; ++j)
{
_itoa_s(j, buffer, 10);
out << ", " << buffer;
}
out << "]\n";
}
return 0;
}
For my test I ran each loop a number of times with 1, 2, 3 and 4 threads (I have a 4 core machine). The number of trials is constant. Here is the output:
doWork1: all ostringstream
n Total
1 557
2 8092
3 15916
4 15501
doWork2: use itoa
n Total
1 200
2 112
3 100
4 105
As you can see, the performance when using ostringstream is abysmal. It gets 30 times worse when adding more threads whereas the itoa gets about 2 times faster.
One idea is to use _configthreadlocale(_ENABLE_PER_THREAD_LOCALE) as recommended by M$ in this article. That doesn't seem to help me. Here's another user who seem to be having a similar issue.
We need to be able to format ints in several threads running in parallel for our application. Given this issue we either need to figure out how to make this work or find another formatting solution. I may code up a simple class with operator<< overloaded for the integral and floating types and then have a templated version that just calls operator<< on the underlying stream. A bit ugly, but I think I can make it work, though maybe not for user defined operator<<(ostream&,T) because it's not an ostream.
I should also make clear that this is being built with Microsoft Visual Studio 2005. And I believe this limitation comes from their implementation of the standard library.
If the Visual Studio 2005's standard library implementation has bugs why not try other implementations? Like:
STLport
Apache C++ Standard Library (STDCXX)
or even Dinkumware upon which Visual Studio 2005 standard library is based on, maybe the have fixed the problem since 2005.
Edit: The other user you mentioned used Visual Studio 2008 SP1, which means that probably Dinkumware has not fixed this issue.
Doesn't surprise me, MS has put "global" locks on a fair few shared resources - the biggest headache for us was the BSTR memory lock a few years back.
The best thing you can do is copy the code and replace the ostream lock and shared conversion memory with your own class. I have done that where I write the stream using a printf-style logging system (ie I had to use a printf logger, and wrapped it with my stream operators). Once you've compiled that into your app you should be as fast as itoa. When I'm in the office I'll grab some of the code and paste it for you.
EDIT:
as promised:
CLogger& operator<<(long l)
{
if (m_LoggingLevel < m_levelFilter)
return *this;
// 33 is the max length of data returned from _ltot
resize(33);
_ltot(l, buffer+m_length, m_base);
m_length += (long)_tcslen(buffer+m_length);
return *this;
};
static CLogger& hex(CLogger& c)
{
c.m_base = 16;
return c;
};
void resize(long extra)
{
if (extra + m_length > m_size)
{
// resize buffer to fit.
TCHAR* old_buffer = buffer;
m_size += extra;
buffer = (TCHAR*)malloc(m_size*sizeof(TCHAR));
_tcsncpy(buffer, old_buffer, m_length+1);
free(old_buffer);
}
}
static CLogger& endl(CLogger& c)
{
if (c.m_length == 0 && c.m_LoggingLevel < c.m_levelFilter)
return c;
c.Write();
return c;
};
Sorry I can't let you have all of it, but those 3 methods show the basics - I allocate a buffer, resize it if needed (m_size is buffer size, m_length is current text length) and keep it for the duration of the logging object. The buffer contents get written to file (or OutputDebugString, or a listbox) in the endl method. I also have a logging 'level' to restrict output at runtime. So you just replace your calls to ostringstream with this, and the Write() method pumps the buffer to a file and clears the length. Hope this helps.
The problem could be memory allocation. malloc which "new" uses has an internal lock. You can see it if you step into it. Try to use a thread local allocator and see if the bad performance disappears.