Clearing memory issue in C++

Clearing memory issue in C++ - c++

I have a C++ code that generates random 3D network structures. I work well and if I run it manually (from the Terminal), I get two different structures, as expected.
However, if I create a small loop to launch it 10 successive times, it produces 10 times the exact same structure, which is not normal. If I add a sleep(1) line at the end of the code, it works again, so I guess it as something to do with C++ releasing the memory (I am absolutely not an expect so I could be completely wrong).
The problem is that, by adding the sleep(1) command, it take much more time to run (10x more). This is of course not an issue for 10 runs, but the aim is to make 1000's of them.
Is there a way to force C++ to release the memory at the end of the code?

C++ does not release memory automatically at all (except for code in destructors) so that is not the case.
But random numbers generators uses a system clock counter (I may be wrong here).
In a pascal language you should've call randomize procedure to init random generator with seed. Without doing so, random numbers generator produces the same results with each run, wich is very like your situation
In C++ there is srand function that is typycally inited by current time, like in example there http://en.cppreference.com/w/cpp/numeric/random/rand
I dont know how you init your rand generator, but if you doing so with a time with seconds resolution and your code is fast enough to do 10 loops in one second - this can be a case. It explans how 1 second delay fixes situation.
if thats the case, you can try a time function with bigger resolution. Also in c++11 stl, there is much powerfull random module (as in boost libraries, if you dont have c++11x). Documentation is here http://www.cplusplus.com/reference/random/

Related

To what extent DLLs speed up calculations in a code such as loops etc

I have a code with a loop that counts to 10000000000, and within that loop, I do some calculations with conditional operators (if etc). It takes about 5 minutes to reach that number. so, my question is, can I reduce the time it takes by creating a DLL and call that dll for functions to do the calculation and return the values to the main program? will it make a difference in time it takes to do the calculations? further, will it improve the overall efficiency of the program?

By a “dll” I assume you mean going from managed .net code to that of un-managed “native” compiled code. Yes this approach can help.
It much depends. Remember, the speed of the loop code is likely only 25 seconds on a typical i3 (that is the cost and overhead to loop to 10 billion but doing much nothing else).
And I assumed you went to the project, then compile. On that screen select advanced compile. There you want to check remove integer overflow checks. Make sure you loop vars are integers for speed.
At that point the “base” loop that does nothing will drop from about 20 seconds down to about 6 seconds.
So that is the base loop speed – now it come down to what we are doing inside of that loop.
At this point, .net DOES HAVE a JIT (a just in time native compiler). This means your source code goes to “CLR” code and then in tern that code does get compiled down to native x86 assembly code. So this “does” get the source code down to REAL machine code level. However a JIT is certainly NOT as efficient nor can it spend “time” optimizing the code since the JIT has to work on the “fly” without you noticing it. So a c++ (or VB6 which runs as fast as c++ when native compiled) can certainly run faster, but the question then is by how much?
The optimized compiler might get another double in speed for the actually LOOPING code etc.
However, in BOTH cases (using .net managed code, or code compiled down to native Intel code), they BOTH LIKELY are calling the SAME routines to do the math!
In other words, if 80% of the code is spend in “library” code that does the math etc., then calling such code from c++ or calling such code from .net will make VERY LITTLE difference since the BULK of the work is spend in the same system code!
The above concept is really “supervisor” mode vs. your application mode.
In other words, the amount of time spent in your code vs. that of using system “library” code means that the bulk of the heaving lifting is occurring in supervisor code. That means jumping from .net to native c++/vb6 dll’s will NOT yield much in the way of performance.
So I would first ensure loops and array index refs in your code are integer types. The above tip of taking off bounds checking likely will give you “close” to that of using a .dll. Worse is often the time to “shuffle” the data two and from that external.dll sub will cost you MORE than the time saved on the processing side.
And if your routines are doing database or file i/o, then all bets are off, as that is VERY different problem.
So I would first test/try your application with [x] remove integer overflow checks turned off. And make sure during testing that you use ctrl-F5 in place of F5 to run your code without DEBUGGING. The above overflow check and options will NOT show increased speed when in debug mode.
So it hard to tell – it really depends on how much math (especially floating calls) you are doing (supervisor code) vs. that of just moving values around in arrays. If more code is moving things around, then I suggest the integer optimizing above, and going to a .dll likely will not help much.

Couldn´t you utilize "Parallel.ForEach" and strip this huge loop in some equal pieces?
Or try to work with some Backgroundworkers or even Threads (more than 1!!) to achieve the optimal CPU performance and try to reduce the spent time.

Capturing function exit time with __gnu_mcount_nc

I'm trying to do some performance profiling on a poorly supported prototype embedded platform.
I note that GCC's -pg flag causes thunks to __gnu_mcount_nc to be inserted on entry to every function. No implementation of __gnu_mcount_nc is available (and the vendor is not interested in assisting), however as it is trivial to write one that simply records the stack frame and current cycle count, I have done so; this works fine and is yielding useful results in terms of caller/callee graphs and most frequently called functions.
I would really like to obtain information about the time spent in function bodies as well, however I am having difficulty understanding how to approach this with only the entry, but not the exit, to each function getting hooked: you can tell exactly when each function is entered, but without hooking the exit points you cannot know how much of the time until you receive the next piece of information to attribute to callee and how much to the callers.
Nevertheless, the GNU profiling tools are in fact demonstrably able to gather runtime information for functions on many platforms, so presumably the developers have some scheme in mind for achieving this.
I have seen some existing implementations that do things like maintain a shadow callstack and twiddle the return address on entry to __gnu_mcount_nc so that __gnu_mcount_nc will get invoked again when the callee returns; it can then match the caller/callee/sp triplet against the top of the shadow callstack and so distinguish this case from the call on entry, record the exit time and correctly return to the caller.
This approach leaves much to be desired:
it seems like it may be brittle in the presence of recursion and libraries compiled without the -pg flag
it seems like it would be difficult to implement with low overhead or at all in embedded multithreaded/multicore environments where toolchain TLS support is absent and current thread ID may be expensive/complex to obtain
Is there some obvious better way to implement a __gnu_mcount_nc so that a -pg build is able to capture function exit as well as entry time that I am missing?

gprof does not use that function for timing, of entry or exit, but for call-counting of function A calling any function B.
Rather, it uses the self-time gathered by counting PC samples in each routine, and then uses the function-to-function call counts to estimate how much of that self-time should be charged back to callers.
For example, if A calls C 10 times, and B calls C 20 times, and C has 1000ms of self time (i.e 100 PC samples), then gprof knows C has been called 30 times, and 33 of the samples can be charged to A, while the other 67 can be charged to B.
Similarly, sample counts propagate up the call hierarchy.
So you see, it doesn't time function entry and exit.
The measurements it does get are very coarse, because it makes no distinction between short calls and long calls.
Also, if a PC sample happens during I/O or in a library routine that is not compiled with -pg, it is not counted at all.
And, as you noted, it is very brittle in the presence of recursion, and can introduce notable overhead on short functions.
Another approach is stack-sampling, rather than PC-sampling.
Granted, it is more expensive to capture a stack sample than a PC-sample, but fewer samples are needed.
If, for example, a function, line of code, or any description you want to make, is evident on fraction F out of the total of N samples, then you know that the fraction of time it costs is F, with a standard deviation of sqrt(NF(1-F)).
So, for example, if you take 100 samples, and a line of code appears on 50 of them, then you can estimate the line costs 50% of the time, with an uncertainty of sqrt(100*.5*.5) = +/- 5 samples or between 45% and 55%.
If you take 100 times as many samples, you can reduce the uncertainty by a factor of 10.
(Recursion doesn't matter. If a function or line of code appears 3 times in a single sample, that counts as 1 sample, not 3.
Nor does it matter if function calls are short - if they are called enough times to cost a significant fraction, they will be caught.)
Notice, when you're looking for things you can fix to get speedup, the exact percent doesn't matter.
The important thing is to find it.
(In fact, you only need see a problem twice to know it is big enough to fix.)
That's this technique.
P.S. Don't get suckered into call-graphs, hot-paths, or hot-spots.
Here's a typical call-graph rat's nest. Yellow is the hot-path, and red is the hot-spot.
And this shows how easy it is for a juicy speedup opportunity to be in none of those places:
The most valuable thing to look at is a dozen or so random raw stack samples, and relating them to the source code.
(That means bypassing the back-end of the profiler.)
ADDED: Just to show what I mean, I simulated ten stack samples from the call graph above, and here's what I found
3/10 samples are calling class_exists, one for the purpose of getting the class name, and two for the purpose of setting up a local configuration. class_exists calls autoload which calls requireFile, and two of those call adminpanel. If this can be done more directly, it could save about 30%.
2/10 samples are calling determineId, which calls fetch_the_id which calls getPageAndRootlineWithDomain, which calls three more levels, terminating in sql_fetch_assoc. That seems like a lot of trouble to go through to get an ID, and it's costing about 20% of time, and that's not counting I/O.
So the stack samples don't just tell you how much inclusive time a function or line of code costs, they tell you why it's being done, and what possible silliness it takes to accomplish it.
I often see this - galloping generality - swatting flies with hammers, not intentionally, but just following good modular design.
ADDED: Another thing not to get sucked into is flame graphs.
For example, here is a flame graph (rotated right 90 degrees) of the ten simulated stack samples from the call graph above. The routines are all numbered, rather than named, but each routine has its own color.
Notice the problem we identified above, with class_exists (routine 219) being on 30% of the samples, is not at all obvious by looking at the flame graph.
More samples and different colors would make the graph look more "flame-like", but does not expose routines which take a lot of time by being called many times from different places.
Here's the same data sorted by function rather than by time.
That helps a little, but doesn't aggregate similarities called from different places:
Once again, the goal is to find the problems that are hiding from you.
Anyone can find the easy stuff, but the problems that are hiding are the ones that make all the difference.
ADDED: Another kind of eye-candy is this one:
where the black-outlined routines could all be the same, just called from different places.
The diagram doesn't aggregate them for you.
If a routine has high inclusive percent by being called a large number of times from different places, it will not be exposed.

RNG crashing c++ program

I am currently coding a roguelike, and naturally am using a lot of random number generation.
The problem I'm running up on is that if I "overheat" rand(); my program will crash.
If i'm only generating 20 or so ints per frame, it's fine... but when the amount of random numbers goes into the hundreds, the program crashes. The more I'm producing every frame, the sooner it crashes... which leads me to believe there is some pileup going on.
I've done tests, and at 20 rand(); calls per frame, it will run for 24 hours straight at max speed without crashing. Triple that and it doesn't make it ten minutes.
If I put srand(); in the initialization, i can churn out thousands of random numbers before it locks up - but if I put srand(); within the frame itself, i make it about 2-8 frames. If it matters, I'm using time(null) to seed.
the more frequently i call rand(); the sooner it crashes.
Help?

The function rand() is not reentrant or thread-safe, since it uses hidden state that is modified on each call. This might just be the seed value to
be used by the next call, or it might be something more elaborate. In order to get reproducible behavior in a threaded application, this state must
be made explicit. The function rand_r() is supplied with a pointer to an unsigned int, to be used as state. This is a very small amount of state,
so this function will be a weak pseudo-random generator. Try drand48_r(3) instead.

A few comments and ideas on how to narrow down the source of the issue:
It almost certainly is not the srand() or rand() functions causing the crash/lock up. Chances are that one, or more, combinations of random numbers is getting your engine into a state where something bad happens.
The first step should be to duplicate the issue such that it always happens at the time/place. Instead of using a srand(NULL) try using a constant seed like srand(12345). Depending on what other factors your engine uses (like user input) this may be enough to get it to crash in the same spot each time.
If using the debugger is having issues (which is suspect, perhaps a buffer overflow is corrupting the stack) use the tried and true method of outputting messages to a text log file. I would suggest outputting all the random numbers generated and perhaps you may see a pattern on when it crashes (i.e, it crashes whenever a "42" is generated). Another option is to start adding a few log message in various functions (start with high level functions like your game update loop). After a crash check the log and begin adding more log messages until you narrow it down to one line/function. This is not as quick as using the debugger can be but is sometimes a better choice, especially if you don't really know where to start looking.
Once you are able to reliably replicate the crash start removing things until the crash point changes or disappears. This may involve #ifdefs, commenting out code, setting application options, or even creating a temporary copy of the project so you can simply delete code, compile, and test. This may be difficult if the project is large/complex.
More information on the type of "crash" would be helpful. Usually programs don't just crash generically but have a certain exception occur, lock-up, etc.... Exception details can help you narrow down the source of the issue with some effort.

Try running it under a debugger
$ gdb myprog
(gdb) break main
(gdb) run
(gdb) record
e.g.
(gdb) break abort
(gdb) break exit
since it is c++:
(gdb) catch throw
(gdb) catch exception
and finally
(gdb) continue
When it stops, reverse-continue until you find the culprit
Option 2:
valgrind --tool=massif --massif-out-file="massif.out.%p" myprog
ms_print massif.out.*
to examine heap profiling. Not unlikely you have a memory leak

It's possible the high number of calls to rand are coming up with a number in a relatively small range which your code cannot handle. Try replacing your calls to rand with a function that just increments a number and returns it, and see if it eventually fails.

You probable shouldn't be using rand(). There are far better PRNGs out there. Have a look at Boost.Random.
You should only srand() once, not every frame.
Find out where your code crashes. Using a debugger that's fairly easy, just start the program with the debugger attached and wait until it crashes.
After you have found out where it crashes, find out why it crashes.
After you have found out why, fix it. It probably doesn't have anything to do with rand().

Using gprof with sockets

I have a program I want to profile with gprof. The problem (seemingly) is that it uses sockets. So I get things like this:
::select(): Interrupted system call
I hit this problem a while back, gave up, and moved on. But I would really like to be able to profile my code, using gprof if possible. What can I do? Is there a gprof option I'm missing? A socket option? Is gprof totally useless in the presence of these types of system calls? If so, is there a viable alternative?
EDIT: Platform:
Linux 2.6 (x64)
GCC 4.4.1
gprof 2.19

The socket code needs to handle interrupted system calls regardless of profiler, but under profiler it's unavoidable. This means having code like.
if ( errno == EINTR ) { ...
after each system call.
Take a look, for example, here for the background.

gprof (here's the paper) is reliable, but it only was ever intended to measure changes, and even for that, it only measures CPU-bound issues. It was never advertised to be useful for locating problems. That is an idea that other people layered on top of it.
Consider this method.
Another good option, if you don't mind spending some money, is Zoom.
Added: If I can just give you an example. Suppose you have a call-hierarchy where Main calls A some number of times, A calls B some number of times, B calls C some number of times, and C waits for some I/O with a socket or file, and that's basically all the program does. Now, further suppose that the number of times each routine calls the next one down is 25% more times than it really needs to. Since 1.25^3 is about 2, that means the entire program takes twice as long to run as it really needs to.
In the first place, since all the time is spent waiting for I/O gprof will tell you nothing about how that time is spent, because it only looks at "running" time.
Second, suppose (just for argument) it did count the I/O time. It could give you a call graph, basically saying that each routine takes 100% of the time. What does that tell you? Nothing more than you already know.
However, if you take a small number of stack samples, you will see on every one of them the lines of code where each routine calls the next.
In other words, it's not just giving you a rough percentage time estimate, it is pointing you at specific lines of code that are costly.
You can look at each line of code and ask if there is a way to do it fewer times. Assuming you do this, you will get the factor of 2 speedup.
People get big factors this way. In my experience, the number of call levels can easily be 30 or more. Every call seems necessary, until you ask if it can be avoided. Even small numbers of avoidable calls can have a huge effect over that many layers.

C++ Asymptotic Profiling

I have a performance issue where I suspect one standard C library function is taking too long and causing my entire system (suite of processes) to basically "hiccup". Sure enough if I comment out the library function call, the hiccup goes away. This prompted me to investigate what standard methods there are to prove this type of thing? What would be the best practice for testing a function to see if it causes an entire system to hang for a sec (causing other processes to be momentarily starved)?
I would at least like to definitively correlate the function being called and the visible freeze.
Thanks

The best way to determine this stuff is to use a profiling tool to get the information on how long is spent in each function call.
Failing that set up a function that reserves a block of memory. Then in your code at various points, write a string to memory including the current time. (This avoids the delays associated with writing to the display).
After you have run your code, pull out the memory and parse it to deterimine how long parts of your code are taking.

I'm trying to figure out what you mean by "hiccup". I'm imagining your program does something like this:
while (...){
// 1. do some computing and/or file I/O
// 2. print something to the console or move something on the screen
}
and normally the printed or graphical output hums along in a subjectively continuous way, but sometimes it appears to freeze, while the computing part takes longer.
Is that what you meant?
If so, I suspect in the running state it is most always in step 2, but in the hiccup state it spending time in step 1.
I would comment out step 2, so it would spend nearly all it's time in the hiccup state, and then just pause it under the debugger to see what it's doing.
That technique tells you exactly what the problem is with very little effort.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js