D compiler profiling - profiling

How to figure out which part of my d code takes long time to compile?
I tried to use valgrind, but the the method names were not very insightful. 87% of time was spent in <cycle 7>, 40% of the time in _D4ddmd5lexer5Lexer4scanMFPS4ddmd6tokens5TokenZv
I'm looking for something like this: 40% of the time was spent on xy.d, out of that 80% of the time took compiling various instantiations of template xyz and a reason is because it used memcpy 99% of the time.
I'm interested profiling both DMD and LDC.

As the D compiler front end is written in D, profiling using conventional tools will be rather hard compared to something like C++. I have had some success using tools like gdb and valgrind on Linux and tools like VisualD on Windows, Mac users are kind of SOL.
You have five other options:
Stop trying to find the specific function in the compiler and turn to common knowledge about the problem (see below)
Use a tool like https://github.com/CyberShadow/DBuildStat. It doesn't give you the exact answer you're asking about, but if you're trying to get a large project to compile faster it's better than nothing.
Use the -v flag to try and see which parts of your program take a while. Granted, this is a very brute force approach and can take you a while.
Modify the makefile the DMD front-end to use the -profile switch. Every time you run DMD you will get a profile file with a lot of information. Granted, I don't think this has ever been tried. Your milage may vary.
Try to ask the LDC team about this on their Github issues page. IIRC they made a patched version for profiling that they used for the Weka.io codebase.
When I say turn to common knowledge, I mean to say that your slow compilation is likely due to a few common problems. For example, when an SQL query is taking too long, my first reaction is not to try to profile the MySQL server code. Here are a couple of the most common issues
CTFE, while it speeds up your runtime, is slow. Especially if your doing recursive templates like allSatisfy or your using functions like ctRegex. If you're doing heavy CTFE and you want faster compiles at the price of possible slower code, consider switching them to run time calls.
DMD doesn't (yet) ignore symbols which aren't used in your program, meaning if you import a module, code-gen will happen for all of the functions in the module. This is true even for selective imports. If you don't use them the linker will prune the functions from the resulting executable, but the compiler still took time to compile them. Avoid imports like import std.algorithm; or import std.range;. Instead use package specific imports like import std.algorithm.iteration : map;.

Related

Visual C++ | How to benchmark EVERY FUNCTION and log output? [duplicate]

I've used a few profilers in the past and never found them particularly easy. Maybe I picked bad ones, maybe I didn't really know what I was expecting!
But I'd like to know if there are any 'standard' profilers which simply drop in and work? I don't believe I need massively fine-detailed reports, just to pick up major black-spots. Ease of use is more important to me at this point.
It's VC++ 2008 we're using (I run standard edition personally). I don't suppose there are any tools in the IDE for this, I can't see any from looking at the main menus?
I suggest a very simple method (which I learned from reading Mike Dunlavey's posts on SO):
Just pause the program.
Do it several times to get a reasonable sample. If a particular function is taking half of your program's execution time, the odds are that you will catch it in the act very quickly.
If you improve that function's performance by 50%, then you've just improved overall execution time by 25%. And if you discover that it's not even needed at all (I have found several such cases in the short time I've been using this method), you've just cut the execution time in half.
I must confess that at first I was quite skeptical of the efficacy of this approach, but after trying it for a couple of weeks, I'm hooked.
VS built in:
If you have team edition you can use the Visual Studio profiler.
Other options:
Otherwise check this thread.
Creating your own easily:
I personally use an internally built one based on the Win32 API QueryPerformanceCounter.
You can make something nice and easy to use within a hundred lines of code or less.
The process is simple: create a macro at the top of each function that you want to profile called PROFILE_FUNC() and that will add to internally managed stats. Then have another macro called PROFILE_DUMP() which will dump the outputs to a text document.
PROFILE_FUNC() creates an object that will use RAII to log the amount of time until the object is destroyed. Both the constructor of this RAII object and the destructor will call QueryPerformanceCounter. You could also leave these lines in your code and control the behavior via a #define PROFILING_ON
I always used AMD CodeAnalyst, I find it quite easy to use and gives interesting results. I always used the time based profile, in which I found that it cooperates well with my apps' debug information, letting me find where the time is spent at procedure, C++ instruction and single assembly instruction level.
I used lt prof in the past for a quick run down of my C++ app. It works pretty easy and runs with a compiled program, does not need and source code hooks or tweaks. There is a trial version available I believe.
A very simple (and free) way to profile is to install the Windows debuggers (cdb/windbg), set a bp on the place of interest, and issue the wt command ("Trace and Watch Data"). Check out MSDN for more info.
Another super simple and useful profiling workflow that works on any programming languages is to comment out blocks of codes. After commenting out all of them, uncomment some and run your program to see the performance. If your program starts to run very slow when some code has been uncommented, then you'll probably want to check the performance there.

How to deliberately slow down compilation?

There are a lot of questions asking how to speed up compilation of C++ code. I need to do the opposite.
I'm working with a software that monitors compiler invocation in order to do static code analysis. But if compiler process is closed too quickly, monitoring software can miss it. So I need to slow compilation down. I understand that's a terrible solution and hope it will be temporary.
I came up with two solutions:
Disable parallel build, enable preprocessor and compiler listing generation. It works but requires a lot of mouse clicking
Use compiler option to force inclusion of special header file that somehow slows compilation.
Unfortunately I couldn't come up with something simple to write and hard to compile at the same time. Using a lot of #warning seems to work but obviously clutters the output significantly.
I'm using Keil with armcc compiler, so I can use most of C++11 but maximum template recursion depth is just 63.
Preferably this should not produce any overhead for binary size or running time.
UPD: I'll try to clarify this a bit. I know that's a horrible idea, I know that this problem should be solved differently. I will try to solve it differently but I also want to explore this possibility.
Maybe this solution will be slow enough =), something like #NathanOliver propose.
Its compile time table sine I use. It requires extra space, but you can tune it a little (table size and sine accuracy are template parameters of "staticSinus" function, hope you`ll find your best).
https://godbolt.org/z/DYZDF5
You don't want to do anything of the sort. Here are some solutions, of varying degree of kludginess:
Ideal solution: invoke the code analysis from the Makefile.
Replace the compiler with an e.g. Python script that forwards the command-line to the compiler, then triggers the analysis tool.
Monitor make instead of the compiler - it tends to live longer.
Have a tiny wrapper script maintain a reference count in shared memory, and when the reference count is initially incremented, the wrapper should go to sleep for "long enough" after the compiler has finished. Monitor that script.
In a nutshell: the monitoring tool shouldn't be monitoring anything. The code analysis should be invoked from the build tool, i.e. given in the Makefile. If generating the Makefile by hand is too cumbersome, use cmake with ninja, or xmake with no dependencies. You can also generate whatever "project" file the IDE needs to make working on the project easier. But make something else than Keil-specific stuff be the source of truth for the project: it'll make everything go easy from then on.

How to export function names and variable names using GCC or clang?

I am making a commercial software and I don't want for it to be easily crackable. It is targeted for Linux and I am compiling it using GCC (8.2.1). The problem is that when I compile it, technically anyone can use disassembler like IDA or Binary Ninja to see all functions names. Here is example (you can see function names on left panel):
Is there any way to protect my program from this kind if reverse engineering? Is there any way of exporting all if these function names and variables from code automatically (with GCC or clang?), so I can make a simple script to change them to completely random before compilation?
So you want to hide/mask the names of symbols in your binary. You've decided that, to do this, you need to get a list of them so that you can create a script to modify them. Well, you could get that list with nm but you don't need any of that (rewriting names inside a compiled binary? oof… recipe for disaster).
Instead, just do what everybody does in a release build and strip the symbols! You'll see a much smaller binary, too. Of course this doesn't prevent reverse engineering (nothing does), though it arguably makes said task more difficult.
Honestly you should be stripping your release binaries anyway, and not to prevent cracking. Common wisdom is not to try too hard to prevent cracking, because you'll inevitably fail, and at the cost of wasted dev time in the attempt (and possibly a more complex codebase that's harder to maintain / a more complex executable that is less fast and/or useful for the honest customer).

How to add compilation for profiling to static library?

My project currently has a library that is static linked (compiled with gcc and linked with ar), but I am currently trying to profile my whole entire project with gprof, in which I would also like to profile this statically linked library. Is there any way of going about doing this?
Gprof requires that you provide -pg to GCC for compilation and -pg to the linker. However, ar complains when -pg is added to the list of flags for it.
I haven't used gprof in a long time, but is -pg even a valid argument to ar? Does profiling work if you compile all of the objects with -pg, then create your archive without -pg?
If you can't get gprof to work, gperftools contains a CPU profiler which I think should work very well in this case. You don't need to compile your application with any special flags, and you don't need to try to change how your static library is linked.
Before starting, there are two tradeoffs involved with using gperftools that you should be aware of:
gperftools is a sampling profiler. As such, your results won't be 100%
accurate, but they should be really good. The big upside to using a
sampling profiler is that it won't really slow your application down.
In multithreaded applications, in my experience, gperftools will only
profile the main thread. The only way I've been able to successfully
profile worker threads is by adding profiling code to my application.
With that said, profiling the main thread shouldn't require any code
changes.
There are lots of different ways to use gperftools. My preferred way is to load the gperftools library with $LD_PRELOAD, specify a logging destination with $CPUPROFILE, and maybe bump up the sample frequency with $CPUPROFILE_FREQUENCY before starting my application up. Something like this:
export LD_PRELOAD=/usr/lib/libprofiler.so
export CPUPROFILE=/tmp/prof.out
export CPUPROFILE_FREQUENCY=10000
./my_application
This will write a bunch of profiling information to /tmp/prof.out. You can run a post-processing script to convert this file into something human readable. There are lots of supported output formats -- my preferred one is callgrind:
google-pprof --callgrind /path/to/my_application /tmp/prof.out > callgrind.dat
kcachegrind callgrind.dat &
This should provide a nice view of where your program is spending its time.
If you're interested, I spent some time over the weekend learning how to use gperftools to profile I/O bound applications, and I documented a lot of my findings here. There's a lot of overlap with what you're trying to do, so maybe it will be helpful.

What's a very easy C++ profiler (VC++)?

I've used a few profilers in the past and never found them particularly easy. Maybe I picked bad ones, maybe I didn't really know what I was expecting!
But I'd like to know if there are any 'standard' profilers which simply drop in and work? I don't believe I need massively fine-detailed reports, just to pick up major black-spots. Ease of use is more important to me at this point.
It's VC++ 2008 we're using (I run standard edition personally). I don't suppose there are any tools in the IDE for this, I can't see any from looking at the main menus?
I suggest a very simple method (which I learned from reading Mike Dunlavey's posts on SO):
Just pause the program.
Do it several times to get a reasonable sample. If a particular function is taking half of your program's execution time, the odds are that you will catch it in the act very quickly.
If you improve that function's performance by 50%, then you've just improved overall execution time by 25%. And if you discover that it's not even needed at all (I have found several such cases in the short time I've been using this method), you've just cut the execution time in half.
I must confess that at first I was quite skeptical of the efficacy of this approach, but after trying it for a couple of weeks, I'm hooked.
VS built in:
If you have team edition you can use the Visual Studio profiler.
Other options:
Otherwise check this thread.
Creating your own easily:
I personally use an internally built one based on the Win32 API QueryPerformanceCounter.
You can make something nice and easy to use within a hundred lines of code or less.
The process is simple: create a macro at the top of each function that you want to profile called PROFILE_FUNC() and that will add to internally managed stats. Then have another macro called PROFILE_DUMP() which will dump the outputs to a text document.
PROFILE_FUNC() creates an object that will use RAII to log the amount of time until the object is destroyed. Both the constructor of this RAII object and the destructor will call QueryPerformanceCounter. You could also leave these lines in your code and control the behavior via a #define PROFILING_ON
I always used AMD CodeAnalyst, I find it quite easy to use and gives interesting results. I always used the time based profile, in which I found that it cooperates well with my apps' debug information, letting me find where the time is spent at procedure, C++ instruction and single assembly instruction level.
I used lt prof in the past for a quick run down of my C++ app. It works pretty easy and runs with a compiled program, does not need and source code hooks or tweaks. There is a trial version available I believe.
A very simple (and free) way to profile is to install the Windows debuggers (cdb/windbg), set a bp on the place of interest, and issue the wt command ("Trace and Watch Data"). Check out MSDN for more info.
Another super simple and useful profiling workflow that works on any programming languages is to comment out blocks of codes. After commenting out all of them, uncomment some and run your program to see the performance. If your program starts to run very slow when some code has been uncommented, then you'll probably want to check the performance there.