Profiling static vs dynamic typing in Hack/PHP - profiling

I'm working in Hack, and am trying to figure out "does code run faster if it's typed", as the language will support both dynamic and static typing depending on the header on the file.
What tests would you run to see the efficiency differences between the two?

A recent paper by Facebook Research describing the current state of the HHVM JIT clarifies (emphasis mine):
§2.1 [Hack's] richer type hints
are only used by a static type checker and are discarded by the HHVM runtime. The reason for discarding these richer
type hints at runtime is that Hack’s gradual type system is
unsound. The static type checker is optimistic and it ignores
many dynamic features of the language. Therefore, even if
a program type checks, different types can be observed at
runtime.
The paper goes on to explain how the JIT uses type information to optimize programs, but for the moment that type information doesn't come from type hints. Still a great read, since it seems that you're invested in the performance of HHVM. I also want to point out that optimizing based on type hints has been requested for some time, which suggests that such optimizations will not come anytime soon.

Related

Is runtime interpreter really part of C program execution?

As we know that C is a compiled language. According to C language Wikipedia it says that:
It was designed to be compiled using a relatively straightforward compiler, to provide low-level access to memory, to provide language constructs that map efficiently to machine instructions, and to require minimal run-time support. It also says that by design, C provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, including operating systems, as well as various application software for computers ranging from supercomputers to embedded systems.
But when I read this & according to Thinking in C++ 2 by Bruce Eckel it says that in Chapter 2 titled Iostreams: (I've omitted some parts)
The big stumbling block is the runtime interpreter used for the
variable-argument list functions. This is the code that parses through
your format string at runtime and grabs and interprets arguments from
the variable argument list. It’s a problem for four reasons.
Because the interpretation happens at runtime there’s a performance
overhead you can’t get rid of. It’s frustrating because all the
information is there in the format string at compile time, but it’s
not evaluated until runtime. However, if you could parse the arguments
in the format string at compile time you could make hard function
calls that have the potential to be much faster than a runtime
interpreter (although the printf( ) family of functions is usually
quite well optimized).
this link also says that:
More type-safe: With , the type of object being I/O’d is
known statically by the compiler. In contrast, cstdio uses "%"
fields to figure out the types dynamically.
So before reading this I was thinking that interpreter isn't used in compiled language like C, but Is it really true that the runtime interpreter also available during execution of C program? Was I wrong before reading this? Is it really so much overhead occurs for this runtime interpretation compared to Iostreams?
What?
This is not runtime interpretation of the code, just inside functions using formatting strings.
Of course they have to loop over the format string to learn about the arguments and the desired formatting, which takes time.

Why doesn't compiler help us with type traits instead of resorting to language quirks?

Type traits are cool and I've used them since they originated in boost a few years ago. However, when you look at their implementation (check out "How is_base_of works?" StackOverflow thread).
Why won't compiler help here? For example, if you want to check if some class is base of another, compiler already knows that, why can't it tell us? This would make things like concepts so much easier to implement and use. You could use language constructs right there.
I am not sure, but I am guessing that it would increase general performance. It is like asking compiler for help, instead of C++ language.
I suspect that the primary reason will sound something like "we need to maintain backwards compatibility" and I agree, but why won't the compiler be more active in generating generic templated code?
Actually... some do.
The thing is that if something can be implemented in pure C++ code, there is no reason, other than simplifying the code, to hard-wire them in the compiler. It is then a matter of trade-off, is the value brought by the code simplification worth the hard-wiring ?
This depends on several points:
correctness (some times software may only partially emulate the trait)
complexity of the code ~~ maintenance burden
performance
...
Once all those points have been weighted, then you can determine whether it's more advantageous to put things in the library or the compiler; and the more likely situation is that you will end up with a mixed strategy: a couple intrinsics in the compiler used as building blocks to provide the required interface in the library.
Note that the maintenance burden is much more significant in a compiler: any C++ developer sufficiently acquainted with the language can delve into a library implementation, whereas the compiler code is a black-box. Therefore, it'll be much easier to debug and patch the library than the compiler, so there is incentive not to put things in the compiler unless you have a very good reason to.
It's hard to give an objective answer here, but I suspect the following.
The code using language quirks to find out this stuff has often already been written (Boost, etc).
The compiler does not have to be changed to implement this if it can be done with language quirks (which saves a lot of time in writing, compiling, debugging and testing).
It's basically a "don't fix what isn't broken" mentality.
Compiler help for type traits has always been a design goal. TR1 formally introduced type traits, and included a section that described acceptable incorrect results in some cases to enable writing the type traits in straight C++. When those type traits were added to C++11 (with some name changes that don't affect their implementation) the allowance for incorrect results was removed, effectively requiring compiler help to implement some of them. And even for those that can be implemented in straight C++, compiler writers prefer intrinsics to complicated templates so that you don't have to put a drip pan under your computer to catch the slag as the overworked compiler causes the computer to melt down.

Open-source C++ scanning library

Rationale: In my day-to-day C++ code development, I frequently need to
answer basic questions such as who calls what in a very large C++ code
base that is frequently changing. But, I also need to have some
automated way to exactly identify what the code is doing around a
particular area of code. "grep" tools such as Cscope are useful (and
I use them heavily already), but are not C++-language-aware: They
don't give any way to identify the types and kinds of lexical
environment of a given use of a type or function a such way that is
conducive to automation (even if said automation is limited to
"read-only" operations such as code browsing and navigation, but I'm
asking for much more than that below).
Question: Does there exist already an open-source C/C++-based library
(native, not managed, not Microsoft- or Linux-specific) that can
statically scan or analyze a large tree of C++ code, and can produce
result sets that answer detailed questions such as:
What functions are called by some supplied function?
What functions make use of this supplied type?
Ditto the above questions if C++ classes or class templates are involved.
The result set should provide some sort of "handle". I should be able
to feed that handle back to the library to perform the following types
of introspection:
What is the byte offset into the file where the reference was made?
What is the reference into the abstract syntax tree (AST) of that
reference, so that I can inspect surrounding code constructs? And
each AST entity would also have file path, byte-offset, and
type-info data associated with it, so that I could recursively walk
up the graph of callers or referrers to do useful operations.
The answer should meet the following requirements:
API: The API exposed must be one of the following:
C or C++ and probably is "C handle" or C++-class-instance-based
(and if it is, must be generic C o C++ code and not Microsoft- or
Linux-specific code constructs unless it is to meet specifics of
the given platform), or
Command-line standard input and standard output based.
C++ aware: Is not limited to C code, but understands C++ language
constructs in minute detail including awareness of inter-class
inheritance relationships and C++ templates.
Fast: Should scan large code bases significantly faster than
compiling the entire code base from scratch. This probably needs to
be relaxed, but only if Incremental result retrieval and Resilient
to small code changes requirements are fully met below.
Provide Result counts: I should be able to ask "How many results
would you provide to some request (and no don't send me all of the
results)?" that responds on the order of less than 3 seconds versus
having to retrieve all results for any given question. If it takes
too long to get that answer, then wastes development time. This is
coupled with the next requirement.
Incremental result retrieval: I should be able to then ask "Give me
just the next N results of this request", and then a handle to the
result set so that I can ask the question repeatedly, thus
incrementally pulling out the results in stages. This means I
should not have to wait for the entire result set before seeing
some subset of all of the results. And that I can cancel the
operation safely if I have seen enough results. Reason: I need to
answer the question: "What is the build or development impact of
changing some particular function signature?"
Resilient to small code changes: If I change a header or source
file, I should not have to wait for the entire code base to be
rescanned, but only that header or source file
rescanned. Rescanning should be quick. E.g., don't do what cscope
requires you to do, which is to rescan the entire code base for
small changes. It is understood that if you change a header, then
scanning can take longer since other files that include that header
would have to be rescanned.
IDE Agnostic: Is text editor agnostic (don't make me use a specific
text editor; I've made my choice already, thank you!)
Platform Agnostic: Is platform-agnostic (don't make me only use it
on Linux or only on Windows, as I have to use both of those
platforms in my daily grind, but I need the tool to be useful on
both as I have code sandboxes on both platforms).
Non-binary: Should not cost me anything other than time to download
and compile the library and all of its dependencies.
Not trial-ware.
Actively Supported: It is likely that sending help requests to mailing lists
or associated forums is likely to get a response in less than 2
days.
Network agnostic: Databases the library builds should be able to be used directly on
a network from 32-bit and 64-bit systems, both Linux and Windows
interchangeably, at the same time, and do not embed hardcoded paths
to filesystems that would otherwise "root" the database to a
particular network.
Build environment agnostic: Does not require intimate knowledge of my build environment, with
the notable exception of possibly requiring knowledge of compiler
supplied CPP macro definitions (e.g. -Dmacro=value).
I would say that CLang Index is a close fit. However I don't think that it stores data in a database.
Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!
I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.
If you are looking for results of code analysis (metrics, graphs, ...) why not use a tool (instead of API) to do that? If you can, I suggest you to take a look at Understand.
It's not free (there's a trial version) but I found it very useful.
Maybe Doxygen with GraphViz could be the answer of some of your constraints but not all,for example the analysis of Doxygen is not incremental.

RTTI and Portability in C++

If a compiler doesn't "support" RTTI, does that mean that the compiler can not handle class hierarchies that have virtual functions in them? Or have I been misunderstanding the literature about how RTTI isn't portable, and the issues lie elsewhere?
Thank you all for your comments!
This is probably way more of an answer you were looking for, but here goes:
RTTI is not "portable" means that if you use compiler A to build dynamic library A, and use compiler B to build application B that links with A, then you cannot use RTTI, because the RTTI implementations of compiler a and b are different. Virtual function are affected only because the virtual function mechanism may not be binary compatible either.
This issue was very important in the mid 90's, but the issue is now obsolete. Not because compliers have now all become binary compatible with each other, but rather the opposite: C++ developers have now recognized that C++ libraries must be delivered as source code, and not linkable libraries. For those who view C++ as an extension of C, this is very discomforting, but for more modern programmers, who grew up in an open source enviroment, nothing special at all.
What changed between the mid-90's and now is differing attitudes between what constitutes valuable intellectual property and what doesn't. To wit: there is actually a patent registered with USPO on "expression templates." Even the holder of such realizes that the patent is unenforcable.
C style "header and binary" libraries were long seen as a way to obfuscate valuable source code. More and more, business came to recognize that the obfuscation was more self-defeating than protective: there is very little code out there that meets "valuable IP" status. Most people buy libraries not because of the special IP it contains, but because it is cheaper to buy rather than roll their own. In fact: expertise in applying IP is far more valuable than the IP itself. But if no one cares about this IP because theyh don't know about it, then it is not worth very much.
This is how open source works: IP is freely distrbuted, in return those distributors gain consultancy fees in applying that IP. Those who can figure it out for themselves profitably-- well good on them. But it is not the norm. What happpens actually is that a developer understands IP and sells their employer on buying the product that implements it. Yeah, whole "developer communities" are founded on this premise.
To make a long story short: binary (and subsequently RTTI) compatibilty went the way of the dinosaur once the open source movement took off, and concurrently, C++ template libraries became the norm. C++ libraries long ago became "source distributable only" like Perl, Python, JavaScript etc. To make your C++ compiler work with all source that you compile with it, make sure that RTTI is turned on (inideed all C++ standard features, like exceptions), and that all C++ libs you link are likewise commpied with the same options that you used to compile your app.
There is one (and only one) compiler I know of that does not enable RTTI by default, and that is beccause there are other legacy ways to do the same thing. To read about these, pick up a copy of Don Box's excellent work "Essential COM."
RTTI is not needed for virtual functions.
It is mainly used for dynamic_cast and typeid.
The only part of RTTI that is unportable is the format of strings returned from type_info::name().
Even this has a fighting chance so long as you can find a c++filt tool for your compiler that converts (unmangles) such a string back into a compliant C++ type.
If a compiler doesn't "support" RTTI, does that mean that the compiler can not handle class hierarchies that have virtual functions in them?
Generally all modern C++ compilers support RTTI... So forget about it.
Or have I been misunderstanding the literature about how RTTI isn't portable, and the issues lie elsewhere?
RTTI today is portable and works fine on any modern compiler... However some special
cases may occur.
Under ELF platforms (Linux) when you load libraries dynamically (i.e. dlopen) and try
to perform dynamic_cast to some class between library and executable it may fail if
you do not pass correct flags for linking executable (-rdynamic).
Almost any other cases... Just works.

Runtime optimization of static languages: JIT for C++?

Is anyone using JIT tricks to improve the runtime performance of statically compiled languages such as C++? It seems like hotspot analysis and branch prediction based on observations made during runtime could improve the performance of any code, but maybe there's some fundamental strategic reason why making such observations and implementing changes during runtime are only possible in virtual machines. I distinctly recall overhearing C++ compiler writers mutter "you can do that for programs written in C++ too" while listening to dynamic language enthusiasts talk about collecting statistics and rearranging code, but my web searches for evidence to support this memory have come up dry.
Profile guided optimization is different than runtime optimization. The optimization is still done offline, based on profiling information, but once the binary is shipped there is no ongoing optimization, so if the usage patterns of the profile-guided optimization phase don't accurately reflect real-world usage then the results will be imperfect, and the program also won't adapt to different usage patterns.
You may be interesting in looking for information on HP's Dynamo, although that system focused on native binary -> native binary translation, although since C++ is almost exclusively compiled to native code I suppose that's exactly what you are looking for.
You may also want to take a look at LLVM, which is a compiler framework and intermediate representation that supports JIT compilation and runtime optimization, although I'm not sure if there are actually any LLVM-based runtimes that can compile C++ and execute + runtime optimize it yet.
I did that kind of optimization quite a lot in the last years. It was for a graphic rendering API that I've implemented. Since the API defined several thousand different drawing modes as general purpose function was way to slow.
I ended up writing my own little Jit-compiler for a domain specific language (very close to asm, but with some high level control structures and local variables thrown in).
The performance improvement I got was between factor 10 and 60 (depended on the complexity of the compiled code), so the extra work paid off big time.
On the PC I would not start to write my own jit-compiler but use either LIBJIT or LLVM for the jit-compilation. It wasn't possible in my case due to the fact that I was working on a non mainstream embedded processor that is not supported by LIBJIT/LLVM, so I had to invent my own.
The answer is more likely: no one did more than PGO for C++ because the benefits are likely unnoticeable.
Let me elaborate: JIT engines/runtimes have both blesses and drawbacks from their developer's view: they have more information at runtime but much little time to analyze.
Some optimizations are really expensive and you will unlikely see without a huge impact on start time are those one like: loop unrolling, auto-vectorization (which in most cases is also based on loop unrolling), instruction selection (to use SSE4.1 for CPU that use SSE4.1) combined with instruction scheduling and reordering (to use better super-scalar CPUs). This kind of optimizations combine great with C like code (that is accessible from C++).
The single full-blown compiler architecture to do advanced compilation (as far as I know) is the Java Hotspot compilation and architectures with similar principles using tiered compilation (Java Azul's systems, the popular to the day JaegerMonkey JS engine).
But one of the biggest optimization on runtime is the following:
Polymorphic inline caching (meaning that if you run the first loop with some types, the second time, the code of the loop will be specialized types that were from previous loop, and the JIT will put a guard and will put as default branch the inlined types, and based on it, from this specialized form using a SSA-form engine based will apply constant folding/propagation, inlining, dead-code-elimination optimizations, and depends of how "advanced" the JIT is, will do an improved or less improved CPU register assignment.)
As you may notice, the JIT (hotspots) will improve mostly the branchy code, and with runtime information will get better than a C++ code, but a static compiler, having at it's side the time to do analysis, instruction reordering, for simple loops, will likely get a little better performance. Also, typically, the C++ code, areas that need to be fast tends to not be OOP, so the information of the JIT optimizations will not bring such an amazing improvement.
Another advantage of JITs is that JIT works cross assemblies, so it has more information if it wants to do inlining.
Let me elaborate: let's say that you have a base class A and you have just one implementation of it namely B in another package/assembly/gem/etc. and is loaded dynamically.
The JIT as it see that B is the only implementation of A, it can replace everywhere in it's internal representation the A calls with B codes, and the method calls will not do a dispatch (look on vtable) but will be direct calls. Those direct calls may be inlined also. For example this B have a method: getLength() which returns 2, all calls of getLength() may be reduced to constant 2 all over. At the end a C++ code will not be able to skip the virtual call of B from another dll.
Some implementations of C++ do not support to optimize over more .cpp files (even today there is the -lto flag in recent versions of GCC that makes this possible). But if you are a C++ developer, concerned about speed, you will likely put the all sensitive classes in the same static library or even in the same file, so the compiler can inline it nicely, making the extra information that JIT have it by design, to be provided by developer itself, so no performance loss.
visual studio has an option for doing runtime profiling that then can be used for optimization of code.
"Profile Guided Optimization"
Microsoft Visual Studio calls this "profile guided optimization"; you can learn more about it at MSDN. Basically, you run the program a bunch of times with a profiler attached to record its hotspots and other performance characteristics, and then you can feed the profiler's output into the compiler to get appropriate optimizations.
I believe LLVM attempts to do some of this. It attempts to optimize across the whole lifetime of the program (compile-time, link-time, and run-time).
Reasonable question - but with a doubtful premise.
As in Nils' answer, sometimes "optimization" means "low-level optimization", which is a nice subject in its own right.
However, it is based on the concept of a "hot-spot", which has nowhere near the relevance it is commonly given.
Definition: a hot-spot is a small region of code where a process's program counter spends a large percentage of its time.
If there is a hot-spot, such as a tight inner loop occupying a lot of time, it is worth trying to optimize at the low level, if it is in code that you control (i.e. not in a third-party library).
Now suppose that inner loop contains a call to a function, any function. Now the program counter is not likely to be found there, because it is more likely to be in the function. So while the code may be wasteful, it is no longer a hot-spot.
There are many common ways to make software slow, of which hot-spots are one. However, in my experience, that is the only one of which most programmers are aware, and the only one to which low-level optimization applies.
See this.