Encapsulation v Performance

Encapsulation v Performance - c++

Simple question:
I really like the idea of encapsulation, but I really don't know if it is worth it is a performance critical situation.
For example:
x->var;
is faster than
x->getVar();
because of the function calling overhead. Is there any solution that is both fast AND encapsulated?

getVar() in all possibility could be inlined. Even if there is a performance penalty, the benefits of encapsulation far outweigh the performance considerations.

There's no overhead if the function is inlined.
On the other hand, getters are a often code smell. And a bad one. They stick to the letters of encapsulation, but violate its principles.

"There's no overhead if the getVar function is inlined"
"If getVar() is simply return var; and is inline and non-virtual the two expressions should be optimized to the same thing"
"getVar() in all possibility could be inlined"
Can Mr Rafferty make the assumption that the code will be inlined? Not "should be" or "could be". In my opinion that's a problem with C++: it's not especially WYSIWYG: you can't be sure what code it will generate. Sure there are benefits to using oo but if execution efficiency (performance) is important C++ (or C# or Java) is not the obvious choice.
On another topic
There's a lot of talk about "Premature Optimization" being the root of all evil and, since no one gets what the premature is about a lot of programmers think that optimization is the root of all evil.
In these cases I find it helpful to bring out the original quote so everyone may see what they've been missing (not to say misunderstanding and misquoting):
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
Most people attribute the quote to Tony Hoare (father of QuickSort) and some to Donald Knuth (Art of Computer Programming).
An informative discussion as to what the quote may or may not mean may be found here: http://ubiquity.acm.org/article.cfm?id=1513451

You can write inline accessor functions.

You are right in that there often is a tradeoff between clean object oriented design and high performance. But do not make assumptions. If you go into these kinds of optimizations, you have to test every change for performance gains. Modern compilers are incredibly good at optimizing your code (like the comment from KennyTM says for your example), so do not fall into the trap of Premature Optimization.

It's important to realise that modern optimisers can do a lot for you, and to use C++ well you need to trust them. They will optimise this and give identical performance unless you deliberately code the accessors out-of-line (which has a different set of benefits: e.g. you can modify the implementation and relink without recompiling client code), or use a virtual function (but that's logically similar to a C program using a function pointer anyway, and has similar performance costs). This is a very basic issue: so many things - like iterators, operator[] on a vector etc. would be too costly if the optimiser failed to work well. All the mainstream C++ compilers are mature enough to have passed this stage many, many years ago.

As others have noted, the overhead is either negligible, or even entirely optimized away.
Anyway it is very unlikely that the bottleneck lies in these kind of functionss. And to add insult to injury, if you find there is a performance problem with the access pattern, if you use direct access you are out of luck, if you use accessor functions you can easily update to better performing patterns like e.g. caching.

Related

Is it really better to have an unnecessary function call instead of using else?

So I had a discussion with a colleague today. He strongly suggested me to change a code from
if(condition){
function->setValue(true)
}
else{
function->setValue(false)
}
to
function->setValue(false)
if(condition){
function->setValue(true)
}
in order to avoid the 'else'. I disagreed, because - while it might improve readability to some degree - in the case of the if-condition being true, we have 1 absolutely unnecessary function call.
What do you guys think?

Meh.
To do this to just to avoid the else is silly (at least there should be a deeper rationale). There's no extra branching cost to it typically, especially after the optimizer goes through it.
Code compactness can sometimes be a desirable aesthetic, especially if a lot of time is spent skimming and searching through code than reading it line-by-line. There can be legit reasons to favor terser code sometimes, but it's always cons and pros. But even code compactness should not be about cramming logic into fewer lines of code so much as just straightforward logic.
Correctness here might be easier to achieve with one or the other. The point was made in a comment that you might not know the side effects associated with calling setValue(false), though I would suggest that's kind of moot. Functions should have minimal side effects, they should all be documented at the interface/usage level if they aren't totally obvious, and if we don't know exactly what they are, we should be spending more time looking up their documentation prior to calling them (and their side effects should not be changing once firm dependencies are established to them).
Given that, it may sometimes be easier to achieve correctness and maintain it with a solution that starts out initializing states to some default value, and using a form of code that opts in to overwrite it in specific branches of code. From that standpoint, what your colleague suggested may be valid as a way to avoid tripping over that code in the future. Then again, for a simple if/else pair of branches, it's hardly a big deal.
Don't worry about the cost of the extra most-likely-constant-time function call either way in this kind of knee-deep micro-level implementation case, especially with no super tight performance-critical loop around this code (and even then, still prefer to worry about that at least a little bit in hindsight after profiling).
I think there are far better things to think about than this kind of coding style, like testing procedure. Reliable code tends to need less revisiting, and has the freedom to be written in a wider variety of ways without causing disputes. Testing is what establishes reliability. The biggest disputes about coding style tend to follow teams where there's more toe-stepping and more debugging of the same bodies of code over and over and over from disparate people due to lack of reliability, modularity, excessive coupling, etc. It's a symptom of a problem but not necessarily the root cause.

What significant exceptions are there to the zero overhead principle, if any? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
As a (possible) example, the LLVM coding standard forbids using standard RTTI or exceptions:
http://llvm.org/docs/CodingStandards.html#do-not-use-rtti-or-exceptions
Is this a good idea, or is that coding standard outdated or unreasonable for most programs?
Are there any other such features in C++ which significantly worsen the program's speed, memory usage or executable size even if you don't use them?

Is this still a good idea, or is that coding standard outdated?
The RTTI is most definitely the most notorious violation of the zero-overhead principle, because it incurs a static cost (executable size, and initialization code) that is proportional to the number of polymorphic classes (i.e., classes with at least one virtual function), and it does not depend on how much you use it, if at all. But there is no way to really provide RTTI without some per-class overhead. That's why you can disable RTTI on most compilers if you don't need it at all, or if you want replace it with an RTTI system you have more control over (as the LLVM guys did). Nevertheless, if you have RTTI enabled, and you are not using it, the overhead is only in the form of code bloat (larger executable, larger memory space needed, larger spread of code) and loading / unloading time, and so, the run-time execution overhead is almost non-existent. But in resource-deprived environments, or for small utility programs (e.g., invoked repetitively in shell scripts), that static overhead can be too much to bear. Moreover, there aren't that many practical situations where you need a high-performance RTTI, most of the time you don't need it at all, and at other times you need it in a few special places that are usually not performance-critical in comparison to other things. LLVM is an exception here because writing a compiler involves dealing with abstract syntax trees and similar descriptive class hierarchies, which is hard to do without lots of down-casting, and since analysing these structures is the core operation that compilers do, performance of the down-casts (or other RTTI invocations) is critical. So, don't take the "don't use the RTTI" as a general rule, just know what overhead it entails, and know if it is acceptable for your application in terms of its cost / benefits.
C++ exceptions are certainly next on the list of things that could have more overhead than you would be prepared to bargain for. This is a much more controversial issue, particularly when it comes to actually characterising the overhead of exceptions overall. Evaluating the overhead of exceptions empirically is a very difficult task because it is highly dependent on usage patterns, i.e., there are different ways to use exceptions, different levels of severity (Do you use exceptions for bugs, fatal errors, exceptional conditions, or to replace every if-statement?), and there are different levels of care for error-handling (whether with or without using exceptions). And then, of course, different compilers can implement exceptions differently. The current implementations, so-called "zero-cost exceptions", is geared towards having zero run-time cost during normal execution, but that leaves quite a bit of static overhead and makes throw-to-catch execution path slower. Here is a nice overview of that. As far as exceptions being in violation of the "you only pay for what you use" principle, it is true (unless you disable them), but they are often justifiable. The basic assumption with exceptions is that you, as a programmer, intend to write robust error-handling code, and if you do handle all errors appropriately, then the overhead of exceptions will pale in comparison to the error-handling code (catch-blocks and destructors), and you will probably have a smaller and faster program than an equivalent C-style error-code implementation of the same amount of error-handling. But if you don't intend to do much error-handling (e.g., the "if anything goes wrong, just crash!" approach), then exceptions will incur significant overhead. I'm not exactly sure why LLVM banned exceptions (if I had to be blunt, I would say it's because they aren't really serious about error-handling, as far as I can see from the code I have seen from that project). So, long story short, the guideline should be "if you intend to seriously handle errors, use exceptions, otherwise, don't". But remember, this is a hotly debated topic.
Are there any other such features which violate "you only pay for what you use"?
You have named to two obvious ones, and unsurprisingly, they are the two main features that most compilers have options to disable (and are often disabled when it is appropriate to do so). There are certainly other more minor violations of the zero-overhead principles.
One notorious example is the IO-stream library (<iostream>) from the standard library. This library has often been criticized for having too much overhead for what most people need and use it for. The IO-stream library tends to pull in a lot of code, and require quite a bit of load-time initialization. And then, many of its classes like std::ostream or std::ofstream have too much run-time overhead, both for construction / destruction and for read/write performance. I think they packed a bit too many features into that library, and since most of the time, IO-streaming tasks are very simple, those features are often left unused and their overhead unjustified. But generally, I find the overhead is often acceptable, especially since most IO tasks are already very slow regardless. BTW, LLVM also bans the use of the IO-stream library, and again, that is because the target of LLVM is for writing lean and mean command-line utilities that do a lot of file IO (like a compiler or its related tools).
There might be other standard libraries that have more overhead than some might wish to have for particular situations. Library code often has to make compromises that fall somewhere that won't please everyone. I would suspect that some of the newer libraries like thread, chrono, regex, and random, provide a bit more features or robust guarantees than are necessary in many applications, and therefore, pull in some undue overhead. But then again, many applications do benefit from those features. This is the very meaning of compromise.
As for language rules that put undue overhead, there are many small issues that impose some overhead. For one, I can think of several places where the standard has to make conservative assumptions that prevent optimizations. One notable example is the inability to restrict pointer aliasing, which forces compilers to assume that any memory could be aliased by any pointer (even though, in practice, pointer aliasing is rare), limiting the opportunities for optimization. There are many similar cases where the compiler has to make the "safe" assumption, limiting optimizations. But most of these are rather small in scope and potential benefits, and they are often justified in terms of being able to guarantee correctness of the behaviour (and repeatability, robustness, portability, etc.). Also, note that in the vast majority of those cases, it doesn't really get any better in other languages, maybe marginally better in C, but that's about it. Some of those issues can also be circumvented with compiler extensions or platform-specific features, or as a last resort, with inline assembly code, that is, if you really need to optimize down to that level.
One example that is no longer valid is the problem of requiring the compiler to produce exception-handling (stack unwinding) code even for functions that will never throw. Now, that can be solved by specifying noexcept on the functions in question.
But other than those microscopic issues, I can't really think of any other major source of undue overhead in C++ (aside from RTTI and exceptions). I mean, there are things in C++ that can create overhead of different kinds, but they are all "opt-in" features (per-use) like virtual functions, virtual inheritance, multiple inheritance, templates, etc... but those mostly obey the "you only pay for what you use" principle. For an example of the rules imposed for a low-overhead subset of C++, check out Embedded C++.

Efficiency of program

I want to know whether there is an effect on program efficiency by adopting object oriented approach to a problem as compared to the structured programming approach in any programming language but specially in c++.

Maybe. Maybe not.
You can write efficient object-oriented code. You can write inefficient structured code.
It depends on the application, how well the code is written, and how heavily the code is optimized. In general, you should write code so that it has a good, clean, modular architecture and is well designed, then if you have problems with performance optimize the hot spots that are causing performance issues.
Use object oriented programming where it makes sense to use it and use structured programming where it makes sense to use it. You don't have to choose between one and the other: you can use both.

I remember back in the early 1990's when C++ was young there were studies done about this. If I remember correctly, the guys who took (well written) C++ programs and recoded them in C got around a 15% increase in speed. The guys who took C programs and recoded them in C++, and modified the imperative style of C to an OO style (but same algorithms) for C++ got the same or better performance. The apparent contradiction was explained by the observation that the C programs, in being translated to an object oriented style, became better organized. Things that you did in C because it was too much code and trouble to do better could more easily be done properly in C++.
Thinking back about this I wonder about the conclusion some. Writing a program a second time will always result in a better program, so it didn't have to be imperative to OO style that made the difference. Todays computer architectures are designed with hardware support for common operations done by OO programs, and compilers have gotten better at using the instructions, so I think that it is likely that whatever overhead a virtual function call had in 1992 it is far smaller today.

There doesn't have to be, if you are very careful to avoid it. If you just take the most straightforward approach, using dynamic allocation, virtual functions, and (especially) passing objects by value, then yes there will be inefficiency.

It doesn't have to be. Algorithm is all matters. I agree encapsulation will slow you down little bit, but compilers are there to optimize.

You would say no if this is the question in computer science paper.
However in the real development environment this tends to be true if the OOP paradigm is used correctly. The reason is that in real development process, we generally need to maintain our code base and that the time when OOP paradigm could help us. One strong point of OOP over structured programming like C is that in OOP it is easier to make the code maintainable. When the code is more maintainable, it means less bug and less time to fix bug and less time needed for implementing new features. The bottom line is then we will have more time to focus on the efficiency of the application.

The problem is not technical, it is psychological. It is in what it encourages you to do by making it easy.
To make a mundane analogy, it is like a credit card. It is much more efficient than writing checks or using cash. If that is so, why do people get in so much trouble with credit cards? Because they are so easy to use that they abuse them. It takes great discipline not to over-use a good thing.
The way OO gets abused is by
Creating too many "layers of abstraction"
Creating too much redundant data structure
Encouraging the use of notification-style code, attempting to maintain consistency within redundant data structures.
It is better to minimize data structure, and if it must be redundant, be able to tolerate temporary inconsistency.
ADDED:
As an illustration of the kind of thing that OO encourages, here's what I see sometimes in performance tuning: Somebody sets SomeProperty = true;. That sounds innocent enough, right? Well that can ripple to objects that contain that object, often through polymorphism that's hard to trace. That can mean that some list or dictionary somewhere needs to have things added to it or removed from it. That can mean that some tree or list control needs controls added or removed or shuffled. That can mean windows are being created or destroyed. It can also mean some things need to be changed in a database, which might not be local so there's some I/O or mutex locking to be done.
It can really get crazy. But who cares? It's abstract.

There could be: the OO approach tends to be closer to a decoupled approach where different modules don't go poking around inside each other. They are restricted to public interfaces, and there is always a potential cost in that. For example, calling a getter instead of just directly examining a variable; or calling a virtual function by default because the type of an object isn't sufficiently obvious for a direct call.
That said, there are several factors that diminish this as a useful observation.
A well written structured program should have the same modularity (i.e. hiding implementations), and therefore incur the same costs of indirection. The cost of calling a function pointer in C is probably going to be very similar to the cost of calling a virtual function in C++.
Modern JITs, and even the use of inline methods in C++, can remove the indirection cost.
The costs themselves are probably relatively small (typically just a few extra simple operations per instruction call). This will be insignificant in a program where the real work is done in tight loops.
Finally, a more modular style frees the programmer to tackle more complicated, but hopefully less complex algorithms without the peril of low level bugs.

Is it worth writing part of code in C instead of C++ as micro-optimization?

I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases; but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?

I'm going to agree with a lot of the comments. C syntax is supported, intentionally (with divergence only in C99), in C++. Therefore all C++ compilers have to support it. In fact I think it's hard to find any dedicated C compilers anymore. For example, in GCC you'll actually end up using the same optimization/compilation engine regardless of whether the code is C or C++.
The real question is then, does writing plain C code and compiling in C++ suffer a performance penalty. The answer is, for all intents and purposes, no. There are a few tricky points about exceptions and RTTI, but those are mainly size changes, not speed changes. You'd be so hard pressed to find an example that actually takes a performance hit that it doesn't seem worth it do write a dedicate module.
What was said about what features you use is important. It is very easy in C++ to get sloppy about copy semantics and suffer huge overheads from copying memory. In my experience this is the biggest cost -- in C you can also suffer this cost, but not as easily I'd say.
Virtual function calls are ever so slightly more expensive than normal functions. At the same time forced inline functions are cheaper than normal function calls. In both cases it is likely the cost of pushing/popping parameters from the stack that is more expensive. Worrying about function call overhead though should come quite late in the optimization process -- as it is rarely a significant problem.
Exceptions are costly at throw time (in GCC at least). But setting up catch statements and using RAII doesn't have a significant cost associated with it. This was by design in the GCC compiler (and others) so that truly only the exceptional cases are costly.
But to summarize: a good C++ programmer would not be able to make their code run faster simply by writing it in C.

measure! measure before thinking about optimizing, measure before applying optimization, measure after applying optimization, measure!
If you must run your code 1 nanosecond faster (because it's going to be used by 1000 people, 1000 times in the next 1000 days and that second is very important) anything goes.
Yes! it is worth ...
changing languages (C++ to C; Python to COBOL; Mathlab to Fortran; PHP to Lisp)
tweaking the compiler (enable/disable all the -f options)
use different libraries (even write your own)
etc
etc
What you must not forget is to measure!.

pmg nailed it. Just measure instead of global assumptions. Also think of it this way, compilers like gcc separate the front, middle, and back end. so the frontend fortran, c, c++, ada, etc ends up in the same internal middle language if you will that is what gets most of the optimization. Then that generic middle language is turned into assembler for the specific target, and there are target specific optimizations that occur. So the language may or may not induce more code from the front to middle when the languages differ greatly, but for C/C++ I would assume it is the same or very similar. Now the binary size is another story, the libraries that may get sucked into the binary for C only vs C++ even if it is only C syntax can/will vary. Doesnt necessarily affect execution performance but can bulk up the program file costing storage and transfer differences as well as memory requirements if the program loaded as a while into ram. Here again, just measure.
I also add to the measure comment compile to assembler and/or disassemble the output and compare the results of your different languages/compiler choices. This can/will supplement the timing differences you see when you measure.

The question has been answered to death, so I won't add to that.
Simply as a generic question, assuming you have measured, etc, and you have identified that a certain C++ (or other) code segment is not running at optimal speed (which generally means you have not used the right tool for the job); and you know you can get better performance by writing it in C, then yes, definitely, it is worth it.
There is a certain mindset that is common, trying to do everything from one tool (Java or SQL or C++). Not just Maslow's Hammer, but the actual belief that they can code a C construct in Java, etc. This leads to all kinds of performance problems. Architecture, as a true profession, is about placing code segments in the appropriate architectural location or platform. It is the correct combination of Java, SQL and C that will deliver performance. That produces an app that does not need to be re-visited; uneventful execution. In which case, it will not matter if or when C++ implements this constructors or that.

I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
no. keep it readable. if your team prefers c++ or c, prefer that - especially if it is already functioning in production code (don't rewrite it without very good reasons).
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference
then forbid copying and assigning
or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases
could you elaborate? if you are referring to templates, they don't have additional cost in runtime (although they can lead to additional exported symbols, resulting in a larger binary). in fact, using a template method can improve performance if (for example) a conversion would otherwise be necessary.
but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
in my experience, an expert c++ developer can create a faster, more maintainable program.
you have to be selective about the language features that you use (and do not use). if you break c++ features down to the set available in c (e.g., remove exceptions, virtual function calls, rtti) then you're off to a good start. if you learn to use templates, metaprogramming, optimization techniques, avoid type aliasing (which becomes increasingly difficult or verbose in c), etc. then you should be on par or faster than c - with a program which is more easily maintained (since you are familiar with c++).
if you're comfortable using the features of c++, use c++. it has plenty of features (many of which have been added with speed/cost in mind), and can be written to be as fast as c (or faster).
with templates and metaprogramming, you could turn many runtime variables into compile-time constants for exceptional gains. sometimes that goes well into micro-optimization territory.

C++ standard library vs mortal made code + where can I find the sources?

Two, maybe trivial questions:
1. Why can not I beat the STD functions?
Really. I spent the last three days implementing something faster than std::sort, just for the sake of doing it. It is supposed to be an introsort, and I suspect it uses the single pivot version quicksort inside. Epic fail. Mine was at least twice as slow.
In my utter bitterness I even copy-pasted other - top notch - programmers code. No avail.
I benchmarked my other algorithms too... My binary search, and upper_bound, lower_bound versions are so stripped down it couldn't really be made with less instructions. Still, they are about twice as slow.
I ask, why, why, why? And this leads me to my next question...
2. Where can I find the source code of the STL library functions?
Of course, I want to look at their sources! Is it even possible to write more efficient code than those, or am I at an abstraction level with my "simple" main.cpp where I can not reach optimisations utilized by the STL library?
I mean for example... Let's take the maps... wich are simple associative containers. The documentation says it is implemented with a red-black tree. Now... would it worth it to try implement my own red-black tree, or they took this joy :-) away from me and I should just throw every data I get my hands on into the map container?
I hope this does make sense.
If not, please forgive me.

The short answer is "if it was possible to write faster code which did the same thing, then the standard library would have done it already".
The standard library is designed by clever people, and the reason it was made part of C++ is that other clever people recognized it as being clever. And since then, 15 years have passed in which other clever people tried to take these specifications and write the absolutely most efficient code to implement it that they could.
That's a lot of cleverness you're trying to compete with. ;)
So there is no magic in the STL, they don't cheat, or use tricks unavailable to you. It is just very carefully designed to maximize performance.
The thing about C++ is that it's not a fast language as such. If you're not careful, it is easy to introduce all sorts of inefficiencies: virtual function calls, cache misses, excessive memory allocations, unnecessary copying of objects, all of this can cripple the performance of C++ code if you're not careful.
With care, you can write code that's about as efficient as the STL. It's not that special. But in general, the only way you're going to get faster code is to change the requirements. The standard library is required to be general, to work as well as possible across all use cases. If your requirement is more specific, it is sometimes possible to write specialized code that favors those specific cases. But then the tradeoff is that the code either will not work, or will be inefficient, in other cases.
A final point is that a key part of the reason why the STL is so clever, and why it was adopted into the standard, is that it it is pretty much zero-overhead. The standard libraries in many languages are "fast enough", but not as fast as hand-rolled code. They have a sorting algorithm, but it's not quite as fast as if you wrote it yourself in-place. It might use a few casts to and from a common "object" base class, or maybe use boxing on value types. The STL is designed so that pretty much everything can be inlined by the compiler, yielding code equivalent to if you'd hand-rolled it yourself. It uses templates to specialize for the type you're using, so there's no overhead of converting to a type understood by the container or algorithm.
That's why it's hard to compete with. It is a ridiculously efficient libary, and it had to be. With the mentality of your average C or C++ programmer, especially 10-15 years ago, no one would ever use a std::vector if it was 5% slower than a raw array. No one would use iterators and std algorithms if they weren't as fast as just writing the loop yourself.
So the STL pioneered a lot of clever C++ tricks in order to become just as efficient as hand-rolled C code.

They are probably optimized to a great extent. Such implementations consider memory page faults, cache misses etc.
Getting the source of those implementations depends on the compiler they are shipped with. I think most compilers (even Microsoft) will allow you to see them.
I think the most important things to know are the architecture you are compiling to and the operating system (if any) your program will be running on. Understanding these things will allow you to precisely target the hardware.
There are also countless optimization techniques. This is a good summary. Also, global optimization is whole science, so there are certainly many things to learn.
There are some clever things on this site, too. Sok sikert!

Looking at the disassembled version of your code versus their code and comparing them may give you some insights into why their is faster than yours.
It seems like a fool's errand to reimplement from scratch standard library functionality for the sake of making faster versions. You'd be far better served trying to modify their version to achieve your goals, although even then you really need to understand the underlying platform to be able to judge the value of the changes that you're making.
I would guess that were you to post your sort routine that it would be torn apart in minutes and you would gain an understanding of why your version is so substantially slower than the standard library version.

Most IDEs have a command to open a named header file using the compiler's search paths. I use this pretty often and tend to keep the code for algorithm open.
For me, the code you're looking for is in
/usr/include/c++/4.2.1/bits/stl_algo.h
/usr/include/c++/4.2.1/bits/stl_tree.h
Note that a lot of people have done their theses on sorting and tree-balancing (fields I would think are picked to the bone, I would not attempt research there), and many of them are probably more determined than you to make GCC's standard library faster.
That said, there's always the possibility of exploiting patterns specific to your code (some subranges are already sorted, frequently used specific small sequence sizes, etc).

Two answers, probably generally-applicable:
You will probably not be able to implement more efficient versions of algorithms that many other smart people have spent much more time optimizing. By virtue of time and testing alone, the STD algorithms will be pretty good.
For identical algorithms, optimization is something which is "very hard" with all the current hardware and configuration variations. To give an example, the primary factor for algorithm performance on a particular platform might be which levels of cache its most frequently used routines can be stored in, which is not generally something you can optimize by hand. Hence, the compiler is generally much more of a factor for actual algorithm performance than any particular code you can write.
But yeah... if you're really serious about optimization, get down into the assembly and compare. My advice, though, would be to focus on other things, unless it's your job to optimize your implementation or something. Just my 2c.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js